Wilson disease tissue classification and characterization using seven artificial intelligence models embedded with 3D optimization paradigm on a weak training brain magnetic resonance imaging datasets: a supercomputer application

Agarwal, Mohit; Saba, Luca; Gupta, Suneet K.; Johri, Amer M.; Khanna, Narendra N.; Mavrogeni, Sophie; Laird, John R.; Pareek, Gyan; Miner, Martin; Sfikakis, Petros P.; Protogerou, Athanasios; Sharma, Aditya M.; Viswanathan, Vijay; Kitas, George D.; Nicolaides, Andrew; Suri, Jasjit S.

doi:10.1007/s11517-021-02322-0

Wilson disease tissue classification and characterization using seven artificial intelligence models embedded with 3D optimization paradigm on a weak training brain magnetic resonance imaging datasets: a supercomputer application

Original Article
Published: 05 February 2021

Volume 59, pages 511–533, (2021)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Medical & Biological Engineering & Computing Aims and scope Submit manuscript

Wilson disease tissue classification and characterization using seven artificial intelligence models embedded with 3D optimization paradigm on a weak training brain magnetic resonance imaging datasets: a supercomputer application

Download PDF

Mohit Agarwal¹,
Luca Saba²,
Suneet K. Gupta¹,
Amer M. Johri³,
Narendra N. Khanna⁴,
Sophie Mavrogeni⁵,
John R. Laird⁶,
Gyan Pareek⁷,
Martin Miner⁸,
Petros P. Sfikakis⁹,
Athanasios Protogerou¹⁰,
Aditya M. Sharma¹¹,
Vijay Viswanathan¹²,
George D. Kitas¹³,
Andrew Nicolaides¹⁴ &
…
Jasjit S. Suri¹⁵

1046 Accesses
45 Citations
Explore all metrics

Abstract

Wilson’s disease (WD) is caused by copper accumulation in the brain and liver, and if not treated early, can lead to severe disability and death. WD has shown white matter hyperintensity (WMH) in the brain magnetic resonance scans (MRI) scans, but the diagnosis is challenging due to (i) subtle intensity changes and (ii) weak training MRI when using artificial intelligence (AI). Design and validate seven types of high-performing AI-based computer-aided design (CADx) systems consisting of 3D optimized classification, and characterization of WD against controls. We propose a “conventional deep convolution neural network” (cDCNN) and an “improved DCNN” (iDCNN) where rectified linear unit (ReLU) activation function was modified ensuring “differentiable at zero.” Three-dimensional optimization was achieved by recording accuracy while changing the CNN layers and augmentation by several folds. WD was characterized using (i) CNN-based feature map strength and (ii) Bispectrum strengths of pixels having higher probabilities of WD. We further computed the (a) area under the curve (AUC), (b) diagnostic odds ratio (DOR), (c) reliability, and (d) stability and (e) benchmarking. Optimal results were achieved using 9 layers of CNN, with 4-fold augmentation. iDCNN yields superior performance compared to cDCNN with accuracy and AUC of 98.28 ± 1.55, 0.99 (p < 0.0001), and 97.19 ± 2.53%, 0.984 (p < 0.0001), respectively. DOR of iDCNN outperformed cDCNN fourfold. iDCNN also outperformed (a) transfer learning–based “Inception V3” paradigm by 11.92% and (b) four types of “conventional machine learning–based systems”: k-NN, decision tree, support vector machine, and random forest by 55.13%, 28.36%, 15.35%, and 14.11%, respectively. The AI-based systems can potentially be useful in the early WD diagnosis.

Contrastive Learning with Dynamic Weighting and Jigsaw Augmentation for Brain Tumor Classification in MRI

Article 30 January 2023

Exploring Challenges and Opportunities for the Early Detection of Multiple Sclerosis Using Deep Learning

Wilson’s disease classification using higher-order Gabor tensors and various classifiers on a small and imbalanced brain MRI dataset

Article 09 March 2023

Discover the latest articles, news and stories from top researchers in related subjects.

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Wilson’s disease (WD) is due to excessive copper accumulation in the liver and brain [1]. The National Organizations for Rare Disorders (NORD) reported that 1 in every 30,000 to 40,000 people in the world are affected by WD [1]. It is estimated that there were nearly 600 cases of WD in the USA in the year 2007¹ and there will be 9000 people affected by WD in the USA by the end of 2020.

WD causes severe disability and death, if not treated early. The present diagnosis of WD uses anatomical tests, but they are not reliable [2, 3]. MRI has shown promising signs for diagnosing WD since it shows white matter hyperintensity (WMH) in the brain [4,5,6,7]. However, due to the volumetric nature of the MRI and subtle nature of the hyperintensity between WD and controls, human bias and interobserver variability may complicate diagnosis. To overcome these challenges, computer-aided diagnosis (CADx) methods can play a vital role in improving the classification and characterization of WD [8, 9].

Artificial intelligence (AI) is a branch of computer science that can handle classification effectively as it can map nonlinearity between input variations and disease severity [10]. AI methods can be broadly divided into two supervised learning categories namely machine learning (ML) and deep learning (DL). Machine learning methods [11,12,13,14,15,16] like decision tree (DT), k-nearest neighbor (k-NN), support vector machine (SVM), and random forest (RF) can be applied for classification, but they use manually identified features and can yield low performance. In contrast, deep learning (DL) [17,18,19,20] methods are more reliable because they can automatically learn features using hidden layers within a dataset. One such example is a deep convolution neural network (DCNN) that has been well-adopted by industry for image classification [11, 19].

Furthermore, DL systems can augment the size of the input data to ensure a balance between classes leading to stronger learning protocols.

The DL systems are having several parameters, in particular the number of layers in the DCNN architecture. These layers are typically (i) convolution layers, (ii) max-pooling layers, and the (iii) combination of dense layers that constitutes the neural network, along with the softmax layer. Since the low-level and high-level MRI features of the WD are extracted by the convolution and max-pooling set of layers, it is, therefore, important to define as to how many such sets are needed for the optimization of the DCNN system [21]. Furthermore, since DCNN is sensitive to the training data size, it is customary to understand what should be the total size of the input training data. This training data size can be altered by the “augmentation procedure” well-developed in the AI industry [22]. For the best combination of “augmentation folds” and the “number of layers” of the DCNN, we, therefore, need to optimize the DL system for best classification and characterization of the WD. Thus, there is a direct bearing between the WD classification and (a) number of DCNN layers and (b) augmentation folds of the training data size. Such an optimization paradigm is being attempted the first time for the WD application. Furthermore, because conventional DCNN (cDCNN) uses the rectified linear unit (ReLU) as an activation function that is not continuously differentiable at the origin, we have designed an improved DCNN (iDCNN) which is smooth near the origin thereby improving its performance. Finally, we, validate the hypothesis that there is WMH in WD MRI images. The system uses two novel characterization strategies by taking advantage of the CNN layers and the Bispectrum signal processing framework [12, 19, 23]. This is another unique feature of our paradigm. Furthermore, we benchmark our two DCNN systems against the transfer learning–based “Inception V3” [24] framework and four types of machine learning systems. Overall, we designed, applied, and compared seven kinds of AI approaches for the diagnosis of WD.

As part of performance evaluation, we conducted several new experiments: (i) computing diagnostics odds ratio (DOR) and correlating this to classification accuracy were conducted, which were never attempted previously. (ii) Furthermore, a power study was also conducted to estimate the optimum dataset size. (iii) Since all implementation were conducted on a supercomputer having 8 GPUs, timing analysis was performed to demonstrate the horsepower of our design. (iv) For best operating point characteristics of the DL system in terms of training data size, we optimize the CNN models by computing classification accuracy on varying the percentage of training data.

This study is the first study of its kind having the following novel approaches:

(i)
Design of 3D optimization of two deep learning systems by classification accuracy with (a) changing DL layers and (b) folds of augmentation. The DL layers were varied between 5, 7, 9, and 11, and each design was tested with different dataset sizes created using the augmentation protocol. Augmentation was utilized to increase the dataset size by 2×, 3×, 4×, and 5× folds to create 5 sets of data including the original size. Three-dimensional optimization was carried amongst these 4 DL designs and 5 augmented dataset sizes creating 20 combinations, which could then be used for choosing the best DCNN design for optimized classification. The final estimation would then be the number of layers and augmentation fold for the highest accuracy.
(ii)
Design of inter-comparison benchmarking between the three kinds of DL systems and four types of ML classification systems.
(iii)
Hypothesis validation by characterization of WD vs. controls using a combination of AI-based feature map strength (FMS), and Bispectrum signal processing approach.
(iv)
Performance evaluation by computing (a) DOR, (b) generalization of DL paradigms, (c) reliability, (d) stability, and (e) time analysis of supercomputing.

This study is subdivided into the following sections: Section 1 details the ongoing research in the field of classification in MR neuroimaging. Section 2 explains the methods and materials used in this study. Section 3 has the details of all classification results, and Section 4 discusses the characterization of WD. Section 5 shows the performance evaluation, and Section 6 presents the discussion on the novel techniques used in this paper. The paper concludes with Section 7.

2 Background literature

The role of AI in WD classification has not taken the front stage yet. Few studies are using AI, on this topic compared with other diseases such as Alzheimer’s, Parkinson’s, or cancer and neuroimaging in general. One can therefore not ignore studies in the WD area which are not AI-based. Our recent work is the culmination of two decades of research, where biomarkers like serum or urine were used for diagnosis or segregation of patients having WD, primarily based on the threshold ranges of these biomarkers [1, 2, 25]. These methods consisted of 24-h urinary and serum laboratory tests for the identification of WD. Another class of method for diagnosis of WD consisted of eye examination for Kayser-Fischer rings and gene mutations [25]. A more recent method used all four types of biomarkers such as serum, eye, urine, and brain imaging to confirm WD [1].

Other frameworks for diagnosis of WD have been studied such as laboratory-based (blood) tests or genetic mechanisms for WD classification. Vrabelova et al. [26] described the utilization of blood tests involving DNA analysis for the ATP7B gene mutation study. Rosencrantz and Schilsky [3] used mutation of ATP7B analysis along with Kayser-Fischer ring in eyes and elevated copper level in urine. These tests were better at diagnosing disease compared to the serum/urine biomarker tests.

With the advent of MRI, several studies diverted towards neuroimaging-based classification approaches; however, they remained manual in nature. WMH has recently been explored recently in many diseases [27,28,29]. Kim et al. [30] analyzed hyperintensity in T1-weighted (T1W) and T2-weighted (T2W) MRI scans of suspected patients and found WMH in different parts of the brain such as the globus pallidus, thalamus, midbrain, and pons.

Recently, AI-community has started implementing this technology for characterization and classification of WD. In 2011, imaging took a leap towards fMRI for WD. Hu et al. [31] studied changes in the amplitude of low-frequency fluctuations (ALFF) while conducting fMRI on WD patients. Resting-state functional magnetic resonance (fMRI) images have shown promise and were employed to measure ALFF in different parts of the brain [32, 33]. Furthermore, the evolution of ML started to penetrate the imaging domain for WD classification [34, 35]. Kaden et al. [34] have demonstrated the use of support vector machine (SVM) and parameterized generalized learning vector quantization (PGLVQ) for WD classification with an accuracy of 87.5% and 90.1%, respectively. Jing et al. [35] used independent component analysis (ICA) with functional networks and SVM for WD classification and obtained the AUC of 0.94 and accuracy of 89.4% (specificity: 90.0%, sensitivity: 89.3%) with aberrant functional networks (FN). None of the above studies demonstrated automated approaches for WD classification and characterization. Our study uses a 3D optimized deep learning–based paradigm for classification of WD against controls and further extends the DL models combined with signal processing for tissue characterization. The CADx system shows three kinds of DL and four kinds of ML for WD classification and offers a novel approach to the diagnosis of WD.

3 Methodology

3.1 Patient demographics, acquisition, and data augmentation

A cohort of 46 patients T2W-TSE MRI scans (average age: 40.73 ± 11.3 years, equal M/F ratio) between the years 2011 and 2015 was analyzed (approval was obtained from the Institutional Ethics Committee, Azienda Ospedaliero Universitaria (A.O.U.), Cagliari, Italy).

Imaging examinations were performed using a 1.5-T superconducting magnet (Philips, Best, The Netherlands) with a head coil according to a standardized protocol. In each subject, the conventional diffusion-weighted imaging (DWI) was performed with single-shot spin-echo with 2 diffusion-sensitivity values of 0 and 1000 s/mm² along the transverse axis. As part of our general brain protocol, axial and sagittal 2D FLAIR images (10000/140/2200 ms for TR/TE/TI; matrix: 512 × 512; FOV: 240 × 240 mm²; section thickness: 5 mm) were acquired. In addition to FLAIR and DWI sequences, axial spin-echo T1-weighted images (500–600/15/2 for TR/TE/excitations) and fast spin-echo T2-weighted images (2200–3200/80–120/1,2 for TR/TE/excitations; turbo factor, 2) were also obtained with the same section thickness.

3.1.1 Data augmentation

The initial MRI data were manually classified by our radiological team which was then prepared for further processing. Because the cohort consisted of 37 controls and 9 WD patients, we had an unequal number of images in both classes. As each patient MRI study had 12–13 slices, this resulted in 458 control images and 115 WD images. For optimal performance with an unbalanced dataset, the augmentation protocol using a python “Augmentor” API was applied in WD class resulting in 343 more WD images. Because deep CNN (DCNN) needs a large number of images for proper training and performance, we increased the number of images from 458 by 2×, 3×, 4×, and 5× folds in both classes, and the system was then trained and tested to find which augmented set yield optimal results. To avoid the unrealistic brain MRI scans during the augmentation protocol, we followed the acceptable protocol of rotating the image by − 10 to 10° randomly. This would prevent methods like flipping horizontal or vertical or rotating by larger angles.

3.1.2 Preprocessing: skull and background removal

Preprocessing is an essential component of the classification process. It helps extract the region-of-interest (ROI) from the MRI images. There are two important steps: (i) removal of the skull region and (ii) removal of the black background to prepare for the segmented ROI. As there are standard packages available which are well-accepted and published, we used BrainSuite [36] combined with volBrain [37] to segment and remove the background images. BrainSuite was used to read DICOM images that converted to nii files (nii file type is primarily associated with NIfTI-1 Data Format by Neuroimaging Informatics Technology Initiative) and obtain the grayscale images of the brain with the skull. volBrain helps to create a mask of the brain which can be used to remove the skull from the original MRI grayscale images. These were then further segmented and morphologically cleaned to remove the background. A sample pair of images from a patient with WD and control is shown in Fig. 1. The WD segmented brain images had brighter regions (higher WMH) in the convoluted zones of the brain (as shown inside yellow dotted rectangles Fig. 1) compared to control images.

3.2 Local architecture: deep CNN configurations

Our group had developed several CNN architectures covering a wide number of applications, namely, radiological imaging [38], stroke [19, 21, 39,40,41], liver [42], and cancer [43]. We have extended this to Wilson disease first time, and this is the first study of its kind which uses deep learning. The DCNN architecture used is shown in Fig. 2. It is composed of three convolution layers, each followed by a max-pooling layer, thus a total of six layers. A flattening layer that follows after these six layers converts the 2D signal to 1D. The final layer is a hidden dense layer consisting of 128 nodes. As usual, the final output is a softmax layer that has two outputs corresponding to WD or control. This design of lesser layers was chosen as the number of classes was only two, and the DCNN was able to work with the desired accuracy. Thus, the current configuration would need less storage space and inference time as compared to pre-trained CNN models. We adapted the ReLU function for the convolution and dense layers since that helps with fast convergence to the solution as compared to sigmoid or tanh activations functions [44]. Because the DCNN had augmentation implemented, we, therefore, consider several layered options corresponding to different DCNN configurations, shown in Table 1. It shows 5 types of DCNN combination consisting of different convolution layers, max-pool layers, and dense layers. Thus, for the adaption of all experiments, it is, therefore, necessary to undergo 3D optimization between the accuracy, CNN layers, and the folds of augmentations. The block diagram of the DL-based classification and characterization system is shown in Fig. 3. As seen in Fig. 3, the MRI scans are preprocessed and split into training and testing. Training images are used to train the deep learning model along with gold-standard labels, generating the training weights. These weights transformed the test patients to predict their class labels. Bispectrum, DL model’s mean feature strength, and histogram were used for the characterization process to yield mean feature strength (MFS) and Bispectrum (B) values.

Table 1 Five types of cDCNN models consisting of different CNN layers

Full size table

The definition of conventional ReLU is as given as σ = max(0, x). Here, σ is the activation value and x is the input to ReLU function. This equation was modified to σ = (max(0, x)) ^ 1.00001. Note that the differential of this equation is 0 at point x = 0, whereas the conventional ReLU is not differentiable at the origin. Since loss minimization in DCNN used gradient descent process which needs differential of various variables, it is, therefore necessary to have ReLU made continuously differentiable at x = 0. The equation for loss is given in formula 13 in Appendix. In the improved deep CNN (iDCNN) for better performance, this change of activation function was implemented in for all the convolution layers and the dense layer.

3.2.1 Transfer learning

As part of the overall CADx system, we benchmark our DL systems against the “transfer learning–based Inception V3” [24] pre-trained CNN and four machine learning paradigms such as k-NN, DT, SVM, and RF. InceptionV3 is a 42-layered deep model consisting of 11 inception modules (each comprising of multiple convolution layers and max-pooling filters), followed by three fully connected layers and a softmax activation layer. It was originally designed for a 1000 class ImageNet dataset for the famous ImageNet Large-Scale Visual Recognition Competition (ILSVRC) and the model is customized to two class problems for this study and trained further after loading the pre-trained weights of the ImageNet dataset. Inception V3 was designed to reduce the overall number of parameters to reduce network size and inference time. The reduction in parameters is done with help of factorizing convolutions. For example, a 5 × 5 filter convolution can be done by two 3 × 3 filter convolutions. The parameters in this process reduce from 5 × 5 = 25 to 3 × 3+3 × 3 = 18. Thus, it brings a 28% reduction in the number of parameters. With a smaller number of parameters, the model will less overfit and thus also increases the accuracy.

3.2.2 Machine learning

Our group has been very active in machine learning (ML) in several tissue characterization and classification applications, namely, diabetes [45], plaque [46,47,48,49,50], thyroid cancer [51, 52], ovarian cancer [53, 54], prostate cancer [23], liver cancer [42, 55, 56], lung cancer [57], skin cancer [58], bladder cancer [59], heart [60], cardiovascular disease risk [61,62,63,64], coronary artery disease [65], stroke [50, 66, 67], arrhythmia [68], and gene expression characterization [69]. We adapted similar paradigm in our current setting for Wilson disease application. Different feature selection methods were used consisting of Haralick, Hu moments, and LBP feature extraction frameworks. Table 2 ML4 (consisting of random forest) shows the highest accuracy corresponding to selected feature combination FC3 that consisted of Haralick’s and Hu’s moments. A brief description of these features is as given below:

Haralick features: These are based on the texture of the image and generated using Gray Level Co-occurrence Matrix (GLCM) using one of energy, entropy, or homogeneity of these matrix element values.
Hu moments: These are features of the object in the image and generated using centralized moments.
LBP features: LBP (Local Binary Pattern) is also a powerful texture-based feature calculated by comparing a pixel with 8 neighboring pixels.

Table 2 Combination of feature types for four types of ML systems (bold cell indicates maximum accuracy obtained with FC3 feature combination and random forest ML technique)

Full size table

The equations for energy, entropy, and homogeneity used in Haralick features are given in Eqs. 1–3.

$$ \mathrm{Energy}={\sum}_i\ {\sum}_j{P}_d{\left(i,j\right)}^2 $$

(1)

$$ \mathrm{Entropy}=-{\sum}_i\ {\sum}_j{P}_d\left(i,i\right)\ \log \left({P}_d\left(i,j\right)\right) $$

(2)

$$ \mathrm{Homogeneity}={\sum}_i\ {\sum}_j\frac{1}{1+{\left(i+j\right)}^2}\ {P}_d\left(i,j\right) $$

(3)

The equation for Hu moments is given by Eq. 4.

$$ {\mu}_{pq}=\sum \limits_x\ \sum \limits_y{\left(x-\overline{x}\right)}^p{\left(y-\overline{y}\right)}^qf\left(x,y\right) $$

(4)

where μ_pq are the centralized moments; x, y are pixel coordinates and f(x, y) are pixel intensities at these coordinates. Here p = 0, 1, 2, 3 and q = 0, 1, 2, 3.

The block diagram of the ML-based classification system is shown in Fig. 4. As seen in Fig. 4, the preprocessing block processes the acquired MRI scans to yield the segmented brain region. This was implemented using BrainSuite and volBrain software which gives a very clean mask used to segment grayscale images very clearly. The engineering features were extracted using a combination of Haralick, Hu moments, and LBP feature–based methods. ML-based methods (k-NN, DT, SVM, or RF) already trained on labeled segmented images are used as input to the prediction process for classification of the test MRI input. The final output consisted of a binary output class consisting of either WD or control class.

3.3 Performance evaluation protocol

To evaluate the model performance of the DCNN systems, different tests were adopted such as (a) Wilson Disease Segregation Index (WDSI) to estimate the feature strength of different AI techniques between two classes; (b) diagnostics odds ratio (DOR) for DCNN and ML methods; (c) power analysis to find the optimum dataset size; (d) timing analysis of supercomputer vs. local computer; (e) for best operating point characteristics, optimization of the CNN model with a percentage of the training datasets; and (f) finally, the validation of DL models against the well-accepted and published biometric facial dataset.

3.3.1 Wilson Disease Segregation Index of WD against control

We compute the WDSI, which is an indicator for the class separation between the controls and Wilson disease, expressed as percentage, and is mathematically given by Eq. 5:

$$ \mathrm{WDSI}=\left(\frac{\mid {\mu}_{WD}-{\mu}_C\mid }{\mu_C}\right)\times 100 $$

(5)

where μ_C is the mean feature strength of control class and μ_WD is the mean feature strength of WD class.

4 Results

This section primarily demonstrates three optimization experiments on DCNN, and four comparative experiments between DCNN and ML systems. The first three experiments show 3D optimization of DCNN layers during the augmentation process (DCNN9*), the effect of training on DCNN performance, and the optimal sample size selection for DCNN to be generalized. The batch is focused on benchmarking of the DL system against four ML systems, AUCs of DL vs. ML systems, and the segregation index of WD vs. controls.

4.1 Three-dimensional optimization of DCNN layers during augmentation process (DCNN9*)

The objective of this protocol is to find out the best CNN layer and augmentation combination. Since there are 20 “DCNN + Augm” combinations (5 types of CNN layers and 4 types of augmentations) and each combination is a K10 protocol (10 combinations in each of the K10 protocols), then a total of 200 different runs (or jobs), we, therefore, take advantage of supercomputer power to run five types of DCNN designs over four kinds of augmentations. Since DCNN accuracy varies depending upon the number of hidden convolution layers, it is, therefore, vital to undergo the optimization run (see Table 1). The results of this can be seen in the 3D surface plot in Fig. 5 and 3D bar graphs in Fig. 14 in Appendix, and the corresponding values for cDCNN are shown in Table 3. As seen in the 3D surface plot, with an increase in the CNN layers (down the rows R1 to R5), the accuracy increases and then gradually falls. Similarly, with an increase in augmentation, the accuracy increases initially (from C1 to C3) and then falls. The best CNN layer-augmentation combination was DCNN9*-Augm4*. All the subsequent experiments were then conducted at this combination point of DCNN9*-Augm4* or a short form as “DL9A4”. The equation of accuracy is given as formula 8 in Appendix. The equation for standard deviation is given as formula 14 of Appendix.

Table 3 Accuracy of different cDCNN layers vs. augmentation (bold cell indicates that maximum accuracy obtained with 4× fold augmentation using 9 layered cDCNN)

Full size table

4.2 Effect of training on DCNN performance using “DL9A4”

K-fold cross-validation protocols were executed on DCNN9*-Augm4* combination dataset. For this different train and test split (K2, K3, K4, K5, K10, and TT), we used different combinations, as required.

Table 4 shows the effect of training data size (with increasing K-fold) on the three types DCNN systems. For convenience, we have added the transfer learning system (tDCNN) here as well. The comparisons of cDCNN and iDCNN results are given in Fig. 15 in Appendix and Fig. 6.

Table 4 Results on the “effect of training data size” using DCNN9*-Augm4* (DL9A4) combination

Full size table

As seen, accuracy slowly rises from K2 to K10 and is best for TT (training is the same as testing) protocol, which was used for validation. Note the order of performance was iDCNN > cDCNN > tDCNN.

4.3 Optimal sample selection for generalization of the DL system

The DL9A4 was tested with a different percent of training data, and K10 accuracy was found for each data size as shown in Fig. 7. The curve shows accuracy increases until the point of inflection, which is 60% of the training dataset. This shows the data size has the capacity to generalize after 60% of the dataset.

The comparison of cDCNN and iDCNN performance for different percent of training data is shown in Fig. 8.

4.4 Benchmarking of three DL systems against four ML systems

The benchmarking was conducted for DCNN9*-Augm4* (DL9A4) combination against the transfer learning–based Inception V3 pre-trained model (tDCNN) and four types of ML systems (k-NN, DT, SVM, and RF) using K10 protocol as discussed in Section 3. The comparative results of benchmarking can be seen in Table 5.

Table 5 Comparison of 7 AI systems for WD classification (in the increasing order of AUC) (bold cell indicates the maximum AUC value obtained with iDCNN)

Full size table

The best performance was achieved for iDCNN9* using a modified ReLU function.

4.5 Receiver operating characteristics curves (DL vs. ML)

Receiver operating characteristics curve shows the relationship between the false-positive rate (FPR) and the true-positive rate (TPR). The AUC validates our hypothesis. The ROC curve for 4 ML classifiers and three DCNNs are given in Fig. 9, while the AUC values are shown in Table 5 (column C2). ROC curve is a plot between TPR (y-axis) and FPR (x-axis). The equation for TPR and FPR is given as formula 11 and 12 in Appendix.

4.6 Wilson Disease Segregation Index

The Wilson Disease Segregation Index (WDSI) was calculated using the mean feature strengths of DCNN and ML features. The large value of WDSI shows larger segregation between WD and controls which justifies the ability of the AI method as seen in Table 6. The order of the WDSI is iDCNN > cDCNN > tDCNN > ML.

Table 6 WDSI between DCNN and ML systems

Full size table

5 WD characterization

Characterization [12, 19, 23, 70] is vital for validation of our hypothesis that WD has a higher WMH compared to controls and this accounts for the increase in feature strength in the layers of the hidden layers of the DCNN. We thus evaluate the FMS for the layers and see the value of the FMS between the WD against controls.

5.1 Hypothesis validation 1: mean feature map strength using DNN9-Augm4 (DL9A4)

Feature map strength (FMS) is the mean of activation values over all images in a class. FMS for a trained DNN9*-Augm4* model at 8 hidden layers is shown in Fig. 10a.

As seen from Fig. 10a, the FMS values are consistently higher for WD class of the output layers. The mean FMS values for control and WD class are 500.14 ± 46.09 and 529.68 ± 47.23, respectively, showing an increase of 5.77% (C.I. 4.5463 to 54.5426, p value < 0.0001). This supports our 1st hypothesis that WMH of WD class is higher than the controls.

5.2 Camel hunch phenomenon

The WMH could be better visualized by understanding the histogram distribution of the brain region. The histogram is computed by considering the bin size of 4, leading to 64 bins (256 values, divided by 4). This is repeated for both classes. As seen in Fig. 11c, d, the histograms show a camel hunch-like shape in WD class from 25th to 35th bin corresponding to intensity range 100–140. This phenomenon occurs due to regions of WMH [5,6,7] in WD MRI scans, mainly at the convoluted edge of the folds of the brain.

5.3 Hypothesis validation 2: bispectrum strength computation

Bispectrum (B) falls in the category of higher order spectra (HOS) [23]. To calculate HOS, the Radon transform of images was calculated at various angles from 0 to 180° in an interval of 15°. Here, the Radon transform was applied to the images where pixels of MRI scans in range 100–140 are segregated (shown in red: Fig. 11a, b). On calculating mean B-values on all images of two classes, the B-value for WD is found consistently higher than control (see Fig. 10b), with mean of 20.87 for WD and 13.47 for control class showing a rise by 54.71% (C.I. 5.5490 to 9.2526, p value < 0.0001). Figure 12 shows the 2D representation of the Bispectrum strength for WD against the control, and Fig. 13 shows more Bispectrum strength for WD in its 3D plot. The equation for Bispectrum is given as $ Bispectrum\left(B\left(\mathrm{f}1,\mathrm{f}2\right)\right)=E\left[\mathcal{F}\left(\mathrm{f}1\right)\times \mathcal{F}\left(\mathrm{f}2\right)\times \mathcal{F}\left(\mathrm{f}1+\mathrm{f}2\right)\right] $, where, B is the Bispectrum value, $ \mathcal{F} $ is the Fourier transforms and E is the expectation operator. The region Ω of computation of bispectrum and bispectral features of a real signal is uniquely given by a triangle 0 < = f2 < = f1 < = f1 + f2 < = 1.

6 Performance evaluation

6.1 Diagnostics odds ratio

Diagnostic odds ratio (DOR) is used to discriminate subjects with a target disorder from subjects without it. DOR is calculated according to Eq. 6 [71]. DOR can take any value from 0 to infinity. A test with a more positive value means better test performance. A test with a value of 1 means it gives no information about the disease and with a value less than 1 means it is in the wrong direction and predicts opposite outcomes.

$$ \mathrm{DOR}=\frac{\mathrm{TP}/\mathrm{FN}}{\mathrm{FP}/\mathrm{TN}}=\frac{\mathrm{sens}/\left(1-\mathrm{sens}\right)}{\left(1-\mathrm{spec}\right)/\mathrm{spec}} $$

(6)

where TP, FP, TN, and FN represent true positive, false positive, true negative, false negative. Sens and spec stand for the sensitivity and specificity, respectively, and equation is given in Formula 9 and 10 of Appendix. The DOR values for all ML and DL methods are shown in Table 7.

Table 7 Sensitivity and specificity with increasing order of DOR for the 7 AI systems (the bold rows indicate the maximum values using cDCNN and iDCNN)

Full size table

6.2 Power analysis

The sample size was calculated according to Eq. 7 [72] using the mean difference between K10 mean accuracy of DCNN9* and DCNN11 while keeping the augmentation 4× (see Table 3, cell number (C3, R3), and cell number (C3, R4)).

$$ \mathrm{Sample}\ \mathrm{Size}=\frac{2\times {\left({Z}_{\alpha }+{Z}_{1-\beta}\right)}^2\times {\sigma}^2}{\varDelta^2} $$

(7)

Here, the value of Z_α = 3.2905 for type 1 error having a value of 1%, and Z_1-β = 1.6449 for type II error having a value of 1%. Here, σ (standard deviation) = 2.53 and Δ (mean difference) = 0.627. Substituting these values in Eq. 2, the sample size returns a value of 793. This is the required sample size. The database we adapted has 1832 samples for WD or controls using 4× augmentations. Thus, our database is 2.31 times the required limit and we are above the limit by 1039 samples.

6.3 Timing analysis

The supercomputer was adapted during the training of DCNN for the optimal performance of the CADx system. We, therefore, calculated the gain as the ratio of time taken by local computer (LC) (which was HP Desktop 2010) to the time taken by the supercomputer (SC) (which was NVIDIA). The gain values are shown in Table 8.

Table 8 Timing analysis and gains in time for the supercomputer against the local computer

Full size table

As seen, the time taken by a local computer using CPU is around 7–9 times more than that of a supercomputer. Thus, it will take 72 h or 3 days for a job to run on a local computer which will take only 8 h on a supercomputer. The C.I. and p value of the timings are 113.2824 to 215.3842, p value = 0.0052.

6.4 Reliability analysis

Reliability is calculated using the formula: $ \mathrm{Reliability}\ \left(\%\right)=\left(1-\frac{\mu_{\mathrm{N}}}{\sigma_{\mathrm{N}}}\right)\times 100 $, where μ_N and σ_N are the mean and standard deviation of the classification accuracy. The variation of reliability according to data size is given in Table 9. For the system to be reliable and stable, we must meet three criteria [70, 73]: (i) If the reliability of the DCNN > 95%, then the system is reliable; (ii) If the SD < 5%, then the DCNN is considered as stable; (iii) Furthermore, if the variation in accuracy is not more than 5%, then the system is considered stable. In our case, we meet all the above 3 criteria. For data size above 20% (row R3), reliability (column C3) is above 95%, SD < 5% (row R3, column C4), and variation in accuracy < 5% for rows R4 to rows R10. This concludes the compliance of our DCNN system to be stable and reliable.

Table 9 Reliability analysis for different percent of training data

Full size table

7 Discussion

The objective of this study was to classify images from patients with WD against control in unbalanced and weak brain MRI training datasets. The system design consisted of 3D optimization of the best DCNN model under best-augmented conditions. Our optimization uses the best combination of DCNN9*-Augm4*. The design of iDCNN was comparable but superior to cDCNN. We also showed that iDCNN outperformed tDCNN by 11.92% and four types of “conventional machine learning–based systems” such as k-NN, decision tree, support vector machine, and random forest by 55.13%, 28.36%, 15.35%, and 14.11%, respectively. The performance evaluation of the DCNN system was evaluated using DOR and WDSI and all showed consistent results. We also showed the effect of training data on the system accuracy for this optimal point. The hypothesis was validated using two novel strategies for WD characterization using FMS and Bispectrum analysis.

7.1 Benchmarking

We benchmarked our DCNN systems against existing systems as shown in Table 10. As found from existing research, no classification work has been done in WD. Benchmarking table also shows a comparison between the previous studies and current proposed study. Overall, this was the first paper using state-of-the-art technology to optimize several AI methods in the diagnosis of WD. The benchmarking is done with recent papers referred by author and year and 1st column C1; it is followed by the type of brain diseases such as Alzheimer (ALZ), brain tumor (BT), and mild cognitive impairment (MCI) in column C2. In column C3, the techniques used are mentioned such as SVM, CNN, and DBM. Column C4 tells the imaging modality such as MRI and fMRI. In column C5, we have mentioned where authors have used ML- or DL-based AI technique. Columns C6 and C7 describe the accuracy and AUC (p value) comparison with these referred papers. The column C4 shows the neurological applications such as spatial MRI imaging or functional MRI imaging. Majority of the studies use ML or DL models. The accuracy of the systems has an average value of 87%, while our system had about 10% improvement compared to the average value. R10 and R11 showed the proposed cDCNN and iDCNN methods which had an accuracy of ~ 97.2% and ~ 98.3%, respectively. The AUC for our proposed methods were 0.98 (p < 0.0001) and 0.99 (p < 0.0001).

Table 10 Benchmarking of our proposed DCNN strategy against the previously published literature (the bold rows indicate that best accuracy/AUC values obtained using cDCNN and iDCNN)

Full size table

7.2 A short note on WD characterization

Even though we are able to characterize the WD by computing the Bispectrum values and AI models, it is important to note that the Bispectrum values are computed in the pixel zone corresponding to the camel hunch region. This was valid for our datasets, but definitely need a wider set of clinical validations. Furthermore, the spatial slices considered were on an average of 12 per patient. To begin with the number of patients on control was four times the WD, which shows a slight imbalance. Thus, a strong pool of data is required to more validate this paradigm.

7.3 A short note on the role of the gold standard for the design of the Wilson disease system

The role of the gold standard is very crucial in the design of deep learning systems. They act like the binary event, “high risk and low risk,” or “benign cancer vs. malignant cancer,” or “cardiovascular event vs. no-cardiovascular event” or “cerebrovascular event vs no-cerebrovascular event.” Such binary events can be well-detected diagnostically if they are trained using the deep learning model. Recently, our group developed a method for classification of symptomatic risk likely to have stroke vs. asymptomatic patients not likely to have the stroke [21]. The deep learning solution was very successful by training the deep CNN model using the gold standard based design by the neurologist. Such a process is also called as characterization of the disease since the deep learning system is able to use the features of the disease to classify into binary events such as Wilson disease vs. normal. Examples of several kinds of well-defined characterization systems can be seen in the machine learning section. Even though typically, the characterization can be used for binary classification, but the multiclass scenarios can also be developed when it comes to characterization [61, 79]. It just provides several levels of risk rather than two types of risk.

7.4 Strength, weakness, and extensions

This is the first study of its kind that considered DCNN architecture for classification and characterization of the WD. The architecture was optimized by changing the number of layers of the DCNN architecture and augmentation protocol. The AI system showed high performance and the results were validated. The WD characterization was conducted using two different models, first using the AI framework, and second using signal processing framework using higher order spectra by computing the Bispectrum values. The system showed consistent results and the hypothesis was validated.

Even though the pilot study showed powerful results, one can automate the manual segmentation step by automated methods [9, 80, 81]. Furthermore, more ML alternative and more features can be used in future [59]. More validations need to be conducted in the future, such as cross-modality fusion using registration methods [82, 83]. More neurological model-based techniques can be designed [84, 85].

The system can be extended to transfer learning–based approaches to avoid the heavy supercomputer processing time during training, therefore using the pre-trained weights [79]. In spite of our successful pilot study showing a set of seven successful AI models, this can be taken as a launching pad for multicenter data collection for bigger trials. The scanners used for imaging also can play a role while acquiring the MRI data, just like other modalities [86].

8 Conclusion

This is the first study of its kind to use an advanced CADx system based on seven kinds of AI combinations to classify images from patients with WD vs. controls to achieve the best possible architecture of DCNN and to attain the best accuracy of 98.28 ± 1.55%. The three DCNN methods were compared with four ML methods showing the benefit of deep learning. The study also used the characterization of WD using two hypotheses showing feature map strength and Bispectrum strength of WD higher due to regions of WMH in MRI scans. A detailed performance evaluation was also implemented using diagnostics odds ratio, power analysis, supercomputer timing analysis, and generalization analysis of DCNN performance.

References

Dusek P, Litwin T, Czlonkowska A (2015) Wilson disease and other neurodegenerations with metal accumulations. Neurol Clin 33(1):175–204
Article PubMed Google Scholar
Medici V, Rossaro L, Sturniolo G (2007) Wilson disease—a practical approach to diagnosis, treatment and follow-up. Dig Liver Dis 39(7):601–609
Article CAS PubMed Google Scholar
Rosencrantz R and Schilsky M (2011) “Wilson disease: pathogenesis and clinical considerations in diagnosis and treatment,” in Seminars in liver disease 31(03):245–259. © Thieme Medical Publishers, pp.
Roberts EA, Schilsky ML (2003) A practice guideline on Wilson disease. Hepatology 37(6):1475–1492
Article PubMed Google Scholar
Singh P, Ahluwalia A, Saggar K, Grewal CS (2011) Wilson’s disease: Mri features. J Pediatr Neurosci 6(1):27
Article PubMed PubMed Central Google Scholar
Parekh JR, Agrawal PR (2014) Wilson’s disease: ‘face of giant panda’ and ‘trident’ signs together. Oxford Medical Case Reports 2014(1):16–17
Article PubMed PubMed Central Google Scholar
Yousaf M, Kumar M, Ramakrishnaiah R, Vanhemert R, Angtuaco E (2009) Atypical MRI features involving the brain in Wilson’s disease. Radiol Case Rep 4(3):312
Article PubMed Google Scholar
El-Baz A, Suri JS (2011) Lung imaging and computer aided diagnosis. CRC Press
El-Baz A, Jiang X, Suri JS (2016) Biomedical image segmentation: advances and trends. CRC Press
Khanna NN, Jamthikar AD, Gupta D, Araki T, Piga M, Saba L, Carcassi C, Nicolaides A, Laird JR, Suri HS, Gupta A, Mavrogeni S, Protogerou A, Sfikakis P, Kitas GD, Suri JS (2019) Effect of carotid image-based phenotypes on cardiovascular risk calculator: Aecrs1. 0. Med Biol Eng Comput 57(7):1553–1566
Article PubMed Google Scholar
Abiwinanda N, Hanif M, Hesaputra ST, Handayani A, and Mengko TR (2019) “Brain tumor classification using convolutional neural network,” in World Congress on Medical Physics and Biomedical Engineering 2018. Springer, pp. 183–189.
Sharma AM, Gupta A, Kumar PK, Rajan J, Saba L, Nobutaka I, Laird JR, Nicolades A, Suri JS (2015) A review on carotid ultrasound atherosclerotic tissue characterization and stroke risk stratification in machine learning framework. Curr Atheroscler Rep 17(9):55
Article PubMed Google Scholar
Saba L, Tiwari A, Biswas M, Gupta SK, Godia-Cuadrado E, Chaturvedi A, Turk M, Suri HS, Orru S, Sanches JM et al (2019) Wilson’s disease: a new perspective review on its genetics, diagnosis and treatment. Front Biosci (Elite edition) 11:166–185
Google Scholar
Wee C-Y, Yap P-T, Zhang D, Wang L, Shen D (2014) Group-constrained sparse fMRI connectivity modeling for mild cognitive impairment identification. Brain Struct Funct 219(2):641–656
Article PubMed Google Scholar
Prasad G, Joshi SH, Nir TM, Toga AW, Thompson PM, A. D. N. I. ADNI et al (2015) Brain connectivity and novel network measures for Alzheimer’s disease classification. Neurobiol Aging 36:S121–S131
Article PubMed Google Scholar
Khedher L, Raḿırez J, Górriz JM, Brahim A, Segovia F, Initiative ADN et al (2015) Early diagnosis of Alzheimer’s disease based on partial least squares, principal component analysis and support vector machine using segmented MRI images. Neurocomputing 151:139–150
Article Google Scholar
Affonso C, Rossi ALD, Vieira FHA, de Leon Ferreira ACP et al (2017) Deep learning for biological image classification. Expert Syst Appl 85:114–122
Article Google Scholar
Baloglu UB, Talo M, Yildirim O, San Tan R, Acharya UR (2019) Classification of myocardial infarction with multi-lead ECG signals and deep CNN. Pattern Recogn Lett 122:23–30
Article Google Scholar
Biswas M, Kuppili V, Saba L, Edla DR, Suri HS, Sharma A, Cuadrado-Godia E, Laird JR, Nicolaides A, Suri JS (2019) Deep learning fully convolution network for lumen characterization in diabetic patients using carotid ultrasound: a tool for stroke risk. Med Biol Eng Comput 57(2):543–564
Article PubMed Google Scholar
Druzhkov P, Kustikova V (2016) A survey of deep learning methods and software tools for image classification and object detection. Pattern Recognit Image Anal 26(1):9–15
Article Google Scholar
Skandha SS, Gupta SK, Saba L, Koppula VK, Johri AM, Khanna NN, Mavrogeni S, Laird JR, Pareek G, Miner M, Sfikakis PP, Protogerou A, Misra DP, Agarwal V, Sharma AM, Viswanathan V, Rathore VS, Turk M, Kolluri R, Viskovic K, Cuadrado-Godia E, Kitas GD, Nicolaides A, Suri JS (2020) 3-D optimized classification and characterization artificial intelligence paradigm for cardiovascular/stroke risk stratification using carotid ultrasound-based delineated plaque: Atheromatic™ 2.0. Comput Biol Med 125:103958
Article CAS PubMed Google Scholar
Saba L, Agarwal M, Sanagala S, Gupta S, Sinha G, Johri A, Khanna N, Mavrogeni S, Laird J, Pareek G et al (2020) Brain MRI-based Wilson disease tissue classification: an optimised deep transfer learning approach. Electronics Letters
Pareek G, Acharya UR, Sree SV, Swapna G, Yantri R, Martis RJ, Saba L, Krishnamurthi G, Mallarini G, El-Baz A et al (2013) Prostate tissue characterization/classification in 144 patient population using wavelet and higher order spectra features from transrectal ultrasound images. Technol Cancer Res Treat 12(6):545–557
Article PubMed Google Scholar
Szegedy C, Vanhoucke V, Ioffe S, Shlens J, and Wojna Z (2016) “Rethinking the inception architecture for computer vision,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2818–2826.
Ferenci P (2006) Regional distribution of mutations of the atp7b gene in patients with Wilson disease: impact on genetic testing. Hum Genet 120(2):151–159
Article CAS PubMed Google Scholar
Vrabelova S, Letocha O, Borsky M, Kozak L (2005) Mutation analysis of the atp7b gene and genotype/phenotype correlation in 227 patients with Wilson disease. Mol Genet Metab 86(1-2):277–285
Article CAS PubMed Google Scholar
Saba L, Lucatelli P, Anzidei M, di Martino M, Suri JS, Montisci R (2018) Volumetric distribution of the white matter hyper-intensities in subject with mild to severe carotid artery stenosis: does the side play a role? J Stroke Cerebrovasc Dis 27(8):2059–2066
Article PubMed Google Scholar
Saba L, Sanfilippo R, Porcu M, Lucatelli P, Montisci R, Zaccagna F, Suri JS, Anzidei M, Wintermark M (2017) Relationship between white matter hyperintensities volume and the circle of Willis configurations in patients with carotid artery pathology. Eur J Radiol 89:111–116
Article PubMed Google Scholar
Porcu M, Balestrieri A, Siotto P, Lucatelli P, Anzidei M, Suri JS, Zaccagna F, Argiolas GM, Saba L (2018) Clinical neuroimaging markers of response to treatment in mood disorders. Neurosci Lett 669:43–54
Article CAS PubMed Google Scholar
Kim T, Kim IO, Kim WS, Cheon J-E, Moon S, Kwon J, Seo J, Yeon K (2006) MR imaging of the brain in Wilson disease of childhood: findings before and after treatment with clinical correlation. Am J Neuroradiol 27(6):1373–1378
CAS PubMed PubMed Central Google Scholar
Hu X, Chen S, Huang C-B, Qian Y, Yu Y (2017) Frequency-dependent changes in the amplitude of low-frequency fluctuations in patients with Wilson’s disease: a resting-state fMRI study. Metab Brain Dis 32(3):685–692
Article PubMed PubMed Central Google Scholar
Corgiolu S, Barberini L, Suri JS, Mandas A, Costaggiu D, Piano P, Zaccagna F, Lucatelli P, Balestrieri A, Saba L (2018) Resting-state functional connectivity MRI analysis in human immunodeficiency virus and hepatitis c virus co-infected subjects. a pilot study. Eur J Radiol 102:220–227
Article PubMed Google Scholar
Porcu M, Wintermark M, Suri JS, Saba L (2020) The influence of the volumetric composition of the intracranial space on neural activity in healthy subjects: a resting-state functional magnetic resonance study. Eur J Neurosci 51(9):1944–1961
Article PubMed Google Scholar
Kaden M, Riedel M, Hermann W, Villmann T (2015) Border-sensitive learning in generalized learning vector quantization: an alternative to support vector machines. Soft Comput 19(9):2423–2434
Article Google Scholar
Jing R, Han Y, Cheng H, Han Y, Wang K, Weintraub D, Fan Y (2019) Altered large-scale functional brain networks in neurological Wilson’s disease. Brain Imaging Behav:1–11
Shattuck DW, Leahy RM (2002) Brainsuite: an automated cortical surface identification tool. Med Image Anal 6(2):129–142
Article PubMed Google Scholar
Manjón JV, Coupé P (2016) volBrain: an online MRI brain volumetry system. Front Neuroinform 10:30
Article PubMed PubMed Central Google Scholar
Saba L, Biswas M, Kuppili V, Godia EC, Suri HS, Edla DR, Omerzu T, Laird JR, Khanna NN, Mavrogeni S et al (2019) The present and future of deep learning in radiology. Eur J Radiol 114:14–24
Article PubMed Google Scholar
Saba L, Biswas M, Suri HS, Viskovic K, Laird JR, Cuadrado-Godia E, Nicolaides A, Khanna N, Viswanathan V, Suri JS (2019) Ultrasound-based carotid stenosis measurement and risk stratification in diabetic cohort: a deep learning paradigm. Cardiovasc Diagn Ther 9(5):439
Article PubMed PubMed Central Google Scholar
Biswas M, Kuppili V, Araki T, Edla DR, Godia EC, Saba L, Suri HS, Omerzu T, Laird JR, Khanna NN, Nicolaides A, Suri JS (2018) Deep learning strategy for accurate carotid intima-media thickness measurement: an ultrasound study on Japanese diabetic cohort. Comput Biol Med 98:100–117
Article PubMed Google Scholar
Biswas M, Saba L, Chakrabartty S, Khanna NN, Song H, Suri HS, Sfikakis PP, Mavrogeni S, Viskovic K, Laird JR, Cuadrado-Godia E, Nicolaides A, Sharma A, Viswanathan V, Protogerou A, Kitas G, Pareek G, Miner M, Suri JS (2020) Two-stage artificial intelligence model for jointly measurement of atherosclerotic wall thickness and plaque burden in carotid ultrasound: a screening tool for cardiovascular/stroke risk assessment. Comput Biol Med 123:103847
Article CAS PubMed Google Scholar
Biswas M, Kuppili V, Edla DR, Suri HS, Saba L, Marinhoe RT, Sanches JM, Suri JS (2018) Symtosis: a liver ultrasound tissue characterization and risk stratification in optimized deep learning paradigm. Comput Methods Prog Biomed 155:165–177
Article Google Scholar
Tandel GS, Biswas M, Kakde OG, Tiwari A, Suri HS, Turk M, Laird JR, Asare CK, Ankrah AA, Khanna N et al (2019) A review on a deep learning perspective in brain cancer classification. Cancers 11(1):111
Article PubMed Central Google Scholar
Li Y and Yuan Y (2017) “Convergence analysis of two-layer neural networks with relu activation,” in Adv Neural Inf Proces Syst, pp. 597–607.
Maniruzzaman M, Kumar N, Abedin MM, Islam MS, Suri HS, El-Baz AS, Suri JS (2017) Comparative approaches for classification of diabetes mellitus data: machine learning paradigm. Comput Methods Prog Biomed 152:23–34
Article Google Scholar
Acharya UR, Faust O, Sree SV, Molinari F, Saba L, Nicolaides A, Suri JS (2011) An accurate and generalized approach to plaque characterization in 346 carotid ultrasound scans. IEEE Trans Instrum Meas 61(4):1045–1053
Article Google Scholar
Suri JS, Kathuria C, and Molinari F (2010) Atherosclerosis disease management. Springer Science & Business Media.
Google Scholar
Saba L, Jain PK, Suri HS, Ikeda N, Araki T, Singh BK, Nicolaides A, Shafique S, Gupta A, Laird JR et al (2017) Plaque tissue morphology-based stroke risk stratification using carotid ultrasound: a polling-based pca learning paradigm. J Med Syst 41(6):98
Article PubMed Google Scholar
Acharya UR, Faust O, Alvin A, Krishnamurthi G, Seabra JC, Sanches J, Suri JS et al (2013) Understanding symptomatology of atherosclerotic plaque by image-based tissue characterization. Comput Methods Prog Biomed 110(1):66–75
Article Google Scholar
Acharya UR, Mookiah MRK, Sree SV, Afonso D, Sanches J, Shafique S, Nicolaides A, Pedro LM, Fernandes JFE, Suri JS (2013) Atherosclerotic plaque tissue characterization in 2d ultrasound longitudinal carotid scans for automated classification: a paradigm for stroke risk assessment. Med Biol Eng Comput 51(5):513–523
Article PubMed Google Scholar
Molinari F, Mantovani A, Deandrea M, Limone P, Garberoglio R, Suri JS (2010) Characterization of single thyroid nodules by contrast-enhanced 3-d ultrasound. Ultrasound Med Biol 36(10):1616–1625
Article PubMed Google Scholar
Acharya UR, Swapna G, Sree SV, Molinari F, Gupta S, Bardales RH, Witkowska A, Suri JS (2014) A review on ultrasound-based thyroid cancer tissue characterization and automated classification. Technol Cancer Res Treat 13(4):289–301
Article CAS PubMed Google Scholar
Acharya UR, Molinari F, Sree SV, Swapna G, Saba L, Guerriero S, Suri JS (2015) Ovarian tissue characterization in ultrasound: a review. Technol Cancer Res Treat 14(3):251–261
Article CAS PubMed Google Scholar
Acharya UR, Sree SV, Kulshreshtha S, Molinari F, Koh JEW, Saba L, Suri JS (2014) Gynescan: an improved online paradigm for screening of ovarian cancer via tissue characterization. Technol Cancer Res Treat 13(6):529–539
Article PubMed Google Scholar
Acharya UR, Sree SV, Ribeiro R, Krishnamurthi G, Marinho RT, Sanches J, Suri JS (2012) Data mining framework for fatty liver disease classification in ultrasound: a hybrid feature extraction paradigm. Med Phys 39(7Part1):4255–4264
Article PubMed Google Scholar
Saba L, Dey N, Ashour AS, Samanta S, Nath SS, Chakraborty S, Sanches J, Kumar D, Marinho R, Suri JS (2016) Automated stratification of liver disease in ultrasound: an online accurate feature classification paradigm. Comput Methods Prog Biomed 130:118–134
Article Google Scholar
Than JC, Saba L, Noor NM, Rijal OM, Kassim RM, Yunus A, Suri HS, Porcu M, Suri JS (2017) Lung disease stratification using amalgamation of Riesz and Gabor transforms in machine learning framework. Comput Biol Med 89:197–211
Article PubMed Google Scholar
Shrivastava VK, Londhe ND, Sonawane RS, Suri JS (2016) Computer-aided diagnosis of psoriasis skin images with hos, texture and color features: a first comparative study of its kind. Comput Methods Prog Biomed 126:98–109
Article Google Scholar
Wu DH, Chen Z, North JC, Biswas M, Vo J, Suri JS (2020) Machine learning paradigm for dynamic contrast-enhanced MRI evaluation of expanding bladder. Front Biosci (Landmark Edition) 25:1746–1764
Article CAS Google Scholar
Acharya UR, Sree SV, Krishnan MMR, Krishnananda N, Ranjan S, Umesh P, Suri JS (2013) Automated classification of patients with coronary artery disease using grayscale features from left ventricle echocardiographic images. Comput Methods Prog Biomed 112(3):624–632
Article Google Scholar
Jamthikar AD, Gupta D, Mantella LE, Saba L, Laird JR, Johri AM, Suri JS (2020) Multiclass machine learning vs. conventional calculators for stroke/CVD risk assessment using carotid plaque predictors with coronary angiography scores as gold standard: a 500 participants study. Int J Cardiovasc Imaging:1–17
Jamthikar A, Gupta D, Khanna NN, Saba L, Laird JR, Suri JS (2020) Cardiovascular/stroke risk prevention: a new machine learning framework integrating carotid ultrasound image-based phenotypes and its harmonics with conventional risk factors. Indian Heart J 72(4):258–264
Article PubMed PubMed Central Google Scholar
Jamthikar A, Gupta D, Khanna NN, Saba L, Araki T, Viskovic K, Suri HS, Gupta A, Mavrogeni S, Turk M et al (2019) A low-cost machine learning-based cardiovascular/stroke risk assessment system: integration of conventional factors with image phenotypes. Cardiovasc Diagn Ther 9(5):420
Article PubMed PubMed Central Google Scholar
Khanna NN, Jamthikar AD, Gupta D, Piga M, Saba L, Carcassi C, Giannopoulos AA, Nicolaides A, Laird JR, Suri HS et al (2019) Rheumatoid arthritis: atherosclerosis imaging and cardiovascular risk assessment using machine and deep learning–based tissue characterization. Curr Atheroscler Rep 21(2):7
Article PubMed Google Scholar
Banchhor SK, Londhe ND, Araki T, Saba L, Radeva P, Laird JR, Suri JS (2017) Wall-based measurement features provides an improved IVUS coronary artery risk assessment when fused with plaque texture-based features during machine learning paradigm. Comput Biol Med 91:198–212
Article PubMed Google Scholar
Suri JS, “Imaging based symptomatic classification and cardiovascular stroke risk score estimation,” Oct. 20 2011, uS Patent App. 13/053,971.
Cuadrado-Godia E, Dwivedi P, Sharma S, Santiago AO, Gonzalez JR, Balcells M, Laird J, Turk M, Suri HS, Nicolaides A et al (2018) Cerebral small vessel disease: a review focusing on pathophysiology, biomarkers, and machine learning strategies. Journal of Stroke 20(3):302
Article PubMed PubMed Central Google Scholar
Martis RJ, Acharya UR, Prasad H, Chua CK, Lim CM, Suri JS (2013) Application of higher order statistics for atrial arrhythmia classification. Biomed Signal Proces Control 8(6):888–900
Article Google Scholar
Maniruzzaman M, Rahman MJ, Ahammed B, Abedin MM, Suri HS, Biswas M, El-Baz A, Bangeas P, Tsoulfas G, Suri JS (2019) Statistical characterization and classification of colon microarray gene expression data using multiple machine learning paradigms. Comput Methods Prog Biomed 176:173–193
Article Google Scholar
Kuppili V, Biswas M, Sreekumar A, Suri HS, Saba L, Edla DR, Marinhoe RT, Sanches JM, Suri JS (2017) Extreme learning machine framework for risk stratification of fatty liver disease using ultrasound tissue characterization. J Med Syst 41(10):152
Article PubMed Google Scholar
Glas AS, Lijmer JG, Prins MH, Bonsel GJ, Bossuyt PM (2003) The diagnostic odds ratio: a single indicator of test performance. J Clin Epidemiol 56(11):1129–1135
Article PubMed Google Scholar
Kadam P, Bhalerao S (2010) Sample size calculation. Int J Ayurveda Res 1(1):55
Article PubMed PubMed Central Google Scholar
Araki T, Ikeda N, Shukla D, Jain PK, Londhe ND, Shrivastava VK, Banchhor SK, Saba L, Nicolaides A, Shafique S, Laird JR, Suri JS (2016) PCA-based polling strategy in machine learning framework for coronary artery disease risk assessment in intravascular ultrasound: a link between carotid and coronary grayscale plaque morphology. Comput Methods Prog Biomed 128:137–158
Article Google Scholar
Suk H-I, Lee S-W, Shen D, Initiative ADN et al (2017) Deep ensemble learning of sparse regression models for brain disease diagnosis. Med Image Anal 37:101–113
Article PubMed PubMed Central Google Scholar
Zhang Y, Zhang H, Chen X, Liu M, Zhu X, Lee S-W, Shen D (2019) Strength and similarity guided group-level brain functional network construction for mci diagnosis. Pattern Recogn 88:421–430
Article Google Scholar
Abrol A, Bhattarai M, Fedorov A, Du Y, Plis S, Calhoun V, Initiative ADN et al. (2020) “Deep residual learning for neuroimaging: an application to predict progression to Alzheimer’s disease,” J Neurosci Methods, p. 108701.
Richhariya B, Tanveer M, Rashid A, Initiative ADN et al (2020) Diagnosis of Alzheimer’s disease using universum support vector machine based recursive feature elimination (USVM-RFE). Biomed Signal Proces Control 59:101903
Article Google Scholar
Liu J, Pan Y, Wu F.-X, and Wang J (2020) “Enhancing the feature representation of multi-modal MRI data by combining multi-view information for mci classification,” Neurocomputing
Tandel GS, Balestrieri A, Jujaray T, Khanna NN, Saba L, and Suri JS (2020) “Multiclass magnetic resonance imaging brain tumor classification using artificial intelligence paradigm,” Computers in Biology and Medicine, p. 103804.
El-Baz A, Gimel’farb G, Suri JS (2015) Stochastic modeling for medical image analysis. CRC Press
Suri JS, Wilson DL, and Laxminarayan S ( 2005) Handbook of Biomedical Image Analysis: Segmentation models part B. Kluwer Academic/Plenum Publishers
Suri JS, Wilson D, Laxminarayan S (2005) Handbook of biomedical image analysis. Springer Science & Business Media 2
Narayanan R, Kurhanewicz J, K. Shinohara, E. D. Crawford, A. Simoneau, and J. S. Suri (2009) “MRI-ultrasound registration for targeted prostate biopsy,” in 2009 IEEE International Symposium on Biomedical Imaging: From Nano to Mael2019neurologicalcro. IEEE, pp. 991–994.
El-Baz A and Suri JS (2019) “Neurological disorders and imaging physics, volume 2; engineering and clinical perspectives of multiple sclerosis,” ndi2
Acharya R, Ng YE, and J. S. Suri (2008) Image modeling of the human eye. Artech House.
Molinari F, Liboni W, Giustetto P, Badalamenti S, Suri JS (2009) Automatic computer-based tracings (act) in longitudinal 2-d ultrasound images using different scanners. J Mech Med Biol 9(04):481–505
Article Google Scholar

Download references

Author information

Authors and Affiliations

CSE Department, Bennett University, Greater Noida, UP, India
Mohit Agarwal & Suneet K. Gupta
Department of Radiology, Azienda Ospedaliero Universitaria (A.O.U.), Cagliari, Italy
Luca Saba
Department of Medicine, Division of Cardiology, Queen’s University, Ontario, Kingston, Canada
Amer M. Johri
Department of Cardiology, Indraprastha APOLLO Hospitals, New Delhi, India
Narendra N. Khanna
Cardiology Clinic, Onassis Cardiac Surgery Center, Athens, Greece
Sophie Mavrogeni
Heart and Vascular Institute, Adventist Health St. Helena, St. Helena, CA, USA
John R. Laird
Minimally Invasive Urology Institute, Brown University, Providence, RI, USA
Gyan Pareek
Men’s Health Center, Miriam Hospital Providence, Providence, RI, USA
Martin Miner
Rheumatology Unit, National Kapodistrian University of Athens, Athens, Greece
Petros P. Sfikakis
Department of Cardiovascular Prevention, National and Kapodistrian Univ. of Athens, Athens, Greece
Athanasios Protogerou
Division of Cardiovascular Medicine, University of Virginia, Charlottesville, VA, USA
Aditya M. Sharma
MV Hospital for Diabetes & Professor M Viswanathan Diabetes Research Centre, Chennai, India
Vijay Viswanathan
R & D Academic Affairs, Dudley Group NHS Foundation Trust, Dudley, UK
George D. Kitas
Vascular Screening and Diagnostic Centre, University of Nicosia, Nicosia, Cyprus
Andrew Nicolaides
Stroke Monitoring and Diagnostic Division, AtheroPoint™, Roseville, CA, 95661, USA
Jasjit S. Suri

Authors

Mohit Agarwal
View author publications
You can also search for this author in PubMed Google Scholar
Luca Saba
View author publications
You can also search for this author in PubMed Google Scholar
Suneet K. Gupta
View author publications
You can also search for this author in PubMed Google Scholar
Amer M. Johri
View author publications
You can also search for this author in PubMed Google Scholar
Narendra N. Khanna
View author publications
You can also search for this author in PubMed Google Scholar
Sophie Mavrogeni
View author publications
You can also search for this author in PubMed Google Scholar
John R. Laird
View author publications
You can also search for this author in PubMed Google Scholar
Gyan Pareek
View author publications
You can also search for this author in PubMed Google Scholar
Martin Miner
View author publications
You can also search for this author in PubMed Google Scholar
Petros P. Sfikakis
View author publications
You can also search for this author in PubMed Google Scholar
Athanasios Protogerou
View author publications
You can also search for this author in PubMed Google Scholar
Aditya M. Sharma
View author publications
You can also search for this author in PubMed Google Scholar
Vijay Viswanathan
View author publications
You can also search for this author in PubMed Google Scholar
George D. Kitas
View author publications
You can also search for this author in PubMed Google Scholar
Andrew Nicolaides
View author publications
You can also search for this author in PubMed Google Scholar
Jasjit S. Suri
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jasjit S. Suri.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

If TP, FP, TN, FN, TPR, and FPR represent true positive, false positive, true negative, false negative, true-positive rate, and false-positive rate, respectively, then the performance parameters can be computed as follows:

$$ \mathrm{Accuracy}=\frac{\mathrm{TP}+\mathrm{TN}}{\mathrm{TP}+\mathrm{TN}+\mathrm{FP}+\mathrm{FN}} $$

(8)

$$ \mathrm{Sensitivity}=\frac{\mathrm{TP}}{\mathrm{TP}+\mathrm{FN}} $$

(9)

$$ \mathrm{Specificity}=\frac{\mathrm{TN}}{\mathrm{TN}+\mathrm{FP}} $$

(10)

$$ \mathrm{TPR}=\frac{\mathrm{TP}}{\mathrm{TP}+\mathrm{FN}} $$

(11)

$$ \mathrm{FPR}=\frac{\mathrm{FP}}{\mathrm{FP}+\mathrm{TN}} $$

(12)

$$ \hbox{\pounds}\left(\Theta \right)=-\left[{y}_i\times \log {p}_i+\left(1-{y}_i\right)\times \log \left(1-{p}_i\right)\right] $$

(13)

where y_i is the class label for input and p_i is the predicted probability of class being y_i.

$$ \sigma =\sqrt{\frac{\sum \limits_{i=1}^N{\left({x}_i-\mu \right)}^2}{N}}\kern0.5em $$

(14)

where σ is the standard deviation and x_i is the accuracy at ith combination of K-fold cross-validation, μ is the mean accuracy, and N is the number of combinations, equal to 10 for K10.

Scientific validation of DCNN systems

The DCNN9*-Augm4* system was validated using well-accepted and well-published facial biometric data. The facial dataset consisted of 72 subjects, and each subject represented a class. Each subject had 20 different face images totaling to 1440 images in the dataset. Using K10 protocol, the accuracy obtained with cDCNN, tDCNN, and iDCNN was 96.72 ± 2.01%, 97.18 ± 1.23%, and 98.27 ± 1.55%, respectively. These numbers are comparable with the accuracy obtained in WD. The order of performance was iDCNN > tDCNN > cDCNN. The results demonstrated the proposed DCNN methods as they promised encouraging high accuracy on an already published dataset. Table 11 shows the 10 K10 combinations of cDCNN, tDCNN, and iDCNN for the facial biometric dataset in sorted form.

Table 11 K10 performance of three DCNN models on the facial biometric dataset

Full size table

Table 12 Symbol table

Full size table

Rights and permissions

Reprints and permissions

About this article

Cite this article

Agarwal, M., Saba, L., Gupta, S.K. et al. Wilson disease tissue classification and characterization using seven artificial intelligence models embedded with 3D optimization paradigm on a weak training brain magnetic resonance imaging datasets: a supercomputer application. Med Biol Eng Comput 59, 511–533 (2021). https://doi.org/10.1007/s11517-021-02322-0

Download citation

Received: 21 August 2020
Accepted: 18 January 2021
Published: 05 February 2021
Issue Date: March 2021
DOI: https://doi.org/10.1007/s11517-021-02322-0

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Wilson disease tissue classification and characterization using seven artificial intelligence models embedded with 3D optimization paradigm on a weak training brain magnetic resonance imaging datasets: a supercomputer application

Abstract

Similar content being viewed by others

Contrastive Learning with Dynamic Weighting and Jigsaw Augmentation for Brain Tumor Classification in MRI

Exploring Challenges and Opportunities for the Early Detection of Multiple Sclerosis Using Deep Learning

Wilson’s disease classification using higher-order Gabor tensors and various classifiers on a small and imbalanced brain MRI dataset

Explore related subjects

1 Introduction

2 Background literature

3 Methodology

3.1 Patient demographics, acquisition, and data augmentation

3.1.1 Data augmentation

3.1.2 Preprocessing: skull and background removal

3.2 Local architecture: deep CNN configurations

3.2.1 Transfer learning

3.2.2 Machine learning

3.3 Performance evaluation protocol

3.3.1 Wilson Disease Segregation Index of WD against control

4 Results

4.1 Three-dimensional optimization of DCNN layers during augmentation process (DCNN9*)

4.2 Effect of training on DCNN performance using “DL9A4”

4.3 Optimal sample selection for generalization of the DL system

4.4 Benchmarking of three DL systems against four ML systems

4.5 Receiver operating characteristics curves (DL vs. ML)

4.6 Wilson Disease Segregation Index

5 WD characterization

5.1 Hypothesis validation 1: mean feature map strength using DNN9*-Augm4* (DL9A4)

5.2 Camel hunch phenomenon

5.3 Hypothesis validation 2: bispectrum strength computation

6 Performance evaluation

6.1 Diagnostics odds ratio

6.2 Power analysis

6.3 Timing analysis

6.4 Reliability analysis

7 Discussion

7.1 Benchmarking

7.2 A short note on WD characterization

7.3 A short note on the role of the gold standard for the design of the Wilson disease system

7.4 Strength, weakness, and extensions

8 Conclusion

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Appendix

Appendix

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation

5.1 Hypothesis validation 1: mean feature map strength using DNN9-Augm4 (DL9A4)