Keywords

1 Introduction

Multimedia data is a combination content of different data forms: text, audio, image, animation, and video. In medical application, the multimedia data offer refers to 3D volumetric data obtained by different imaging techniques.

The sensorineural hearing loss (SNHL) is a disease featuring in gradual deafness [1]. SNHL contains thee types: (i) sensory hearing loss (SHL), (ii) neural hearing loss (NHL), and (iii) both. SHL may be due to bad function of cochlear hair cell, and NHL may be because of impairment of cochlear nerve function.

In this study, we aimed to use multimedia data obtained by magnetic resonance imaging (MRI) scanning [2] to differentiate left-sided SNHL and right-sided SNHL. The detection basis is that SNHL patients will have slight to severe structural change in specific brain regions. Traditionally, the human eye-based detection is unreliable since the human eyes cannot perceive slight atrophy. Thus, artificial intelligence is employed in this study, which is aimed to develop a computer-aided diagnosis (CAD) system.

Traditional CAD systems mainly used discrete wavelet transform (DWT) [3,4,5] to learn global image features, and then employed latest pattern recognition tools. For example, Mao, Ma and Tian [6] used DWT to analyze the potential signals of local field. Ikawa [7] employed DWT to performance auditory brainstem response (ABR) operation. Nayak, Dash and Majhi [8] employed the DWT to identify brain images. They used AdaBoost with random forests as classifiers. Lahmiri [9] utilized three multi-resolution techniques: DWT, empirical mode decomposition (EMD), and variational mode decomposition (VMD). Chen and Chen [10] used principal component analysis (PCA) and generalized eigenvalue proximal support vector machine (GEPSVM). Gorriz and Ramírez [11] proposed a directed acyclic graph support vector machine method.

Nevertheless, DWT suffers from the disadvantage translational variance [12]. That means, even a slight translation may lead to different decomposition result [13]. Besides, the DWT decomposition will lead to larger dimension space (~106) than original image (~105) for a 256 × 256 size image, and it needs dimension reduction techniques, such as principal component analysis [14].

To solve this problem, we introduced a relatively new technique: discrete wavelet packet entropy (DWPE) [15,16,17] that can yield mere a few (~101) translational invariant features. Besides, we used a single-hidden layer neural network as the classifier, which was trained by gradient descent with adaptive learning rate back propagation method.

2 Materials

Subjects were enrolled from outpatients of department of otorhinolaryngology and head-neck surgery and community. They were excluded if evidence existed of known psychiatric or neurological diseases, brain lesions, taking psychotropic medications, as well as contraindications to MR imaging.

Finally, the study collection includes 15 patients with left-sided SNHL (LSNHL), 14 patients with right-sided SNHL (RSNHL) and 20 age- and sex-matched healthy controls (HC), as shown in Table 1.

Table 1. Subject characteristics

Preprocessing was implemented on the software platform of FMRIB Software Library (FSL) v5.0. The brain extraction tool (BET) was utilized to extract brain tissues. The results are shown in Fig. 1. Then, the extracted brains of all subjects were registered to MNI space. Three experienced radiologists were instructed to select the most distinctive (around 40-th) slice between SNHLs and HCs.

Fig. 1.
figure 1

The green lines label the edge of BET result (Color figure online)

3 Methodology

3.1 Discrete Wavelet Packet Transform

In the field of signal processing, standard discrete wavelet transform (abbreviated as DWT) [18, 19] decomposes the given signal at each level, by submitting the previous approximation subband to the quadrature mirror filters (QMF) [20]. Its even-indexed downsampling causes the translational invariance problem [21].

On the other hand, discrete wavelet packet transform (DWPT) [22] is an improvement of standard DWT. DWPT passes both approximation and detail coefficients of previous decomposition level to QMF, so it can create a full binary tree [23]. In general, DWPT offers more features than DWT at the same decomposition levels [24].

Suppose x represents the original signal, c the channel index, d the decomposition level, p the position parameter, D the decomposition coefficients, and ψ the wavelet function, then DWPT is calculated as below:

$$ D_{p}^{c,d} = \int_{ - \infty }^{\infty } {x(t)\psi_{c} (2^{ - d} t - p){\text{d}}t} $$
(1)

where. 2d sequences will be yielded. Based on d-level decomposition, the decomposition results of (d + 1) level is:

$$ D_{k}^{2c,d + 1} = \sum\limits_{p \in Z} {h(p - 2k) \times D_{p}^{c,d} } ,D_{k}^{2c + 1,d + 1} = \sum\limits_{p \in Z} {l(p - 2k) \times D_{p}^{c,d} } $$
(2)

From Fig. 2, we can observe that for an image, DWT offer in total (1 + 3d) coefficient subbands. In contrast, DWPT generates in total 4d coefficients subbands. Thus, DWPT can provide much more information than DWT.

Fig. 2.
figure 2

Comparison between 2-level DWT and 2-level DWPT (x denotes for an image, H denotes the high-pass filter result, L denotes the low-pass filter result)

3.2 Shannon Entropy

Entropy was originally utilized to measure the system disorder degree [25]. It was generalized by Shannon to measure information contained in a given message [26]. Suppose m the index of grey level, h m the probability of m-th grey level, and T the total number of grey levels, we have the Shannon entropy S as:

$$ S = - \sum\nolimits_{m = 1}^{T} {h_{m} \log_{2} (h_{m} )} $$
(3)

In the case of h m equals to zero, the value of 0log2(0) is taken to 0 [27]. We calculated Shannon entropies of all subbands obtained from DWPT, and dubbed the results as discrete wavelet packet entropy (DWPE). For a brain image with size of 256 × 256, it has originally 65,536 features. A two-level DWPE can finally reduce the 65,536 features to only 24 = 16 features.

3.3 Single-Hidden Layer Neural Network

The features were then presented into a classifier. There are many classifier in various fields, such as logistic regression [28], linear regression classifier [29, 30], extreme learning machine [31], decision tree [32, 33], etc.

In this study, we chose the classifier as a single-hidden layer neural-network (SLNN) [34] due to its superior performance. We did not employ multiple hidden layers [35], because one-hidden layer model is complicated enough to express our data. In a SLNN, the input nodes are connected to the hidden neuron layer, which is then connected to the output neuron layer.

The hidden neuron number is usually assigned with a large value. Afterwards, its value is decreased gradually till the classification performance reaches the peak result. The gradient descent with adaptive learning-rate back propagation (ALBP) algorithm [36] was employed to train the weights and biases of SLNN. Initial learning rate was set to 0.01. The increasing ratio and decreasing ratio of learning rate were set to 1.05 and 0.07, respectively. The maximum epoch is set to 5000.

4 Experiments and Results

4.1 DWPT Result

The 2-level DWPT result of a left-sided SNHL image is shown in Fig. 3. Here we can see in total 4 subbands are generated for 1-level decomposition, and 16 subbands are generated for 2-level decomposition.

Fig. 3.
figure 3

DWPT of a left-sided sensorineural hearing loss image

4.2 Accuracy Performance

We repeated 5-fold cross validation [37] 10 times. The brief accuracy performance by BP algorithm is shown in left side in Table 2 with overall accuracy of 87.14%, and the accuracy performance by ALBP algorithm is shown in right side in Table 2 with overall accuracy of 95.31%. In these two tables, y/z represents y instances are successfully detected out of z instances.

Table 2. Accuracy performance by BP and ALBP (R = Run; F = Fold; T = Total)

The 10 repetition of 5-fold cross validation results indicate that this proposed ALBP performs better than classical BP algorithm. The reason lies in the adaptive learning-rate can accelerate the training procedure [38]. In standard BP, the learning rate is unchanged, and thus the performance is sensitive to initial weight [39]. We see from left side of Table 2 that the accuracy in each run of BP vary from 79.59% to 91.84%. While the ALBP makes the learning rate responsive to the local error surface, and thus it is not as sensitive as BP. We see from right side of Table 2 that the accuracy in each run of ALBP vary from 91.84 to 97.76%. Thus, ALBP is much more stable than BP.

4.3 Comparison

Finally, we compared our DWPE + SLNN + ALBP approach with following three methods: (i) The combination of fractional Fourier transform (FRFT) and principal component analysis (PCA) method [40], which shall be abbreviated as FRFT + PCA. (ii) The combination of wavelet entropy (WE) and decision tree (DT) method [41], which is abbreviated as WE + DT. (iii) The hybrid system based on wavelet entropy (WE) and Markov random field (MRF) [42], abbreviated as WE + MRF.

Table 3 shows that our method get superior overall accuracy of 95.31% to other three methods: FRFT + PCA [40], WE + DT [41], and WE + MRF [42]. The reason may be two folds: First, our method used DWPE, which combines two successful components, DWPT and Shannon entropy. Second, the wavelet packet transform is more efficient than fractional Fourier transform in image texture extraction. In the future, we shall try to use advanced classifiers, such as sparse autoencoder [43], convolutional neural network [44], and shared-weight neural network [45].

Table 3. Comparison with state-of-the-art methods

5 Conclusions

We developed a new computer-aided diagnosis system in this paper for detecting unilateral hearing loss, viz., left-sided or right-sided. The experiments gave promising results. In the future, we shall collect more data to further validate our method.