1 Introduction

Content-based Image Retrieval (CBIR) is a major task in image processing. It is commonly used in numerous applications such as security, medical, and environmental monitoring [1, 2]. CBIR can efficiently describe the visual information of the images which can be used to classify the medical and natural images and retrieval [3]. The CBIR is a powerful tool that finds the most analogous images in the database for a given query image using distance metrics [4]. It performs two essential tasks: index and searching [5]. The first involves extracting the suitable feature vectors (FV) of a query image and saving them in the database. The second task is computed and compares the FVs of the database [6]. The CBIR system represents each image in its repository as an FV [7]. The feature extraction method is used to transform the image’s key points and regions that contain the raw pixel values into fewer domains that are single values [8]. The features are the visual representations of an image. In general, the texture, color, and shape are the low-level features that represent the various perceptions of the image [9, 10]. The texture is an important feature of an image and the shape descriptor describes the shape of the specific region of an image [11]. The search is carried out to find the images that are related to the extracted features [12]. An image consists of an immense number of features that are stored in digital devices.

Currently, a huge quantity of medical images is being produced in hospitals all across the world to diagnose diseases. These medical images are stored in Digital Imaging and Communications in Medicine (DICOM) format. Among these, the image produced by medical imaging devices was vastly increased [13, 14]. Due to their importance, all images should be examined to acquire a better interpretation of the human body [15]. Managing, indexing, and retrieving such a large collection of images using manual methods is both expensive and time-consuming [16]. The CBIR system has helped physicians to identify disease early and take appropriate steps to treat it [17]. It is very important that the right images are found in the right databases to enable the proper diagnosis and treatment of disease [18]. It is challenging to extract the appropriate and needed information from a huge medical imaging dataset automatically [19]. With the increasing amount of data related to medical images, the need for robust and efficient image retrieval and search systems has become more critical [20]. This paper presents an efficient content-based medical image retrieval (CBMIR) system based on a new canny steerable texture filter (CSTF) and Brownian motion weighting deep learning neural network (BMWDLNN).

The rest of the paper is structured as follows: Sect. 2 discusses various existing methods related to the CBMIR system; Sect. 3 elaborates the process of the proposed methodology; the experimental results for the proposed system are discussed in Sect. 4; the paper is concluded in Sect. 5.

2 Literature survey

2.1 Image retrieval

Swati et al. [21] presented a CBMIR system for brain tumors using T1-weighted contrast-enhanced magnetic resonance images (CE-MRI). The system was developed using pre-trained VGG19 on a large ImageNet dataset (more than 1.2 million labeled images). The Closed-Form Metric Learning (CFML) distance measurement technique was carried out to determine the similarity between the extracted features of a database and the test/query images. This method performed better than the state-of-the-art methods on the CE-MRI dataset. The distance learning task results in an optimization problem, which makes it difficult to provide a closed-form solution.

Sundararajan et al. [22] proposed to retrieve Avascular Necrosis images using Deep Belief-Convolutional neural network (DB-CNN) for feature description. Initially, Median Filter (MF) was used to eradicate the image noise and then resized. For the retrieval task, the modified Hamming distance (MHD) was evaluated to determine the similarity between the database and query images. The test results showed that the work was superior to existing techniques, since this method is limited to small datasets.

Cai et al. [23] presented a CBMIR framework based on CNN and hash coding. The Siamese network (SN) was considered with image pairs as inputs. Then, the compact binary hash codes of the query and database images were computed. Those hash codes were compared for the retrieval task. Two experimentations are conducted on the cancer imaging archive-computed tomography (TCIA-CT), and the vision and image analysis group/international early lung cancer action program (VIA/I-ELCAP). The method outperformed conventional hash algorithms and CNN methods, according to the results. Because of the slower learning process, the Siamese network requires more training time.

Shinde et al. [24] proposed a series of local neighborhood wavelet feature descriptors (LNWFD) for CBMIR. The main components of the system are wavelet decomposition, feature extraction, and similarity measurement. A triplet half-band filter bank (THFB) was used to obtain the ‘4’ sub-bands of the wavelet decomposition. The relationships among the wavelet coefficients were then computed at each sub-band to form LNWFD. The Manhattan distance was calculated to determine the similarity between the query and the database feature vectors. The retrieval tests were performed over OASIS-MRI and NEMA-CT for top ten matches. The average retrieval precision (ARP) of these databases was 74.57% and 99.51%, respectively. Similarly, this method was tested on the Emphysema-CT database for top 50 matches and achieved 55.51% ARP. The computation errors in THFB affect the regularity of the wavelets.

Owais et al. [25] employed a classification-based system that uses enhanced residual network (ResNet) as artificial intelligence was developed for the retrieval of multimodal medical images. The resulting feature vector was then extracted from the last convolutional layer and returned as a deep FV. The Euclidean distance technique was employed to compute the distance then compared one by one with the generated FVs. The test phase demonstrated that the deep-feature-based variable node classification framework could retrieve classes with better accuracy than previous methods. The increased number of layers in the network reduces the efficacy of the system.

Karthik et al. [26] proposed an approach to classify medical images using CNN, the results of which were used for supporting content-based medical image retrieval. For experimental evaluation, Image CLEF 2009 dataset was considered and a classification task was performed based on body orientation.

2.2 Image denoising

Image denoising is an important step in medical image analysis. The obtained medical images may be corrupted or artifacts during the process of acquisition and this leads to wrong analysis. Gai [27] presented a color image denoising technique via monogenic matrix-based sparse representation. The proposed technique considers the color image as a monogenic matrix, which can transform independent color channels into the whole. Then, dictionary learning method was designed using a monogenic matrix. In the sparse coding stage, monogenic-based orthogonal matching was considered. Jia et al. [28] presented a novel cascading U-Nets architecture with multiscale dense processing in the applications of image denoising. The proposed technique was good at edge recovery and structure preservation in real noisy image denoising. Jia et al. [29] proposed a color image denoising technique based on Pixel-Attention CNN with color correlation loss. The pixel-attention mechanism could generate pixel-wise attention maps which help remove random noise. The color correlation loss exploits color correlation to further improve denoising performance on color noisy images. The experimental results on several standard datasets demonstrate the state-of-the-art (SOTA) performance and the superiority of the proposed method.

2.3 Feature dimension reduction

Large numbers of input features to a predictive modeling task might make it more difficult to model. The difficulty could be reduced by obtaining the lesser number of input features of the original data to a model. Zhang et al. [30] developed a novel algorithm named dimension reduction window principal component analysis (DRWPCA). It was realized the dimension reduction by analyzing the correlation between the dimensions, and therefore, the physical meaning of the original data set is retained. It utilizes mathematical statistics to obtain the correlation coefficient or the degree of correlation between attributes. By statistical analysis of the degree of correlation between attributes, the feature with high correlation is removed so as to achieve the goal of reducing the dimension. The original data need not map on the space of other dimensions for processing. Sanchez et al. [31] proposed a new feature relevance measure for star coordinates plots associated with the class of linear dimensionality reduction mappings defined through the solutions of eigenvalue problems, such as linear discriminant analysis or principal component analysis. This approach leads to enhanced feature subsets for class separation or variance maximization in the plots for numerous data sets of the UCI repository.

3 Main contribution of the work

In the domains of imaging research, clinical surgery, and pattern recognition, medical image retrieval has become the most important. The retrieval and classification performances depend on images features such as texture, shape, color, visual, local features, etc. Many algorithms have been developed to improve the retrieval performance of medical images. The medical images consist of texture-like regions, the existing traditional texture features are used to represent the images in various medical image retrieval systems. The local feature descriptors presented through the published literature have utilized the relationship of a reference pixel with its neighboring pixel or among the surrounding neighbors but at the expense of high dimensionality [38]. To overcome the issues in literature, a CSTF feature descriptor is proposed for efficient medical images retrieval and classification.

The key features of the proposed method are (1) noise reduction using Modified Kuan Filter (MKF) which reduces the despeckling noise in medical images, (2) the proposed method uses a novel feature descriptor since the existing feature descriptors are sensitive to the image noise and their semantic representation. For this reason, the proposed feature descriptor uses the steerable texture filter. (3) The Mean Coefficient Correlation Component Analysis (MCCCA) dimensionality reduction technique is proposed to reduce the complexity and computational cost in retrieval and classification tasks. The efficiency of the proposed approach is validated through experimentation over four medical image databases.

4 Proposed medical image retrieval system

Content-based image retrieval (CBIR) refers to the process of retrieving similar images from a database for a given query image. Due to the increasing number of medical images, it has been challenging to improve the image retrieval process in the medical field. The existing methods developed for medical image retrieval have several drawbacks, such as medical image corruption, performance degradation, and larger retrieval time. To overcome these issues, this paper proposed an efficient CBMIR system based on a new CSTF and BMWDLNN. The proposed work contains two phases, namely (1) training and (2) testing. In the training phase, the database images are applied to the following processes, such as noise reduction, contrast enhancement, feature extraction, dimensionality reduction, classification, and score value calculation. In the testing phase, the same processes are carried out for the query image. Primarily, the noise present in the image is reduced using the Modified Kuan Filter (MKF). Next, the contrast of the image is enhanced using the Gaussian Linear Contrast Stretching Model (GLCSM). Thereafter, the important features are extracted from the contrast-enhanced image by means of Canny steerable texture filter (CSTF), and the dimensions of the extracted features are reduced with the help of Mean Coefficient Correlation Component Analysis (MCCCA). Then, the dimensionality reduced features are given to the Brownian motion weighting deep learning neural network (BMWDLNN) classifier. For the classified images, the score value is calculated utilizing the Harmonic Mean-based Fisher Score (HMFS). After calculating the score value, different distance values are measured and the average of all the distance values is determined. Using the average value, a similar image with a minimum average value is retrieved. The proposed CBMIR system is shown in Fig. 1.

Fig. 1
figure 1

Proposed CBMIR system

4.1 Noise reduction

Initially, the input medical image \(I\) is considered from the dataset. Then, the noise is reduced from the image using the Modified Kuan Filter (MKF) to attain a better-retrieved image. Kuan filter is the well-known image despeckling filter used in the image processing system. Several iterations of the Kuan filter can greatly reduce the noise. However, small details may be lost due to the repetitive smoothing operation. Therefore, in the Kuan filter, the modification is done in two ways, such as the exponential weight factor, and the geometric mean calculation process. The exponential weight is the scale factor so the repetitive smoothing operation is easily avoided. In addition, the Kuan filter considers the local mean of the image window. The local mean value usually will not cover all the directions of the pixels in the image, which leads to repetitive iteration. To avoid his problem, here, the geometric mean calculation is done, which cover all the pixel value during the calculation. The weighting function in the Kuan filter is defined as

$$ \varpi = \exp \left( {\frac{{1 - \frac{{P_{{\text{c}}}^{{2}} }}{{P_{{\text{I}}}^{2} }}}}{{1 + P_{{\text{c}}}^{{2}} }}} \right), $$
(1)

where \(P_{{\text{c}}}\) and \( P_{{\text{I}}}\) are the variation coefficients of speckle \({\text{c}}\) and input image \({\text{I}}\). The final despeckled image \(I_{{\text{d}}}\) is obtained as

$$ I_{{\text{d}}} = m_{{{\text{fw}}}} + \varpi \left( {g_{{{\text{fw}}}} - m_{{{\text{fw}}}} } \right), $$
(2)

where \(m_{{{\text{fw}}}} \) is the geometric mean of pixel values in the filter window \({\text{fw}}\), and \(g_{{{\text{fw}}}}\) is the center pixel in \({\text{fw}}\).

4.2 Contrast enhancement

The contrast of the despeckled image \(I_{{\text{d}}}\) was enhanced using the Gaussian Linear Contrast Stretching Model (GLCSM). Gaussian model is the better way to enhance the contrast of the image than the other algorithms. The existing Gaussian model utilizes the histogram equalization method to make better contrast of the image, which was indiscriminate in the contrast enhancement process and results in increasing the contrast of the background noise. To offset the existing issue, in the proposed methodology, the linear contrast stretch is used in the Gaussian model. It mainly uses point operations to correct pixel gray values, linearly stretches the gray value of the image, and enhances the image’s gray areas of interest while suppressing the indifferent gray areas. The Gaussian model includes three steps: modeling, partitioning, and mapping. In modeling, each pixel in the input image \(I_{{\text{d}}}\) is designed into Gaussian distribution. The gray level distribution function \( G(I_{{\text{d}}} |p)\) is expressed as

$$ G\left( {I_{{\text{d}}} |p} \right) = \mathop \sum \limits_{i = 1}^{N} c_{i} {\text{PDF}}\left( {I_{{\text{d}}} \left( {a,b} \right)} \right), $$
(3)
$$ {\text{PDF}}\left( {I_{{\text{d}}} \left( {a,b} \right)} \right) = \sqrt {\frac{1}{{2\pi v_{{{\text{c}}_{i} }}^{2} }}} \exp \left( { - \frac{{\left( {I_{{\text{d}}} - M_{{{\text{c}}_{i} }} } \right)^{2} }}{{2v_{{{\text{c}}_{i} }}^{2} }}} \right), $$
(4)

where \({\text{PDF}}\left( \bullet \right) \) denotes the probability density function, \(c_{i} \) is the weights associated with the \(i{\text{th}} \) Gaussian distribution, and \(M_{{{\text{C}}_{i} }} ,v_{{{\text{c}}_{i} }}\) are the mean and variance of the \(i{\text{th}}\) component. \(a,b\) are the pixel values of the image, \(N \) is the number of mixture components, and \(p \) is a parameter estimated using an expectation–maximization algorithm. The probability of components is chosen to satisfy the following constraints:

$$ \mathop \sum \limits_{i = 1}^{N} D\left( {c_{i} } \right) = 1\;\left( {0 \le D\left( {c_{i} } \right) \le 1} \right). $$
(5)

Then, the number of training set \(I_{{\text{d}}}\) is drawn independently to estimate parameters \(M_{{{\text{C}}_{i} }} ,v_{{{\text{c}}_{i} }}\) with the mixture of components \(c_{i}\). The parameters \( M_{{{\text{C}}_{i} }} ,v_{{{\text{c}}_{i} }} ,c_{i}\) are estimated by maximizing the log-likelihood function \(L\left( p \right)\) of the expectation–maximization method as

$$ L\left( p \right) = \mathop \sum \limits_{i = 1}^{N} \log \left( {G(I_{d} ,c_{i} |p)} \right). $$
(6)

The estimated parameters are denoted as

$$ p = \left\{ {M_{{{\text{c}}_{i} }} ,v_{{{\text{c}}_{i} }} ,c_{i} } \right\}. $$
(7)

Then, the expectation and maximization steps are involved to estimate the membership probabilities of the parameters and to update the new values of the parameters. After that, the image partitioning is done to represent the image in a way that is easier to analyze. For partitioning, all the intersection points within the dynamic range of the image are detected. Thereafter, the quality of the image is highlighted using the Linear Contrast Stretching (LCS) model, which provides the output gray level needed for further processing. The output gray levels are obtained as

$$ I_{{{\text{gout}}}} \left( {a,b} \right) = 255*\frac{{\left( {I_{{{\text{gin}}}} \left( {a,b} \right)} \right) - {\text{min}}}}{{{\text{max}} - {\text{min}}}}, $$
(8)

where the input gray level intervals \(I_{{{\text{gin}}}} \left( {a,b} \right)\) are converted to the output gray level intervals \(I_{{{\text{gout}}}} \left( {a,b} \right)\), \(255\) is the dynamic range of the image, and \({\text{min}}\) and \({\text{max}}\) are the minimum and the maximum intensity values of the image. After the output intervals are mapped to the corresponding input intervals, the contrast-enhanced image \(I_{{{\text{ce}}}} \left( {a,b} \right)\) is obtained.

4.3 Feature extraction

After contrast enhancement, the features of the image \(I_{{{\text{ce}}}} \) are extracted using the Canny steerable texture filter (CSTF) feature descriptor. The proposed method uses this novel feature descriptor since the existing feature descriptor is sensitive to the image noise and their semantic representation also depends on the shapes of the objects in the image. For this reason, the proposed feature descriptor uses the steerable texture filter. The proposed feature descriptor extracts the texture, edge, shape, wavelet features, etc.

The CSTF feature descriptor includes five steps: noise reduction, gradients calculation, non-maximum suppression, double thresholding, and edge tracking by hysteresis. In the first step, the noise present in the image is reduced using the steerable texture filter. This step is to avoid the issue of assuming the noise as edges and also to extract the texture features in addition to the edges. The image \(I_{{{\text{ce}}}} \left( {a,b} \right)\) is applied to the steerable filter and the smoothened image \(I_{{{\text{SF}}}} \left( {a,b} \right)\) is obtained as

$$ I_{{{\text{SF}}}} \left( {a,b} \right) = {\text{SF}}\left( {a,b} \right)*I_{{{\text{ce}}}} \left( {a,b} \right), $$
(9)

where \({\text{SF}}\left( {a,b} \right)\) is the steerable texture filter response, which can be expressed as

$$ {\text{SF}}\left( {a,b} \right) = \mathop \sum \limits_{z = 1}^{Z} U_{z} \left( \gamma \right)\delta_{z} \left( {a,b} \right), $$
(10)

where \(U_{z} \left( \gamma \right)\) is the interpolation function with respect to the orientation function \(\gamma\), and \(\delta_{z} \left( {a,b} \right)\) is the impulse response at \(\gamma\). In gradient calculation, the magnitude and angle are calculated for the horizontal and vertical gradients as follows:

$$ \left| \hbar \right| = \left( {\tau_{{{\text{hor}}}} \left( {I_{{{\text{SF}}}} \left( {a,b} \right)} \right)^{2} + \tau_{{{\text{ver}}}} \left( {I_{{{\text{SF}}}} \left( {a,b} \right)} \right)^{2} } \right)^{\frac{1}{2}} , $$
(11)
$$ \gamma = {\text{agr}}\tan \left( {\frac{{\tau_{{{\text{hor}}}} }}{{\tau_{{{\text{ver}}}} }}} \right), $$
(12)

where \(\left| \hbar \right| \) denotes the magnitude of the horizontal and vertical gradients \(\tau_{{{\text{hor}}}} ,\tau_{{{\text{ver}}}}\). In the non-maxima suppression step, two neighboring pixels \(a_{n} ,b_{n} \) are selected in the positive and negative directions. Then, the duplicate merging pixels are reduced by

$$ I_{{{\text{dup}}}} \left( {a,b} \right) = \left\{ {\begin{array}{*{20}l} {{\text{no}}\;{\text{changes}}} \hfill & {{\text{if}}\left( {\hbar \left( {a,b} \right) > \hbar \left( {a_{n} ,b_{n} } \right)} \right)} \hfill \\ {{\text{set}}\left( {a,b} \right) = 0} \hfill & {{\text{otherwise}}} \hfill \\ \end{array} .} \right. $$
(13)

Then, in double thresholding, the magnitudes are compared with the lower and higher threshold values to suppress the smaller gradients and obtain the stronger gradients and weaker gradients. The double thresholding \(I_{{{\text{DT}}}} \left( {a,b} \right)\) is done as

$$ I_{{{\text{DT}}}} \left( {a,b} \right) = \left\{ {\begin{array}{*{20}l} {\tau_{{{\text{smaller}}}} } \hfill & {{\text{if}}(\hbar < \wp_{{{\text{low}}}} )} \hfill \\ {\tau_{{{\text{stronger}}}} } \hfill & {{\text{if}}(\hbar > \wp_{{{\text{high}}}} )} \hfill \\ {\tau_{{{\text{weaker}}}} } \hfill & {} \hfill \\ \end{array} } \right., $$
(14)

where smaller gradients are denoted as \(\tau_{{{\text{smaller}}}}\), stronger gradients are denoted as \(\tau_{{{\text{stronger}}}}\), weaker ones are denoted as \(\tau_{{{\text{weaker}}}}\), and the higher and lower thresholds are denoted as, \(\wp_{{{\text{high}}}} ,\;\wp_{{{\text{low}}}}\). Finally, the different features are identified and expressed as

$$ \psi_{{\text{s}}} = \left\{ {\psi_{1} ,\psi_{2} ,\psi_{3} , \ldots ,\psi_{f} } \right\}, $$
(15)

where\(f\) denotes the number of features, and \(\psi_{s} \) is the final feature set. The procedure of proposed CSTF method is given in algorithm 1.

figure c

The algorithm 1 explains the steps involved in extracting the features of the images. The features set \(\psi_{s} \) extracted by the CSTF method are given for further processing of the dimensionality reduction.

4.4 Dimensionality reduction

The dimensionality of a dataset refers to the number of input variables or features. The process of reducing the number of input variables in a dataset is referred as dimensionality reduction. The CSTF descriptor extracts features such as texture, edge, shape, and wavelet features. The extracted features have high dimensionality, making retrieval and classification tasks more difficult, as well as a high computational cost. Then, the dimensionality of the features \(\psi_{s} \) is reduced using the Mean Coefficient Correlation Component Analysis (MCCCA) method. The algorithm of MCCCA is based on the basic idea of correlation coefficient. The normal Principal Component Analysis (PCA) algorithm calculates the covariance matrix, but the covariance can only measure the directional relationship between two pixels and it cannot show the strength of the pixels. For this reason, the proposed methodology uses the mean coefficient correlation, which measures the strength in a better way. In MCCCA method, the mean of each feature dimension \(\overline{{\psi_{{\text{s}}} }}\) is calculated as

$$ \overline{\psi }_{s} = \frac{1}{S}\mathop \sum \limits_{s = 1}^{S} \psi_{s} , $$
(16)

where \(S\) is the number of input features. Then, the mean correlation coefficient \({\text{ Cc}}_{\psi }\) is calculated as

$$ {\text{Cc}}_{\psi } = \sum \frac{{\varsigma \left( {\psi_{s} ,\overline{\psi }_{s} } \right)}}{{D\left( {\psi_{s} } \right)D\left( {\overline{\psi }_{s} } \right)}}, $$
(17)

where \(\varsigma\) denotes the covariance of the input vectors, and \(D \) is the standard deviation of \(\psi_{s} ,\overline{\psi }_{s}\). For the correlation coefficient \({\text{Cc}}_{\psi }\), the eigenvalues are calculated as

$$ E_{m} \times \ell_{m} = Cc_{\psi } \times \ell_{m} , $$
(18)

where \(\ell_{m}\) represents the eigenvectors, and \(E_{m} \) denotes the eigenvalues. Then, the eigenvalues are sorted in descending order and the features are selected based on the \(m \) largest eigenvalues. After dimensionality reduction, the selected feature set is denoted as \(\psi_{r}\).

4.5 Classification of image category

Next, the Brownian motion weighting deep learning neural network (BMWDLNN) classifier is applied on reduced features \(\psi_{r}\). In general, deep learning neural networks, the weight value is selected randomly. The random weight value selection increases the execution time and the over-classification problem. To avoid this problem, the proposed methodology uses the Brownian motion weighting factor. Initially, the features \(\psi_{r}\) are given to the neurons of the input layer. Once the inputs are received, the weight values for corresponding input vectors are randomly generated as follows:

$$ \lambda_{r} = \left\{ {\lambda_{1} ,\lambda_{2} ,\lambda_{3} , \ldots \lambda_{N} } \right\}, $$
(19)

where \(\lambda_{r} \) is the randomly generated weight value. To avoid the existing problems of larger execution time and over-classification, the weight values are initialized using the BMW method as

$$ B_{M} \left( {\mathop \lambda \limits^{ \Rightarrow }_{r} } \right) = \chi (\varphi \left( {\lambda_{r} } \right))^{\varepsilon } , $$
(20)

where \(B_{M}\) is the Brownian motion function, \(\chi\) is a constant, \(\varepsilon\) is known as diffusion parameter, \(\varphi\) is the number of sudden motions, and \(\mathop \lambda \limits^{ \Rightarrow }_{r}\) is the new set of weight values initialized by BMW method. Then, the input features \(\psi_{r} \) and initialized weight values \(\mathop \lambda \limits^{ \Rightarrow }_{r}\) are mapped to the hidden layer where the product of these two values is summed up. After the values are inputted, the activation function is determined as

$$ \eta_{r} \left( {\mathop \sum \limits_{r = 1}^{N} \psi_{r} \mathop \lambda \limits^{ \Rightarrow }_{r} } \right) = \exp \left( { - \mathop \sum \limits_{r = 1}^{N} \psi_{r} \mathop \lambda \limits^{ \Rightarrow }_{r} } \right). $$
(21)

The output of the hidden layer \(\xi_{{{\text{hid}}}}^{r}\) is computed as

$$ \xi_{{{\text{hid}}}}^{r} = \delta + \sum \eta_{r} \left( {\mathop \sum \limits_{r = 1}^{N} \psi_{r} \mathop \lambda \limits^{ \Rightarrow }_{r} } \right)\mathop \lambda \limits^{ \Rightarrow }_{{r,{\text{hid}}}} , $$
(22)

where \(\delta \) is the bias value,\( \eta_{r} \) is the Gaussian activation function, and \(\mathop \lambda \limits^{ \Rightarrow }_{{r,{\text{hid}}}} \) is the weight values between the input and hidden layer. Finally, all the weight values are added at the output layer and the output values are attained as

$$ \xi_{{{\text{out}}}}^{r} = \delta + \mathop \sum \limits_{r = 1}^{N} \xi_{{{\text{hid}}}}^{r} \mathop \lambda \limits^{ \Rightarrow }_{{r,{\text{out}}}} , $$
(23)

where \(\mathop \lambda \limits^{ \Rightarrow }_{{r,{\text{out}}}} \) is the weight values between the hidden and output layers,\( \xi_{{{\text{out}}}}^{r}\) implies the output unit of the classifier which contains the category of the input image. After classification, different categorized image set \(W_{n}\) under different classes is obtained.

4.6 Score value calculation

Here, the score value is calculated for the categorized image set \({W}_{n}\) using the Harmonic Mean-based Fisher Score (HMFS). The general Fisher score calculation considers the mean vector calculation. The proposed methodology uses the harmonic mean value calculation in the Fisher score. The proposed algorithm cannot ignore any item of a series and it is rigidly defined. The Fisher score of the image set \(W_{{{\text{fs}}}} \) is computed as

$$ W_{{{\text{fs}}}} = \frac{{\mathop \sum \nolimits_{l = 1}^{L} {\rm M}_{l} H_{l}^{2} \left( {W_{n} } \right)}}{{\mathop \sum \nolimits_{l = 1}^{L} {\rm M}_{l} J_{l}^{2} \left( {W_{n} } \right)}}, $$
(24)

where \(H_{l}\) is the harmonic mean and \(J_{l}\) is the standard deviation (SD) of \(n{\text{th }}\) image category in the \(l{\text{th}} \) classes, and \(M_{l}\) is the number of instances in the \(l{\text{th}}\) classes. The harmonic mean \(H_{l}\) is expressed as

$$ H_{l} \left( {W_{n} } \right) = \left( {\frac{{\sum \left( {\frac{1}{{W_{n} }}} \right)}}{n}} \right)^{ - 1} , $$
(25)

where \(n\) is the number of categories.

All the aforementioned procedures, such as noise reduction, contrast enhancement, feature extraction, dimensionality reduction, classification, and score value calculation are also done for the query image in the testing phase. After that, the Manhattan distance, Euclidean distance, Jaccard distance, Hamming distance, and the relative standard deviation are calculated between the score value of the input database image and query image.

4.7 Image retrieval

In this section, the nearest neighbor of the query image is identified and retrieved based on the average of various distance values. The average value determines the similarity between the classified input and query image. The Fisher scores calculated for the input image and query image are denoted as \(W_{{{\text{fs}}}}\) and \( Q_{{{\text{fs}}}}\).Then, the distance values for the Fisher scores of the input image and query image are initialized as follows:

$$ Y_{n} = \left\{ {Y_{M} ,Y_{E} ,Y_{J} ,Y_{H} ,Y_{{{\text{rsd}}}} } \right\}, $$
(26)

where \(Y_{n}\) is the total distance value, \(Y_{{\text{M}}}\) is the Manhattan distance, \(Y_{{\text{E}}}\) is the Euclidian distance, \(Y_{{\text{J}}}\) is the Jaccard distance, \(Y_{{\text{H}}} \) is the hamming distance, and \(Y_{{{\text{rsd}}}}\) is the relative standard deviation,

where

$$ Y_{{\text{M}}} = \mathop \sum \limits_{j = 1}^{R} \left| {W_{{{\text{f}}s\left( j \right)}} - Q_{{{\text{fs}}\left( j \right)}} } \right|, $$
(27)
$$ Y_{{\text{E}}} = \sqrt {\mathop \sum \limits_{j = 1}^{R} W_{{{\text{fs}}\left( j \right)}} - Q_{{{\text{fs}}\left( j \right)}} } , $$
(28)
$$ Y_{{\text{J}}} = 1 - \alpha \left( {W_{{{\text{fs}}}} ,Q_{{{\text{fs}}}} } \right), $$
(29)
$$ Y_{{\text{H}}} = \beta_{n} \left( {W_{{{\text{fs}}}} \oplus Q_{{{\text{fs}}}} } \right), $$
(30)
$$ Y_{{{\text{rsd}}}} = \frac{{\sigma \left( {W_{{{\text{fs}}}} ,Q_{{{\text{fs}}}} } \right)}}{{\mu \left( {W_{{{\text{fs}}}} ,Q_{{{\text{fs}}}} } \right)}}, $$
(31)

wherein \(R \) denotes the number of dimensions, \(\alpha\) is the Jaccard coefficient, \(\beta_{n}\) is the number of ones after the XOR operation \(\oplus\) of \(W_{{{\text{fs}}}}\) and \(Q_{{{\text{fs}}}}\), \(\sigma\) is the SD, and \(\mu\) mean of the score values. Then, the average of all distance values is calculated as

$$ Y_{{{\text{avg}}}} = \frac{{\sum Y_{n} }}{n}, $$
(32)

where \(n\) denotes the number of distance values. The similar image with the minimum average value will be retrieved using the average distance value.

5 Results and discussion

In this section, the retrieval performance of the proposed method is evaluated by conducting several experiments using MATLAB.

5.1 Database description

In this work, Extraction of Airways from CT 2009 (EXACT-09) [32], The Cancer Image Archive (TCIA) [33], National Electrical Manufacturers Association (NEMA-CT) [34], and Open Access Series of Imaging Studies (OASIS) [35] databases are used for the experiments. The description for each database is explained as follows, EXACT-09 and TCIA are the publicly available databases. In this work, the images in EXACT-09 are grouped under 19 categories, whereas in the TCIA database, the images are grouped under 8 categories. All the images in both databases have the dimension of 512 × 512. The NEMA-CT database contains 315 CT images which are categorized into 9 categories. The OASIS is a magnetic resonance imaging (MRI) dataset that contains scans of 421 subjects ranging in age from 18 to 96 years old. These 421 subjects are grouped into 4 categories. The detailed descriptions of all databases can be seen in [38] and [45].

5.2 Performance analysis of classification

In this section, the proposed BMWDLNN classifier is compared against the existing Adaptive Neuro-Fuzzy Interference System (ANFIS), Artificial Neural network (ANN), and Naive Basis Results. The performance analysis is done in terms of sensitivity, specification, accuracy, Negative Predictive Value (NPV), False Positive Rate (FPR), False Negative Rate (FNR), Mathews Correlation Coefficient (MCC), False Detection rate (FDR), and False Rejection Rate (FRR) using the above-mentioned datasets.

Figure 2 illustrates the comparative analysis of the proposed BMWDLNN classifier with the existing classifiers based on sensitivity, specification, and accuracy for different databases, namely EXACT-09, TCIA, NEMA-CT, and OASIS. When analyzing for EXACT-09 database, the sensitivity attained by the proposed method is 0.941176, and for specification and accuracy, the method obtained the values of 0.996732 and 0.993808. The proposed method is improved by 70.58% and 83.82%, 3.43% and 4.65%, 6.96% and 8.82% in respect of sensitivity, specificity, and accuracy when comparing with the existing ANN and Naive Bayes. In the analysis of the TCIA database, the sensitivity, specification, and accuracy of the proposed method are 0.88785, 0.983979, and 0.971963. The sensitivity, specification, and accuracy of the proposed method are improved by 5.14%, 6.14%, and 11.79% than the existing ANN. Followed by TCIA, in the analysis of NEMA-CT databases also, the proposed method has 0.964286 of sensitivity, 0.995536 of specification, and 0.992063 of accuracy. The proposed method showed an improvement of 60.71% in sensitivity, 4.68% in the specification, and 10.91% in accuracy compared to the existing ANN. Then, for the OASIS database, the proposed method yields the sensitivity of 0.952381, specificity of 0.984127, and accuracy of 0.97619. The proposed method is improved by the sensitivity of 69.04%, specificity of 23.01%, and accuracy of 34.52% than the existing Naive Bayes. The values of the existing approaches are lower than the proposed method for the four databases described above. The analysis above demonstrates that the proposed method is superior to the existing methods.

Fig. 2
figure 2

The sensitivity, specification, and accuracy of a Exact-09 Database, b TCIA Database, c NEMA-CT Database, d OASIS Database

The NPV, FPR, and FNR of the proposed and existing methods are analyzed for EXACT-09, TCIA, NEMA-CT, and OASIS database in Fig. 3. The NPV value achieved by the proposed method is 0.996732 for EXACT-09, 0.983979 for TCIA, 0.995536 for NEMA-CT, and 0.984127 for OASIS, which are higher than the existing methods. In terms of FPR, the proposed method has lower values compared to existing methods, which are 0.003268, 0.016021, 0.004464, and 0.015873 for EXACT-09, TCIA, NEMA-CT, and OASIS databases, respectively. In the analysis of FNR, the proposed method has the values for EXACT-09, TCIA, NEMA-CT, and OASIS as 0.058824, 0.11215, 0.035714, and 0.047619, respectively. The NPV value achieved by the proposed method has the improvement of 4.92% and 3.90% for EXACT-09, 10.72% and 7.23% for TCIA, 10.12% and 7.36% for NEMA-CT, and 23.31% for OASIS than the existing ANFIS and ANN, respectively. In terms of FPR, the proposed method had lowered the FPR range when compared to the existing ANFIS method, as 4.61%, 7.20%, 8.92%, and 23.01% for EXACT-09, TCIA, NEMA-CT, and OASIS databases, respectively. In the analysis of FNR, the proposed method was enhanced with the lowered range of 88.97%, 78.50%, 82.14%, and 70.23% than the existing ANFIS method for EXACT-09, TCIA, NEMA-CT, and OASIS, respectively. The higher NPV, and lower FPR, and FNR values of the proposed method demonstrates the proposed classifier is more efficient.

Fig. 3
figure 3

Analysis the NPV, FPR, and FNR of a Exact-09 Database, b TCIA Database, c NEMA-CT Database, d OASIS Database

Figure 4 shows the comparative analysis of the proposed classifier with the existing classifiers in terms of MCC, FRR, and FDR. For an efficient classifier, the MCC value should be higher and FRR and FDR values should be lower. The analysis is given for EXACT-09, TCIA, NEMA-CT, and OASIS databases. For these databases, the proposed method achieves 0.937908, 0.871829, 0.959821, and 0.936508 of MCC, 0.058824, 0.11215, 0.035714, and 0.047619 of FRR, and 0.058824, 0.11215, 0.035714, and 0.047619 of FDR, respectively. When comparing, MCC of the proposed method was improved by 73.13% than ANFIS, 56.39% than ANN, 61.57% than ANN, and 92.06% than Naive Bayes for EXACT-09, TCIA, NEMA-CT, and OASIS databases. Compared to the existing ANFIS, the FPR and FDR of the proposed method have lowered to the range of 88.97% and 88.64%, 78.50% and 74.49%, 82.14% and 80.42% and 70.23% and 70.21% for EXACT-09, TCIA, NEMA-CT, and OASIS databases, respectively. The above analysis infers that the proposed BMWDLNN classifier has better performance than the existing methods.

Fig. 4
figure 4

Represents the MCC, FRR, and FDR of a Exact-09 Database, b TCIA Database, c NEMA-CT Database, d OASIS Database

5.3 Retrieval performance analysis

In this section, the proposed method’s performance is assessed in terms of Average Recall Rate (ARR), Average Precision Rate (APR), and F-score. The \({\text{ARR}}\), \({\text{APR}}\) and \( F_{{\text{score }}}\) are calculated as in the following equations:

$$ {\text{ARR}} = { }\frac{{\mathop \sum \nolimits_{i = 1}^{{N_{I} }} R_{i} }}{{N_{I} }}, $$
(33)
$$ {\text{Where}}\;\;{\text{Recall }}\left( R \right) = \frac{{{\text{No}}{\text{. of relevant images retrieved }}}}{{{\text{No}}{\text{. of relevant images in the database}}}}. $$
(34)
$$ {\text{APR}} = { }\frac{{\mathop \sum \nolimits_{i = 1}^{{N_{I} }} P_{i} }}{{N_{I} }}, $$
(35)
$$ {\text{Where}}\;\;{\text{Precision }}\left( P \right) = \frac{{{\text{No}}{\text{. of relevant images retrieved }}}}{{{\text{No}}{\text{. of retrieved images}}}}. $$
(36)
$$ F_{{{\text{score}}}} = \frac{{2 \times {\text{ARP}} \times {\text{ARR}}}}{{\text{ARP + ARR}}}, $$
(37)

where \(N_{I}\) denotes the total number of database images. The top 10 retrieved images for the given query image (one of the best results with 100% precision) of EXACT-09, TCIA, NEMA-CT and OASIS databases are shown in Figs. 5, 6, 7, and 8.

Fig. 5
figure 5

Retrieval results (EXACT-09): a query image, b retrieved images

Fig. 6
figure 6

TCIA database retrieval result: a query image, b retrieved images

Fig. 7
figure 7

Retrieval results over NEMA-CT database: a query image, b retrieved images

Fig. 8
figure 8

The retrieval results of OASIS database: a query image, b retrieved images

The quantitative retrieval results’ comparison is illustrated in Table 1a–c over EXACT-09, TCIA, NEMA-CT and OASIS databases. The performance of the proposed technique is compared with the existing histogram of compressed scattering coefficients (HCSC) [36], Scattering Transform-Canonical Correlation Analysis vertical projection (ST-CCA-v) [37], local directional frequency encoded pattern (LDFEP) [38], local Ternary Pattern (LTP) [39], Local Derivative Pattern (LDP) [40], Local Tetra Pattern (LTrP) [41], Local Ternary Co-occurrence Patterns (LTCoP) [42], Local-Mesh Patterns (LMeP) [43], Spherical Symmetric 3D-LTP (SS-3D-LTP) [44], Local Wavelet Pattern (LWP) [45], and local bit-plane decoded AlexNet descriptor (LBpDAD) [46] features in the literature based on ARR, APR, and F-score values.

Table 1 Comparison of the retrieval performance over the different databases

Table 1 analyses the APR, ARR, and F-score values of various feature descriptor methods with respect to the (a) EXACT-09 and TCIA, (b) NEMA-CT, and (c) OASIS databases. The proposed method obtained the APR as 0.9981 for EXACT-09, 0.99929 for TCIA, 0.99512 for NEMA-CT, and 99.40 for OASIS. In the case of ARR and F-score, 0.30608 and 0.4685 for EXACT-09, 0.1500 and 0.2608 for TCIA, 0.3364 and 0.5013 for NEMA-CT, and 10.80 and 19.483 for OASIS are obtained, respectively. The retrieval performance analysis of proposed and existing methods in terms of APR, ARR, and Fscore for the aforementioned databases is shown in Fig. 9a–d.

Fig. 9
figure 9

Retrieval performance analysis in terms of APR, ARR, and Fscore over a EXACT-09, b TCIA, c NEMA-CT and d OASIS databases

The percentage of improvement of the proposed method in terms of {ARP, ARR and Fscore} is observed as {9.08%, 6.16% and 6.86%}, {6.92%, 3.96 and 4.66%} and {12.15%, 59.74% and 48.58%} with respect to HCSC, ST-CCAv and LDFEP for EXACT-09 database, respectively. Similarly, for TCIA database, {5.05%, 3.30% and 3.49%}, {3.60%, 1.97% and 2.19%} and {2.93%, 1.01% and 1.28%} improvements are noticed in contrast with HCSC, ST-CCAv and LBpDAD methods. Over the NEMA-CT database, {4.39%, 4.21% and 4.27} improvement is observed in the comparison of LWP and APR is improved by 1.2% in contrast with HCSC. In addition, the proposed method’s improvement is observed over OASIS database as {13.76%, 53.40% and 49.52%} and {55.15%, 77.19% and 75.03%} in comparison with LDFEP and LBpDAD descriptors, respectively. For all metrics, the proposed method obtained higher rates than the existing methods.

The category wise retrieval results of the proposed method over the aforementioned databases are shown in Fig. 10a–d. The proposed method shown high APR and low ARR and Fscore in Table 1(b) over NEMA-CT database compared to the HCSC method. Among the nine categories of NEMA-CT, two categories which are 3rd and 6th got low precision as 0.9889, and 0.97, respectively, are shown in Fig. 10c.

Fig. 10
figure 10

Category wise retrieval results of the proposed method over a EXACT-09, b TCIA, c NEMA-CT and d OASIS databases

6 Conclusions

Due to the increasing number of images in hospitals, the need for image retrieval systems has become more critical. This paper proposes a new CSTF feature descriptor to retrieve medical images based on the BMWDLNN classifier. The proposed feature descriptor extracts the texture, edge, shape, wavelet features, etc. The proposed method achieves the classification accuracy of 99.38% for EXACT-09, 97.19 for TCIA, 99.20% for NEMA-CT, and 97.61% for OASIS which are higher than the existing methods. The retrieval performance is evaluated based on APR, ARR and Fscore. The proposed method’s APR, ARR and Fscore are achieved as 0.9981, 0.30608, and 0.4685 for EXACT-09, for TCIA 0.9992, 0.15 and 0.2608, for NEMA-CT 0.9951, 0.3265 and 0.4917 and for OASIS 99.40, 10.80 and 19.483. The experimental results showed that the proposed method outperform the existing descriptors over the CT and MRI image databases.