Introduction

Avascular Necrosis (AN) which indicates the bone tissue’s death in the femoral head is normally owing to the insufficient blood supply. This state causes a) micro fractures in the bone with continual weight bearing, and b) collapse of both sub-chondral bone and c) collapse of over-lying articular cartilage surface [1]. This can directly collapse the articular surface furthermore this forces the need for hip replacement [2]. The AN of Femoral Head generally affects younger and full-grown adults. Retrieving the AN images as of the huge collection of AN image dataset is intricate [3]. Medical CBIR (content-based image retrieval) have been introduced that typically operate by contrasting the queried image to other images existent in the imaging logs. Detecting similar images as of larger imaging archives, like PACS (‘picture archive & communication system’), can prominently aid in recommending diagnoses of multiple identical cases [4]. With the specific integration betwixt devices and Internet in IoT (‘Internet of things’), heap of images are accessed via social networking. CBIR is proposed in a manner to resolve this intricate problem. CBIR is an exhaustive and difficult domain of research [5]. Image retrieval (IR) systems are broadly categorized to 2 types- i) text-based retrieval (TBR) and ii) content-based retrieval (CBR) systems. The TBR system explores images on the base of unique keywords [6]. In addition, it is almost intricate to manually annotate huge count of images. Consequently, a pure text-centered framework is not adequate for IR [7]. Contrarily, CBIR is delineated as a process that searches to retrieves images as of a huge database as per their visual contents. Local or global features [8] express these contents of an image.

CBIR is a generally utilized IR technique in several computer vision applications like, in i) medical domain for attaining the past patient details, ii) E-commerce for ascertaining the needed products, iii) information retrieval for taking images as of the large database, etc. [9]. CBIR systems are utilized to take the images as of massive database grounded on the preferred contents [10]. The exponential growth in the count of image databases makes IR an active research area [11]. Indexing those images and taking the requisite image as of the big image database effectually is a chief research concern of CBIR [12]. By utilizing a medical image, it is feasible to consult the diagnoses that have been executed to images with identical lesions. This signifies a valuable aid for physicians when they face conditions which are intricate to diagnose, or when they are faced with external facets like the inexperience, subjectivity, tiredness, etc. that could direct to inaccurate diagnoses [13].

The remaining sections delineated in this paper are ordered as. Second section proffers the explanations of the related words. Third section delineates the complete framework of the proposed system. Fourth section evinces the experiential outcomes and fifth section concludes this paper.

Literature review

Thiriveedhi Yellamanda Srinivasa Rao and Pakanati Chenna Reddy [14] propounded an augmented classification along with feature diminution methodologies for IR. A firefly-neural network (FNN) centered IR forecast via 3 segments – a) feature extortion, b) feature diminution, classification and also the IR was projected in such document. The chief metrics like a) accuracy, b) Recall, c) precision, and d) f-measures, were evaluated for the propounded technique. From the outcome, it was elucidated that the FNN classifier exploited in the propounded method was enhanced on the forthcoming classifiers with better accurateness.

Rehan Ashraf et al. [15] developed a mechanism for automated IR and also proffered novel content centered image capture strategy that contingent on color features. The recommended research utilizes the analysis of histogram, color and DCT as they were robust and needed lesser computational power. The performance was contrasted in respect of precision, retrieval time, feature extortion and recall. The comparisons outcomes evinced that with other CBIR customary scheme, the recommended scheme performed better on considering all prevailing systems in respect of average recall along with precision values.

Cong Jin and Shan- Wu Ke [16] propounded an IR system centered on lower-level shape features. In IRSFM, these low-leveled characters of the image were employed for ascertaining salient zone. The shape characteristics of such zone were utilized for computing similarity betwixt salient zones. For the propounded shape features extortion framework, the shape features were ascertained by the principal axis, all feature’s value was signified by numerous scalars. It was confirmed by experiments that the IRSFM’s performance was the greatest in all contrasted frameworks, and it proffered extremely good performance of CBIR.

Carolina Reta et al. [17] proffered an effectual content-centered IR approach that utilized histogram-grounded descriptors to signify color, texture and edge features, and a k nearest neighbor classifier to recover the finest matches for the query images. The outcomes evinced that the proffered approach consistently attained notable mean average precision, recall, as well as precision measures. It outperformed modern strategies, whilst producing comparable outcomes to those attained by SIFT and SURF-centered approaches.

Zahid Mehmood et al. [18] developed an effectual visual words fusion (VWF) methodology centered on HOG (histogram oriented gradients) and SURF feature descriptors. The recommended technique proffers classification accurateness of 98.40% utilizing SVM whilst IR of 80.61% Qualitative along with quantitative examinations executed on 4 standard image compilations like, 1) Corel-1000, 2) Corel-1500, 3) Corel-5000, along with 4) Caltech-256 elucidated the recommended technique’s effectiveness centered on VWF of HOG and SURF feature descriptors.

Ghanshyam Raghuwanshi and Vipin Tyagi [19] propounded an approach for content centered IR grounded on Tetrolet transforms and feed-forward architecture. The recommended method resolves the problem of accuracy along with retrieval time for the IR scheme. The recommended retrieval framework operated in 2 segments: feature extortion together with retrieval. The experiential outcomes on COREL- 1 K as well as CIFAR - 10 benchmark databases evinced that the recommended system performed better in respect of the retrieval time and accuracy in contrast to the modern methods.

Drawback of existing approach

The existing approach lags the performance in terms of context of words, If meaning of word is alternate then the SSBOVW is not able to recognize, In the deep learning since large hidden layers and futuristic approach are included most of context of words are determined.

CBIR using deep belief CNN feature representation

CBIR [20, 21] utilizes image content features for searching and also retrieving images as of a huge database. In the proposed technique, AN images are generally retrieved as of the huge dataset. AN is the cellular death of bone constituents, encompassing the bone marrow; owing to the destruction of the bones’ blood supply. Retrieval of those images is intricate on account of its differed appearances. This paper proposes an effectual technique for retrieval of AN images utilizing DB-CNN Feature Representation. Initially, the input dataset undergoes preprocessing. The image noise is eradicated utilizing MF and is resized in the preprocessing stage. Now, features are extorted utilizing DB-CNN. Now, the image feature representations are transmuted to binary codes. Then, the similarity measurement is evaluated utilizing Modified Hamming Distance (MHD). Lastly, the images are retrieved centered on the similarity values. The outline of the proposed method is evinced in Fig. 1.

Fig. 1
figure 1

The proposed framework

Pre processing

It is the primary step in the CBIR system. An enhancement in the image data is the chief purpose of pre-processing since it quashes undesired distortions or augments certain image features notable for further processing. Eradication of noise and resizing is the chief function of image preprocessing. All the data set has been taken from.

Noise removal

The proposed technique employs MF for eradicating noises. This filter is extremely effectual at preserving edges whilst eradicating noise. The working principle of MF is that it replaces each value by moving on the image pixel by pixel with the median value of nearby pixels. Over the complete image, window Ws designates the pattern of neighbors, which slides pixel by pixel across the intact image. To evaluate the median value Ms, all the pixel value initially gets sorted in numerical order as of the window Ws, and then substitutes the middle/median pixel value Ms. A median value could be computed as of the window Ws, which is proffered in Eq. (1).

$$ {M}_s=\frac{s_{i\left(n+1\right)}}{2} $$
(1)

Here, s1, s2, ...., sn symbolizes the grey levels of any pixel value in any window.

Resizing

The size of the images is altered to process those images in an effectual way. Here, variable size images are resized to 256 × 256.

Feature representation

DB-CNN concentrates on feature representation of AN image. Feature extortion is delineated as extortion of collection of features or image characteristics to symbolize the information meaningfully or effectually, thereby making the image effectual for classification and analysis. As per the feature extortion, the image clarity will augment the accuracy of IR. The proposed technique utilizes DB-CNN for extorting features.

Deep belief CNN

Deep learning (DL) is the sort of advanced Artificial Neural Network introduced by several researchers to make the machine learning process to a disparate level of a frontier. The chief role of this DL process is to extort the information in higher-level abstraction strategy.

The DB-CNN classifier encompasses different categories of layers: (a) convolution layer (CL), (b) max-average pooling layer, and (c) a fully connected layer (FCL).

  1. (a)

    Convolutional layer:

It comprises multi-learned weight matrices (MLWM) termed filters/kernels, which slide over the input features. On each, this layer, initially the outcome of former layers is convolved to MLWM termed learned kernels/filter masks. Then, a non-linear operation is employed to process the outcome to attain the layer output. A kernel indicates the matrix, which is to be convolved with the inputted features and stride controls the filter convolves over those features. This layer executes the convolution on the inputted data with the kernel utilizing Eq. (12). The output of such convolution is moreover termed as the feature map.

$$ {g}_k=\sum \limits_{n=0}^{N-1}{x}_n{h}_{k-n} $$
(2)

Where, x denotes the input features, N symbolizes the number of elements in x and h indicates filter. The subscripts indicate the nth element of the vector whereas the output vector signifies g.

  1. (b)

    Max-Average Pooling layer:

The imperative point of pooling is to attain fundamental features as of joint features whilst keeping the notable features and abandoning the irrelevant ones. Pooling comprises augmented feature representation. This layer is as well termed as the down-sampling layer. The above pooling procedure diminishes the dimension of output neurons as of the CL todiminish the computational intensity and evade the over fitting. Pooling layers are normally utilized immediately after CL. What the pooling layers do is simplify the information in the output as of the CL.

For the convolved matrix images with the size of u × v, for every u- dimensional feature vector gi, 2 pooling categories could be delineated, namely, average pooling as well as max pooling which are proffered by the Eqs. (3) and (4).

$$ {f}_m(g)=\max \left({g}_i\right) $$
(3)
$$ {f}_a(g)=\frac{1}{u}\sum \limits_{i=1}^u{g}_i $$
(4)

Features of CL comprise associated information with close by features, like position as well as relative position. While the dissemination of the image features is flat and smoother, max-pooling function abandons the associated local spatial information. It influences the feature extortion and its representation in a huge extent. In that scenario, average pooling is presumed to sustain the locally associated information. Subsequently in the proposed technique, the pooling function is delineated as proffered in Eq. (5).

$$ f(g)={\alpha}_1\max \left({g}_i\right)+{\alpha}_2\frac{1}{u}\sum \limits_{i=1}^u{g}_i $$
(5)

Here, α1 + α2 = 1.

  1. (c)

    Fully connected layer:

This layer works as a matrix multiplication operation, which is normally equal to the feature-space transformation. Generally, it could be employed for valuable information integration and extortion. Features of FCLs signify global information. Subsequent to multiple layers of pooling and convolution, the output is created in the sort of a class. Those 2 layers would only be competent to extort features and diminish the count of parameters as of the actual images. Nevertheless, to produce the final output, an FCL requires to be employed to create an output equivalent to the count of needed classes. It becomes tougher to attain that number only with the CL. CLs create activation maps whilst the output is desired as whether or not an image is in a specified class. Basically, the output layer comprises a loss function to evaluate the prediction error. Once such forward pass is finished, the back-propagation starts to update the biases and weight for loss and error diminution.

Deep binary codes generation

For a couple of identical images, a high response feature maps seen at the same index locations on the deep layer. Grounded on this perception, the image representation is transmuted on the deep CLs in to the binary codes. This binary code is developed by contrasting the response as of every feature map with the average response over all the feature maps. The activation function is proffered by the Eq. (6).

$$ {a}_k^{HL}=\sigma \left({a}_k^7{W}^{HL}+{b}^{HL}\right) $$
(6)

Here, \( \sigma \left({a}_k^7{W}^{HL}+{b}^{HL}\right) \) denotes the sigmoid logistic function, which regulates the outputs betwixt the interval (0, 1). \( {a}_k^7 \), indicates the output feature vectors in FC7 layer, WHL specifies the weights, and bHL signifies the bias parameters in the hidden layer (HL). The binary code function is delineated as proffered in Eq. (7).

$$ {b}_k=\Big\{{\displaystyle \begin{array}{l}1,\kern2.52em {a}_k^{HL}\succ 0.5\\ {}0,\kern2.4em {a}_k^{HL}\le 0.5\end{array}} $$
(7)

The deep binary code (DBC) generation by multiple scale pooling is delineated in Fig. 2.

Fig. 2
figure 2

Deep binary codes generation

During the binary operation on the HL, the feature vectors are mapped to the binary codes. In IR, the proposed method extorts image features for the query image. Then succeeding the HL output, binary codes are extorted with the activation function.

Similarity measurement

Similarity measurement proffers quantitative gauge of the degree of match betwixt 2 images. In the proposed technique, MHD is utilized for measuring similarity betwixt 2 images.

Modified hamming distance

For a particular query image Q and a collection of deep binary codes bk, k = 1, 2, .., L, the specified Hamming distance is utilized to gauge similarity. A higher similarity signifies a lower value in the criteria of the Hamming distance. In this proposed technique, MHD HQ = D − HQ in order that HQ comprises a higher value for maximal similarity. Here, D denotes the size of bk and the sorted search score centered on the MHD returned by one bk is signified by Ck, which is proffered by the Eq. (8).

$$ {\overline{C}}_k=\frac{C_k-\min \left({C}_k\right)}{\max \left({C}_k\right)-\min \left({C}_k\right)} $$
(8)

Furthermore, max-min normalization is utilized on the above scores returned by the MHD, so that query relevant images proffer a max-score equal to 1, whilst the ir-relevant images provide zero score.

The size of \( {\overline{C}}_k \) (area under a curve) is evaluated as in Eq. (9).

$$ {A}_k=\sum \limits_{l=1}^N{\overline{C}}_{k,l} $$
(9)

Where, N signifies the top N nearest neighbors in every search score. This N parameter is developed to evade the situation in which the sorted curve as of a good feature may come under that as of a poor search score for a big N. This parameter regulates the area size. Evidently, the computed area size under each normalized score-curve could be employed to choose the topmost K higher quality traits.

Now, an adapted weight value is allotted to each topmost K scores, which is proffered by Eq. (10).

$$ Weigh{t}_k=\frac{1}{A_k} $$
(10)

Lastly, the fused search score as of the higher quality DBCs is evaluated utilizing Eq. (11).

$$ Score=\sum \limits_{i=1}^K\left({C}_k\times Weigh{t}_k\right) $$
(11)

The proposed dynamic score-level late fusion strategy is utmost adaptable. Automatically, the DBC’s quality is gauged in an un-supervised way. Lastly, on the base of the fused score level, the top identical images are retrieved.

Result and discussion

In this section, experiments are developed and proffered the performance of the proposed image representation centered on DBCs and the similarity measurement for AN image retrieval. Numerous evaluations are done to assess the proposed DB-CNN’s performance. This proposed approach is employed in the platform of MATLAB.

Performance analysis

The performance of the proposed DB-CNN is contrasted with the prevailing systems such as IRSFM (IR Algorithms centered on Shape Feature Matching), Bag of Visual Words centered on SIFT and SURF (SS-BOVW). The proposed technique is assessed grounded on the metrics such as Recall, Precision, Retrieval Time and Retrieval Accuracy. The images of Femur, Humerus, and Knee affected by the disease Avascular Necrosis are utilized to analyze the IR performance by employing the proposed DB-CNN.

Precision

The precision is evaluated by the number of relevant images retrieved to a total count of images regained in response to an aimed image and it is mathematically signified utilizing the Eq. (12).

$$ \Pr ecision=\frac{No. of \operatorname {Re} levant \operatorname {Im} ages \operatorname {Re} trieved}{Total\; No. of \operatorname {Im} ages \operatorname {Re} trieved} $$
(12)

Figure 3 analyzes the performance of the proposed DB-CNN with the prevailing techniques like IRSFM and SS-BOVW in respect of the metric Precision. The comparison is grounded on the images of Femur, Humerus, and Knee affected by AN. For Femur images, the proposed DB-CNN evinces greater precision by 10.98 and 7.07% on considering the prevailing IRSFM and SS-BOVW respectively. For Humerus images, the proposed DB-CNN exhibits greater precision by 8.38 and 12.18% than the prevailing IRSFM and SS-BOVW respectively. For Knee images, the prevailing IRSFM and SS-BOVW are lesser on considering the proposed DB-CNN since DB-CNN exhibits greater precision by 5.6 and 2.14%. All these inferences grounded on the metric precision prove the dominance of the proposed one.

Fig. 3
figure 3

Comparison of the existing and the proposed techniques in terms of precision

Recall

The recall, the parameter for performance evaluation handles the sensitivity of the IR scheme. The recall is computed by the number of exact images retrieved to the total count of images within that semantic sort of the provided image repository, which is signified by Eq. (13).

$$ \operatorname{Re} call=\frac{No. of \operatorname {Re} levant \operatorname {Im} ages\kern0.17em Rtrieved}{Total\; No. of \operatorname {Re} elvant \operatorname {Im} ages\ in\ the\; set} $$
(13)

Figure 4 analyzes the performance of the proposed DB-CNN with the prevailing techniques like IRSFM and SS-BOVW in respect of the metric Recall. It is inferred that the recall of the prevailing IRSFM is 74.2% for Femur images, 75% for Humerus images and 70.28% for Knee images. The recall of the existing SS-BOVW is 72.1% for Femur images, 79.36% for Humerus images and 76.87% for Knee images. However, the recall of the proposed DB-CNN is 81.56% for Femur images, 80.23% for Humerus images and 79.9% for Knee images, which are greater on considering the other 2 prevailing techniques IRSFM and SS-BOVW. The existing IRSFM evinces low performance for Knee images of AN and the proposed DB-CNN exhibits high performance for Femur images of AN.

Fig. 4
figure 4

Comparison of the existing and the proposed techniques in terms of recall

Retrieval time

It is influenced by disparate factors like: methodologies utilized for feature extortion, count of features existent in the feature set, the methodology utilized for similarity gauge etc. IR time signifies the total of the searching time and the feature extraction time. Retrieval time or Computation time encompasses the feature extortion and similarity matching time. The system’s total retrieval time is provided in Eq. (14).

$$ \operatorname{Re} trieval\ Time= Deep\ Binary\ Code\ Generation\ Time+ Similarity\ Measurement\ Time $$
(14)

Figure 5 examines the performance of the proposed DB-CNN with the prevailing techniques like IRSFM and SS-BOVW in respect of the metric Retrieval Time. In the given graph, x-axis signifies the AN affected images of Femur, Humerus and Knee and y-axis specifies the time (in seconds) taken for the IR. Considering Femur images, the existing SS-BOVW shows poor performance whose retrieval time is 25 s. The existing IRSFM varies from the existing SS-BOVW by 4 s. But, the proposed DB-CNN evinces the lowest retrieval time of 11 s, which in turn indicates higher performance. Considering Humerus images, the existing IRSFM shows poor performance whose retrieval time is 25 s and the existing SS-BOVW shows the retrieval time of 22 s. But, the proposed DB-CNN shows the lowest retrieval time of 9 s, and this confirms its superiority. Considering Knee images, the prevailing SS-BOVW evinces poor performance whose retrieval time is 20s and the prevailing IRSFM shows the retrieval time of 14 s. But, the proposed DB-CNN evinces the lowest retrieval time of 7 s, and this confirms the dominance of the proposed one.

Fig. 5
figure 5

Comparison of the existing and the proposed techniques in terms of retrieval time

Retrieval accuracy

Retrieval Accuracy is delineated as the competency to distinguish the relevant as well as irrelevant images. The accuracy is a gauge of the extent of the closeness of a calculated or measured value to its original value. Accuracy also stands as the degree to which the outcome of a calculation, specification or measurement matches to the standard (correct) value. The accuracy of retrieval is ascertained utilizing the Eq. (15).

$$ \mathrm{Retrieval}\ \mathrm{Accuracy}=\frac{\mathrm{Correctly}\ \mathrm{Retrieved}\ \mathrm{Images}}{\mathrm{Total}\kern0.28em \mathrm{No}\ \mathrm{of}\ \mathrm{Retrieved}\ \mathrm{Images}} $$
(15)

Figure 6 analyzes the performance of the proposed DB-CNN and the prevailing techniques like IRSFM and SS-BOVW in respect of the metric Retrieval Accuracy. In the given graph, x-axis signifies the AN affected images of Femur, Humerus, and Knee whereas the y-axis specifies the IR accuracy in percentage. It is inferred that, amongst all the AV images, Knee images evinces slightly low performance for the retrieval of images contrasted to the Femur images and Humerus images. The proposed DB-CNN exhibits the retrieval accuracy of 90.4% for Femur images, 95.1% for Humerus images and 88.58% for Knee images. Amongst the Femur images, Humerus images and Knee images, the highest accuracy of retrieval can be seen in the retrieval of Humerus images. The retrieval accuracy of Femur images and Knee images are slightly lower when contrasted to the Humerus images. When those values are contrasted with the prevailing techniques like IRSFM and SS-BOVW, the existing techniques shows lower retrieval accuracy contrasted to the proposed technique, which shows greater retrieval accuracy.

Fig. 6
figure 6

Comparison of the existing and the proposed techniques in terms of retrieval accuracy

Benefits of implementing deep learning in medical imaging

While considering all the calculated parameters discussed in the “Results and discussion” Section, It is clearly indicates that the proposed methods shows significant improvement in terms of image retrieval in accuracy, time, recall, precision etc. In the image processing all the parameters, plays a vital role, Hence there will be major shift in technology will happen if we apply deep learning technology in medical applications. In future, the extension of deep learning will be model and simulate the entire performance of the product for assessing the overall behavior, Since the deep learning deals with feature mapping and future prediction this will be the suitable technique as for as now for all the image processing application.

Demerits in the proposed system

Since the proposed system is limited to the dataset and getting large number of data for analysis is very difficult, The is AV disease is very limited in the society and specifically in India, Since most of the data set are acquired from the website, Using the real data is very limited in the research paper.

Conclusion

This paper proffers a novel technique named DB-CNN centered feature representation for CBIR of AN image. The performance of the proposed approach is analyzed and contrasted to the prevailing techniques like IRSFM and SS-BOVW. The metrics utilized for the evaluation of performance are Precision, Recall, Retrieval Time and Retrieval Accuracy. The images of Femur, Humerus, and Knee affected by the disease Avascular Necrosis are used to analyze the performance for the retrieval of images using the proposed DB-CNN. For Femur images, the proposed DB-CNN shows greater precision of 92.3%, recall of 81.56%, and retrieval time of 11 s and retrieval accuracy of 90.4%. For Humerus images, the proposed DB-CNN shows greater precision of 87.38%, recall of 80.23%, and retrieval time of 9 s and retrieval accuracy of 95.1%. For Knee images, the proposed DB-CNN shows greater precision of 80.5%, recall of 79.9%, and retrieval time of 7 s and retrieval accuracy of 88.58%. From the above-assessed results, it is proved that the proposed DB-CNN evinces superior performance when contrasted with the other prevailing techniques.