Keywords

1 Introduction and Motivation

With the up gradation of internet technology and digital image repositories, large amount of different media databases are being created in every area. In image repositories, recovery of the images can be performed in two ways i.e. text based and content based. The text based retrieval approach is the conventional technique to retrieve the images and it suffers from many drawbacks such as this process is very cumbersome for humans to manually annotate the images. Moreover, spelling mistakes may also be there. So it is not feasible for large databases [1]. To blown away these problems, content based image retrieval was designed in 1992 by T. Kato which retrieves the images based on the visual attributes of the images [2].

Since then, the research on this emerging area has grown rapidly and the researchers now have done much progress and have made this system from low level basic system to high level semantic and intelligent system. Basically the CBIR works in two steps: First step is extraction of features and the next is similarity calculation as shown in Fig. 1. This system works by extracting the features of both query as well as database images. It then compares the query image feature vector with the stored feature vectors of the database images and retrieves the relevant images which are visually same on the basis of low level features [3]. The basic block diagram showing how the image retrieval takes place is demonstrated in Fig. 1.

Fig. 1
figure 1

Block diagram of CBIR system

CBIR systems have several applications: It is used in the medical domain for the diagnosis of diseases as large digital images are now being created by X-rays MRI’s etc. So this system assists the physician in the diagnosis of the problem by retrieving the similar images from the medical database [4]. Another important application is in criminal investigations by searching sketch of the criminal from the criminal database available. These systems are now being utilized in many areas such as remote sensing, GIS, graphic and fashion design etc.

2 Feature Extraction Techniques

The images are retrieved with the extraction of low level features such as color, shape and texture. Based on these features the images are retrieved from the visual attributes of the image but for retrieving from high level semantic features special techniques are used. The commonly used methods of color feature extraction are color histogram, dominant color descriptor (DCD), color moment (CM) and many more which are displayed Fig. 2.

Fig. 2
figure 2

Color feature extraction techniques

Color moment technique is most suitable for the extraction of color feature due to its lowest complexity and quicker response than other methods such as histogram based or dominant color descriptor. This technique computes the statistical measures which have the capability to express all the important details present in the image [5]. Mean and Standard deviation are evaluated from the RGB color space which is described in Eqs. 1 and 2.

$${\text{Mean}}\left( {\text{Ir}} \right) = \frac{1}{X \times Y}\mathop \sum \limits_{i = 1}^{X} \mathop \sum \limits_{j = 1}^{Y} Pc_{ij} ,r = [{\text{R,G,B}}]$$
(1)

\(Pc_{ij}\), image pixel value in ith row and jth column

$${\text{Std}}(I_{r} ) = \left( {\frac{1}{X \times Y}\mathop \sum \limits_{i = 1}^{X} \mathop \sum \limits_{j = 1}^{Y} (P_{cij} - {\text{Mean(}}I_{r} ))^{2} } \right)^{{\frac{1}{2}}}$$
(2)

X and Y are the row and column of the image

Color histogram is also an extensively used feature extraction technique in retrieval systems. It embodies the frequency distribution of the color pixels in the images. It figure out number of similar pixels of the image and stores that pixels. The major drawback of this technique is that the spatial information is not evaluated during its computation. And secondly the two divergent images will produce the similar histograms with same color distribution. In CCV technique, every bit of a histogram is divided into two types: incoherent and coherent type. If the value of the image pixel is attached with the uniform colored region then it is coherent otherwise incoherent. The coherent pixel belongs to the contiguous region of the image. The CCV presents the allocation of every color present in an image.

Another important feature is texture of the image. These features are classified as statistical features, model-based features, signal processing and lastly structural features. Mostly used texture feature techniques are based on signal processing due to their better performance; few of them are Discrete Cosine Transform (DCT), Gabor filter, Wavelet transform etc. Some important and influential texture extraction techniques are given in Fig. 3.

Fig. 3
figure 3

Texture feature extraction techniques

LBP is used on a large scale for ample number of applications in image processing because of its simplicity, performance and implementation. The LBP texture descriptor has illumination and rotational invariant properties. In this technique, the image is subdivided into smaller number of sub-matrices and from that the features are extracted. All the extracted features procured from these smaller sub matrices are merged to form one feature histogram that represents the whole image [6]. GLCM is also an accurate technique for the extraction of texture feature from the images. It computes the various second order statistical properties present in the images. Some of the important properties are shown in below Eqs. 3 to 5.

$${\text{Energy}}\;E = \mathop \sum \limits_{a} \mathop \sum \limits_{b} \left( {k\left( {a,b} \right)} \right)$$
(3)
$${\text{Contrast}}\;c = \mathop \sum \limits_{a} \mathop \sum \limits_{b} \left( {a - b} \right)^{2} k\left( {a,b} \right)$$
(4)
$${\text{Entropy}}\;T = \mathop \sum \limits_{a} \mathop \sum \limits_{b} \left( {k\left( {a,b} \right)} \right)\log \left( {k\left( {a,b} \right)} \right)$$
(5)

where, a and b are the co-occurrence matrix coefficients.

The Discrete wavelet transform have the frequency and spatial characteristics which are capable to obtain the multi-scale resolution of the images. For extracting the texture features via this technique, the coefficient distribution of the mother wavelet is computed. This wavelet is translated by b and scaled by a is given in Eq. 6.

$$\Delta_{a,b} (t) = \frac{1}{\sqrt a }\varPsi \left( {\frac{t - b}{a}} \right)$$
(6)

Shape feature is also most important feature for image retrieval as it describes the contour and position information. But to extract these features image segmentation is required. It is very difficult to retrieve the images based on simply shape feature. The various models which describe the target contour or shapes include spline fitting curves, line segments, gaussian curves and Fourier descriptors etc. [7].

3 Performance Evaluation Metrics

The second step of this system is similarity measurement or calculation. For this measurement, the difference between feature vectors of the query image and feature vectors of database images is calculated with the help of various distance metrics. The various distance matrices are Euclidean distance, manhattan distance, mahalanobis distance and Murkowski distance and many more [8]. Some of the most significant distance measures are shown in Eqs. 7, 8 and 9.

$$D_{\text{Euclidean}} = \sqrt {\mathop \sum \limits_{i = 1}^{n} (\left| {I_{i} - D_{i} } \right|} )^{2}$$
(7)
$$D_{\text{Manhattan}} = \sum\limits_{i = 1}^{n} {\left| { I_{i} - D_{i} } \right|}$$
(8)
$$D_{\text{Minkowski}} = \left[ {\mathop \sum \limits_{i = 1}^{n} \left( {\left| { I_{i} - D_{i} } \right|} \right)} \right]^{{\frac{1}{p}}}$$
(9)

In these above equations, I denote the feature vector of the query image and D denotes the feature vector extracted for every image of the database.

After the step of similarity evaluation the competence of the particular CBIR system can be measured in terms of two most important and well known metrics known as precision and recall [9,10,11]. These are shown in Eqs. 10 and 11.

$${\text{Precision}} = \frac{\text{Retrieved number of relevant images }}{\text{Number of images Retrieved }}$$
(10)
$${\text{Recall}} = \frac{\text{Number of retrieved relevant images}}{\text{Total Number of relevant images in database}}$$
(11)

4 Hybrid CBIR Systems and Their Performance

The capability of the CBIR systems depends upon the proper and appropriate selection of the feature extraction techniques. The low level features described in the above section when individually used in CBIR systems for feature extraction do not provide highlighted results. When the image is a complex one, only the use of primary feature will not be sufficient. Because it will not be able to capture the variable details present in the images. To overcome this limitation, integration of features is employed. Different types of hybrid CBIR systems have been designed and proposed in the literature on different datasets, which shows the better performance in terms of average precision and recall rate. In [12] CBIR model was proposed in which the retrieval was done in two stages. In first stage the images were analyzed by splitting them in small patches and in the next stage the same information was used for retrieving the similar images. The performance of the system in terms of precision and recall is increased by 55% and 25% respectively by the described technique.

Two texture techniques i.e. LBP and Gabor filters were used for extracting the image features. This designed approach proved to be less sensitive to histogram equalization as well as rotational invariant. Another new method was designed in [13] which combine the DWT and Handamard matrix results in increased accuracy along with speed of the CBIR system.

Singh and Kaur [14] Schemed a system on the combined approach of texture and color features which was very efficient and fast. New techniques based on block difference and block variations were used for textual features and color histogram for extracting color. It was observed that out of several distance metrics square-chord gives the better results. Non training based classifiers were used due to their efficiency and simplicity. Sadegh Fadaei et al. [9] designed the CBIR scheme on the optimized combination of two different features to improve the precision value of image retrieval. Dominant Color Descriptor (DCD) features were extracted from HSV color space and to extract texture features wavelet and curvelet were applied and finally these two features are combined optimally by optimization algorithm which is particle swarm optimization algorithm and provides much better accuracy as compared to other systems. The performance of some hybrid systems on Wang database with their average precision rate is tabulated in Table 1. Wang database is taken as it is one of the standard databases used to check the competency of CBIR systems.

Table 1 Hybrid systems with their average precision values (%)

5 Hybrid and Intelligent CBIR Systems with Their Performance

The application of artificial intelligence including machine learning and deep learning algorithms in CBIR systems has been heightened successfully in recent years. Due to the employment of all these algorithms basic CBIR system has now become an intelligent CBIR system with the improvement in efficiency and retrieval time. These deep learning algorithms are based on neural networks and they have adaptive learning power. So this is primarily the main reason behind their achievement in every field of multimedia processing [17]. Various types of machine and deep learning algorithms are being used successfully in CBIR systems for the purpose of classification of images or for the feature extraction of the images. Mostly used are Convolutional neural networks (CNN), SVM, auto-encoders, Extreme learning machine (ELM), clustering and simple forward and back propagated neural networks. An intelligent hybrid CBIR based on SVM was designed on the combined approach of color, edge and texture. It has increased the speed and accuracy of the system to a great extent [18].

The average precision of some CBIR systems using some intelligent techniques on Wang dataset is presented in Table 2.

Table 2 The average value of precision in intelligent CBIR systems (%)

It can be clearly observed from the Tables 1 and 2, that the execution results of the CBIR systems with some intelligent algorithms or techniques are more prominent as compared to the simple hybrid CBIR systems. Moreover, their accuracy and retrieval time is also more preferable than others which is also the matter of concern in CBIR systems in case of large datasets.

6 Semantic Gap Reduction

The basic CBIR system suffers from the problem of semantic gap which is the difference in the low level features of the image captured by the system and human perception. This issue can be effectively overcome by the mechanism of Relevant Feedback [2]. It acts as an interface which connects the user with the search engine. Depending upon the feedback from the user, this technique helps in refining the images. When the query image is entered in CBIR system, number of images is retrieved. Then these images are examined by the user and by Relevant Feedback technique selects the best matching images out of the total retrieved images. This operation repeats until the appropriate results are obtained. The basic CBIR system with Relevant Feedback is shown in Fig. 4.

Fig. 4
figure 4

Relevance feedback based CBIR system

Grigorova et al. [22] designed a RF based CBIR system in order to increase the accuracy of the system and to reduce the problem of semantic gap. This system dynamically selects the features and assigns them appropriate weights depending upon the user feedback. Various machine and deep learning algorithms are also being used as relevant feedback techniques in these systems. Another study was done on SFS algorithm along with relevance feedback for determining the matching images for the particular query image [23]. The system was tested on eighteen different distance measures and evaluation metrics were calculated.

These techniques can also enhance the efficiency of unlabeled image databases used for the image retrieval. In [24] at every iteration of feedback, query image and user feedback images are trained and after this the unlabeled images are labeled in the database. After that with the retraining of the images, the unlabeled images are again labeled and merged into the previously labeled images.

7 Conclusion, Issues and Future Scope

This paper presents the précised portraiture of CBIR systems along with its feature extraction techniques. The most important and critical issue in this area is the feature selection and feature combination as the hybrid CBIR systems yields better performance as compared with individual techniques. The other problem which arises here with large image datasets is ‘Curse of dimensionality’, so proper indexing technique should be used in order to reduce the feature dimensions of the images so that faster results can be obtained. Some intelligent and hybrid CBIR systems are also discussed here and their performance parameter is compared with the simple hybrid CBIR systems. These intelligent technique based CBIR systems have very high precision rate and higher accuracy values. When some deep learning, machine learning algorithms, indexing techniques, relevance feedback and other promising techniques are successfully employed in these systems then they will be able to make a revolution in various image related applications.