A modified similarity measurement for image retrieval scheme using fusion of color, texture and shape moments

Varish, Naushad

doi:10.1007/s11042-022-12289-1

A modified similarity measurement for image retrieval scheme using fusion of color, texture and shape moments

Published: 11 March 2022

Volume 81, pages 20373–20405, (2022)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Multimedia Tools and Applications Aims and scope Submit manuscript

A modified similarity measurement for image retrieval scheme using fusion of color, texture and shape moments

Download PDF

Naushad Varish ORCID: orcid.org/0000-0002-0088-2213¹

498 Accesses
10 Citations
1 Altmetric
Explore all metrics

Abstract

In this paper, a simple feature fusion scheme is proposed for content-based image retrieval (CBIR) using color, texture and shape feature moments. The low dimensional feature descriptor is constructed by fusing color, texture and shape moments of an image effectively. The color moments are extracted from the image using probability histogram model while the Gray Level Co-occurrence Matrix (GLCM) based texture moments are computed in a very new fashion by selecting salient components in Discrete Cosine Transform (DCT) domain after determining the inter-relationship between the DCT blocks. Alone, color or texture information is not adequate to produce the desire results in image retrieval system. So, suggested work has also considered the multi-resolution based shape information, since the single resolution level of image does not provide an adequate image information and it may be possible that the fine details may be visualized on other resolution levels of image. Therefore, shape feature descriptor is determined by calculating the invariant moments of multi-resolution based sub-images at the various levels. Finally, fused single feature descriptor is computed by proficient fusion of color, texture and shape feature moments. The modified distance is also suggested for image retrieval task. The proposed feature fusion scheme is implemented on Corel-1K, OT-8 and GHIM-10K image databases to evaluate the retrieval performance and experimental results show the effectiveness of our scheme over the other existing CBIR schemes.

Image Retrieval Scheme Using Efficient Fusion of Color and Shape Moments

An Effective Content-Based Image Retrieval Using Color, Texture and Shape Feature

Image retrieval based on exponent moments descriptor and localized angular phase histogram

Article 14 March 2016

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

The main goal of the content-based image retrieval (CBIR) [20, 38] scheme is to retrieve a set of the most desired images from the digital image repository using low level visual contents such as color, texture and/or shape of a given query image. The digital image repository has becoming larger day by day from the various areas including, social networking sites, entertainments, medical imaging, crime prevention, historical archives, and broadcasting. So retrieving the most relevant images from such a large digital repository is one of the challenging task. In a typical CBIR system, initially the low level visual contents are computed from the query image and target images in digital repository for construction of feature vectors/descriptors. The appropriate similarity measure is calculated between the feature descriptors of the given query image and each target image in a digital repository. The target images are ranked based on their computed similarity measures to the query image. The visual contents like color, texture and shape play an important role for producing desired results as per the user’s requirements. Several CBIR [6, 37, 41, 45] schemes based on texture and shape features have been found in literature reviews. Therefore, in this paper, author has considered color, texture and shape visual descriptors/features [28] for CBIR system, where probability histogram based color feature descriptor has been computed using color moments and novel texture feature extraction technique has been proposed in DCT domain with the help of co-occurrence matrix. Finally shape feature extraction technique is suggested based on the various multi-resolution sub-images.

The color feature descriptor is significantly adapted in image retrieval schemes due to its rotation and scaling invariant properties. Liu et al. [18] have computed color feature descriptor using color difference histograms based on first and second order partial derivatives and uniform quantization approach in image retrieval. Using color visual descriptors, several image retrieval systems have been proposed such as co-occurrence matrix [17], color autocorrelation [33], exponent color moments [52] and color moments [5]. In general, color moments represent color characteristics of the digital image. These moments are also determined the important visual features of image under different circumstances, and various lighting environments. The color moments alone are not sufficient to identify and differentiate the various image contents significantly and effectively.

Texture is one of the most essential features of image which play a vital role for developing effective CBIR applications. The texture moments or features provide the distinguish image properties such as smoothness, homogeneity, coarseness, and regularity. In an image, various objects can be notable solely using texture patterns. Malik et al. [22] have proposed an image retrieval scheme where the texture features have been extracted from the non-overlapping DCT blocks of the grayscale image. In this scheme, DC and first three AC coefficients from each DCT block have collected in zigzag scanning order and constructed the four histograms of DC and three AC coefficients individually. Further, they have quantized the corresponding histograms into different numbers of bins and calculated statistical parameters like mean, standard deviation, skewness, kurtosis and smoothness from the histograms for construction of the texture feature descriptor. The various similarity distances have been performed for effective retrieval of images from the database. Phadikar et al. [27] have also proposed a CBIR scheme in DCT domain, where three visual features i.e., color histogram, color moments, and edge histogram are computed directly from the transform domain. However, all the above computed texture visual features are not uniformly significant for the construction of the feature vector. So, before similarity matching, they have performed the genetic algorithm (GA) for optimizing the set of image visual features using weight factor which increases the retrieval accuracy. Haralick [10] have proposed a Gray level co-occurrence matrix (GLCM) [29] which provides the co-relation between the pixels at a particular distance and gives the much spatial information of an image. Wang et al. [48] have suggested an image retrieval scheme based on texture features; initially an image is divided into 8-connectivity regions and GLCM based texture features have been extracted from each connected region. Vahid et al. [24] have proposed new texture descriptor, also known as the CoALTP texture feature vector which is obtained by the efficient combination of GLCM and Local Ternary Pattern (LTP). This CoALTP texture feature vector inherits the properties of both occurrence matrix and LTP. Finally, the retrieval results using different distance metrics have been analyzed based on this texture feature vector.

Shape feature is one of the important components of the image which is extensively used as a discriminating element for the developing CBIR applications [11]. It also provides many accurate results if the image consist of objects, various shapes, different structures and distinguish edges in many directions. In general, the description of the shape is done by two methods; one is contour based method where the Fourier descriptors have been used as shape representatives of image. The second is region based method, where invariant moments represent the shape features. Akrem et al. [7] have developed shape based image retrieval scheme, where shape signatures have been extracted based on the Fourier descriptor (FD) and farthest point distance (FPD) technique. In this scheme, the shape signatures have been computed at each point on a shape contour and it also achieved the scale, translation invariant properties of image which further improves the retrieval accuracy. But during acquiring the invariant properties, some valuable information has been lost. Hence another shape based image retrieval scheme has been suggested by Emir et al. [39] which overcome the problem of existing scheme [7] and it also preserves the invariance properties of image. They have adopted only the phase of Fourier coefficient and it has been used for the specific points (or pseudo mirror points) as a shape orientation reference. The shape signature is also invariant under translation, scaling and rotation due to the phase-preserving Fourier descriptors. Li et al. [15] have suggested invariant moments based CBIR scheme, where the shape feature vector has been constructed by combining the Zernike moments (ZMs) based phase coefficients and ZM magnitude. Kothyari et al. [14] have proposed CBIR scheme where they have directly computed the seven invariant moments for the formation of the feature vector. But images captured in nature are not noise free. There is a need for some significant preprocessing technique before extracting the visual contents or moments from images. In this paper, we have analyzed the images at different multi-resolution levels prior to extract the shape features/moments, since information at single resolution has proven to be insufficient.

Most of the earlier discussed CBIR schemes have used only single visual contents among three low-level visual contents i.e., color, texture and shape. However, it is very difficult to achieve adequate retrieval results proficiently by considering single feature descriptor alone because, image in nature contains a variety of visual attributes. For improving the retrieval performance, many researchers have developed several CBIR schemes [21, 49] based on the proficient combination of texture and shape features. Wang et al. [49] have suggested an image retrieval scheme, where shape and texture features are combined efficiently. In this scheme, shape features have been computed from an RGB color image using exponent moments, those having numerous desirable image characteristics and texture features have been computed using histogram of localized angular phase of the intensity image plane. Liu et al. [21] have proposed CBIR scheme by combining texture and shape features using weighted distance measurement efficiently, where the texture visual features have been computed from the extracted optimal non-subsampled shearlet transform based decomposed images and the shape features are extracted from images by using low-order quaternion polar harmonic transforms (QPHTs). Finally, the single distance is computed based on the optimal weighted similarity for texture and shape features respectively. This single distance is used in the retrieval process. In literature survey, a number of image retrieval schemes [42, 50] have been developed using a various combination of different image visual features which improves retrieval performance in certain extent. But it is observed that by combining such visual image features, there is no guarantee to produce the better retrieval results. For developing effective CBIR scheme, it is necessary to extract suitable visual image features that are enough significant to represent the low dimensional feature vector effectively without compromising the retrieval performance. In the presented work, author has proposed a novel CBIR scheme based on color, shape and texture visual moments/ features. The main contribution of paper is highlighted by the following points:

1. The color moments have been computed from the probability histograms of image planes after using an effective pre-processing algorithm on color image.
2. The Gray Level Co-occurrence Matrix (GLCM) based texture moments from the image are computed in a very new fashion by selecting salient components in Discrete Cosine Transform (DCT) domain after determining an inter-relationship between the DCT blocks. In this way, the new texture feature extraction technique is proposed by computing the GLCM features from the arranged matrices of DC and AC coefficients of the DCT image blocks.
3. The shape moments are extracted from the multiresolution based sub-images since the most of the detailed information of an image plane is not visible at one resolution level while some significant visual information is analyzed in different multi-resolution levels.
4. Finally, the low dimesional feature descriptor is constructed by simple fusion of color, texture and shape moments of an image effectively which reduces the computional overhead. This simple fusion approach also improves the retrieval accuracy of CBIR system.
5. The proposed similarity distance has been suggested and comparative results with Euclidean distance have been presented using three standard image datasets.

The rest of the paper is organized as follows: In Section 2, some preliminaries on discrete cosine transform, gray level co-occurrence matrix, Gaussian image pyramid are described. Section 3 elaborates the proposed CBIR scheme in details. In Section 4, the experimental results and discussion are provided. Finally, Section 5 presents the conclusions of the paper.

2 Preliminary concepts

Before presenting our proposed CBIR scheme, we will describe some basic concepts of discrete cosine transformation, Gaussian image pyramid and gray level occurrence matrix in brief. These concepts have been used in the proposed feature extraction techniques during the formation of the feature descriptors.

2.1 Block level discrete cosine transformation

Discrete cosine transformation (DCT) [2] has been intensively used in the signal and digital image processing applications. It is a proficient tool which converts an image into frequency/transform domain from the pixel/spatial domain. The DCT tool considered only real part of frequency domain which makes it faster than the other transformation tool like discrete Fourier transformation. The DCT transformed image has DCT coefficients, where the first upper top left corner component is known as DC coefficient (or energy of image block) and all remaining components are called as AC coefficients. Moreover, the most significant visual information of the transformed image is lies in the fewer numbers of coefficients which lies on the top upper left part of decomposed image block. If we select DCT coefficients in zigzag scanning order then it represent the most significant visual information of the image block. The selection process of DCT coefficients from transformed block is depicted in Fig. 1. The special characteristics of this transformation are pixel de-correlation, high energy compaction which helps us to split image into its different frequencies by de-correlating the pixels and preserving the energies. Due to these characteristics, DCT plays an important role in various fields of digital image processing such as image compression, image segmentation, feature extraction, image enhancement and visual content based image retrieval [44]. The 2-D DCT of the image block of size N × N can be defined as:

$$ \begin{array}{l} F(u, v) = \frac{2}{N}c\left( u \right)c\left( v \right)\sum\limits_{x = 1}^{N} {\sum\limits_{y = 1}^{N} {f(x, y)} } \cos \left[ {\frac{{(2x + 1)u\pi }}{{2N}}} \right] \times cos\left[ {\frac{{(2y + 1)v\pi }}{{2N}}} \right]\\ c(u) = c(v) = \left\{ \begin{array}{l} \frac{1}{{\sqrt 2 }} \text{if} u = v = 0\\ 1 \text{if} u, v > 0 \end{array} \right\} \end{array} $$

(1)

where, F(u, v) and f(x, y) are the transformed image and the original image respectively. The top upper left component i.e. F(0,0) = DC coefficient; is the average intensity or energy of the image block. As per the concept, only few DCT coefficients are sufficient to represent an approximate image block without losing any significant visual information. Therefore, in the presented work, 27 out of 63 AC coefficients from each 8 × 8 DCT block have been taken in zigzag scanning order and corresponding statistical values are computed for visual information representation. A lot of researchers have suggested various techniques to extract the visual information from the DCT blocks. Jiang et al. [12] have proposed image retrieval scheme, where the visual information is extracted by considering the spatial relationship between the DCT coefficients of block and its sub-blocks. In our proposed work, DC and AC coefficients have been collected separately, where DC coefficients are kept in the form of matrix. The only 27 out of 63 AC coefficients from each block are selected in zigzag scanning order and divided into three uniform groups where each group contains similar number of AC coefficients. The first group has the smallest number of AC coefficients while the last group has the largest number of AC coefficients. Thereafter, the coefficient of variation (CV ) is computed from three non-uniform groups i.e., group G₁, group G₂ and group G₃. For the computation of the CV, initially the mean(μ) and standard deviation σ from each group are calculated and accordingly the ratio between standard deviation and mean provides the CV. Later, these three CV s would be used for the construction of the three AC matrices because this selection retains the local structure of image block [4]. The computation of CV is defined as:

$$ \begin{array}{l} C{V_{Gr}} = \sigma_{Gr}/\mu_{Gr}, {Gr} = \left\{ {G_{1}, G_{2}, G_{3}} \right\}\\ {\mu_{Gr}} = \frac{1}{n}\sum\limits_{i = 1}^{n} {A{C_{i}}} , \sigma{_{Gr}} = \sqrt {\frac{1}{n}\sum\limits_{i = 1}^{n} {{{(A{C_{i}} - {\mu_{Gr}})}^{2}}} } \end{array} $$

(2)

Where n is the number of AC coefficients and AC_i is i^th AC coefficient in a particular group Gr. The value of CV is high in the areas of DCT image block, where the edges are exists while it is very low in the uniform areas [23]. Hence a higher value of CV shows that AC coefficients in the image block belong to the edges while the low CV value specifies AC coefficients belong to a uniform region. Indeed, the characteristics of CV have proven that it can be considered as a good region detector. The CV is also used to detect non-spurious items in the image block. Now, for the texture feature extraction, the DC coefficients and CV_s of selected AC coefficients are arranged in a matrix form separately. In this way, the three matrices based on AC coefficients and one matrix based on DC coefficients have been achieved. These matrices will be considered for the texture feature representation. The whole process for the texture feature extraction will be described in the proposed CBIR scheme Section.

2.2 Gray-level spatial dependence matrix

The Gray-level spatial dependence matrix has been introduced by Haralick [10] in year 1973 which converts an image into a matrix using frequency of occurrence of pair of pixel values at a specific distance in the original image. The Gray-level spatial dependence matrix is also known as Gray level co-occurrence matrix (GLCM) and it is a most effective tool for analyzing texture visual features in the image [44, 54, 55] using second order statistical parameters. Moreover, the GLCM approach provides the spatial relationship between two pixel values at a particular distance rather than a single pixel value in the image. Let I(x, y) be an image consisting of N_x and N_y horizontal and vertical resolution cells respectively. In order to reduce the computational complexity, the gray level or gray tone values of each resolution cell are quantized into N_g gray tone values. Let us suppose L_x = {1,2,...,N_x} and L_y = {1,2,...,N_y} represent the horizontal and vertical spatial domains, and N = {1,2,...,N_g} is set of quantized gray tone values. The quantized set N is computed by the uniform quantization technique. Now, an image I(x, y) is defined as L_y × L_x → N_g. In our proposed work, the spatial relationship among pixel values at a unit distance and in four different directions is determined. For each direction, corresponding computed co-occurrence matrices are converted into normalized gray level co-occurrence matrices (NGLCMs). Finally, these NGLCMs are considered for the computing the texture visual features using second order statistical parameters/moments. The element (i, j) of GLCM is computed by collecting the quantized gray tone value i occurs in particular relation with a quantized gray tone value j in the image. An each element (i, j) is determined by occurrence of quantized gray tone values to its neighboring quantized gray tone values in the image at a specific displacement with particular direction. Since, the size of the GLCM is rely on the number of quantized gray tones in the image, so we have computed the GLCM based on the symmetric property. Hence, the normalized occurrence matrix has the probabilities of occurrence of pairs of all gray level values with distance (d) i.e., d = 1 and direction (𝜃), where 𝜃 = 0⁰,45⁰,90⁰,135⁰,180⁰,270⁰,225⁰,315⁰. Then the co-occurrence probability can be computed as

$$ {P_{r}}(x) = \left\{ {{P_{ij}}\left| {(d, \theta )} \right.} \right\} $$

(3)

where, element P_ij of NGLCM between two gray tones i.e, i and j is defined as:

$$ {P_{ij}} = \frac{{{C_{ij}}}}{{\sum\nolimits_{i = 1}^{G} {\sum\nolimits_{j = 1}^{G} {{C_{ij}}} } }} $$

(4)

C_ij represents the frequency of occurrence of gray tones i and j with parameters (d, 𝜃) and G is the total number of gray tones which is obtained by uniform quantization process. The value ${\sum \nolimits _{i = 1}^{G} {\sum \nolimits _{j = 1}^{G} {{C_{ij}}} } }$ represents the sum of elements of GLCM for a particular value of (d, 𝜃). For easy computation, the parameter (d) is normally taken as less than or equal to 10 while eight orientations i.e. 𝜃 = 0⁰,45⁰,90⁰,135⁰,180⁰,270⁰,225⁰,315⁰ are adapted. In GLCM approach [10], fourteen statistical parameters are suggested and corresponding similarity measure between them is calculated for texture analysis. In order to reduce computational complexity, we have computed only four textural features including contrast, correlation, energy and homogeneity from GLCMs for distinct values of (d, 𝜃), where, this statistical information is appropriate to classify the texture characteristics of an image effectively.

2.3 Gaussian image pyramid

The presented work uses the multiscale representation approach since multiscale approaches are mostly used for deriving the significant image visual information at different resolution levels. The visual information of the digital image can be analyzed in various scales easily because it is very difficult to visualize all significant visual features/information of the original image directly. The Gaussian image pyramid (GIP) [1, 51] is the most popular multiscale technique for image analysis and feature extraction. In our proposed scheme, Gaussian image pyramid(GIP) [51] has been considered due to its easy implementation and low computational overhead. The GIP is performed in two steps that is averaging/ smoothing and down sampling. Smoothing is done by the convoluting the filter or mask with the image, where every time the smoothed image is reduced by a factor two. The whole process is repeated until to obtain the lowest resolution image of size 1 × 1. Each iteration provides the reduced size image with increased smoothing. In this way, the GIP is constructed, where the collection of distinct resolution images in decreasing order based on their sizes is kept in pyramidal shape. Let I₀ be the original image of size N × N and it is convolved with the low pass filter (i.e., Gaussian kernel function) and apply down sampling technique to generate next level image I₁ of size N/2 × N/2. Similarly an image I₂ at level two has been obtained from image I₁. This process is continue until to achieve the lowest level G_n of image pyramid. The mathematical formula for GIP of original image I(x, y) is defined as

$$ \begin{array}{l} {I_{0}}\left( {x, y} \right) = I\left( {x, y} \right)\\ {I_{l}}\left( {x, y} \right) = \sum\limits_{a = - 2}^{2} {\sum\limits_{b = - 2}^{2} {w(a, b)} } {I_{l - 1}}\left( {2x + a, 2y + b} \right), \\ \forall 0 \le l \le G\_n \end{array} $$

(5)

where w(a, b) is a low pass filter (or the approximate Gaussian filter). Sometimes it is also known as weighing function or Gaussian generating kernel. These weighting functions are constants, separable and symmetric for all decomposition levels. In our proposed work, the following Gaussian kernel is used and it is defined as:

$$\frac{1}{{256}}\left[ \begin{array}{l} {\text{ 1 ~~~4 ~~~6 ~~~~4 ~~~1}}\\ {\text{ 4 ~~16 ~~24 ~~16 ~~4}}\\ {\text{ 6 ~~24 ~~36 ~~24 ~~6}}\\ {\text{ 4 ~~16 ~~24 ~~16 ~~4}}\\ {\text{ 1 ~~~4 ~~~6 ~~~~4 ~~~1}} \end{array} \right]$$

The mean distribution of the low pass filter is lies at the middle of the Gaussian generating kernel i.e., w(a, b) = w(0,0). The kernel’s weight can be generated using following Gaussian function [35]:

$$w\left( {a, b} \right) = \frac{1}{{\sigma \sqrt {2\pi } }}{e^{- \frac{1}{{2{\sigma^{2}}}}\left[ {{{\left( {a - \mu } \right)}^{2}} + {{\left( {b - \mu } \right)}^{2}}} \right]}}$$

where a and b are the coordinates of the kernel in horizontal and vertical directions/axis. Parameters μ and σ are the mean and standard deviation of the Gaussian distribution function. The Gaussian functions provides almost symmetric curve (also known as Gaussian shape) in two dimensions. The main objective of GIP technique is that the neighboring pixel values within a specific region or window in the image often have the similar kinds of properties, and thus, they are highly correlated with each other. For the visualization purpose, the 3-level decomposition of cameraman image of size 256 × 256 using GIP is shown in Fig. 2 but GIP can decompose an image of size 1 × 1 at the lowest level.

3 Proposed content-based image retrieval scheme

The main purpose of the proposed CBIR scheme is to provide the effective and efficient image retrieval system using color, texture and shape visual features/moments. The most of existing CBIR schemes have high dimensional visual descriptors which causes slower retrieval speed and requires high retrieval time. The presented scheme not only constructed the low dimensional visual feature descriptor but also gives the comparative retrieval accuracy with existing schemes. In the presented work, the color feature descriptor has been constructed by computing color moments from the probability histogram model while the texture visual feature descriptor are calculated using DCT and GLCM tools. At last the shape visual features are extracted using GIP based multi-resolution sub-images and geometric moments. Thereafter, the low dimensional of feature descriptor is obtained by the proficient combination of color, texture and shape visual features/descriptors. In CBIR process, the similarity metric is measured between the feature descriptors of digital repository images and given query image. The preprocessing, color, texture and shape based image feature extraction techniques will be presented in the following subsections.

3.1 Preprocessing

The preprocessing approach consist of three steps i.e, histogram equalization, sharpening and cropping of an image. The whole process of preprocessing is shown in Fig. 3. Generally, the images are captured in different environments. Thus foreground and background of these images are bright or dark. The histogram equalization [5] is widely used technique to enhance the contrast by adjusting the intensity values for improving the quality of original image data. However, the human visual perception is highly sensitive to edges, isolated points and fine details of an image. In order to find detailed thin lines and isolated points, the Laplacian filter [9] is considered for computing much finer detailed image, since it is based on the second-order partial derivative while other filters like prewitt, sobel are rely on first-order partial derivatives. Let I(x, y) be an RGB color image, then the Laplacian operator can be defined as:

$$ {\nabla^{2}}\left[ {I(x, y)} \right] = \left[ \begin{array}{l} {\nabla^{2}}R(x, y)\\ {\nabla^{2}}G(x, y)\\ {\nabla^{2}}B(x, y) \end{array} \right] $$

(6)

where R, G and B represent the red, green and blue color components respectively; x and y are horizontal and vertical coordinates. The Laplacian filter is applied on each color component of an RGB image individually. The Laplacian derivate for an Image I_cc(x, y), cc ∈{R, G, B} is defined as

$$ \begin{array}{l} {\nabla^{2}}\left[ {{I_{cc}}(x, y)} \right] = \frac{{{\partial^{2}}{I_{cc}}(x, y)}}{{\partial {x^{2}}}} + \frac{{{\partial^{2}}{I_{cc}}(x, y)}}{{\partial {y^{2}}}}\\ = {I_{cc}}(x + 1, y) + {I_{cc}}(x - 1, y) + {I_{cc}}(x, y + 1)\\ + {I_{cc}}(x, y - 1) + {I_{cc}}(x - 1, y - 1) + {I_{cc}}(x + 1, y - 1)\\ + {I_{cc}}(x - 1, y + 1) + {I_{cc}}(x + 1, y + 1) - 8{I_{cc}}(x, y) \end{array} $$

(7)

The filter mask of size 3 × 3 with center − 8 is obtained by (7). The mask is shown in (8) which covers horizontal, vertical and diagonal edges of objects in the image. Since Laplacian filter has provided the filtered image using second order partial derivative which drives constant areas of image to zeros.

$$ \left[ {\begin{array}{*{20}{c}} 1&~1&~1\\ 1&{-8}&~1\\ 1&~1&~1 \end{array}} \right] $$

(8)

During the filtering process, some amount of information in the image is lost. To restore this information and obtain an enhanced/sharpened image g, the Laplacian filtered image L1 is subtracted from the original image I as calculated:

$$ g = I - {L_{1}} $$

(9)

After histogram equalization and sharpening process, the obtained color image is cropped from the center. The color image of size R × S is decomposed into two parts i.e., peripheral and central regions which is depicted in Fig. 3d, where R and S represent the rows and columns of the image. It is very common that the major object lies on the central position of the image. Therefore, the proposed scheme consider only the central area and eliminate the peripheral area of the image. The cropped image is shown in Fig. 3e which will be considered for the feature extraction process.

The major steps for the preprocessing of color image are described in Algorithm 1.

The image obtained in the step 6 will be adapted for the color and texture feature extraction process.

3.2 Probability histogram based color moments

Color visual feature is one of most widely used image content due to its computational simplicity and invariant properties with rotation, scaling, translation and any other spatial transformation [26]. The appropriate global feature representation is performs better than local feature representation since it is minimized the computational overhead and increase image retrieval performance into a certain extent. The statistical color moments [43] represent the distribution of colors in the image and it can be computed from image in any color model but in the proposed scheme the HSV color model is taken into account for color feature representation. The HSV color model is chosen due to its human visual perception property. In accordance with the three elements of color vision characteristics of human eyes, HSV is more in line with human visual perception than commonly used other color space, and it can well reveal the visual consistency to human eyes [40]. In the presented work, cropped RGB enhanced color image is obtained in step 6 using Algorithm 1 and this RGB enhanced color image cropped image is again converted into HSV color image and decomposed into its hue (H), saturation(S) and value (V) color planes/components. After that, the histograms of each color component have been constructed and corresponding probability histograms have been computed for calculating moments. The statistical color moments such as mean, standard deviation, skewness and kurtosis from each probability histogram of color planes is computed for the formation of color feature descriptor. Then statistical moments can be calculated as

$$ \small \begin{array}{l} {\text{Mean}} {\mu_{CC}} = \sum\limits_{r = 0}^{L - 1} {rP(r)} \\ {\text{Standard}} {\text{deviation}} {\sigma_{CC}} = \sqrt {\sum\limits_{r = 0}^{L - 1} {{{(r - {\mu_{CC}})}^{2}}P(r)} } , \\ {\text{Skewness}} s{k_{CC}} = \frac{1}{{{\sigma^{3}}}}\sum\limits_{r = 0}^{L - 1} {{{(r - {\mu_{CC}})}^{3}}P(r)} \\ {\text{Kurtosis}} {k_{CC}} = \frac{1}{{{\sigma^{4}}}}\sum\limits_{r = 0}^{L - 1} {{{(r - {\mu_{CC}})}^{4}}P(r)} \\ {\text{where}} CC \in \{ {H_{H}}, {S_{H}}, {V_{H}}\} {\text{and}} {\text{Probability}} P\left( r \right) = \frac{{{\text{Number}} {\text{of}} {\text{Pixels}} {\text{at}} {\text{bin}} 'r^{\prime}}}{{Width \times Height}} \end{array} $$

(10)

where CC ∈{H_H,S_H,V_H} represents the probability histogram of color components of HSV color image and pixel values are ranging from 0 − (L − 1). Mean value represents brightness and the average color information of the image while standard deviation shows the contrast of the image and it measures the distribution of the pixel values about mean in the histogram. The skewness measures the skewed pixel values of the image about mean in histogram. The kurtosis calculates the peakness of the pixel values of the image about mean in histogram. The collective color moments of all three probability histograms of HSV color planes has constructed the 12-D feature descriptor. Hence color feature descriptor is defined as

$$ F{V_{Color}} = \left\{ {{\mu_{CC}}, {\sigma_{CC}}, s{k_{CC}}, {k_{CC}}} \right\} , CC \in \{ {H_{H}}, {S_{H}}, {V_{H}}\} $$

(11)

3.3 Gray level co-occurrence matrix based texture moments

In this section, the proposed the texture feature extraction technique is presented. Initially, the color components are estimated using block level DCT tool and collect the DC coefficients from each block. These DC coefficients are arranged in such way that they form the DC matrix/sub-image. The formation of DC Matrix is depicted in Fig. 4.

Thereafter, the CV s of three different groups of AC coefficients are computed and formed corresponding three matrices/sub-images separately. These matrices are known as CV matrices. The formation of these matrices of AC coefficients are similar to the formation of DC matrix. In this way, we have got four matrices i.e., one is DC matrix and remaining three matrices are based on the AC coefficients known as CV matrices. As we know that the DCT blocks are co-related and they have interrelated information which indicates that the DCT coefficients describe the local feature information inside each block (Intra-Block) but more DCT coefficients based global features are integrated by providing spatial information between each block and its neighbors block (Inter-Block). The GLCM is one of the statistical tool which provides the spatial/inter-related information between the pixel values of image block at a particular distance and at specific orientation. Here, we have DC matrix which contains the DC coefficients of various DCT blocks and three CV matrices having information of AC coefficients of various DCT blocks. In order to provide inter-related information between DCT blocks, GLCMs of DC matrix and three CV matrices of AC coefficients have been used for extracting texture features since it provides the spatial/inter-related information among matrix elements at a specific distance and particular orientations. The presented work uses various GLCMs in four and eight different directions with unit distance separation among the matrix elements using the symmetric property and corresponding normalized GLCMs (NGLCMs) have been computed. The computation details of these matrices are discussed in Section 2.2. The statistical parameters such as contrast, correlation, energy and homogeneity are widely used for characterizing the texture properties of the image. These four features have been extracted from image in the existing CBIR schemes [36, 53] where the retrieval results have been reported the satisfactory. For formation of the texture feature descriptor, these four statistical texture parameters are computed from all NGLCMs, those are obtained from DCT coefficients based matrices. These statistical moments are defined as

$$ \begin{array}{l} Contrast {f_{1}} = \sum\limits_{i = 1}^{G} {\sum\limits_{j = 1}^{G} {{{\left( {i - j} \right)}^{2}}{P_{ij}}} } \\ Correlation {f_{2}} = \sum\limits_{i = 1}^{G} {\sum\limits_{j = 1}^{G} {\frac{{\left( {i - {\mu_{i}}} \right)\left( {j - {\mu_{j}}} \right)}}{{{\sigma_{i}}{\sigma_{j}}}}} } {P_{ij}}, {\sigma_{i}}, {\sigma_{j}} \ne 0\\\ Energy {f_{3}} = {\sum\limits_{i}^{G}} {{\sum\limits_{j}^{G}} {{P_{ij}}^{2}} } \\ Homogeneity {f_{4}} = {\sum\limits_{i}^{G}} {{\sum\limits_{j}^{G}} {\frac{{{P_{ij}}}}{{1 + \left| {i - j} \right|}}} } \end{array} $$

(12)

$$\begin{array}{l} {\text{where}} \\ {\mu_{i}} = \sum\limits_{i = 1}^{G} {\sum\limits_{j = 1}^{G} {i{P_{ij}}} } , {\mu_{j}} = \sum\limits_{i = 1}^{G} {\sum\limits_{j = 1}^{G} {j{P_{ij}}} } \\ {\sigma_{i}} = \sum\limits_{i = 1}^{G} {\sum\limits_{j = 1}^{G} {{{\left( {i - {\mu_{i}}} \right)}^{2}}{P_{ij}}} } , {\sigma_{j}} = \sum\limits_{i = 1}^{G} {\sum\limits_{j = 1}^{G} {{{\left( {j - {\mu_{j}}} \right)}^{2}}{P_{ij}}} } \end{array}$$

where μ_i and μ_j are the means; σ_i and σ_j are the standard deviations. The contrast feature f₁ measures contrasts between a value and its adjacent values over the matrix/image block and it represents the variation among matrix elements in the texture; The correlation feature f₂ is considered to measure how a matrix element is correlated to its elements over the matrix/image block. Energy feature f₃ represents the sum of the squared elements in NGLCM and sometimes it is also called as angular second moment or uniformity of energy. If energy is 1, image block is said to be a constant. The homogeneity feature f₄ measures the closeness of the distribution of elements in the NGLCM to the NGLCM diagonal and homogeneity is always one for a diagonal matrix. Let us consider a image component I, then feature vector (FV_I) is formed as

$$ FV_{_{I}} = \{ {f_{1}}, {f_{2}}, {f_{3, }}{f_{4}}\} $$

(13)

In the presented paper, initially image component I is decomposed into N × N blocks and each block is transformed using DCT tool. After that, we have constructed DC matrix using collected DC coefficients (or energies of image blocks) and CVs matrices of different groups of AC coefficients. For image component I, let FV_DC, $FV{_{AC\_G_{1}}}$, $FV{_{AC\_G_{2}}}$ and $FV{_{AC\_G_{3}}}$ feature vectors of the DC matrix and CVs matrices of three groups of AC coefficients, where all these features have been computed by (13). The single feature vector $FV{_{DCT\_I}}$ for image component I is obtained as follows:

$$ F{V_{DCT\_I}} = [F{V_{DC}}, F{V_{AC\_G_{1}}}, F{V_{AC\_G_{2}}} , F{V_{AC\_G_{3}}}] $$

(14)

Now, we will describe the procedure for constructing the feature descriptor of an RGB color image in brief. Initially, an RGB color image is decomposed into its red(R), green(G) and blue(B) color planes and each color plane is divided into non-overlapping N × N blocks. Subsequently, all the blocks are transformed using DCT tool and we have constructed matrices using specific arrangement of DCT coefficients. Let $FV{_{DCT\_R}}$, $FV{_{DCT\_G}}$ and $FV{_{DCT\_B}}$ are the visual feature vectors/descriptors of red(R), green(G) and blue(B) components respectively, where these feature descriptors have been computed using (14). Then, the single RGB feature descriptor is obtained as

$$ F{V_{DCT\_RGB}} = [FV{_{DCT\_R}}, FV{_{DCT\_G}}, FV{_{DCT\_B}}] $$

(15)

3.4 Multi-resolution based shape moments

Shape is one of the most important feature descriptor to identify the objects [13] in the image. These objects can be recognize by human beings solely based on their shapes in the image significantly. The main purpose of the shape representation is to determine attributes of object, those attributes are used in the matching process during image retrieval. In general, there are two ways of extracting shape features from image; first is edge based method; second is region based method. Since in CBIR applications, the invariant properties are essentially required for effective retrieval of images from the large scale database. Due to the efficient representation of shape descriptors, the moments have been considered as a pattern features for the development of many image retrieval applications [30, 46]. Sometimes the most of the object/ shape features of an image is not recognizable in a single resolution but these features can be visualized in different resolution levels. So in this paper, shape features based on moments have been exploited from gray scale image using GIP multi-resolution approach [51]. Let us consider two dimensional discrete image f(x, y), moments of order p and q of discrete image f(x, y) is defined as

$$ {m_{p q}} = \sum\limits_{x} {\sum\limits_{y} {{x^{p}}{y^{q}}f(x,y)} } , \forall p, q = 0, 1, 2 $$

(16)

where x and y are spatial coordinates of the image. The central moments are defined as

$$ {\mu_{pq}} = \sum\limits_{x} {\sum\limits_{y} {\left( {x - \overline x } \right)\left( {y - \overline y } \right)f(x,y)} } $$

(17)

where $ \overline x=m_{10}/m_{00}$, $ \overline y=m_{01}/m_{00}$, are known as the center of region. Hence center moments of order three can be calculated as:

$$ \begin{array}{l} {\mu_{00}} = {m_{0 0}}\\ {\mu_{10}} = 0\\ {\mu_{01}} = 0\\ {\mu_{11}} = {m_{11}} - \overline y {m_{10}}\\ {\mu_{20}} = {m_{20}} - {\overline x^{2}}{m_{10}}\\ {\mu_{02}} = {m_{02}} - \overline y {m_{01}}\\ {\mu_{30}} = {m_{30}} - 3\overline x {m_{20}} + 2{m_{10}}{\overline x^{2}}\\ {\mu_{21}} = {m_{21}} - 2\overline x {m_{11}} - \overline y {m_{20}} + 2{\overline x^{2}}{m_{01}}\\ {\mu_{12}} = {m_{12}} - 2\overline y {m_{11}} - \overline x {m_{02}} + 2{\overline y^{2}}{m_{10}}\\ {\mu_{03}} = {m_{03}} - 3\overline y {m_{02}} + 2{\overline y^{2}}{m_{01}} \end{array} $$

(18)

The central moments of order p and q are normalized as

$$ {\mu_{pq}} = {\mu_{pq}}/{\mu^{\gamma} }_{00}, \forall p,q = 0, 1, 2,... $$

(19)

where γ = (p + q)/2 + 1. The set of seven moments (ϕ₁ − ϕ₇) for (p + q) = 2,3,... can be calculated as follows:

$$ \begin{array}{l} {\phi_{1}} = {\mu_{20}} + {\mu_{02}}\\ {\phi_{2}} = {\left( {{\mu_{20}} + {\mu_{02}}} \right)^{2}} + {\left( {4{\mu_{11}}} \right)^{2}}\\ {\phi_{3}} = {\left( {{\mu_{30}} + 3{\mu_{12}}} \right)^{2}} + {\left( {3{\mu_{21}} - {\mu_{03}}} \right)^{2}}\\ {\phi_{4}} = {\left( {{\mu_{30}} + {\mu_{12}}} \right)^{2}} + {\left( {{\mu_{21}} - {\mu_{03}}} \right)^{2}}\\ {\phi_{5}} = \left( {{\mu_{30}} + 3{\mu_{12}}} \right) + \left( {{\mu_{30}} + {\mu_{12}}} \right)\left[ {{{\left( {{\mu_{30}} + {\mu_{12}}} \right)}^{2}} - 3{{\left( {{\mu_{21}} + {\mu_{03}}} \right)}^{2}}} \right]\\ + \left( {3{\mu_{21}} + {\mu_{03}}} \right)\left( {{\mu_{21}} + {\mu_{03}}} \right)\left[ {3{{\left( {{\mu_{30}} + {\mu_{12}}} \right)}^{2}} - {{\left( {{\mu_{21}} + {\mu_{03}}} \right)}^{2}}} \right]\\ {\phi_{6}} = \left( {{\mu_{20}} - {\mu_{02}}} \right)\left[ {{{\left( {{\mu_{30}} + {\mu_{12}}} \right)}^{2}} - 3{{\left( {{\mu_{21}} + {\mu_{03}}} \right)}^{2}}} \right] \\~~~~~~~+ 4{\mu_{11}}\left( {{\mu_{30}} + {\mu_{12}}} \right)\left( {{\mu_{21}} + {\mu_{03}}} \right)\\v {\phi_{7}} = \left( {3{\mu_{21}} - {\mu_{03}}} \right)\left( {{\mu_{30}} - {\mu_{12}}} \right)\left[ {{{\left( {{\mu_{30}} + {\mu_{12}}} \right)}^{2}} - 3{{\left( {{\mu_{21}} + {\mu_{03}}} \right)}^{2}}} \right]\\ - \left( {{\mu_{30}} - 3{\mu_{03}}} \right)\left( {{\mu_{21}} + {\mu_{03}}} \right)\left[ {3{{\left( {{\mu_{30}} + {\mu_{12}}} \right)}^{2}} - {{\left( {{\mu_{21}} + {\mu_{03}}} \right)}^{2}}} \right] \end{array} $$

(20)

The six moments ϕ₁ − ϕ₆ are the invariant with size, orientations and translations while ϕ₇ is invariant to skewness which is used to distinguish the mirror images. The set of seven central moments represents the feature descriptor of an image and it is obtained as:

$$ F{V_{mu}} = \left[ {{\phi_{1}}, {\phi_{2}},{\phi_{3}},{\phi_{4}},{\phi_{5}}, {\phi_{6}}, {\phi_{7}}} \right] $$

(21)

Further, the set of seven central moments are computed from each multi-resolution decomposed gray scale image and collective representation of all multi-resolution set of seven central moments form a feature descriptor. Let FV_M be the final shape feature descriptor in a multi-resolution levels which is obtained as:

$$ F{V_{M}} = [FV_{mu1}, F{V_{mu2}}, F{V_{mu2}}] $$

(22)

where FV_mu1,FV_mu2, and FV_mu2 represent the feature descriptors of multi-resolution images or sub images at 1, 2 and 3 levels respectively.

3.5 Fused features

Let $F{V_{Color}} = \left \{ {{f_{c1}}, {f_{c2}}, {f_{c3}},..., {f_{cn}},} \right \}$, $F{V_{Texure}} = \left \{ {{f_{t1}}, {f_{t2}}, {f_{t3}},..., {f_{tn}},} \right \}$ and $F{V_{Shape}} = \left \{ {{f_{s1}}, {f_{s2}}, {f_{s3}},..., {f_{sn}},} \right \}$ be the color, texture and shape feature descriptors respectively, where cn, tn and sn represents the number of color, texture and shape feature components respectively. Generally image in nature is complex and clumsy and it is recognized and identified by its visual contents like color, shape and/or texture. To represent the characteristics of color, shape and texture features simultaneously, specific fusion technique has been proposed. The whole process of feature extraction is represented in Fig. 5.

The feature components should be normalized to make the same range/scale of components because the different multimedia data have features/components of different ranges/scales; hence it is necessary to normalize the features and avoiding the component variances during the similarity measurement process. This is especially important for distinguishing different kinds of images. The major summarized steps for image feature representation is given in Algorithm 2.

3.6 Similarity measurements and image retrieval

In this section, author will describe the proposed similarly measurement along with some existing similarity distances. Let the fused feature descriptor of the query image is FV_Q = [F_Q1,F_Q2,F_Q3,...,F_Qn] and fused feature descriptor of target images in the database is FV_T = [F_T1,F_T2,F_T3,...,F_Tn], where n represent the length of the feature descriptor. These feature descriptors have been computed using (23) of Algorithm 2. The main aim of the similarity measure is to get the top most desired images from digital repository those are similar to the given query image. In the presented work, the similarity distance (D_mn) based on minimum and maximum values of feature descriptors is proposed, known as min-max distance. It is defined as:

$$ \small {\Delta} {D_{mn}} = \sum\limits_{i = 1}^{dd} {\sqrt {\frac{{\left| {\max \left\{ {F{V_{Qi}}, F{V_{Ti}}} \right\}} \right|}}{{\left| {\min \left\{ {F{V_{Qi}}, F{V_{Ti}}} \right\}} \right| + \in }}} } , {\text{where}} 0 < \in < 0.5, i = 1,2,...,dd $$

(24)

Where, dd represents the dimension of the fused feature descriptor. Feature Vectors FV_Qi and FV_Ti are the fused feature descriptors of query image and target image of dataset. In this distance, it is obvious that the sometimes the minimum value between two descriptors is zero. So if it is zero, then small quantity i.e., ∈ must be assign in the denominator to avoid the undefined distance value.

This paper also uses the Euclidean distance to find the similarity between the query image and target images in the database. It computes the distance between feature descriptors of the two images by taking the square root of the sum of squaring their absolute differences. The Euclidean distance ΔD: It is computed as:

$$ {\Delta} D = \sum\limits_{i = 1}^{dd} {\sqrt {{{\left| {F{V_{Qi}} - F{V_{Ti}}} \right|}^{2}}} } , i = 1,2,...,dd \ $$

(25)

The smaller distance represent the better retrieval result in terms of relevancy. If the distance is zero, then two images are identical. The block diagram of the proposed image retrieval scheme is depicted in Fig. 6.

The major algorithmic steps of the proposed image retrieval system are presented in Algorithm 3.

4 Experimental results and discussion

In this section, the experimental results are discussed and analyzed. The retrieved performance is also compared and discussed with some other state of arts CBIR schemes.

4.1 Databases

The retrieval performance of the proposed CBIR scheme is validated on three standard Corel-1K [16], OT-8 dataset [25] and GHIM-10K [19] image datasets. The Corel-1K image dataset consists of 1000 images and it is divided into 10 different categories, where each category has 100 similar types of images. The semantic names of each category images are people, building, food, horse, bus, flower, elephant, mountain, beach and dinosaur. All images are in the JPEG format and the sizes of them are either 384 × 256 or 256 × 384. The sample images of Corel image dataset is depicted in Fig. 7. The Corel image dataset has different variety of images and it meets all the requirements to perform image retrieval due to the diverse contents in the images. The OT dataset has divided into 8 categories consisting of 2688 images, where each category has a different number of images. The semantic names and number of images of each category are coasts (360), open countrys (410), forests (328), mountains (374), highways (260), streets (292), inside cities (308) and tall buildings (356). The forest images include all forest and rivers scenes since the most of the images have the sky objects and there is no specific sky scene image. The images of this dataset have the diverse contents and mixed with other category images. These images are in JPEG format and size of each image is 265 × 265. This dataset is an imbalanced dataset because each category has a different number of images. The sample images of each category of OT dataset is shown in Fig. 8. The GHIM-10K database consists of 10000 images and it is categorized into 20 categories where each category contains 500 similar types of images of size 400 × 300 or 300 × 400 in JPEG format. The semantic names of images are beaches, flowers, horses, ships, flies, cars, bikes, insects etc., The sample images of GHIM-10K dataset are depicted in Fig. 9, where single image from each category of dataset has been taken.

The proposed CBIR scheme has been implemented on MATLAB 2013b using computer configuration Intel(R) Core i3, 2.27GHz processor with 6 GB RAM and Microsoft window 7 ultimate with 32-bit operating system.

4.2 Performance evaluation metrics

The retrieval performance of CBIR systems is measured by two standard metrics i.e., precision and recall. These metrics represent the retrieval relevancy of the images based on the query image. The precision of image retrieval system based on the query image q is defined as:

$$ \begin{array}{l} P(q) = X/Y \end{array} $$

(26)

The precision 100.00% means all the retrieved images from the database are relevant. The recall of image retrieval system based on the query image q is defined as:

$$ \begin{array}{l} R(q) = X/Z \end{array} $$

(27)

where X represents the total relevant retrieved images; Y is the total retrieved images from database and Z is the number of relevant images available in the image database category wise. The precision and recall are not alone capable to give the whole effectiveness of image retrieval system. Hence, a weighted harmonic mean of them is defined and it is known as F-score or F-measure. The F-score is computed as

$$ F(q) = = \frac{{\left( {{\beta^{2}} + 1} \right) \times P(q) \times R(q)}}{{{\beta^{2}} \times P(q) + R(q)}}, {\beta^{2}} \in [0, \infty ] $$

(28)

The parameter β is to consider to balance the ratio of precision and recall by selecting the appropriate weight. If β = 1, then it is said to be balanced. Thus, F-score is called as F1-score. The parameter β = 1 is mostly used in general CBIR systems. Then the F-score is re-written as

$$ F(q) = \frac{{2 \times P(q) \times R(q)}}{{P(q) + R(q)}} $$

(29)

The values of the precision, recall, and F-score are normally lies between 0 to 1, but they are also written in percentages.

4.3 Quantitative and qualitative results

In the proposed scheme, the author have extracted the three kinds of visual contents i.e., color, texture and shape. Before doing retrieval, we have applied preprocessing technique to remove the distorted/ unwanted information from the image. Thereafter, the color based visual contents have been extracted by computing four statistical parameters from the color components of the HSV color image. So the 12-D dimension of the color feature descriptor is obtained which is very small as compared to the original size of the image. Next, we have computed the texture features using DCT and GLCM tools. In this texture feature extraction based technique, an RGB preprocessed color image is taken and decomposed it into red, green and blue components. These three components are divided into 8 × 8 fixed size blocks and each block is operated by the DCT tool. Then, DC coefficients (i.e. energy of the block) of all blocks are arranged into matrix form and selected only 27 AC coefficients from each block which shows the significant information of each block. Now, the selected AC coefficients are divided into groups where each group has the similar number of AC coefficients. After that, CV of each group has been computed and these CVs have been arranged into a matrix forms. In this way, four matrices have been achieved i.e., one for DC coefficient and three for AC coefficients. Once, we have got the DC and AC coefficients based matrices, four GLCM features from both the matrices have been computed. In this way, the dimension of texture feature descriptor will be M_n × C_n × GL_n = 48, where M_n = 4 number of DC and AC coefficient based arranged matrices, C_n = 3 number of color components and number of texture moments GL_n = 4. These GLCM features have been tested with 4 and 8 different directions using Corel image dataset in combination with other features. Lastly, the multi-resolution based shape features have been computed from the image. Initially, original an RGB color image is converted into the gray scale image. Then the Gaussain image pyramid has been applied upto three level and seven moments from each level is calculated. So, the dimensional of the shape feature descriptor will be 21-D. Finally, the author has concatenate all three feature descriptors to get a single feature descriptor which has overall dimension of 12 + 48 + 21 = 81-D. Table 1 shows the component values of the fused feature vector descriptor of bus sample image of the Corel-1K image dataset, where each values is lies between [-1 1], since it is normalized individually in (23). The first 12 values represent the color moments, next 48 values represent the texture moments and last 21 values represent the shape moments of the image.

Table 1 The component values of fused feature descriptor of bus image of Corel-1K image dataset

Full size table

Table 2 shows the retrieval accuracy in terms of precision, recall and F-score in percentages for different directional GLCM texture features using Euclidean distance and above mentioned single feature descriptor, where only top L = 20 images are retrieved from Corel-1K image dataset. In this table, it is observed that the dinosaur images have 100.00% precision in 4 directions (i.e., two cases) and 8 directions. Since, these images do not have much structural information and complex attributes. The lowest precision is vary from directions to directions since may be one category image has most prominent features in one direction while other category has in other directions. For horizontal and vertical directions, beaches images have generated the lowest precision i.e., 55.00% while this precision reduces to 50.00% for diagonal directions. Since, beaches images have also the some features from other category like mountain, people and elephant images. Therefore, it is hard to driven the significant image features from such kinds of images. In 8-directions, two kinds of image category i.e., beaches and elephant images have the lowest precisions. In Corel-1K image database, people, beaches, elephants and mountains images requires high quality feature extraction algorithms to extract good enough significant image features so that results will not be effected. It is found that by Table 2, average the precision, average recall and average F-score for 4 angles (0 − 180⁰ and 90⁰ − 270⁰) are 72.50% 14.50% and 24.17% while these becomes 70.50% 14.10% and 23.50% for another 4 angles (45⁰ − 225⁰ and 135⁰ − 315⁰) respectively which means that accuracy is little bit decreased. But the combined results for 8 angles i.e., horizontal, vertical and diagonal directions (0 − 180⁰, 90⁰ − 270⁰, 45⁰ − 225⁰ and 135⁰ − 315⁰) is improved and it produces the 73.50% 14.70% and 24.50% average the precision, average recall and average F-score respectively.

Table 2 The Precision, recall and F-score in percentages (%) for top L = 20 retrieved images from Corel-1K dataset using different directional texture features with color and shape descriptors based on Euclidean distance

Full size table

In the proposed image retrieval scheme, the experimental results are tested on two kinds of similarity measures. First is Euclidean distance while other distance is suggested by the author. The proposed distance is totally depends on the minimum and maximum values between the feature vectors of query image and target images. This distance has provided the good results but little bit less than the Euclidean similarity measurement. The main advantage of suggested distance is that it has low computational overhead as compared to the Euclidean similarity measurement. Table 3 depicted the retrieval results for top L = 20 images using suggested and Euclidean distances for eight directional texture features with color and shape visual contents. For Euclidean distance, the minimum precision recall and F-score are 55.00%, 11.00% and 18.00% have been obtained for beaches and elephant images while newly proposed distance has given lowest precision (30.00%), recall (6.00%) and F-score (10.00%) for building image category. It is noticed from the table that the retrieval accuracy is decreased from 73.50% to 54.00% from Euclidean to proposed distance using Corel-1K image database. But it may be possible that the suggested distance will be produced good results for any real life applications or for other datasets. The experimental results are also computed for other large size image dataset where, author has found that very little change in the retrieval accuracy between Euclidean distance and suggested distance.

Table 3 The Precision, recall and F-score in percentages (%) for top L = 20 retrieved images from Corel-1K dataset using Euclidean distance and proposed distance

Full size table

The proposed CBIR scheme is also validated on OT-8 dataset which is imbalance in nature. Table 4 shows the retrieval results in terms of precision, recall and F-score for top L = 20 retrieved images from OT-8 dataset. It is clear from the table that the street category images recieved the 100.00% precision for Euclidean distance while it is decreased to 75.00% precision for the proposed distance. The lowest retrieval results for Euclidean distance (precision 45.00%) and proposed distance (precision 35.00%)is achieved by tall building image category, since the contents of these images have mixed with the contents of other category images. So it is very difficult for the proposed feature vector descriptor to distinguish the actual contents of the images. The average precision, average recall and F-score are 68.13%, 4.17% and 7.84% for Euclidean distance while these matrices are decreased to 52.50%, 3.24% and 6.10% for proposed distance. But the overall retrieval performance is satisfactory in terms of image categorwise as well as whole average metrices(i.e., precision, recall and F-score).

Table 4 The Precision, recall and F-score in percentages (%) for top L = 20 retrieved images from OT-8 dataset using Euclidean distance and proposed distance

Full size table

The proposed scheme is also validated on GHIM-10K image dataset, where it produces the satisfactory retrieval results. Table 5 shows the retrieval accuracy in terms of precision, recall and F-score using Euclidean distance and proposed distance, where top L = 20 images have been retrieved from GHIM-10K database. In this table, horses and airplane images have obtained the lowest retrieval performance i.e., precision (35.00%), recall(1.40%) and F-score (2.69%) using Euclidean distance while proposed distance has provided the lowest precision (20.00%), recall(0.80%) and F-score (1.53%) for wall image category. The highest retrieval performance has given by the fireworks images in both the Euclidean distance and proposed distance. The average retrieval performance i.e., average precision (52.25%), average recall(2.09%) and average F-score (4.02%) for Euclidean distance has been achieved while average precision (48.00%), average recall(1.92%) and average F-score (3.69%) have been obtained by proposed distance. Here, it has been observed that the precision decreased from 52.25% to 48.00% using Euclidean distance and proposed distance. This difference is very low which is acceptable for any natural image dataset and proposed distance also required low computational overhead in the retrieval process due to not considering the square root computation.

Table 5 The Precision, recall and F-score in percentages (%) for top L = 20 retrieved images from GHIM-10K dataset using Euclidean distance and proposed distance

Full size table

4.4 Comparative results and discussion with related state of art CBIR schemes

To check the validity of the proposed image retrieval scheme, the experimental results has also compared with some recently developed CBIR schemes [3, 8, 31, 32, 34, 47] in terms of retrieval accuracy. Alamin et al. [32] have computed color and texture visual descriptors in singular value decomposition (SVD) domain, where initially an RGB color image transformed into HSV color image and all color components i.e., hue(H), saturation(S) and value(V) of it are divided into non-overlapping blocks. Then the SVD tool is employed on each block of all color components to compute the color-texture information of an image by discarding non-significant singular values. In this particular work, building category images have been got the lowest precision i.e., 24.00% while highest precision has achieved by the dinosaur category images. Somehow overall average precision is acceptable i.e., 63.00%. Rahimi et al. [31] have also suggested a color-texture features based CBIR scheme, where they have extracted the color information from color components, i.e. red, green and blue using spatial relationship between pixel values. The texture information has been computed from the manual segmented image regions where each region is operated by dual tree complex wavelet transform (DT-CWT) and SVD tool. Since the segmentation based on human visual perception is not suitable in image classification. Therefore, this scheme has not produced the good retrieval results in Corel image dataset and the average precision has been obtained is 49.69%. Fadaei et al. [8], have extracted color information from uniform division of H, S and V color planes of HSV color image using dominant color descriptor (DCD) technique and this color information has been integrated with wavelet and curvelet based texture information. The integrated information is optimized using the particle swarm optimization (PSO) algorithm. In the DCD proposed method, the dinosaur images has produced the best precision (99.75%) while the worst precision (45.05%) has been achieved by the mountain image category and the average precision of this DCD technique is obtained as 71.05%. But once the integrated information is optimized by PSO algorithm, the average precision has become 76.50%. Ashraf et al. [3] have proposed CBIR scheme based on fused features of edge histogram and discrete wavelet domain information. After construction of fused feature database, the Artificial Neural Networks (ANN) is applied on it and precision and recall values have been computed based on the retrieved images from the dataset. The building image category has the worst retrieval precision (50.00%) and highest precision (100.00%) has been obtained by dinosaur image category. The average precision of this scheme is 0.73.50% which is equivalent to the our proposed CBIR scheme. Abdolreza et al. [34] have computed the multi-resolution (wavelet) and color information features using DWT and color histogram techniques. The most relevant information features have been selected using ant colony optimization selection technique which maximize image retrieval accuracy of CBIR system. In this work, mountain image category has the lowest precision (39.80%), while the dinosaur image category has the highest precision (99.80 %). Vimina et al. [47] have proposed CBIR scheme based on a multi-cue fusion approach for BoVW framework using early and late fusion methods. For fusion, a composite edge, Speeded Up Robust Features and color visual feature descriptors have been extracted to represent local regions of the image effectively. Thereafter, independent vocabularies of these visual feature descriptors have been used to form the histograms. The histograms are further fused to characterize the image. Based on the histograms, the retrieval process has been performed and satisfactory image retrieval accuracy has been gained. This scheme has obtained the worst results for building image category while dinosaur has the highest retrieval results. In the discussed CBIR scheme, the image features have been extracted from both the spatial and transform domains but fusion methods are complicated. But in our proposed CBIR scheme, the image visual feature descriptors have been extracted from both spatial and transform domain effectively and very simple fusion technique is applied and produced the satisfactory results in most of the instances. It also produces the 100.00% precision for dinosaur image category and the worst precision (55.00%)for the beach image category. The average precision is somehow good as compare to the other existing CBIR methods. The comparative Table 6 shows the retrieval performance in terms of precision, recall and F-score with respect to all category images of Corel 1K dataset, where top L = 20 images have been retrieved from the database. From Table 6, we observed that in the most of the cases the worst results have been obtained by the mountain, building and beach image category. Since these category images have similar kinds of visual contents like sky, ocean etc., so the actual contents are mixed with each other category images and also have the complex structures and shapes. Therefore, the proposed/existing feature extraction algorithms are unable to extract the actual contents and not always producing the satisfactory results. For such type of images, high level feature extraction algorithms are required.

Table 6 Comparison of proposed CBIR scheme with some recently developed schemes on Corel-1K image database in terms of average precision

Full size table

The proposed CBIR scheme is also compared with existing schemes in terms of the dimensionality of feature vectors. In Table 7, the feature vector (F V) dimension and average precison for Corel-1K image dataset has been given for four existing methods, where in the paper [3], the results are similar but feature vector dimension is higher than the dimension of proposed feature vector descriptor. Similarly, [40] has the highest feature vector dimension among all other existing methods but they have provided the lowest average precision. However, sometimes lowest feature vector dimension is also provided the good average precision [44]. In paper [47], the dimension of the feature vector descriptor is 86 and average precision is 69.20%. Based on our discussion, it is very clear that the extraction of significant feature vector descriptor with low dimesion is very important in image retrieval system without compromising the retrieval performance.

Table 7 Comparison of the proposed CBIR scheme in terms of the dimensionality of feature vectors

Full size table

4.5 Simulation results

To visualize the retrieval results,we have presented the different kinds of image categories based on Corel-1K dataset using Euclidean distance and new proposed distance. Based on Corel-1K dataset, for Euclidean distance, the lowest precision (i.e., 55.00%) has been obtained by elephant and beach category images and dinosaur image category have produced best precision (100.00%) while in case of proposed distance, the building and dinosaur images have the worst and best retrieval precision. Figure 10 shows the retrieval results for beach and dinosaur images using Euclidean distance while Fig. 11 depicted the results for building and dinosaur images using proposed distance.

5 Conclusions

In this paper, a low dimensional feature descriptor is constructed by using simple fusion of color, texture and shape moments effectively. The color moments have been computed from the pre-processed HSV color components using probability histogram model. The texture moments are calculated by determining an inter-relationship between DCT blocks and GLCM based statistics have been computed from them which provides significant textual information of the image. Lastly, the invariant shape moments of image at different resolutions of GIP have been computed since multi-resolution captures the significant information those are not covered in single resolution. The combined visual feature descriptor is tested on two distances; one is Euclidean and other is suggested distance. Both the distances provides the satisfactory retrieval results for three standard image databases i.e., Corel-1K, OT-8 and GHIM-10K. The proposed scheme will be also valid for any standard natural image dataset. The experimental results also shows that the proposed scheme out performs over the some other existing image retrieval schemes.

References

Abbass MY, Kim H (2018) Blind image separation using pyramid technique. EURASIP J Image Vid Process 2018(1):38
Article Google Scholar
Ahmed N, Natarajan T, Rao KR (1974) Discrete cosine transform. IEEE Trans Comput 100(1):90–93
Article MathSciNet MATH Google Scholar
Ashraf R, Ahmed M, Jabbar S, Khalid S, Ahmad A, Din S, Jeon G (2018) Content based image retrieval by using color descriptor and discrete wavelet transform. J Med Syst 42(3):44
Article Google Scholar
Bai C, Kpalma K, Ronsin J (2012) A new descriptor based on 2d dct for image retrieval. In: International conference on computer vision theory and applications (VISAPP), pp 4–pages
Bella MIT, Vasuki A (2019) An efficient image retrieval framework using fused information feature. Comput Electr Eng 75:46–60
Article Google Scholar
Chaki J, Parekh R, Bhattacharya S (2015) Plant leaf recognition using texture and shape features with neural classifiers. Pattern Recogn Lett 58:61–68
Article Google Scholar
El-ghazal A, Basir O, Belkasim S (2009) Farthest point distance: a new shape signature for fourier descriptors. Signal Process Image Commun 24 (7):572–586
Article Google Scholar
Fadaei S, Amirfattahi R, Ahmadzadeh MR (2017) New content-based image retrieval system based on optimised integration of dcd, wavelet and curvelet features. IET Image Process 11(2):89–98
Article Google Scholar
Gonzalez RC, Woods RE (2002) Digital image processing
Haralick RM, Shanmugam K, Dinstein IH (1973) Textural features for image classification. IEEE Trans Syst Man Cybern 6:610–621
Article Google Scholar
Huang M, Shu H, Ma Y, Gong Q (2015) Content-based image retrieval technology using multi-feature fusion. Optik-Int J Light Electron Opt 126 (19):2144–2148
Article Google Scholar
Jiang J, Feng G (2002) The spatial relationship of dct coefficients between a block and its sub-blocks. IEEE Transon Signal Process 50(5):1160–1169
Article Google Scholar
Khokher A, Talwar R (2017) A fast and effective image retrieval scheme using color-, texture-, and shape-based histograms. Multimed Tools Appl 76 (20):21787–21809
Article Google Scholar
Kothyari P, Dwivedi S, Mandoria HL (2016) Content based image retrieval using statistical feature and shape extraction. Int J Innov Res Comput Commun Eng 4:6
Google Scholar
Li S, Lee M-C, Pun C-M (2009) Complex zernike moments features for shape-based image retrieval. IEEE Trans Syst Man Cybern-Part A: Syst Humans 39(1):227–237
Article Google Scholar
Li J, Wang JZ (2008) Real-time computerized annotation of pictures. IEEE Trans Pattern Anal Mach Intell 30(6):985–1002
Article Google Scholar
Lin C-H, Chen R-T, Chan Y-K (2009) A smart content-based image retrieval system based on color and texture feature. Image Vis Comput 27 (6):658–665
Article Google Scholar
Liu G-H, Yang J-Y (2013) Content-based image retrieval using color difference histogram. Pattern Recogn 46(1):188–198
Article Google Scholar
Liu G-H, Yang J-Y , Li ZY (2015) Content-based image retrieval using computational visual attention model. Pattern Recogn 48(8):2554–2566
Article Google Scholar
Liu Y, Zhang D, Lu G, Ma W-Y (2007) A survey of content-based image retrieval with high-level semantics. Pattern Recogn 40(1):262–282
Article MATH Google Scholar
Liu Y-N, Zhang S -S, Sang Y, Wang S -M (2019) Improving image retrieval by integrating shape and texture features. Multimed Tools Appl 78(2):2525–2550
Article Google Scholar
Malik F, Baharudin B (2013) Analysis of distance metrics in content-based image retrieval using statistical quantized histogram texture features in the dct domain. J King Saud Univ-Comput Inform Sci 25(2):207–218
Google Scholar
Mora M, Tauber C, Batatia H (2005) Robust level set for heart cavities detection in ultrasound images. In: Computers in cardiology, 2005, pp 235–238. IEEE
Naghashi V (2018) Co-occurrence of adjacent sparse local ternary patterns: a feature descriptor for texture and face image retrieval. Optik 157:877–889
Article Google Scholar
Oliva A, Torralba A (2001) Modeling the shape of the scene: a holistic representation of the spatial envelope. Int J Comput Vis 42(3):145–175
Article MATH Google Scholar
Pavithra LK, Sree Sharmila T (2018) An efficient framework for image retrieval using color, texture and edge features. Comput Electr Eng 70:580–593
Article Google Scholar
Phadikar BS, Phadikar A, Maity GK (2018) Content-based image retrieval in dct compressed domain with mpeg-7 edge descriptor and genetic algorithm. Pattern Anal Applic 21(2):469–489
Article MathSciNet Google Scholar
Rafi M, Mukhopadhyay S (2019) Salient object detection employing regional principal color and texture cues. Multimed Tools Appl 78(14):19735–19751
Article Google Scholar
Rafi M, Mukhopadhyay S (2019) Image quilting for texture synthesis of grayscale images using gray-level co-occurrence matrix and restricted cross-correlation. In: Progress in advanced computing and intelligent engineering, pp 37–47. Springer
Raghuwanshi G, Tyagi V (2019) A novel technique for content based image retrieval based on region-weight assignment. Multimed Tools Appl 78 (2):1889–1911
Article Google Scholar
Rahimi M, Moghaddam ME (2015) A content-based image retrieval system based on color ton distribution descriptors. SIViP 9(3):691–704
Article Google Scholar
Rahman Alamin A, Shamsuddin S (2014) Cbir based on singular value decomposition for non-overlapping blocks. J Theor Appl Inform Technol 70(2):260–267
Google Scholar
Rahman Md M, You D, Simpson MS, Antani SK, Demner-Fushman D, Thoma GR (2013) Multimodal biomedical image retrieval using hierarchical classification and modality fusion. Int J Multimed Inf Retriev 2(3):159–173
Article Google Scholar
Rashno A, Rashno E (2019) Content-based image retrieval system with most relevant features among wavelet and color features. arXiv:1902.02059
Ravisankar P, Sree Sharmila T, Rajendran V (2018) Acoustic image enhancement using gaussian and laplacian pyramid–a multiresolution based technique. Multimed Tools Appl 77(5):5547–5561
Article Google Scholar
Seetharaman K, Kamarasan M (2014) Statistical framework for image retrieval based on multiresolution features and similarity method. Multimed Tools Appl 73(3):1943–1962
Article Google Scholar
Singh C, Kaur KP (2016) A fast and efficient image retrieval system based on color and texture features. J Vis Commun Image Represent 41:225–238
Article Google Scholar
Smeulders AWM, Worring M, Santini S, Gupta A, Jain R (2000) Content-based image retrieval at the end of the early years. IEEE Trans Pattern Anal Mach Intell 22(12):1349–1380
Article Google Scholar
Sokic E, Konjicija S (2016) Phase preserving fourier descriptor for shape-based image retrieval. Signal Process Image Commun 40:82–96
Article Google Scholar
Song W, Zhang Y, Liu F, Chai Z, Ding F, Qian X, Park SC (2018) Taking advantage of multi-regions-based diagonal texture structure descriptor for image retrieval. Expert Syst Appl 96:347–357
Article Google Scholar
Tenali RK, Ramesh Kumar M, Spandana M, Samba Siva Raju P (2018) Storage and retrieval of secure information in the cloud systems. J Adv Re Dyn Control Syst 10(9):773–778
Google Scholar
Van De Sande K, Gevers T, Snoek C (2010) Evaluating color descriptors for object and scene recognition. IEEE Trans Pattern Anal Mach Intell 32 (9):1582–1596
Article Google Scholar
Varish N, Pal AK (2016) Content based image retrieval using svd based eigen images. Int J Image Min 2(1):68–83
Article Google Scholar
Varish N, Pal AK (2018) A novel image retrieval scheme using gray level co-occurrence matrix descriptors of discrete cosine transform based residual image. Appl Intell 48(9):2930–2953
Article Google Scholar
Varish N, Pal AK, Hassan R, Hasan MK, Khan A, Parveen N, Banerjee D, Pellakuri V, Ul Haqis A, Memon I (2020) Image retrieval scheme using quantized bins of color image components and adaptive tetrolet transform. IEEE Access 8:117639–117665
Article Google Scholar
Vassou SA, Anagnostopoulos N, Christodoulou K, Amanatiadis A, Chatzichristofis SA (2018) Como: a scale and rotation invariant compact composite moment-based descriptor for image retrieval. Multimed Tools Appl, 1–24
Vimina RE, Jacob PK (2019) Feature fusion method using bovw framework for enhancing image retrieval. IET Image Process 13(11):1979–1985
Article Google Scholar
Wang X-y, Chen Z-f, Yun J-j (2012) An effective method for color image retrieval based on texture. Comput Stand Interf 34(1):31–35
Article Google Scholar
Wang X-Y, Liang L -L, Li Y -W, Yang H -Y (2017) Image retrieval based on exponent moments descriptor and localized angular phase histogram. Multimed Tools Appl 76(6):7633–7659
Article Google Scholar
Wang X-Y, Yu Y-J, Yang H -Y (2011) An effective image retrieval scheme using color, texture and shape features. Comput Stand Interf 33(1):59–68
Article Google Scholar
Yadav AR, Anand RS, Dewal ML, Gupta S (2015) Gaussian image pyramid based texture features for classification of microscopic images of hardwood species. Optik-Int J Light Electron Opt 126(24):5570–5578
Article Google Scholar
Yang H-Y, Xu N, Li W-Y, Li Y-W, Niu P-p, Wang X-Y (2015) Color image representation using invariant exponent moments. Comput Electric Eng 46:273–287
Article Google Scholar
Yang P, Yang G (2016) Feature extraction using dual-tree complex wavelet transform and gray level co-occurrence matrix. Neurocomputing 197:212–220
Article Google Scholar
Zhang J, Li G-l, He S-w (2008) Texture-based image retrieval by edge detection matching glcm. In: 10th IEEE International conference on high performance computing and communications, 2008. HPCC’08, pp 782–786. IEEE
Zhang Y, Wang S, Ji G, Phillips P (2014) Fruit classification using computer vision and feedforward neural network. J Food Eng 143:167–177
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science and Engineering, Koneru Lakshmaiah Education Foundation, Guntur, Andhra Pradesh, 522502, India
Naushad Varish

Authors

Naushad Varish
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Naushad Varish.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Varish, N. A modified similarity measurement for image retrieval scheme using fusion of color, texture and shape moments. Multimed Tools Appl 81, 20373–20405 (2022). https://doi.org/10.1007/s11042-022-12289-1

Download citation

Received: 19 September 2020
Revised: 24 May 2021
Accepted: 14 January 2022
Published: 11 March 2022
Issue Date: June 2022
DOI: https://doi.org/10.1007/s11042-022-12289-1

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

A modified similarity measurement for image retrieval scheme using fusion of color, texture and shape moments

Abstract

Similar content being viewed by others

Image Retrieval Scheme Using Efficient Fusion of Color and Shape Moments

An Effective Content-Based Image Retrieval Using Color, Texture and Shape Feature

Image retrieval based on exponent moments descriptor and localized angular phase histogram

1 Introduction

2 Preliminary concepts