Abstract
In this paper, we have proposed a novel feature descriptors combining color and texture information collectively. In our proposed color descriptor component, the inter-channel relationship between Hue (H) and Saturation (S) channels in the HSV color space has been explored which was not done earlier. We have quantized the H channel into a number of bins and performed the voting with saturation values and vice versa by following a principle similar to that of the HOG descriptor, where orientation of the gradient is quantized into a certain number of bins and voting is done with gradient magnitude. This helps us to study the nature of variation of saturation with variation in Hue and nature of variation of Hue with the variation in saturation. The texture component of our descriptor considers the co-occurrence relationship between the pixels symmetric about both the diagonals of a 3 × 3 window. Our work is inspired from the work done by Dubey et al. (IEEE Signal Process Lett 22(9):1215–1219, [2015]). These two components, viz. color and texture information individually perform better than existing texture and color descriptors. Moreover, when concatenated the proposed descriptors provide a significant improvement over existing descriptors for content base color image retrieval. The proposed descriptor has been tested for image retrieval on five databases, including texture image databases—MIT-VisTex database and Salzburg texture database and natural scene databases Corel 1K, Corel 5K and Corel 10K. The precision and recall values experimented on these databases are compared with some state-of-art local patterns. The proposed method provided satisfactory results from the experiments.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
1 Introduction
Modernization in technology has led to advancement in various areas of academics, medicine, forensic analysis, entertainment, and other such developments. In the areas of image processing, this has led to the rising necessity of information retrieval from images. A significant amount of research has been done in this area using various methods to retrieve text and related information from images. Image recognition has found wide applications in various real-time applications, for examples tracking automobiles on CCTV cameras, or helping blind people to travel, etc. For this work, the main hindrance that is experienced is from the quality of images. Blurry images or images with a lot of contour deformations provide challenging scenarios for text detection. Lighting conditions, complex backgrounds are other such challenges that need to be overcome for proper text detection in images. So, to drive out these difficulties in image retrieval based on text information, content-based image retrieval (CBIR) was introduced. In content-based image retrieval (CBIR) textual descriptions are avoided for image retrieval. Instead, similarities in their contents (textures, colors, shapes, etc.) with respect to the query image are considered, and retrieval of images are done by listing images from large databases in descending order of similarity. There have been numerous researches done in content-based image retrieval [2,3,4,5,6,7,8,9,10] in recent past involving both color and texture features.
Color quantization is closely related with color models. A large number of color models have been proposed and used for image retrieval and other related tasks over the years. Selecting the appropriate color space thus plays an important role in this aspect. The color model which is most commonly used in image processing and computer vision problems is the RGB model which contains three color channels, namely (R channel), Green (G channel) and Blue (B channel). Apart from texture, color information also serves to play an important role in content-based image retrieval task [11,12,13,14] since it provides the global information about the image in terms of distribution of various color components. The drawback of using RGB color model is that the color information contained in the three channels is highly correlated. This prompted us to use the HSV color space in content-based image retrieval framework in order to capture the color information efficiently. In the HSV color space, H, S and V stand for Hue, saturation and value, respectively. The Hue component is defined as an angle and it varies from 0 through 1 and the corresponding color varies from yellow, green, cyan, blue, magenta and back to red. Thus there are red values both at 0 and 1.0. The saturation is an indication of the purity of the color. Value component indicates the brightness which is almost similar to the gray-scale version of RGB image.
In this paper, the primary motive of developing a feature descriptor is to efficiently capture both the color and texture information present in the image. This makes the feature a multipurpose descriptor that can work for a large variety of images belonging to different databases. Our work stresses on this point and aims at giving equally effective results on different image datasets available online. In this paper, we aim to develop a novel color descriptor by exploring the inter-channel or the mutual relationship between H and S channels which has not been done in any previous work. Along with this, a texture descriptor has been designed which considers the relationship between the pixels symmetric about the left and right diagonals of a 3 × 3 window. These two descriptors individually perform better than existing color and texture descriptors and concatenation of these two gives a significant improvement over existing descriptors for content base color image retrieval.
The rest of the paper is organized as follows. Related works have been discussed in Sect. 1.1. In Sect. 1.2, the main contributions of our work are mentioned with respect to existing techniques. In Sect. 2, we detail the proposed color and texture descriptor. We summarize the proposed method in Sect. 3. Section 4 presents the experimental results and advantages of the proposed descriptor. Finally, the conclusion part is mentioned in the last Section of the paper.
1.1 Related work
Image retrieval process mainly focuses upon texture and color analysis. Local intensity of the image defines texture to some extent, which is why local neighborhood features and statistical features are discovered for such texture patterns and similarly color correlogram, color histogram, color coherence vector, etc., are used for low-level color feature descriptor. Most renowned method for statistical feature extraction of images, Gray Level Co-occurrence Matrix (GLCM) was first proposed by Haralick [2]. The GLCM, also known as the Gray Level Spatial Dependence Matrix examines the texture by considering the spatial relationship of pixels. The texture of an image is characterized by the GLCM function by calculating co-occurrence of pairs of pixel with specific values and in a particular spatial relationship. A GLCM is thus created, and then from this matrix statistical measures are extracted. Features were calculated directly by applying GLCM to the texture image, then edge image was used by Zhang et al. for gathering even more concrete and relevant information [3]. Thus, GLCM of edge images was calculated and Prewitt edge detector was applied there in four directions. For LUV and RGB color channels, GLCM was further extended to single-channel co-occurrence matrix and then to multi-channel co-occurrence matrix. Application of color texture image retrieval [11] was thus introduced using GLCM. Gray Level Co-occurrence Matrix was used for retrieval of rock texture images by Partio et al. [4]. Siqueira et al. [5] utilized pyramid representation and Gaussian smoothing for multi-scale image extraction and retrieval. Some other applications of GLCM are [15,16,17].
For color and texture features proposition of integrated color and intensity, co-occurrence matrix was done where composition of texture and color features were computed. Color representation was performed using HSV color space instead of RGB, and image retrieval was done by labeled and unlabeled image datasets [18]. In color histogram, the frequency of every intensity is considered discarding the color spatial co-relation. This spatial co-relation is utilized in color correlogram [19]. Again, this color correlogram combined with supervised learning was used for feature vector extraction and thus improved result in image retrieval in two different ways, firstly by modifying the query image, and secondly by the distance metric learning [20]. For retrieving images, color coherence vector was proposed using image pixel color coherence and incoherence, and then it was compared with color histogram [13]. Technique of artificial neural network (ANN) was applied for image retrieval at faster rate by clustering images by Park et al. [21]. Quantization of color histogram for retrieving images was utilized in Gaussian Mixture Vector Quantization (GMVQ) [12]. The Motif Co-occurrence Matrix builds a 3D matrix, corresponding to local statistics of images was proposed for image retrieval [22]. Further, an extension of this Motif Co-occurrence Matrix was used in Modified Color Motif Co-occurrence Matrix (MCMCM) for image retrieval using relationships between the color channels by Murala et al. [14]. Again, using HSV color space, text was used with motif matrix on histogram in [23].
Wavelet transform has found extensive application in description of image texture. Along the most prominent perceptual dimensions [24] the texture quality is determined. Wavelet transform was used [25] to collect texture and color features from an image. Daubechies’ Wavelet Transform (DWT) was used for image searching and indexing in Wang et al. [26]. Here, the feature vectors are constructed by using the wavelet coefficients in the lowest few frequency bands and their variances. The idea of wavelet correlogram in Content-Based Image Retrieval was first proposed in [27]. Information is extracted from an image only in three directions (horizontal, vertical and diagonal) by DWT. This directional limitation was removed by using Gabor wavelet feature-based texture analysis in [28]. Gabor wavelet transform was used for texture classification in [29] by Ahmadian et al. Gabor wavelet correlogram is an extension of [27], which was proposed as a rotation-invariant feature using Gabor wavelet in Content-Based Image Retrieval [30].
Ojala et al. [31] first proposed the local binary pattern (LBP) for texture feature extraction. In LBP, an eight-bit binary string is used for representing the spatial relationship between the local neighboring pixels with its center pixel. Uniform version of LBP and rotation-invariant LBP has been introduced for image classification and retrieval. Various extensions of LBP, e.g., Completed LBP (CLBP) [32], Block-based Local Binary Pattern (BLK LBP) [33], Dominant LBP (DLBP) [34], Center Symmetric LBP (CS-LBP) [35], etc., were introduced for image retrieval and texture classification. One major drawback of traditional LBP method is that the anisotropic features are not described in its circular sampling region. A Multi-structure local binary pattern (Ms-LBP) [36] operator was proposed as a solution to this problem, where an extended LBP operator was obtained by changing the shape of sampling region for texture image classification. In fact, for texture classification Gaussian as well as wavelet-based low-pass filters, were used in LBP called as Pyramid Local Binary Pattern (PLBP) [37] proposed by Qian et al. where multi-resolution images were extracted by using a low-pass filter from the original image, and these multi-resolution low-pass images are used in LBP features collection. Again, a combination of LBP along with the Gabor filter gave better result [38]. Besides these, the moment was applied in feature extraction in [39]. Again, the feature was extracted by the edge information in Directional Local Extrema Pattern [40]. There were various improvements made over LBP in Dominant Local Binary Pattern(DLBP) [34], Local Bit-plane Decoded Pattern (LBDP) [41], Local Edge Pattern for Segmentation and Image Retrieval (LEPSEG and LEPINV) [42], Local Mesh Pattern (LMP) [43], Average Local Binary Pattern (ALBP) [44], etc. To minimize noise effect in LBP numerous algorithms have been formulated. In Local Ternary Pattern (LTP) [45], firstly a threshold value is considered (say t) if the neighboring pixel values \( \left( {I_{i} } \right) \) is in the range of center pixel \( \left( {I_{c} } \right) \pm \) threshold, i.e., \( \left( {I_{c} - t,I_{c} + t} \right) \) then 0 is assigned and if it is less than this range − 1 is assigned otherwise + 1 is assigned. Then this ternary pattern is converted into upper and lower binary bit patterns and improved versions of LTP known as Improved LTP [46] gives better result. Noise-Resistant LBP (NR-LBP) [47], Robust LBP (RLBP) [48] are used in noise reduction of LBP feature. Second order derivation in horizontal direction as well as vertical directions are considered in Local Tetra Patterns [49] and it gives better result than LBP which is then transformed to binary patterns for calculations. Extended version of Local Tetra Patterns is Local Oppugnant Pattern [50] in RGB color space. Murala et al. proposed spherical symmetric 3D Local Ternary Patterns [51] using Gaussian filters and RGB color space which provided a 3D space and extracted LTP from every directions. A texture synthesis-based texture hashing framework has been proposed by Bhunia et al. [52]. Zhang et al. [53] introduced a novel learning framework in order to transform tree-structured data into a vector representation and its performance has been examined in content-based image retrieval task. Recently, there have been a few works [54,55,56,57,58] which introduced the new methods for texture classification. Dense Micro-block Difference (DMD)-based method is proposed by Dong et al. [55]. Multi-scale rotation-invariant representation (MRIR) of textures based on multi-scale sampling is proposed in [57].
It is to be noted that our work is inspired by the work in [59] which used both color histogram and texture descriptor in order to capture the global and local information of the image respectively. However, to our knowledge, none of the earlier local descriptors considered the inter-channel relationship for histogram calculation in HSV color space. In other words, the mutual relationship between the channels has not been thoroughly investigated to evaluate the feature descriptor in content-based image retrieval task. Also, the idea of exploring the relationship between diagonally symmetric pairs in a 3 × 3 window has not been done earlier. Following this, we develop one color histogram descriptor and one texture descriptor which upon concatenation provide a significant improvement over the method in [59] and other existing methods as well.
1.2 Main contributions
A number of works [59,60,61] have focused on color content-based image retrieval. The work in [60] proposed a novel image feature representation method using color information from the L* a* b* color space. They called it Color Difference Histogram (CDH) for image retrieval. Walia et al. [61] exploited the Color Difference Histogram (CDH) and Angular Radial Transform (ART) features to obtain color, texture and shape information of an image. They used a modified Color Difference Histogram to improve the retrieval performance. In [59], the authors simply quantized the histograms from the H and S channels into different bins and then concatenated those histograms to obtain the color feature. Very few works like the one by Lu et al. [62] developed a novel LBP-based color feature named Ternary-Color LBP (TCLBP), to represent the inter-channel information using the RGB color space. However, to the best of our knowledge, none of the existing works have exploited the inter-channel information existing in the HSV color space. In this paper, we exploit the inter-channel relationship between the H and S channels for color histogram computation. This is done by quantizing the H channel (angle i.e. different color range) into different bins and voting using saturation value of the corresponding pixel position. This explores the inter-channel or the mutual relationship between H and S channels in a novel manner. This is due to the fact that this novel histogram computation takes into account the actual saturation(S channel) values corresponding to a range of colors (a particular bin in the H channel) rather than just the number of occurrences of those values within the range. With the similar motivation, we quantize the S channel into different bins and use the corresponding H channel (different color range) value for voting.
A number of texture features based on local patterns have been proposed for texture-based image retrieval by considering the relation among symmetric neighbors with respect to the center. The most popular among them is the Center Symmetric Local Binary Pattern (CSLBP) [35]. In CSLBP, the relation between the center symmetric pixels is considered for calculating the local pattern of the input image and the remaining neighbors are ignored. A modified form of CSLBP is proposed by Verma et al. [59] for calculating the feature map. Moreover, they have calculated the texture feature using the V channel. Not many works have focused on exploring the mutual relationship between the diagonally symmetric neighbors though. The importance of representing the relationship between the diagonal neighbors was proposed in the work done by Dubey et al. [1]. Here, the first-order local diagonal derivatives are calculated for exploiting the relationship among the diagonal neighbors of a given center pixel in an image. The authors compared the intensity values of the center pixel with the intensity value of local diagonal for utilizing the relationship of the central pixel with its neighbors. Motivated from this work, we intend to explore the relationship among the diagonally symmetric neighbors along the left and right diagonals of a 3 × 3 window of an image. Moreover, we calculate the GLCM of the feature map obtained rather than computing the histogram to maintain the information of the spatial correlation.
The major contributions of our paper are as follows: Firstly, we introduce a novel method for color histogram calculation from H and S channels of HSV color space with an objective to explore the mutual relationship between the two channels. Secondly, a new texture descriptor is developed using the relationship between the diagonally symmetric neighbors along both the diagonals in a 3 × 3 window of an image. These two feature descriptors, color histogram and texture feature, are concatenated in order to utilize both the global and local information of the image respectively which found to be beneficial in our experiments. Thirdly, the resultant feature descriptor has been used for color image retrieval on different databases (Corel-1K, Corel-5K, Corel-10K database, Salzburg texture database and MIT-VisTex database) and it has been found to be performing better than the method in [59] significantly and other existing methods as well.
2 Color and texture descriptor
2.1 Color histogram using inter-channel voting
Since the primary objective in this work is to exploit the Inter-channel relationship, we do not focus on separately quantizing the H and S channels into bins and concatenating histograms as done in [59]. Our principle is motivated from the popular HOG descriptor. The HOG (Histogram of Oriented Gradients) is a feature descriptor which calculates the gradient of an image in two different directions, X and Y. The orientation and magnitude of the gradient are calculated. The gradient vector is quantized into a histogram of P bins. Each individual bin is used to specify a particular octant in the angular space. The histogram is formed by adding the gradient magnitude g (x,y) to the bin indicated by quantized gradient orientation Ω(x,y). Similarly, our focus in this work is to quantize the Hue (holding color information) value \( \emptyset (x,y) \) into different bins and adding up the Saturation value \( S\left( {x,y} \right) \) to the bin indicated by \( \emptyset (x,y) \). Studying the variation of the Hue with the Saturation is equally as important as studying the variation of Saturation with Hue. For the accomplishment of this objective, we consider it reasonable to quantize the Saturation value \( S\left( {x,y} \right) \) into different bins and form the histogram by voting using the Hue values \( \emptyset (x,y) \). If the Hue value at a particular pixel position (i,j) of the image be \( \emptyset (i,j) \) and if it belongs to the kth quantized histogram bin, then we may write:
where \( S\left( {i,j} \right) \) is the saturation value at pixel position (i,j). Following the same principle, if we quantize the Saturation values into L bins and vote with the corresponding Hue value, we may write:
Thus, we construct two sets of histograms one with K bins and another with L bins to exploit the inter-channel relationship. The traditional histogram quantization method and our proposed histogram quantization method are shown in Fig. 1a, b respectively.
2.2 Local pattern
2.2.1 Local binary patterns
Ojala et al. proposed Local Binary Pattern (LBP) which was used mainly in texture classification [63] but its computational ease leads it to be further used in medical imaging [64], image classification [31], object tracking [65] and facial expression recognition [66]. This method uses a small window of an image. Here, each of the N neighboring pixels surrounding the center pixel is compared to the center pixel and a binary value (0 or 1) is assigned based on this intensity difference (as given in Eq. 3). The final result is obtained after multiplying these bits with specific weights. The center pixel is replaced with this value which is the binary pattern value for that center pixel. Thus by replacing each center pixel with its binary pattern value a local binary map of the image is generated in its gray level. A histogram of this local binary map is calculated to create the feature vector. Equations (3)–(6) give the formula for LBP and the histogram.
Here N represents the number of neighboring pixels. The kth surrounding pixel is denoted by \( I_{k} \) and center pixel is denoted by \( I_{c} \). The final histogram of the pattern map is computed by Eq. (5). An example window for LBP calculation is given in Fig. 3a.
2.2.2 Gray level co-occurrence matrix
The concept of Gray Level Co-occurrence Matrix (GLCM) was proposed by Haralick et al. [2] in which they studied 24 features. It is used to study the co-occurrence of pixel pairs within a specific distance and in a particular direction in an image. It is a very popular statistical method for feature calculation. In this paper, we have calculated the GLCM of the feature map obtained after applying the proposed texture DSCoP (Diagonally Symmetric Co-occurrence Pattern) rather than computing a histogram. This has been done to exploit the spatial correlation of pixels in the feature map which is lost on histogram computation as it is purely a frequency distribution. The equation used for calculating GLCM of an input image:
where \( \left( {a,b} \right),\left( {c,d} \right) \) € \( H_{a } \times H_{b} \left( {c,d} \right) = a + k \times \emptyset_{1} ,b + k \times \emptyset_{2} . \)
In this equation, \( M_{d } \) is the gray level co-occurrence matrix. Here \( k \) represents the distance and \( \emptyset \) represents the direction. \( H_{a } \times H_{b} \) represents the horizontal and vertical spatial domains. \( I\left( {a,b} \right) \;{\text{and}}\;I\left( {c,d} \right) \) are the values of pixel intensity at positions \( \left( {a,b} \right) \) and \( \left( {c,d} \right). \) An example of GLCM calculation is shown in Fig. 2. Figure 2a shows the original matrix, and in Fig. 2b, we have calculated the GLCM of the matrix given in Fig. 2a with adjacent pairs (one-distance) and horizontal pixel pair (zero-degree direction).
2.2.3 Diagonally symmetric co-occurrence pattern
In the present work, we have calculated a texture feature named Diagonally Symmetric Co-occurrence Pattern (DSCoP) for image retrieval. In this pattern, we consider the relationship between the diagonally symmetric neighboring pairs of a 3 × 3 window as shown in Fig. 3b. There are two diagonals for every 3 × 3 window of an image. One is the principal diagonal and the second diagonal is the counter diagonal. If the symmetric neighbor pair about a given diagonal be \( \left( {I_{k} ,I_{j} } \right) \), where k = {1,2,3,4,…,8}is one of the eight neighbors of the center and the center pixel be denoted by \( I_{c } \) then \( I^{{\prime }}_{k} \) may be written as:
The values of k can be divided into two subsets. One set of values (k = 1,7,8) for the principal diagonal and the second set of values (k = 1,2,3) for the counter diagonal. As a result, the values of \( I_{j}^{{\prime }} \) may be expressed as:
The relationship between \( I'_{k} \) and \( I^{{\prime }}_{j} \) may be represented as:
Thus, when both \( I^{{\prime }}_{k} \) and \( I^{{\prime }}_{j} \) are of the same sign, the resultant bit will be 1; otherwise, it is zero. There are six neighbor pairs in total, three for each diagonal. We obtain a six-bit binary string and calculate the decimal equivalent which replaces the center pixel of the window. Thereafter, we calculate the GLCM which helps to exploit the spatial co-relation. For GLCM computation we follow the same set of specifications as that followed in [59]. However, since we compute 64 feature vectors instead of 16, we quantize the GLCM into 16 levels from 0 to 15 to maintain the same feature dimension as used in [59].
2.3 Advantage of proposed descriptor
-
1.
The texture descriptor proposed in our work takes into account the relationship between the diagonally symmetric neighboring pairs about the principal and counter diagonal of a 3 × 3 window of an image rather than considering the relationship only between center symmetric pixels which has not been studied earlier in the literature so far.
-
2.
A novel color descriptor by taking into consideration the inter-channel relationship between the H and S channels of an image. This type of relationship between H and S channels has not been studied in the literature so far.
-
3.
The proposed descriptor has been evaluated on a number of publicly available color and texture datasets. For each of them, the proposed descriptor has outperformed the existing descriptors for image retrieval.
3 Proposed system framework
The proposed method has been illustrated with the help of a block diagram shown in Fig. 4, and the corresponding algorithm for the same in Sect. 3.1. In this work, we have computed the color feature by quantizing the H channel into different bins and studied the variation of Saturation with respect to the Hue by following a principle similar to that of the HOG descriptor. Hue represents the color component and has a value between 0 and 1. For our experiments, the H channel has been divided into 18/36/72 bins. The variation of Hue with Saturation has also been studied following the same principle as mentioned above. For this purpose, the S channel has been quantized into 10/20 bins. All possible combinations of Hue and Saturation have been used and the results have reported in the results section. We have used the same value of normalization factor as used in [59] for all databases to appropriately justify the superiority of our method. For texture feature extraction, we have calculated the GLCM of the DSCoP pattern. The same set of specifications for GLCM computation as mentioned in [59] has been followed. The only difference is that we have quantized our GLCM matrix into 16 levels from 0 to 15 to obtain a 256-dimensional feature vector and keep the feature dimension same as [59].
The algorithm is shown in two parts. The first part describes the system framework. Here, an image is fed as input and in the output, and the feature vector is obtained by concatenating the histograms generated by calculating the GLCM vector of DSCoP feature and the Modified Color Histogram. In part 2, retrieval of image is performed using the proposed feature extraction method. Here, query image is taken as input and in output; the retrieved images are obtained based on similarity measure of the feature vectors as in part 1.
3.1 System framework algorithm
Part 1: Construction of feature vector
Input: An Image from the database
Output: Feature vector.
-
1.
Choose an image from database and convert it from RGB to HSV color space.
-
2.
Construct histograms by quantizing the hue into different number of bins and voting with the corresponding Saturation value and vice versa.
-
3.
Obtain DSCoP map from the value channel of HSV color space.
-
4.
Form GLCM of DSCoP map by quantizing it into 16 levels.
-
5.
Transform GLCM into a vector form.
-
6.
Concatenate GLCM vector of step 5 with the histogram of step 2, and thus the final histogram is constructed as a feature vector.
Part 2: Image retrieval using DSCoP + Modified Color Histogram
Input: Database query image
Output: Retrieved images after similarity measure
-
1.
Take the query image as input from the database.
-
2.
Perform step-2 to step-6 in part 1 to extract the feature vector of the query image.
-
3.
Using different similarity measures compute the similarity index of the query image vector with every database images.
-
4.
Sort the similarity indices to produce the set of similar matching retrieved images as the final result.
3.2 Similarity measure
In content-based image retrieval for retrieving and classifying images alongside color and texture feature computation, similarity measure is of same importance. The distance between the query image feature vector and feature of every image from the database in the feature space is given by the similarity measure which is performed after feature calculation. Indexing is then done based on this measure and sorting of the set of retrieved images is done based on the images with lower indices measures. Calculation of similarity matching is done using these five distance measures.
- a.
d1 distance:
$$ \partial_{{D^{ } ,q_{k} }} = \mathop \sum \limits_{l = 1}^{n} \left| {\frac{{\rho_{d}^{k} \left( l \right) - \rho_{{q_{k} }} (l)}}{{1 + \rho_{d}^{k} \left( l \right) + \rho_{{q_{k} }} (l)}}} \right| $$(11) - b.
Euclidean Distance
$$ \partial_{{D^{ } ,q_{k} }} = \left( {\mathop \sum \limits_{l = 1}^{n} \left| {(\rho_{d}^{k} \left( l \right) - \rho_{{q_{k} }} (l))^{2} } \right|} \right)^{1/2} $$(12) - c.
Manhattan Distance
$$ \partial_{{D^{ } ,q_{k} }} = \mathop \sum \limits_{l = 1}^{n} \left| {\rho_{d}^{k} \left( l \right) - \rho_{{q_{k} }} (l)} \right| $$(13) - d.
Canberra Distance
$$ \partial_{{D^{ } ,q_{k} }} = \mathop \sum \limits_{l = 1}^{n} \left| {\frac{{\rho_{d}^{k} \left( l \right) - \rho_{{q_{k} }} (l)}}{{\rho_{d}^{k} \left( l \right) + \rho_{{q_{k} }} (l)}}} \right| $$(14) - e.
Chi-square Distance
$$ \partial_{{D^{ } ,q_{k} }} = \frac{1}{2}\mathop \sum \limits_{l = 1}^{n} \frac{{(\rho_{d}^{k} \left( l \right) - \rho_{{q_{k} }} (l))^{2} }}{{\rho_{d}^{k} \left( l \right) + \rho_{{q_{k} }} (l)}} $$(15)
Here, the distance function for database \( D^{ } \) and query image \( q_{k} \) is represented by \( \partial_{{D^{ } ,q_{k} }} \), n represents the length of the feature vector. The feature vector of kth database image and query image are \( \rho_{d}^{k} \left( l \right)\;{\text{and}}\;\rho_{{q_{k} }} (l) \) respectively.
4 Experimental results and analysis
In this paper, we have evaluated the performance of our method on 5 different datasets including texture image databases—MIT-VisTex database and Salzburg texture database and natural scene databases Corel 1K, Corel 5K and Corel 10K. The superiority of the proposed method has been validated by evaluating the precision and recall rate and comparing it with existing methods on all these 5 datasets. Precision shows the relation between the total no. of relevant images retrieved for a given query image and the total no. of retrieved images from the database as in Eq. 16. Precision decreases as we gradually retrieve more images. The equation for determining the rate of precision may be given as:
Here \( Q \) is the query image. \( P_{k} \) represents the precision rate for category \( Q \).
Another commonly used measure for determining accuracy is Recall. It can be defined as the probability of retrieving a correct relevant image by the query. For an image retrieval system, recall improves with increase in the number of images retrieved. It increases as more images are retrieved for different datasets. Recall can be viewed as the ratio of the total no. of relevant images retrieved for a given query image to the total no. of relevant images of that class from the database as in Eq. 17.
Here \( N_{k} \) indicates number of images in each category of the database, i.e., the total number of relevant images in the database.
The average precision rate may be calculated as follows:
In Eq. 18, \( P_{\text{avg}} \left( M \right) \) represents the average precision rate for category \( \left( M \right) \), where j is the total no of images in that category. Similarly, the recall rate for each category may be expressed as given in Eq. 19.
On similar terms, we can compute the total precision and total recall for our experiment using Eqs. 20 and 21.
Here, C is the total no. of categories that is present in that particular database. Total Recall is also known as Average Recall Rate (ARR). The performance of the proposed method has been compared with a number of state-of-the-art methods. The list of abbreviations for these methods is given in Table 1.
4.1 Dataset 1
The first dataset used in our experiment is the Corel 1k database. It consists of 10 categories with 100 images in each category. Thus, there are a total of 1000 images in this database. The various categories of images in this database include Asians, buildings, beaches, elephant, flower, dinosaur, buses, mountains, hills and flood. Each image in this database has a size of 256 × 384 or a size of 384 × 256. Some sample images from this database are shown in Fig. 5b. The precision, recall and average retrieval rate have also been evaluated for this database. The precision and recall curves with different number of retrieved images for this dataset are shown in Fig. 6. For our experiment, we have initially retrieved 10 images and then increased the number of retrieved images in steps of 10 images at a time till the number of retrieved images becomes 100. In Fig. 7, the query image is represented by the first image of each row and the remaining images show the retrieved images for each query image. The comparative study of color and texture patterns for this dataset is shown in Fig. 8a, b by studying the precision and recall considering each of the two individually.
4.2 Dataset 2
The second database that we have worked with in our experiment is the Corel 5K dataset. It consists of a total of 5000 images. There are a total of 50 categories and 100 images in each category. The dataset includes images of animals, e.g., bear, lion, fox, tiger, etc., human, buildings, paintings, natural scenes, fruits, cars, etc. In this experiment, we have retrieved 10 images initially. This has been increased till we retrieve 100 images to provide a fair comparison. The precision and recall curves for this dataset with varying number of images are shown in Fig. 9. Figure 5e shows some sample images for this dataset, and the images retrieved corresponding to these sample images are shown in Fig. 10. The average retrieval rate for this dataset shown in Table 2 indicates that the proposed method performs better than state-of-the-art methods given in Table 1. As shown in Fig. 11, the texture pattern individually is more effective than the color pattern for CBIR.
4.3 Dataset 3
The third database we used in our experiment is the Corel 10K database.Footnote 1 It consists of 100 categories of images with 100 images in each category. This database is a continuation of Corel 5k database. It has images belonging to categories like buses, ships, texture, food, army, airplanes, furniture, oceans, cats, fishes, etc. Some sample images from this database are shown in Fig. 5c. The precision and recall curves for this dataset with different number of images retrieved for this dataset are shown in Fig. 12. The Average Retrieval Rate for this dataset shows an improvement over existing methods given in Table 1. For experimental study, we have retrieved 10 images at first. We have then retrieved some more images from this dataset. Finally, we have retrieved 100 images from this dataset to provide a detailed comparison. In Fig. 13, the query image is represented by the first image of each row and the remaining images show the retrieved images for each query image. Figure 14 shows the comparative study of both texture and color feature of our method with LECoP.
4.4 Database 4
We have evaluated the performance of the proposed method on our fourth database the Salzburg texture (STex) database.Footnote 2 The database consists of 7616 images. This includes 476 categories with 16 images in each category. The sample images from this dataset are presented in Fig. 5a. For STex dataset, we have retrieved 16 images to measure the precision and recall performance. To provide a detailed study, we have retrieved some more images which are reported with the help of precision and recall curves in Fig. 15a, b. In Fig. 16, the query image is represented by the first images of each row and the remaining images show the retrieved images for each query image. Figure 17a, b shows the comparative study of both texture and color feature of our method with LECoP.
4.5 Database 5
Finally, our method is tested on MIT-VistexFootnote 3 database created by MIT Vision and Modeling Group. The dataset contains texture images of size 512 × 512. There are total 40 such gray-scale texture images. These images are again subdivided into images of size \( 128 \times 128 \). So, the dataset is divided into 40 different types of images with 16 images of each type. In this experiment, initially 16 images are retrieved and then the number of images retrieved is increased by 16. The maximum number of retrieved images is 96. The precision rate and recall rate for all images in the database are calculated and compared with the methods in Table 1. A graph in support of our observations is shown in Fig. 18. Some sample images from our dataset is shown in Fig. 5d and some query images and their corresponding retrieved images are shown in Fig. 19. Our proposed texture and color feature performs better than the texture and color feature of LECoP as shown in Fig. 20a, b.
The average retrieval rate for this method is determined as in Table 2. It has been compared with recent methods for different datasets. We have also studied the image retrieval and feature extraction time of several state-of-the-art feature descriptors and compared the same with our method. The feature vector length and image retrieval time of the recently developed techniques are shown in Table 3. Various similarity metrics have been considered for performance evaluation as in Eqs. 11–15. We have shown the performance of the proposed method using different similarity or distance metrics as given in Table 4. The performance varies with different distance metrics on different datasets. However, the best performance on all datasets is obtained for d1 distance metric. We have also shown a comparative performance with LECoP (Local Extrema Co-occurrence pattern) by studying the texture and the color patterns separately. Both the patterns show an improvement over the corresponding patterns of LECoP. However, the texture pattern turns out to be more effective than the texture pattern for all datasets as indicated by the precision–recall values shown with the help of bar graphs. In Table 5, the individual importance of hue, saturation and value in HSV color space is analyzed with different values of quantization levels of hue and saturation components for all databases.
Overall, our proposed color and texture descriptors captures better color and texture information in the encoded feature representation and it performs better than existing handcrafted texture descriptors. Large scale exploration of images due to easy availability of smartphones demands an expert automatic annotation and retrieval system, which can perform based on the content of the images in real time.
5 Conclusion
This paper presents a novel approach toward content-based image retrieval by proposing a novel descriptor by combining the color and texture information. The texture descriptor is named Diagonally Symmetric Local Binary Co-occurrence Pattern since it effectively captures the co-occurrence relationship between the symmetric neighbor pairs about the left and right diagonals of an image. The color descriptor focuses on capturing the inter-channel relationship between the H and S channels of the HSV color space by quantizing the H channel into bins and voting with Saturation value and replicating the process for the S channel. The texture descriptor developed in this paper effectively captures the co-occurrence relationship between the neighbor pairs symmetric about the principle and counter diagonal of an image. The method has been evaluated on texture image databases—MIT-VisTex database and Salzburg texture database and natural scene databases Corel 1K, Corel 5K and Corel 10K. The result obtained has been compared with existing techniques by calculating the precision and recall values for all of them. The proposed method turns out to be better than the existing approaches in terms of both precision and recall. The feature vector length and image retrieval rate are also competitive with most approaches. Thus, in real-time systems, this image retrieval technique is quite effective and efficient.
Notes
MIT Vision and Modeling Group, Cambridge, Vision texture, available online: http://vismod.media.mit.edu/pub/.
References
Dubey SR, Singh SK, Singh RK (2015) Local diagonal extrema pattern: a new and efficient feature descriptor for CT image retrieval. IEEE Signal Process Lett 22(9):1215–1219
Haralick RM, Shanmugam K (1973) Textural features for image classification. IEEE Trans Syst Man Cybern 3(6):610–621
Zhang J, Li GL, He SW (2008) Texture-based image retrieval by edge detection matching GLCM. In: Proceedings—10th IEEE international conference on high performance computing and communications, HPCC, pp 782–786
Partio M, Cramariuc B, Gabbouj M, Visa A (2002) Rock texture retrieval using gray level co-occurrence matrix. In: Proceedings 5th Nord Signal
de Siqueira FR, Schwartz WR, Pedrini H (2013) Multi-scale gray level co-occurrence matrices for texture description. Neurocomputing 120:336–345
Li Y, Zhou C, Geng B, Xu C, Liu H (2013) A comprehensive study on learning to rank for content-based image retrieval. Signal Process 93(6):1426–1434
Fadaei S, Amirfattahi R, Ahmadzadeh MR (2017) Local derivative radial patterns: a new texture descriptor for content-based image retrieval. Signal Process 137:274–286
Li W, Pan H, Li P, Xie X, Zhang Z (2017) A medical image retrieval method based on texture block coding tree. Signal Process Image Commun 59:131–139
Tiwari AK, Kanhangad V, Pachori RB (2017) Histogram refinement for texture descriptor based image retrieval. Signal Process Image Commun 53:73–85
Banerjee P, Bhunia AK, Bhattacharyya A, Roy PP, Murala S (2017) Local neighborhood intensity pattern: a new texture feature descriptor for image retrieval. arXiv Prepr. arXiv:1709.02463
Palm C (2004) Color texture classification by integrative co-occurrence matrices. Pattern Recognit 37(5):965–976
Jeong S, Won CS, Gray RM (2004) Image retrieval using color histograms generated by Gauss mixture vector quantization. Comput Vis Image Underst 94(1–3):44–66
Pass G, Zabih R, Miller J (1998) Comparing images using color coherence vectors. In: Proceedings fourth ACM international conference multimedia (MULTIMEDIA’96), pp 1–14
Subrahmanyam M, Wu QMJ, Maheshwari RP, Balasubramanian R (2013) Modified color motif co-occurrence matrix for image indexing and retrieval. Comput Electr Eng 39(3):762–774
Baraldi A, Parmiggiani F (1995) An investigation of the textural characteristics associated with gray level cooccurrence matrix statistical parameters. IEEE Trans Geosci Remote Sens 33(2):293–304
Kovalev V, Petrou M (1996) Multidimensional co-occurrence matrices for object recognition and matching. Graph Model Image Process 58(3):187–197
Davis LS, Johns SA, Aggarwal JK (1979) Texture analysis using generalized co-occurrence matrices. IEEE Trans Pattern Anal Mach Intell 1(3):251–259
Vadivel A, Sural S, Majumdar AK (2007) An integrated color and intensity co-occurrence matrix. Pattern Recognit Lett 28(8):974–983
Huang J, Kumar SR, Mitra M, Zhu W-J, Zabih R (1997) Image indexing using color correlograms. In: Proceedings, 1997 IEEE computer society conference on computer vision and pattern recognition, pp 762–768
Huang J, Kumar SR, Mitra M (1997) Combining supervised learning with color correlograms for content-based image retrieval. In: Proceedings fifth ACM international conference multimedia—multimedia’97, pp 325–334
Park ST, Seo K, Jang D (2005) Expert system based on artificial neural networks for content-based image retrieval. Expert Syst Appl 29(3):589–597
Jhanwar N, Chaudhuri S, Seetharaman G, Zavidovique B (2004) Content based image retrieval using motif cooccurrence matrix. Image Vis Comput 22(14):1211–1220
Vipparthi SK, Nagar SK (2014) Multi-joint histogram based modelling for image indexing and retrieval. Comput Electr Eng 40(8):163–173
Balmelli L, Mojsilovic A (1999) Wavelet domain features for texture description, classification and replicability analysis. In: Proceedings international conference on image processing, ICIP 99, vol 4, pp 440–444
Ardizzoni S, Bartolini I, Patella M (1999) Windsurf: region-based image retrieval using wavelets. In: Proceedings tenth international workshop database expert system application DEXA 99, pp 167–173
Wang JZ, Wiederhold G, Firschein O, Wei SX (1997) Content-based image indexing and searching using Daubechies’ wavelets. Int J Digit Libr 1(4):311–328
Moghaddam HA, Khajoie TT, Rouhi AH, Tarzjan MS (2005) Wavelet correlogram: a new approach for image indexing and retrieval. Pattern Recognit 38(12):2506–2518
Manjunath BS (1996) Texture features for browsing and retrieval of image data. IEEE Trans Pattern Anal Mach Intell 18(8):837–842
Ahmadian A, Mostafa A (2003) An efficient texture classification algorithm using Gabor wavelet. In: Proceedings of the 25th annual international conference of the IEEE engineering in medicine and biology society (IEEE Cat. No. 03CH37439), vol 1, pp 930–933
Moghaddam HA, Dehaji MN (2013) Enhanced Gabor wavelet correlogram feature for image indexing and retrieval. Pattern Anal Appl 16(2):163–177
Ojala T, Pietikainen M, Maenpaa T (2002) Multiresolution gray-scale and rotation invariant texture classification with local binary patterns. IEEE Trans Pattern Anal Mach Intell 24(7):971–987
Guo Z, Zhang L, Zhang D (2010) A completed modeling of local binary pattern operator for texture classification. IEEE Trans Image Process 19(6):1657–1663
Takala V, Ahonen T, Pietikainen M (2005) Block-based methods for image retrieval using local binary patterns. In: Lecture notes in computer science, vol 3540, pp 882–891
Liao S, Law MWK, Chung ACS (2009) Dominant local binary patterns for texture classification. IEEE Trans Image Process 18(5):1107–1118
Heikkilä M, Pietikäinen M, Schmid C (2006) Description of interest regions with center-symmetric local binary patterns. In: Computer vision, graphics and image processing. Springer, Berlin, Heidelberg, pp 58–69
He Y, Sang N, Gao C (2012) Multi-structure local binary patterns for texture classification. Pattern Anal Appl 16(4):595–607
Qian X, Hua XS, Chen P, Ke L (2011) PLBP: an effective local binary patterns texture descriptor with pyramid representation. Pattern Recognit 44(10–11):2502–2515
Tlig L, Sayadi M, Fnaiech F (2012) A new fuzzy segmentation approach based on S-FCM type 2 using LBP-GCO features. Signal Process Image Commun 27(6):694–708
Papakostas GA, Koulouriotis DE, Karakasis EG, Tourassis VD (2013) Moment-based local binary patterns: a novel descriptor for invariant pattern recognition applications. Neurocomputing 99:358–371
Murala S, Maheshwari RP, Balasubramanian R (2012) Directional local extrema patterns: a new descriptor for content based image retrieval. Int J Multimed Inf Retr 1(3):191–203
Dubey SR, Singh SK, Singh RK (2016) Local bit-plane decoded pattern: a novel feature descriptor for biomedical image retrieval. IEEE J Biomed Heal Inform 20(4):1139–1147
Yao CH, Chen SY (2002) Retrieval of translated, rotated and scaled color textures. Pattern Recognit 36(4):913–929
Murala S, Wu QMJ (2014) Local mesh patterns versus local binary patterns: biomedical image indexing and retrieval. IEEE J Biomed Heal Inform 18(3):929–938
Hamouchene I, Aouat S (2014) A new texture analysis approach for iris recognition. AASRI Proc 9:2–7
Tan X, Triggs B (2010) Enhanced local texture feature sets for face recognition under difficult lighting conditions. IEEE Trans Image Process 19(6):1635–1650
Wu X, Sun J, Fan G, Wang Z (2015) Improved local ternary patterns for automatic target recognition in infrared imagery. Sensors (Switzerland) 15(3):6399–6418
Ren J, Jiang X, Yuan J (2013) Noise-resistant local binary pattern with an embedded error-correction mechanism. IEEE Trans Image Process 22(10):4049–4060
Zhao Y, Jia W, Hu RX, Min H (2013) Completed robust local binary pattern for texture classification. Neurocomputing 106:68–76
Murala S, Maheshwari RP, Balasubramanian R (2012) Local tetra patterns: a new feature descriptor for content-based image retrieval. IEEE Trans Image Process 21(5):2874–2886
Jacob IJ, Srinivasagan KG, Jayapriya K (2014) Local oppugnant color texture pattern for image retrieval system. Pattern Recognit Lett 42(1):72–78
Murala S, Wu QMJ (2015) Spherical symmetric 3D local ternary patterns for natural, texture and biomedical image indexing and retrieval. Neurocomputing 149(PC):1502–1514
Bhunia AK, Kishore PSR, Mukherjee P, Das A, Roy PP (2019) Texture synthesis guided deep hashing for texture image retrieval. In: IEEE winter conference on applications of computer vision (WACV), pp 609–618
Zhang H, Wang S, Xu X, Chow TWS, Wu QMJ (2018) Tree2Vector: learning a vectorial representation for tree-structured data. IEEE Trans Neural Netw Learn Syst 99:1–15
Wang T et al (2018) Jumping and refined local pattern for texture classification. IEEE Access 6:64416–64426
Dong Y, Wu H, Li X, Zhou C, Wu Q (2018) Multiscale symmetric dense micro-block difference for texture classification. IEEE Trans Circuits Syst Video Technol. https://doi.org/10.1109/TCSVT.2018.2883825
Dong Y, Feng J, Yang C, Wang X, Zheng L, Pu J (2018) Multi-scale counting and difference representation for texture classification. Vis Comput 34(10):1315–1324
Dong Y, Feng J, Liang L, Zheng L, Wu Q (2017) Multiscale sampling based texture image classification. IEEE Signal Process Lett 24(5):614–618
Dong Y, Tao D, Li X, Ma J, Pu J (2015) Texture classification and retrieval using shearlets and linear regression. IEEE Trans Cybern 45(3):358–369
Verma M, Raman B, Murala S (2015) Local extrema co-occurrence pattern for color and texture image retrieval. Neurocomputing 165:255–269
Liu G-H, Yang J-Y (2013) Content-based image retrieval using color difference histogram. Pattern Recognit 46(1):188–198
Walia E, Pal A (2014) Fusion framework for effective color image retrieval. J Vis Commun Image Represent 25(6):1335–1348
Lu Z, Jiang X, Kot A (2017) A novel LBP-based color descriptor for face recognition. In: 2017 IEEE international conference on acoustics, speech and signal processing (ICASSP), pp 1857–1861
Ahmadian A, Mostafa A, Abolhassani M, Salimpour Y (2005) A texture classification method for diffused liver diseases using Gabor wavelets. In: Conference proceedings IEEE engineering in medicine and biology society, vol 2(c), pp 1567–1570
Nanni L, Lumini A, Brahnam S (2010) Local binary patterns variants as texture descriptors for medical image analysis. Artif Intell Med 49(2):117–125
Ning J, Zhang L, Zhang D, Wu C (2009) Robust object tracking using joint color-texture histogram. Int J Pattern Recognit Artif Intell 23(07):1245–1263
Moore S, Bowden R (2011) Local binary patterns for multi-view facial expression recognition. Comput Vis Image Underst 115(4):541–558
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Bhunia, A.K., Bhattacharyya, A., Banerjee, P. et al. A novel feature descriptor for image retrieval by combining modified color histogram and diagonally symmetric co-occurrence texture pattern. Pattern Anal Applic 23, 703–723 (2020). https://doi.org/10.1007/s10044-019-00827-x
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10044-019-00827-x