Content-based image retrieval using feature weighting and C-means clustering in a multi-label classification framework

Ghodratnama, Samaneh; Abrishami Moghaddam, Hamid

doi:10.1007/s10044-020-00887-4

Content-based image retrieval using feature weighting and C-means clustering in a multi-label classification framework

Theoretical advances
Published: 18 June 2020

Volume 24, pages 1–10, (2021)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Pattern Analysis and Applications Aims and scope Submit manuscript

Content-based image retrieval using feature weighting and C-means clustering in a multi-label classification framework

Download PDF

709 Accesses
20 Citations
3 Altmetric
Explore all metrics

Abstract

In this paper, a novel learning algorithm based on feature weighting is proposed to improve the performance of image classification or retrieval systems in a multi-label framework. The goal is to exploit maximally the beneficial properties of each feature in the system. Since each feature can separate more effectively some of the image classes, it is hypothesized that the weights of various features at some states can be traded off against each other. The training phase of the suggested algorithm is performed in two stages: (1) The input images are clustered using a supervised C-means method iteratively; (2) image features are weighted using a local feature weighting method in each cluster. These weights are determined by considering the importance of each feature in minimizing the classification error on each cluster. In the testing phase, the cluster corresponding to the query is found first. Then, the most similar images are retrieved in the multi-label framework using the feature weights assigned to that cluster. Experimental results on three well-known, public and international image datasets demonstrate that our proposed method leads to significant performance gains over existing methods.

Toward semantic content-based image retrieval using Dempster–Shafer theory in multi-label classification framework

Article 20 September 2017

An Improvement Method of Kd-Tree Using k-Means and k-NN for Semantic-Based Image Retrieval System

Semi-supervised image clustering with multi-modal information

Article 30 October 2014

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Content-based image retrieval (CBIR) is a method to retrieve the most similar images to a query image. This system searches similar images using features such as color, texture, and shape. One of the most common methods that can be used for this purpose is the K-nearest neighbors algorithm (KNN) [1,2,3]. This algorithm returns the most similar image using a distance function within feature space.

Retrieving similar images using only low-level features is one of the main shortcomings of the conventional content-based image retrieval (CBIR) systems. Among various techniques, machine learning such as deep learning [4,5,6,7,8,9,10], sparse coding for bag-of-words (BoW)-based approach [11, 12], Fisher vectors [13], etc., have been actively investigated as possible directions to bridge the semantic gap in the long term. This paper presents an effort to overcome this drawback and proposes a CBIR approach in which the retrieved labels of images in the multi-label classification framework satisfy user expectations.

It is also noteworthy to consider that images are in different classes and each of them would be more separable based on one particular kind of feature. For instance, assume a color feature would be more discriminant in one image, so this image is more distinguishable based on the color feature, while another one would be more discernible based on texture or shape features. Therefore, to have an effective search with more accurate results, one way is to search images based on a combination of features using the distance function. Besides, for the purpose of using the discriminative power of each feature concurrently and more purposefully, it is advantageous to train the weights for features. For example, when color features are more likely to be discriminant than the other features, more weight must be assigned to the color features and vice versa [14,15,16].

Then, if weighting is supposed to be applied to the whole data, considering that the data scope is large and dispersed, it would be difficult to find an ideal set of weights for features. Contrastingly, if the weights are found within subspaces, it will be more probable to achieve weights which yield more accurate retrieval results. Thus, we suggest dividing feature space into some regions in order to do a more intelligent search within smaller regions to retrieve similar images to the query. This strategy makes searching faster and more accurate [17,18,19]. The clustering algorithm is an intelligent method to divide the space into regions containing more similar data. Obviously, finding samples similar to the query image would be easier in such regions [20].

Based on the aforementioned discussion, in this paper a local feature weighting algorithm [21] is used for CBIR application. This algorithm first clusters the data and simultaneously trains the features in each cluster to have the appropriate weights for such features based on minimizing the classification error rate. In other words, both clustering and classification methods are used conjointly to create a powerful framework for retrieving similar images to the query.

The rest of the paper is organized as follows. In Sect. 2, we review some related studies in our multi-label learning framework such as C-means clustering, KNN and a feature extraction algorithm, which are used in this paper. In Sect. 3, we explain our method for multi-label classification using feature weighting algorithms and C-means clustering based on classification error rate minimization (MLC-FWC). In Sect. 4, we present experimental results and show the superiority of our algorithm over its counterparts which do not use either feature weighting or a multi-label framework. Finally, conclusions are presented in Sect. 5.

2 Related works

2.1 Multi-label classification

Multi-label classification has become popular in many areas [22]. For instance, in text categorization, a document may belong to multiple classes simultaneously [23], or in an automatic image annotation scenario, a scene may be linked to multiple concepts [24]. Similarly, different labels can be assigned to each video in a video indexing domain [25]. This situation also happens in functional genomics where multiple functions may be represented by a single gene [26]. In all the above examples, each instance is associated with multiple labels, and this situation is different from traditional single-label classification. Figure 1 shows a sample image from the Corel1000 database that can be classified as beach, sea, sky, tree and even mountain by considering the semantic meaning of the image.

There are some methods that do CBIR in multi-label framework. For example, in [27, 28] the authors created a visual dictionary for each group in the dataset. Then, they proposed a system for image retrieval based on local features using a BoW model that considered mid-level representations. The methods based on BoW create a codebook of visual discriminating patches (visual words) and then compute statistics (using the codebook) on the visual word occurrences in the test image. The system tries to bring more accuracy with the option to use local rather than global features to reduce the semantic gap issue and enhance the performance of the CBIR by performing image annotation.

In what follows, we provide a formal definition of the multi-label classification problem [29]. Let X denote the instance space and Y the set of class labels. Then, the goal is to learn a function $ f:{\mathbf{X}} \to 2^{{\mathbf{Y}}} $ which is defined on a given dataset { $ ({\mathbf{x}}_{1} .{\mathbf{y}}_{1} ).({\mathbf{x}}_{2} .{\mathbf{y}}_{2} ) \ldots ({\mathbf{x}}_{N} .{\mathbf{y}}_{N} )\} $, where $ {\mathbf{x}}_{i} \in {\mathbf{X}}. i = 1 \ldots N $ is an instance and $ {\mathbf{y}}_{i} \subseteq {\mathbf{Y}} $ is a set of labels $ \left\{ {y_{i1} .y_{i2} \ldots y_{{il_{i} }} } \right\} $,$ y_{ik} \in {\mathbf{Y}} .k = 1.2 \ldots l_{i} $, $ l_{i} = 1 \ldots H. $ Here, H is the number of labels, N is the number of data samples and l_i denotes the number of labels in $ {\mathbf{y}}_{i} $.

2.2 KNN classification

The KNN classification rule has been used successfully in many pattern recognition applications. This method is simple, and the distance measure in this algorithm can be various kinds of distance function. We use Euclidean distance, mostly used for dissimilarity measurement in image retrieval due to its effectiveness. It measures the distance between two vectors of images by $ d\left( {{\mathbf{x}},{\mathbf{x}}^{{\prime }} } \right) = \sqrt {\mathop \sum \nolimits_{j = 1}^{n} \left( {{\text{x}}_{j} - {\text{x}}_{j}^{{\prime }} } \right)^{2} } $, where x and x’ are two different vectors and n is the feature dimension. Since the purpose of our algorithm is to assign weights to the features, in this framework we use the weighted Euclidean distance, defined as:

$$ d_{w} \left( {{\mathbf{x}},{\mathbf{x}}^{{\prime }} } \right) = \sqrt {\mathop \sum \limits_{j = 1}^{n} {\text{w}}_{j} \left( {{\text{x}}_{j} - {\text{x}}_{j}^{{\prime }} } \right)^{2} } $$

(1)

where $ {\text{w}}_{j} $ is the weight corresponding to the jth feature.

2.3 C-means clustering

C-means [30] is one of the most reliable and widely used clustering algorithms. The procedure follows a simple and quick way to cluster a dataset to a certain number of clusters. The main idea is to define C centroids, one for each cluster as a prototype. Then, a binding is created between each data point and one of these C cluster centroids. First, the cluster centroids are set randomly [31]. This algorithm minimizes iteratively the total squared Euclidean distance between the data points in feature space and the cluster centroids. In each iteration, the centroids change their location with respect to the data points in each cluster. After changing the centers, the data points associated with each cluster will change based on the new centroids. These steps will be repeated until no more changes happen or centroids move too slightly (less than a threshold).

Let $ {\mathbf{X}} = \left\{ {{\mathbf{x}}_{1} ,{\mathbf{x}}_{2} , \ldots {\mathbf{x}}_{N} } \right\} $ be a set of data points and V = $ \left\{ {{\mathbf{v}}_{1} ,{\mathbf{v}}_{2} , \ldots {\mathbf{v}}_{C} } \right\} $ represent the C cluster centers. C-partitioning $ \left\{ {\theta_{1} \ldots \theta_{C} } \right\} $ is created by the Euclidean C-means algorithm with the aim to minimize the following objective function:

$$ J_{1} \left( {{\mathbf{V}},{\mathbf{W}}} \right) = \mathop \sum \limits_{k = 1}^{C} \mathop \sum \limits_{i = 1}^{{{\text{N}}_{\text{k}} }} \|{\mathbf{x}}_{i} - {\mathbf{v}}_{k}\|_{{\mathbf{w}}_{\text{k}}}^{2}$$

(2)

where $ {\mathbf{W}} = \left\{ {\varvec{w}_{1} ,{\mathbf{w}}_{2} , \ldots ,\varvec{w}_{\varvec{C}} } \right\} $ is the set of local feature weights in each cluster and $ \|{\mathbf{x}}_{i} - {\mathbf{v}}_{k}\|_{{\mathbf{w}}_{\text{k}}}^{2}$ is a weighted Euclidean distance measure between a data point $ {\mathbf{x}}_{i} $ and the cluster center $ {\mathbf{v}}_{k} $. In the kth cluster, $ {\text{N}}_{\text{k}} $ is the number of samples and $ {\mathbf{w}}_{k} = \left\{ {w_{kj} , 1 \le j \le n} \right\} $, where n is the number of features (dimensions).

2.4 Feature weighting

Feature weighting is a technique used to approximate the optimal degree of influence of individual features using a training set. When successfully applied, highly relevant features are attributed a high weight value, whereas irrelevant features are given a weight value close to zero [32]. Feature weighting can be used not only to improve the classification accuracy but also to discard features with weights below a certain threshold value and thereby increase the resource efficiency of the classifier [33, 34].

2.5 Feature extraction

Feature extraction has an undeniable role in the efficiency of the classification process within the image retrieval system. In the proposed method, we use low-level features such as color and texture. For texture features, the wavelet correlogram method [35] and, for color, the method based on a color correlogram with vector quantization [36] are used, because they have shown a good individual ability to classify images.

2.5.1 Wavelet correlogram

Wavelet correlogram is a CBIR system based on texture. It applies the color correlogram method [37] on quantized wavelet coefficients and creates texture features. As a result, this method gains from the multiscale–multiresolution characteristic of the wavelet transform and inherits the translation–rotation invariance property from the color correlogram.

According to the enhanced Gabor wavelet correlogram (EGWC) indexing algorithm, three different scales of wavelet transform are calculated.

In each scale, the non-maximum suppression block is aimed to reduce the data redundancy as it is explained in [35]. Then, the coefficients are discretized using three sets of quantization thresholds. Thanks to optimal selection of thresholds, improved retrieval performance is guaranteed. Finally, the text feature is calculated based on horizontal and vertical auto-correlogram of quantized coefficients.

2.5.2 Vector-quantized color correlogram

A vector-quantized color correlogram (VQCC) is a compound CBIR system designed to utilize both color correlogram and vector quantization methods [36]. The VQCC algorithm quantizes the RGB color of the input image using a vector quantizer. The resulting pixel vector represents a cluster of points in the three-dimensional RGB space. Then, for each pixel inside a cluster, based on its Euclidean distance from the cluster center, a binary code is specified. In other words, the pixel is labeled with the code in the codebook corresponding to the closest cluster center. Finally, an index vector is created by computing the auto-correlogram of these binary codes.

Image retrieval is performed using a comparison between indices. The codebook is constructed by the following three steps:

a.
Some images are selected randomly from the dataset. (The number of images is determined experimentally based on the total number of images and image categories.)
b.
C-means clustering extracts pixel clusters for this population.
c.
The centers of clusters make code vectors in the codebook for quantization.

3 Multi-label classification using feature weighting and C-means clustering (MLC-FWC)

The goal of the learning algorithm is to minimize the total error rate of the KNN classifier [32]. To achieve this goal, we use C-means clustering and local feature weighting in each cluster. The weights of features as parameters of the distance function are learnt by our proposed method that attempts to minimize the average error rate per cluster.

The schematic diagram of the proposed method is shown in Fig. 2. In the training phase, first, texture and color features of database images are extracted. Then, in an iterative process, a supervised C-means clustering, which is explained later, is run to divide the input space into proper subspaces, and then a KNN classifier tries to minimize the classification error on each cluster by means of local feature weighting. This procedure is executed till the feature weights converge. In the test phase, first the nearest cluster to the query is found based on its distance from the center of each cluster. The nearest cluster together with the weights (found in the training phase) represents a new input space within which the most similar images to the input query are retrieved.

3.1 Training phase

In the training phase of this algorithm, clustering and classification are combined to construct a model that benefits from the advantages of both methods. In other words, clusters are improved by taking the classifier feedback, and classification is improved by using clustering information. The feedback from classification is used for relocating misclassified samples into other clusters and finding a new partitioning of the data, so a new labeling is assigned accordingly. In other words, a quasi-supervised clustering is used in this framework to classify more accurately. In contrast, classification can be improved by the clustering information, since it divides the space into some related subspaces. This process continues till no sample is displaced between clusters. For this process, a method is also proposed to weight features locally based on minimizing the error of each cluster [32].

So in our framework, clustering and classification are incorporated in a single objective function in order to partition the input space into related subspaces. It makes classification and retrieval simpler and faster. Our purpose is to construct related clusters and minimize the error rate of the NN classifier within each cluster using leave-one-out (LOO) error rate by doing feature weighting. These weights can be represented as a weight matrix: $ {\mathbf{W}} = \left\{ {{\mathbf{w}}_{k} ,1 \le k \le C} \right\} $, where C is the number of clusters. It is noteworthy that this definition assigns distinct weights to the different features in each cluster.

MLC-FWC first uses an approximation of the NN classification error as introduced in [32] to define a classification objective function. Then, it tries to minimize the objective function such that the samples are assigned to their corresponding cluster.

In the training phase, first the data are randomly clustered. Then, the idea is to learn the weights of the features in each cluster such that, for each sample $ {\mathbf{x}}_{i} $, the nearest same-class sample seems closer, while the nearest different-class sample appears farther. This strategy reduces the weighted distance between same-class samples. The LOO NN classification error as a function of feature weights can be written as [32]:

$$ J_{2} \left( {\mathbf{W}} \right) = \frac{1}{N}\mathop \sum \limits_{k = 1}^{C} \mathop \sum \limits_{i = 1}^{{{\text{N}}_{\text{k}} }} \text{S}_{\beta } \left( {\frac{{d_{w} \left( {{\mathbf{x}}_{i} ,{\mathbf{x}}_{ = } } \right)}}{{d_{w} \left( {{\mathbf{x}}_{i} ,{\mathbf{x}}_{ \ne } } \right)}}} \right) $$

(3)

where $ N $ is the number of samples, $ {\mathbf{x}}_{ = } , {\mathbf{x}}_{ \ne } $ are the nearest same-class and different-class samples of $ {\mathbf{x}}_{i} $ in the kth cluster and $ d_{w} $ is the weighted Euclidean distance as defined in (2). As the framework performs on the multi-label images, the same-class sample $ {\mathbf{x}}_{ = } $ denotes one which has the largest number of common labels with $ {\mathbf{x}}_{i} $, and different-class sample $ {\mathbf{x}}_{ \ne } $ indicates one that has the fewest common labels with $ {\mathbf{x}}_{i} $. If there are more than one sample with this condition, the nearest and farthest ones are selected as the same-class and different-class samples, respectively. Let $ \text{S}_{\beta } $ be the sigmoid function with slope β, centered at z = 1:

$$ \text{S}_{\beta } \left( z \right) = \frac{1}{{1 + e^{{\beta \left( {1 - z} \right)}} }} $$

(4)

For large β, if $ {\mathbf{x}}_{i} $ is closer to some prototypes of its own class than to any other from a different class, then $ d_{w} \left( {{\mathbf{x}}_{i} ,{\mathbf{x}}_{ = } } \right) $ < $ d_{w} \left( {{\mathbf{x}}_{i} ,{\mathbf{x}}_{ \ne } } \right) $ and the argument of $ \text{S}_{\beta } $ will be smaller than 1 and $ \text{S}_{\beta } $ approaches 0. On the contrary, if $ {\mathbf{x}}_{i} $ is closer to some prototypes of a different class than to any other from its own class, the argument of $ \text{S}_{\beta } $ will be greater than 1 and $ \text{S}_{\beta } $ approaches 1. Accordingly, $ J_{2} \left( {\mathbf{W}} \right) $ is in fact the LOO NN estimate of the misclassification probability over the training set X.

In this paper, gradient descent optimization is employed, so it requires the function $ J_{2} \left( {\mathbf{W}} \right) $ to be differentiable with respect to the corresponding parameters $ {\mathbf{w}}_{k} $ and $ {\mathbf{v}}_{k} $ to be minimized, which is why using the sigmoid function is adopted here.

Now, we combine the two objective functions ($ J_{1} \left( {{\mathbf{V}},{\mathbf{W}}} \right) $ and $ J_{2} \left( {\mathbf{W}} \right) $) which both try to minimize the error, one for classification and the other for clustering. So the overall objective function is defined as:

$$ J\left( {{\mathbf{V}},{\mathbf{W}}} \right) = J_{1} \left( {{\mathbf{V}},{\mathbf{W}}} \right) + \gamma J_{2} \left( {\mathbf{W}} \right) $$

(5)

where γ is a weight which determines the trade-off between the two objective functions. Substituting $ J_{1} \left( {{\mathbf{V}},{\mathbf{W}}} \right) {\text{and}} J_{2} \left( {\mathbf{W}} \right) $ yields to:

$$ J\left( {{\mathbf{V}},{\mathbf{W}}} \right) = \left( {\mathop \sum \limits_{k = 1}^{C} \mathop \sum \limits_{i = 1}^{{{\text{N}}_{\text{k}} }} \| \varvec{x}_{i} - \varvec{v}_k\|_{{{\mathbf{w}}_{k} }}^{2} + \frac{1}{N}\mathop \sum \limits_{k = 1}^{C} \mathop \sum \limits_{i = 1}^{{{\text{N}}_{\text{k}} }} \text{S}_{\beta } \left( {\frac{{d_{w} \left( {\varvec{x}_{i} ,\varvec{x}_{ = } } \right)}}{{d_{w} \left( {\varvec{x}_{i} ,\varvec{x}_{ \ne } } \right)}}} \right)} \right) $$

(6)

In this equation, the first term specifies the clustering error rate, while the second term designates the classification error and $ \gamma $ is set to 1. Within each cluster, the weight of features corresponding to ($ {\mathbf{x}}_{ = } $) should be modified to make this sample appear closer to $ {\mathbf{x}}_{i} $ in a feature-dependent manner, while those corresponding to ($ {\mathbf{x}}_{ \ne } $) should be modified such that it appears farther.

Obviously, (6) is differentiable. To solve the objective function, a gradient descent procedure is proposed which guarantees convergence. The gradient descent evolution equation is:

$$ \theta^{t + 1} = \theta^{t} - \alpha \left( {\frac{\partial J\left( \theta \right)}{\partial \theta }} \right) $$

(7)

where θ^t and α denote the optimal parameters (here v and w) in the tth iteration and the step size, respectively. Before replacing J(θ) with the objective function in (6), we define:

$$ R\left( {{\mathbf{x}}_{i} } \right) = \left( {\frac{{d_{w} \left( {{\mathbf{x}}_{i} ,{\mathbf{x}}_{ = } } \right)}}{{d_{w} \left( {{\mathbf{x}}_{i} ,{\mathbf{x}}_{ \ne } } \right)}}} \right) $$

(8)

The partial derivative of $ J\left( {{\mathbf{V}},{\mathbf{W}}} \right)_{ } $ with respect to $ {\mathbf{w}}_{k} $ is calculated as:

$$ \frac{{\partial J\left( {\varvec{V},\varvec{W}} \right)}}{{ \partial \varvec{w}_{k} }} \cong \mathop \sum \limits_{i = 1}^{{{\text{N}}_{\text{k}} }}\| {\mathbf{x}}_{i} - {\mathbf{v}}_{k}\|^{2} + \frac{1}{N}\mathop \sum \limits_{i = 1}^{{{\text{N}}_{\text{k}} }} \text{S}_{\beta }^{{\prime }} \left( {R\left( {{\mathbf{x}}_{i} } \right)} \right)\frac{{\partial R\left( {{\mathbf{x}}_{i} } \right)}}{{\partial \varvec{w}_{k} }} $$

(9)

where the derivative of $ \text{S}_{\beta }^{{\prime }} \left( z \right) $ and $ \frac{{\partial R\left( {{\mathbf{x}}_{i} } \right)}}{{\partial {\mathbf{w}}_{k} }} $ is:

$$ \text{S}_{\beta }^{{\prime }} \left( z \right) = \frac{{dS_{\beta } \left( z \right)}}{dz} = \frac{{\beta e^{{\beta \left( {1 - z} \right)}} }}{{\left( {1 + e^{{\beta \left( {1 - z} \right)}} } \right)^{2} }} $$

(10)

$$ \frac{{\partial R\left( {\varvec{x}_{i} } \right)}}{{\partial \varvec{w}_{k} }} = \frac{1}{{2d_{w}^{2} \left( {\varvec{x}_{i} ,\varvec{x}_{ \ne } } \right)}}\left( {\frac{1}{{R\left( {\varvec{x}_{i} } \right)}}\|\varvec{x}_{i} - \varvec{x}_{ = }\|^{2} - R\left( {\varvec{x}_{i} } \right)\|\varvec{x}_{i} - \varvec{x}_{ \ne }\|^{2} } \right) $$

(11)

And the partial derivation of $ J\left( {{\mathbf{V}},{\mathbf{W}}} \right)_{ } $ with respect to $ {\mathbf{v}}_{k} $ is calculated as:

$$ \frac{{\partial J\left( {{\mathbf{V}},{\mathbf{W}}} \right)}}{{\partial {\mathbf{v}}_{k} }} \cong \mathop \sum \limits_{i = 1}^{{{\text{N}}_{\text{k}} }} - 2\varvec{w}_{k} \odot \left( {{\mathbf{x}}_{i} - {\mathbf{v}}_{k} } \right) $$

(12)

Therefore, we have two parameters $ {\mathbf{w}}_{k} $ and $ {\mathbf{v}}_{k} $ that should be optimized. We use the gradient descent algorithm to optimize these parameters. In this approach, the number of clusters starts from an initial value and then is learnt dynamically by the algorithm; we first update W in each cluster and then based on the modified W, the cluster centers will be updated. Samples are clustered again with the new weights and centers.

The gradient descent algorithm minimizes $ J\left( {{\mathbf{V}},{\mathbf{W}}} \right) $ via an iterative procedure. At each iteration, the weights $ {\mathbf{w}}_{k} $ and $ {\mathbf{v}}_{k} $ are updated in the opposite direction to $ \nabla J\left( {{\mathbf{V}},{\mathbf{W}}} \right) $ using the step sizes of $ \alpha $ and $ \delta $:

$$ \varvec{w}_{k}^{{\left( {t + 1} \right)}} = \varvec{w}_{k}^{\left( t \right)} - \alpha \frac{{\partial J\left( {{\mathbf{V}},{\mathbf{W}}} \right)}}{{\partial \varvec{w}_{k} }} $$

(13)

$$ {\mathbf{v}}_{k}^{{\left( {t + 1} \right)}} = {\mathbf{v}}_{k}^{\left( t \right)} - \delta \frac{{\partial J\left( {{\mathbf{V}},{\mathbf{W}}} \right)}}{{\partial {\mathbf{v}}_{k} }} $$

(14)

The values of $ \alpha $ and $ \delta $ are referred to as learning rates or learning step factors.

A pseudo-code of our training algorithm is presented in Fig. 3. It is noteworthy that all the parameters are initialized randomly. The maximum number of iterations is set empirically. The other parameter α, $ \delta $ and β are selected experimentally for each dataset.

The computational complexity of the proposed algorithm as we can see in the pseudo-code of MLC-FWC algorithm in Fig. 3 is $ O\left( {I \times {\text{K}} \times \left( {N_{K} + {\text{N}}} \right)} \right) $, where I is the number of iterations, K is the number of clusters, N_K is the number of samples in each cluster and N is the number of all training samples. Based on the experiments, the algorithm converges within less than 10 iterations, which is not a large number, K is a constant much smaller than the number of samples, and therefore negligible. N_K is the number of samples in each cluster, which in the worst case is equal to N (for only one cluster case). Therefore, the overall complexity is O(N).

3.2 Test phase

In the test phase, for each query, its distance to all cluster centers is calculated based on local feature weights in each cluster. The nearest one is considered as the cluster corresponding to the query image. In other words, we find the proper cluster k which minimizes $ \sqrt {\mathop \sum \nolimits_{j = 1}^{n} {\text{w}}_{kj} \left( {x_{ij} - v_{kj} } \right)^{2} } $ among all clusters (k = 1,…,C). Based on the K value, the common labels among KNN will be assigned to the query as labels. If K = 1, all labels of NN will be considered as the query label or if K = 5, the labels that are repeated two times or more between all 5NN will be assigned to the query image.

Then, M similar images within that cluster with the same labels are assigned to the query and are retrieved as the similar images in the retrieval framework based on their distance to the query image.

3.2.1 Experimental results

The proposed method was evaluated on the Corel1000 database of images. The database consists of 1000 images from 10 categories in which each category has 100 images. The categories are African people, beaches, buildings, buses, dinosaurs, elephants, roses, horses, mountains and food. All these categories were used for the experiments. This dataset is most often used for CBIR systems as a single-label dataset. Since our algorithm is introduced in a multi-label classification framework, we labeled manually this dataset with 1 to maximum 5 labels based on the related semantics of the image. We did the same for the Corel5000 dataset [35]. We also evaluated our proposed method using the Scene multi-label dataset [38] which is a benchmark for multi-label image classification containing 2407 natural scene images. Their characteristics are summarized in Table 1.

Table 1 Characteristics of data sets

Full size table

The performance of the proposed algorithm can be evaluated within two frameworks: first, based on criteria which are used for evaluating multi-label classification algorithms; second, by criteria that are used for evaluation of image retrieval systems. In the following, various criteria of both approaches are discussed.

3.3 Evaluation criteria

The performance of any retrieval system can be measured in terms of its precision and recall. Precision measures the ability of the system to retrieve only models that are relevant, while recall measures the ability of the system to retrieve all models that are relevant. In this paper, we use the criteria for a multi-label framework as defined in [23] and for image retrieval as defined in [35].

3.3.1 Multi-label classification framework

Let D = $ \left\{ {x_{1} ,x_{2} , \ldots x_{N} } \right\} $ be a multi-label evaluation dataset containing N labeled examples. Let $ {\hat{\text{Y}}}_{i} = {\text{H}}\left( {{\text{Q}}_{\text{i}} } \right) $ be the predicted label set for the pattern Q_i as query image, while Y_i is the ground truth label set for x_i. A first metric called the averaged accuracy gives an average degree of similarity between the predicted and the ground truth label sets of all test examples:

$$ {\text{AA}}\left( {{\text{H}}, {\text{D}}} \right) = \frac{1}{\text{N}}\mathop \sum \limits_{{{\text{i}} = 1}}^{N} \frac{{\left| {{\text{Y}}_{\text{i}} \cap {\hat{\text{Y}}}_{i} } \right|}}{{\left| {{\text{Y}}_{\text{i}} \cup {\hat{\text{Y}}}_{i} } \right|}} $$

(15)

where |S| represents the cardinality of the set S. Two other metrics called average precision and average recall are also used in the literature [39] to evaluate a multi-label learning system. The former computes the proportion of correct positive predictions, while the latter calculates the proportion of true labels that have been predicted as positives:

$$ {\text{AP}}\left( {{\text{H}}, {\text{D}}} \right) = \frac{1}{\text{N}}\mathop \sum \limits_{{{\text{i}} = 1}}^{N} \frac{{\left| {{\text{Y}}_{\text{i}} \cap {\hat{\text{Y}}}_{i} } \right|}}{{\left| {{\hat{\text{Y}}}_{i} } \right|}} $$

(16)

$$ {\text{AR}}\left( {{\text{H}}, {\text{D}}} \right) = \frac{1}{\text{N}}\mathop \sum \limits_{{{\text{i}} = 1}}^{\text{N}} \frac{{\left| {{\text{Y}}_{\text{i}} \cap {\hat{\text{Y}}}_{i} } \right|}}{{\left| {{\text{Y}}_{\text{i}} } \right|}} $$

(17)

Another evaluation criterion is the F1 measure that is defined as the harmonic mean of the AP and AR metrics [40]:

$$ {\text{F}}1 = \frac{2}{{\frac{1}{\text{AP}} + \frac{1}{\text{AR}}}} $$

(18)

The values of these evaluation criteria are in the interval [0, 1]. Larger values of these metrics correspond to higher classification quality.

3.3.2 Image retrieval framework

Let $ {\text{Y}}\left( {{\text{Q}}_{\text{i}} } \right) $ be the set of retrieved images which are matched to the query image $ {\text{Q}}_{\text{i}} $:

$$ {\text{Y}}\left( {{\text{Q}}_{\text{i}} } \right) = \left\{ {{\text{I}}_{\text{i}} | {\text{rank}}\left( {{\text{I}}_{\text{i}} } \right) \le {\text{M}}, {\text{I}}_{\text{i}} \in {\text{A}}} \right\} $$

(19)

where $ {\text{rank}}\left( {{\text{I}}_{\text{i}} } \right) $ is the rank of $ {\text{I}}_{\text{i}} $ among M retrieved images and A is the subset of the image database having the same cluster as the query image. Retrieval precision was used for evaluation of the image retrieval. Precision is defined as the ratio of the number of retrieved relevant images to the total number of retrieved images:

$$ {\text{P}}\left( {{\text{Q}}_{\text{i}} } \right) = \frac{{\left| {{\text{Y}}\left( {{\text{Q}}_{\text{i}} } \right)} \right|}}{\text{M}} $$

(20)

where |Y($ {\text{Q}}_{\text{i}} $)| represents the size of Y($ {\text{Q}}_{\text{i}} $). The average precision per cluster is defined as:

$$ {\text{P}}\left( {{\text{cluster}}_{k} } \right) = \mathop \sum \limits_{{{\text{i}} = 1}}^{{N_{k} }} {\text{P}}\left( {{\text{Q}}_{\text{i}} } \right)/N_{k} $$

(21)

where N_k is the number of images in the kth cluster. Finally, the average precision is given by:

$$ {\text{P}}_{\text{avg}} = \mathop \sum \limits_{{{\text{k}} = 1}}^{\text{C}} {\text{P}}\left( {{\text{cluster}}_{k} } \right)/{\text{C }} $$

(22)

where C is the number of clusters.

4 Results and discussion

There are several approaches to improve retrieval effectivity by feature weighting, and also plenty of methods which address the multi-label classification problem. However, we could not find any algorithm which uses feature weighting in a multi-label classification and retrieval framework. Consequently, we decided to evaluate our algorithm in the four following ways. First, we compare our results before and after the MLC-FWC algorithm within the multi-label classification framework. Second, we compare the MLC-FWC method to methods in [41–47] which use feature weighting (FW) or feature fusion (FF) algorithms to combine features in a single-label image retrieval framework. Third, we compare our algorithm within a multi-label classification framework with the algorithms in [24, 29] which perform multi-label classification without feature weighting on the Scene multi-label dataset. Finally, the retrieval accuracy of the algorithm is evaluated by comparing it to two CBIR methods [35, 36] on the Corel1000 dataset within a multi-label image retrieval framework. For these results, all images are selected as query from the same dataset. In particular, a retrieved image is considered a match if and only if it is in the same category as the query in a single-label framework.

Tables 2 and 3 present the average values of accuracy, precision, recall and F1 measure for various groups of images in the Corel1000 and Corel5000 databases. Here, we used VQCC [36] as color feature and EGWC [35] as texture feature. Tables 2 and 3 demonstrate that the proposed MLC-FWC method provides better results when the EGWC and VQCC features are fused.

Table 2 Average values of accuracy, precision, recall and F1 measure resulting from applying MLC-FWC to Corel1000 database with/without feature combining by FW

Full size table

Table 3 Average values of accuracy, precision, recall and F1 measure resulting from applying MLC-FWC to Corel5000 database with/without feature combining by FW

Full size table

To evaluate our proposed method in comparison with other algorithms, the average precision results of the MLC-FWC algorithm and various state-of-the-art methods are shown in Table 4. In these algorithms [41,39,40,41,42,43,44,48], features are weighted or fused in a single-label image retrieval framework to retrieve more similar images and improve their result by feature weighting. All of these algorithms evaluated their methods on the Corel1000 dataset.

Table 4 Average precision of some new feature combination (FW and FF) methods in comparison with MLC-FWC with k = 3 on Corel1000 dataset within single-label image retrieval framework

Full size table

In Table 5, we also present the average precision and F1 measure results of our algorithm in comparison with some new multi-label classification algorithms [24, 29] on the Scene multi- label dataset with its original features within a multi-label classification framework. All the results are reported directly from their corresponding papers.

Table 5 Average precision of some new multi-label algorithms in comparison with MLC-FWC with k = 3 on scene dataset and within multi-label classification framework

Full size table

In Table 6, we evaluate the performance of the MLC-FWC algorithm in the retrieval framework. In the current implementation of MLC-FWC, the dataset images are indexed separately with the EGWC [35] and VQCC [36] algorithms. Therefore, we compared the results of the MLC-FWC with retrieval results of these two individual algorithms in terms of average precision (Table 6).

Table 6 Average precision for each category of EGWC and VQCC in comparison with MLC-FWC on Corel1000 dataset within multi-label image retrieval framework

Full size table

5 Conclusion

In this paper, we present a novel approach for image retrieval in a multi-label framework. The idea of this approach is to make a framework using clustering and classification together to reveal the structure of data and divide the input space into meaningful parts. Also, we fuse features by feature weighting learning algorithm to improve the retrieval results by purposefully using the ability of each feature to separate data.

Compared to the majority of papers that use deep learning architectures to perform CBIR [4,5,6,7,8,9,10], the proposed method simplifies the retrieval process and decreases the problem of “semantic gap,” by finding the labels of the query in the multi-label framework before doing retrieval, and also improves the performance through feature weighting. The main advantages of using a deep neural network are providing high accuracy and reducing the efforts in handcrafted feature design. The deep neural network automatically enables feature learning from raw data [10]. Nonetheless, it requires a huge computing power to train, so it is a costly and time-consuming process. It also suffers from difficulty in configuration and has not interpretable results. So if the features are ready to use with a satisfactory performance, it is not valuable to use that as a training model.

Experimental results on the Corel1000 and Corel5000 datasets demonstrate a significant supremacy of the proposed CBIR system. The results are evaluated by four statistical measures, all of which confirm the significant improvement in the proposed framework.

References

Irawan C, Listyaningsih W, Sari CA, Rachmawanto EH (2018) CBIR for herbs root using color histogram and GLCM based on K-nearest neighbor. In: 2018 International seminar on application for technology of information and communication. IEEE, pp 509–514
Singh S, Rajput ER (2015) Content based image retrieval using SVM, NN and KNN classification. Int J Adv Res Comput Commun Eng 4(6):549–552
Dharani T, Aroquiaraj IL (2013) Content based image retrieval system using feature classification with modified KNN algorithm. arXiv preprint arXiv:1307.4717
Zhou W, Li H, Tian Q (2017) Recent advance in content-based image retrieval: a literature survey. arXiv preprint arXiv:1706.06064
Tzelepi M, Tefas A (2018) Deep convolutional learning for content based image retrieval. Neurocomputing 275:2467–2478
Article Google Scholar
Wan J, Wang D, Hoi SCH, Wu P, Zhu J, Zhang Y, Li J (2014). Deep learning for content-based image retrieval: a comprehensive study. In: Proceedings of the 22nd ACM international conference on multimedia. ACM, pp 157–166
Yu J, Yang X, Gao F, Tao D (2016) Deep multimodal distance metric learning using click constraints for image ranking. IEEE Trans Cybern 47(12):4014–4024
Article Google Scholar
Yu J, Tao D, Wang M, Rui Y (2014) Learning to rank using user clicks and visual features for image retrieval. IEEE Trans Cybern 45(4):767–779
Article Google Scholar
Qayyum A, Anwar SM, Awais M, Majid M (2017) Medical image retrieval using deep convolutional neural network. Neurocomputing 266:8–20
Article Google Scholar
Sadeghi-Tehran P, Angelov P, Virlet N, Hawkesford MJ (2019) Scalable database indexing and fast image retrieval based on deep learning and hierarchically nested structure applied to remote sensing and plant biology. J. Imaging 5(3):33
Article Google Scholar
Sarwar A, Mehmood Z, Saba T, Qazi KA, Adnan A, Jamal H (2019) A novel method for content-based image retrieval to improve the effectiveness of the bag-of-words model using a support vector machine. J Inf Sci 45(1):117–135
Article Google Scholar
Tsai CF (2012) Bag-of-words representation in image annotation: a review. ISRN Artif Intell 2:1–19
Xu D, Yan S, Tao D, Lin S, Zhang HJ (2007) Marginal fisher analysis and its variants for human gait recognition and content-based image retrieval. IEEE Trans Image Process 16(11):2811–2821
Article MathSciNet Google Scholar
Da Silva SF, Avalhais LP, Batista MA, Barcelos CA, Traina AJ (2014) Findings on ranking evaluation functions for feature weighting in image retrieval. J Braz Comput Soc 20(1):7
Article MathSciNet Google Scholar
Chathurani NWUD, Geva S, Chandran V, Rajapaksha P (2016) Image retrieval based on multi-feature fusion for heterogeneous image databases. Int J Comput Inf Eng 10(10):1797–1802
Google Scholar
Cordeiro De Amorim R, Mirkin B (2012) Minkowski metric, feature weighting and anomalous cluster initialisation in K-means clustering. Pattern Recognit 45(3):1061–1075
Article Google Scholar
Modha DS, Spangler WS (2003) Feature weighting in k-means clustering. Mach Learn 52(3):217–237
Article Google Scholar
Saha A, Das S (2015) Automated feature weighting in clustering with separable distances and inner product induced norms: a theoretical generalization. Pattern Recognit Lett 63:50–58
Article Google Scholar
Chen X, Ye Y, Xu X, Huang JZ (2012) A feature group weighting method for subspace clustering of high-dimensional data. Pattern Recognit 45(1):434–446
Article Google Scholar
Magesan E, Gambetta JM, Córcoles AD, Chow JM (2015) Machine learning for discriminating quantum measurement trajectories and improving readout. Phys Rev Lett 114(20):200501
Article Google Scholar
Ghodratnama S, Boostani R (2015) An efficient strategy to handle complex datasets having multimodal distribution. In: ISCS 2014: interdisciplinary symposium on complex systems. Springer, Cham, pp 153–163
Tsoumakas G, Katakis I, Vlahavas I (2010) Random k-labelsets for multilabel classification. IEEE Trans Knowl Data Eng 23(7):1079–1089
Article Google Scholar
Younes Z, Abdallah F, Denœux T (2009) An evidence-theoretic k-nearest neighbor rule for multi-label classification. In: International conference on scalable uncertainty management. Springer, Berlin, pp 297–308
Yu Y, Pedrycz W, Miao D (2014) Multi-label classification by exploiting label correlations. Expert Syst Appl 41(6):2989–3004
Article Google Scholar
Jiang JY, Tsai SC, Lee SJ (2012) FSKNN: multi-label text categorization based on fuzzy similarity and k nearest neighbors. Expert Syst Appl 39(3):2813–2821
Article Google Scholar
Vens C, Struyf J, Schietgat L, Džeroski S, Blockeel H (2008) Decision trees for hierarchical multi-label classification. Mach Learn 73(2):185
Article Google Scholar
Wang M, Zhou X, Chua TS (2008) Automatic image annotation via local multi-label classification. In: Proceedings of the 2008 international conference on Content-based image and video retrieval. ACM, pp 17–26
Lin Z, Ding G, Hu M, Wang J (2014) Multi-label classification via feature-aware implicit label space encoding. In: International conference on machine learning, pp 325–333
Zhou ZH, Zhang ML, Huang SJ, Li YF (2012) Multi-instance multi-label learning. Artif Intell 176(1):2291–2320
Article MathSciNet Google Scholar
Duda RO, Hart PE, Stork DG (1973) Pattern classification and scene analysis, vol 3. Wiley, New York
MATH Google Scholar
Celebi ME, Kingravi HA, Vela PA (2013) A comparative study of efficient initialization methods for the k-means clustering algorithm. Expert Syst Appl 40(1):200–210
Article Google Scholar
Paredes R, Vidal E (2006) Learning weighted metrics to minimize nearest-neighbor classification error. IEEE Trans Pattern Anal Mach Intell 7:1100–1110
Article Google Scholar
Sharma A, Dey S (2012) A comparative study of feature selection and machine learning techniques for sentiment analysis. In: Proceedings of the 2012 ACM research in applied computation symposium. ACM, pp 1–7
Bouaguel W, Mufti GB, Limam M (2013) A fusion approach based on wrapper and filter feature selection methods using majority vote and feature weighting. In 2013 International conference on computer applications technology (ICCAT). IEEE, pp 1–6
Moghaddam HA, Dehaji MN (2013) Enhanced Gabor wavelet correlogram feature for image indexing and retrieval. Pattern Anal Appl 16(2):163–177
Article MathSciNet Google Scholar
Shad SM (2011). Color image indexing and retrieval using wavelet correlogram. M.Sc. thesis in artificial intelligence and robotics, faculty of computer engineering, K.N. Toosi University of Technology, Tehran, Iran
Huang J, Ravi Kumar S, Mitra M, Zhu WJ, Zabih R (1997) Image indexing using color correlograms. Proc IEEE Comput Soc Conf Comput Vis Pattern Recognit 1:762–768
Boutell MR, Luo J, Shen X, Brown CM (2004) Learning multi-label scene classification. Pattern Recognit 37(9):1757–1771
Article Google Scholar
Tsoumakas G, Katakis I (2007) Multi-label classification: an overview. Int J Data Warehous Min 3(3):1–13
Article Google Scholar
Yang Y (1999) An evaluation of statistical approaches to text categorization. Inf Retrieval 1(1–2):69–90
Article Google Scholar
Guldogan E, Gabbouj M (2008) Feature selection for content-based image retrieval. SIViP 2(3):241–250
Article Google Scholar
Huang ZC, Chan PP, Ng WW, Yeung DS (2010) Content-based image retrieval using color moment and Gabor texture feature. In: 2010 International conference on machine learning and cybernetics, vol 2. IEEE, pp 719–724
Puviarasan N, Bhavani R, Vasanthi A (2014) Image retrieval using combination of texture and shape features. Int J Adv Res Comput Commun Eng 3(3)
Singha M, Hemachandran K (2012) Content based image retrieval using color and texture. Signal Image Process 3(1):39–57
Google Scholar
Lin CH, Chen RT, Chan YK (2009) A smart content-based image retrieval system based on color and texture feature. Image Vis Comput 27(6):658–665
Article Google Scholar
Raghupathi G, Anand RS, Dewal ML (2010) Color and texture features for content based image retrieval. In: Second international conference on multimedia and content based image retrieval
Hiremath PS, Pujari J (2007) Content based image retrieval based on color, texture and shape features using image and its complement. Int J Comput Sci Secur 1(4):25–35
Google Scholar
Rao MB, Rao BP, Govardhan A (2011) CTDCIRS: content based image retrieval system based on dominant color and texture features. Int J Comput Appl 18(6):40–46
Google Scholar
Li L, Feng L, Yu L, Wu J, Liu S (2016) Fusion framework for color image retrieval based on bag-of-words model and color local Haar binary patterns. J Electron Imaging 25(2):023022
Article Google Scholar
Ali N, Mazhar DA, Iqbal Z, Ashraf R, Ahmed J, Khan FZ (2017) Content-based image retrieval based on late fusion of binary and local descriptors. arXiv preprint arXiv:1703.08492
Ali N, Bajwa KB, Sablatnig R, Chatzichristofis SA, Iqbal Z, Rashid M, Habib HA (2016) A novel image retrieval based on visual words integration of SIFT and SURF. PLoS ONE 11(6):e0157428
Article Google Scholar
Montazer GA, Giveki D (2015) An improved radial basis function neural network for object image retrieval. Neurocomputing 168:221–233
Article Google Scholar
Tian X, Jiao L, Liu X, Zhang X (2014) Feature integration of EODH and Color-SIFT: application to image retrieval based on codebook. Sig Process Image Commun 29(4):530–545
Article Google Scholar
Moghaddam HA, Ghodratnama S (2017) Toward semantic content-based image retrieval using Dempster-Shafer theory in multi-label classification framework. Int J Multimedia Inf Retr 6(4):317–326
Article Google Scholar
Wu B, Lyu S, Ghanem B (2016) Constrained submodular minimization for missing labels and class imbalance in multi-label learning. In: Thirtieth AAAI conference on artificial intelligence

Download references

Author information

Authors and Affiliations

Department of Computer Engineering, Science and Research Branch, Islamic Azad University, Tehran, Iran
Samaneh Ghodratnama
Faculty of Electrical and Computer Engineering, K. N. Toosi University of Technology, Tehran, Iran
Hamid Abrishami Moghaddam

Authors

Samaneh Ghodratnama
View author publications
You can also search for this author in PubMed Google Scholar
Hamid Abrishami Moghaddam
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hamid Abrishami Moghaddam.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ghodratnama, S., Abrishami Moghaddam, H. Content-based image retrieval using feature weighting and C-means clustering in a multi-label classification framework. Pattern Anal Applic 24, 1–10 (2021). https://doi.org/10.1007/s10044-020-00887-4

Download citation

Received: 27 January 2018
Accepted: 08 June 2020
Published: 18 June 2020
Issue Date: February 2021
DOI: https://doi.org/10.1007/s10044-020-00887-4

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Content-based image retrieval using feature weighting and C-means clustering in a multi-label classification framework

Abstract

Similar content being viewed by others

Toward semantic content-based image retrieval using Dempster–Shafer theory in multi-label classification framework

An Improvement Method of Kd-Tree Using k-Means and k-NN for Semantic-Based Image Retrieval System

Semi-supervised image clustering with multi-modal information

1 Introduction