Keywords

1 Introduction

In recent years, with the explosive increase in volume of digital images, content-based image retrieval technique is getting popular, lots of systems had been developed in the decades, including QBIC, Photobook, MARS, PicHunter and others [1, 2]. In a CBIR system, the system fails to be so perceptive to the user’s intention that cannot return the satisfactory results, which mostly owning to the huge gap between the high-level semantic concepts and the low-level image features. In that way, relevance feedback methods were proposed [3]. And this technique had widely applied in vary content-based image retrieval systems [4, 5].

In this paper, we proposed an Extreme learning Machine (with Gaussian kernel) based relevance feedback approach. Firstly, we extract the features of labeled data and use them to train the component ELM classifiers. Secondly, we predict the whole database based on the tri-training method and vote for the result. Thirdly, the new labeled data are used to retrain the component classifiers and get the final learner group.

Cox et al. [6] proposed an interactive relevance feedback scheme based on Bayesian model by optimizing the features’ probability distribution. Rui et al. [3] proposed an optimizing scheme for the relevance feedback performance by analyzing the features of positive samples. Zhou et al. [7] combined the semi-supervised method with active learning method by labeling the uncertain samples, which could have the unlabeled-data used.

The rest of this paper is organized as follows. Section 2 describe the method we proposed. Section 2.1 contains the description of our ELM with kernel based relevance feedback system. Section 3 describes the experiments and the performance of our scheme. Section 4 concludes this paper.

2 Method

2.1 Extreme Learning Machine with Gaussian Kernel

Extreme learning machine (ELM) was first proposed by Huang et al. [8]. ELM works for generalized single-hidden layer feed forward networks (SLFNs) [9, 10]. ELM algorithm tends to provide better generalization performance at extremely fast learning speed. Structure of the SLFNs [10] is shown as Fig. 1.

Fig. 1
figure 1

Single-hidden layer feed forward networks

With L hidden nodes in output layer, the output function of SLFNs can be expressed by:

$$ f_{L} ({\mathbf{x}}) = \sum\limits_{i = 1}^{L} \varvec{\upbeta}_{i} g_{i} (\varvec{x}) = \sum\limits_{i = 1}^{L} {_{i} \varvec{\upbeta}_{i} G(\varvec{a}_{i} ,b_{i} ,\varvec{x})} . $$
(1)

The function also can be written as \( {\text{f}}({\mathbf{x}}) = {\mathbf{h}}({\mathbf{x}})\varvec{\upbeta} \) where \( \varvec{\upbeta} = [\varvec{\upbeta1},\varvec{\upbeta2}, \ldots ,\varvec{\upbeta{\text{L}}}] \) is the vector of the output weights between the hidden layer of L neurons and the output neuron and \( {\text{h}}({\text{x}}) = [{\text{h}}1({\text{x}}),{\text{h}}2({\text{x}}), \ldots,{\text{hL}}({\text{x}})] \) is the output vector of the hidden layer with respect to the input x, which maps the data from input space to the ELM feature space [11].

In particular, L equals to N, which is rare condition because L is far smaller than N in actual problem, that is to say, that there is error between the output value and the actual value. So, the most important thing is to find least-squares solution \( {\hat{\varvec{\upbeta }}} \) of the linear system.

$$ {\mathbf{H}}\upbeta = {\mathbf{T}}, \quad {\hat{\varvec{\upbeta }}} = {\mathbf{H}}^{\dag } {\mathbf{T}}.$$
(2)

where \( {\mathbf{H}}^{\dag } \) is the Moore-Penrose Generalized inverse of matrix H [12, 13], \( {\mathbf{H}}^{\dag } \) = \( ({\mathbf{H}}^{{\prime }} {\mathbf{H}})^{{ - {\mathbf{1}}}} {\mathbf{H}}^{{\prime }} \) or \( {\mathbf{H}}^{{\prime }} ({\mathbf{HH}}^{{\prime }} )^{ - 1} \), depending on the singularity of \( {\mathbf{H}}^{{\prime }} {\mathbf{H}} \) or \( {\mathbf{HH}}^{{\prime }} \).

In the newly developed kernel ELM, it’s getting more stable to introduce a positive coefficient into the learning system. If \( {\mathbf{H}}^{{\prime }} {\mathbf{H}} \) is nonsingular, the coefficient 1/λ is added to the diagonal of \( {\mathbf{H}}^{{\prime }} {\mathbf{H}} \) in the calculation of the output weights \( \varvec{\upbeta} \) After that, \( \varvec{\upbeta} = {\mathbf{H}}^{\prime } ({\mathbf{I}}/\lambda + {\mathbf{HH}}^{\prime } )^{ - 1} \), the corresponding function of the regularized ELM is:

$$ f({\mathbf{x}}) = {\mathbf{h}}({\mathbf{x}})\upbeta = {\mathbf{h}}({\mathbf{x}}){\mathbf{H}}^{\prime} \left( {\frac{1}{\lambda }{\mathbf{I}} + {\mathbf{HH}}^{\prime} } \right)^{ - 1} {\mathbf{T}}{.} $$
(3)

Huang et al. [11] shown that ELM with a kernel matrix can be defined as follows. Let \( {\varvec{\Omega}}_{\text{ELM}} = {\mathbf{HH}}^{{\prime }} \):\( {\varvec{\Omega}}_{{{\text{ELM}}_{{{\text{i}},{\text{j}}}} }} = {\mathbf{h}}({\mathbf{xi}}){\mathbf{h}}({\mathbf{xj}}) = K({\mathbf{xi}},{\mathbf{xj}}). \) The output function can be written as:

$$ f({\mathbf{x}}) = {\mathbf{h}}({\mathbf{x}}){\mathbf{H}}^{{\prime }} \left( {\frac{1}{\lambda }{\mathbf{I}} + {\mathbf{HH}}^{\prime} } \right)^{ - 1} {\mathbf{T}} = \left[ {\begin{array}{*{20}c} {\varvec{K}\text{(}{\mathbf{x}},{\mathbf{x}}_{1} \text{)}} \\ \vdots \\ {\varvec{K}\text{(}{\mathbf{x}},{\mathbf{x}}_{N} \text{)}} \\ \end{array} } \right]^{\prime} \left( {\frac{1}{\lambda }{\mathbf{I}} + {\varvec{\Omega}}_{\text{ELM}} } \right)^{ - 1} {\mathbf{T}}. $$
(4)

The hidden layer feature mapping \( {\mathbf{h}}({\mathbf{x}})\varvec{ } \) need not to be known, and instead its corresponding kernel \( K({\mathbf{u}},{\mathbf{v}}) \) can be computed. In this way, the Gaussian kernel is used, \( K({\mathbf{u}},{\mathbf{v}}) = { \exp }( -\upgamma||{\mathbf{u}} - {\mathbf{v}}||2) \) [14].

2.2 Algorithm Description

Due to the special requirement of relevance feedback, short processing time and high accuracy are the key points. Our method shown the promising effect.

  1. Step 1:

    Label the data.

    In the image retrieval system we proposed, let L denote the labeled data set, U denote the unlabeled data set. While \( L = P{ \cup }N \), where P denote the labeled positive samples and N denote the labeled negative samples.

  2. Step 2:

    Feature extraction and train the component classifiers.

    Inspired by the co-training paradigm [15], the labeled data used as the data of first train, and extract three different features to train the component classifier to reduce the variance of unstable procedures, which leading to improved prediction [16].

  3. Step 3:

    Vote.

    The whole database as known as U is put into the combined-classifier-group. From the Fig. 2 we can see the framework of voting procedure. Each component classifier will give its predicted label of \( i \)th trail of U data set, and the output is set as \( Y_{in} \). All the trails are fed into m number of sub-ELM. Finally, a class label set ψ can be got.

    Fig. 2
    figure 2

    The framework of voting

  4. Step 4:

    Label the unlabeled data automatically.

    Let \( L\_temp \) denote the whole database with labels. \( L\_temp \) is used to retrain the component classifiers in order to level up the classification ability. From the Fig. 3, the dataset U has been classified by updating the classifier group.

    Fig. 3
    figure 3

    The framework of labeling the unlabeled data

For the \( i \)th trail, \( P_{i} \) is the total number of the component classifiers which predict the class of \( i \)th trail to be 0; \( Q_{i} \) is the total number of the component classifiers which predict the class of \( i \)th trail to be 1. The sum of \( P_{i} \) and \( Q_{i} \) is 3. The vote function can be shown as bellow:

$$ {\text{vote(}}\Psi )= \left[ {\begin{array}{*{20}c} {Y_{1} } \\ \vdots \\ {Y_{M} } \\ \end{array} } \right]_{M \times 1} ,Y_{i} = \left\lfloor {\frac{{Q_{i} }}{{P_{i} }}} \right\rfloor $$
(5)

Step 5: Update the classifiers.

As the result, let \( P^{*} \) denote the positive dataset, \( N^{*} \) denote the negative dataset. In each iteration of the feedback, the whole result list is returned by the ascend order of the similarity rank.

3 Experiments

In this section, we firstly introduce the image database our experiments performed on. Secondly we describe the methods of the feature extraction. Then we illustrate the structure of our relevance feedback system. Finally, we compare the performance of our methods with other methods in the aspect of the average-AP values and the aspect of processing time.

3.1 Image Database

We perform our experiments on the COREL photo gallery, which contains 1000 images into 10 categories, and 6000 images into 60 categories. The size of every image in the database is 256 × 384 or 384 × 256. A ground truth of the image database is needed to evaluate the performance of our experiment. Thus, the natural categories of the COREL photo gallery are used as the semantic categories, and we define that images belong to the same category are relevant, otherwise, are irrelevant.

3.2 Feature Selection

For the image retrieval system with relevance feedback, use 3 image features are used: SIFT, color and LBP.

Based on the former work in our laboratory, using Bag-of-Features model to extract two image features which based on SIFT and color, respectively. The color information is the most informative feature because of its robustness with respect to scaling, rotation, perspective, and occlusion.

Local Binary Pattern as known as LBP which is an effective image descriptor, it has been used as the texture feature of the image.

3.3 Image Retrieval System

For the image relevance feedback, the image retrieval system is required to return the images which are the most semantically relevant. Figure 4 shows our system has three main parts. The feedback part is the key function in the system.

Fig. 4
figure 4

The framework of image retrieval system with relevance feedback

  • Retrieval part: Extract three features which mentioned above of every image in the image database. Computing the Euclidean distance as the similarity between the query and each image of the database, and return the top 20 images as the retrieval result.

  • Feedback part: Let the user choose and label the negative images, extract the image features, the images that displayed and without labeled are set positive. Training classifier groups for the each kind of features respectively, to label the unlabeled database automatically. Updating the classifier groups by using the new-labeled data, as the result, the final classifier is using for classifying the unlabeled database.

  • Display part: Displaying the retrieval result for the retrieval operation, displaying the feedback result of the each iteration for the relevance feedback operation.

3.4 Experiment Results

Before the relevance feedback procedure, the top 20 images are result which displayed on the panel. The user interface is shown on the Fig. 5. The image that at the top of left is the query image and the other images on the ‘result panel’ are the retrieval result. User label the ‘negative’, the rest images are labeled ‘positive’ automatically.

Fig. 5
figure 5

The user interface and the retrieval result

From Fig. 6 has shown only 4 images of the retrieval result meet our request. After the first iteration of relevance feedback, the result had notable improvement, the result shown on Figs. 7, 8 and 9 show the feedback results by iteration 1, 2, 3, respectively. The ‘right’ images are getting more, semantically, the result is more and more meets our request. It took us only 3 iterations to have all the results right.

Fig. 6
figure 6

The result of first relevance feedback iteration

Fig. 7
figure 7

The result of second relevance feedback iteration

Fig. 8
figure 8

The result of third relevance feedback iteration

Fig. 9
figure 9

The average-AP values of different method performed on 10 categories

We use the Average Precision (AP) measure. The retrieval result has been optimized by performing our relevance feedback scheme. At the each iteration of relevance feedback, The AP value can be obtained, which is defined as the average of precision value obtained after each relevant image is retrieved. Let \( {\bar{\text{P}}} \) denotes the average precision which is obtained at the current iteration, and it is computed by:

$$ \bar{P} = \mathop \sum \limits_{{E_{i} { \in }{\text{R}}}} \frac{{P_{i} }}{\left| S \right|} $$
(6)

where Pi denotes the precision value obtained after the system retrieves i top-ranked images, Ei is the element of the relevant images set, S is the set of all relevant images that belong to the same category as the query, and \( \left| {\text{S}} \right| \) denotes the cardinality of S. The AP calculated over all the relevant images can avoid the fluctuation of precision that is usually encountered by the traditional precision measurement.

We report the feasibility and utility of our algorithm, and compare it with five feedback methods. Figures 9 and 10 show the average-AP value of different method.

Fig. 10
figure 10

The average-AP values of different method performed on 60 categories

According to the uncertainty of the ELM which brought by the random parameters, 3 ELM are used classifiers to get a stable component-classifier, respectively. Thus, 9 elm classifiers are used to compose the component-classifiers. Because of the randomness of the ELM, the experiments performed 20 times to get the stable data. From the figures we can see that the result of 9-elm-classifiers method is very close to the result of 3-elm-classifiers method, so 3-elm-classifiers method is better in order to reduce the unnecessary processing time. The average-AP values of the kernel ELM method are superior for the 4 iterations. And the final iteration our method perform the most perfect result than the other methods.

Figures 11 and 12 show the processing time of various methods getting short as the number of feedback iterations increases. And they show our methods based on ELM with Gaussian kernel have enormous advantage in each iteration whatever on 10 categories or 60 categories which is the most important factor to the relevance feedback. Eventually, from the result of accuracy and time, the effectiveness of the ELM with Gaussian kernel based relevance feedback scheme is utilized.

Fig. 11
figure 11

The processing times of various methods performed on 10 categories

Fig. 12
figure 12

The processing times of various methods performed on 60 categories

4 Conclusions

Recently, Extreme Learning Machine has been widely applied in relevance feedback. It’s an important and efficient way to improve the performance of image retrieval. Most of the advantages of ELM are suitable for the relevance feedback, such as: short processing time, high accuracy of classification, and good generalize ability, etc. However, the conventional feedback relevance schemes could not give considerations to the both accuracy and speed. To combine ELM with the semi-supervised method, we can overcome the limitation of the conventional problems. The experiments performed on the Corel-Photo gallery shows that our new method can have an excellent performance.