Keywords

1 Introduction

Image retrieval is a growing field. Recently, it has witnessed an explosion of personal and professional image collections especially with the development of new technologies that allow users share their images via internet. There was an urgent need for automatic tools that organizing these large quantities of images. Such tools can help users to find the desired images within a reasonable time. These tools are called image retrieval engines.

Image retrieval systems can be classified into two main categories: the first category called Content-Based Image Retrieval (CBIR). In this category, the user is generally asked to formulate a visual query (image for example) that correspond to what he is looking for. The research is performed by measuring similarities between the low levels features of the query and the images of the collection. Various visual features, including both global features [1,2,3] (e.g., color, texture, and shape) and local features (e.g., SIFT keypoints [4,5,6]) have been studied for CBIR.

The second category which exploits the semantic concepts associated with images is called Semantic-Based Image Retrieval: SBIR. The user formulates his/her query with textual terms that express his/her needs. In this case, the search is performed by comparing between these terms and textual annotations that represent the images in the collection by the use of semantic tools. Among the tools that are often used to represent the semantics, there are “ontologies” [7].

There is another secondary category where the user formulates his/her query with visual example like CBIR but the research is carried out on the basis of the semantics associated with those images instead of the low levels features [8]. This category try to overcome the drawbacks of the two previous categories by avoiding user to identify keywords and letting the images speak for themselves.

Whatever the category of the image retrieval system, Their main objective is to understand user’s needs and answer him/her accurately. However, these needs may vary from one person to another, and even from a given user at different times; therefore, the success of an image retrieval system depends on how accurate the system understands the intention of the user when he/she formulates a query [9].

Creating the query is a difficult problem. It arises two important challenges for both the user and the retrieval system [9]: Firstly, for the user the challenge is how to express his/her needs accurately. Secondly, for the system the challenge is how to understand what the user wants based on the query that he/she formulated.

The difficulty in dealing with these two challenges lead to many problems in image retrieval, such as noise and miss. Noise can be defined as the set of retrieved images which do not correspond to what the user wants [10] whereas, miss is the set of images corresponding to what the user wants which have not been retrieved.

In this paper, our objective is to improve the efficiency of image retrieval by decrease noise and miss via understanding user’s intention. We will be interested by the systems of the last category where the user formulates a visual query using images and the retrieval system exploits the semantic concepts associated with these images.

We propose an algorithm that performs a thorough analysis of the semantic concepts presented in the user’s query. Our algorithm is processed in two steps: in the first step, we will perform a generalization from the explicit concepts to infer the other hidden concepts that the user can’t express them which decrease the miss problem. In the second step, we ask user to select negative examples to refine results obtained in the first step. In this case the images similar to those of the negative examples will be rejected, which reduces the noise. At the same time, the rejected images are replaced by others correspond to what the user wants, so miss will be decrease also.

The rest of the paper is organized as follows: Sect. 2 briefly reviews previous related work. Section 3 present the principle of our algorithm, then, we will present our data and our working hypotheses in Sect. 4. Section 5 details our algorithm. An experimental evaluation of the performance of our algorithm is presented in Sect. 6. Finally, Sect. 7 presents a conclusion and suggests some possibilities for future research.

2 Related Works

User is at the centre of any image retrieval system. One of the primary challenges for these systems is to grasp user intent given the limited amount of data from the input query [11]. In order to solve this challenge, several methods has been proposed. One way is keyword expansion where the user’s original query is augmented by new concepts with a similar meaning to decrease the miss problem.

Keyword expansion was implemented by different techniques. Some of them [12,13,14] are typically based on dictionaries, thesauri, or other similar knowledge models such as ontology. It is noted by many researchers that the use of an ontology for query expansion is advantageous especially if the query words are disambiguated [11, 15]. Note that query expansion is very effective, provided that initial query perfectly matches the user’s need [16], However, this is not always the case, especially in situation where the user has a difficulty to express them. In this case, the extensions of the query can lead to more inappropriate results than the initial query because of irrelevant concepts which leads to significantly increase noise [17].

Another way to grasp user’s intention is relevance feedback [10, 18,19,20] and pseudo relevance feedback. In Relevance feedback technique, the user choose from the returned images those that he/she finds relevants (positive examples) and those that he/she finds not relevants (negative examples) then a query-specific similarity metric was learned from the selected examples.

In pseudo relevance feedback [21, 22], the query is expanding by taking the top N images visually most similar to the query image as positive examples.

For many researchers [23,24,25], The task of learning concepts from user’s query to understand his/her intent has seen as a supervised learning problem in which the classifiers search through a hypothesis space to find an adequate hypothesis that will make good predictions. Several algorithms of supervised learning were proposed [26,27,28,29] such us: Multi-Class Support Vector Machines (SVM), Naive Bayes Classifier, Maximum Entropy (MaxEnt) classifier, Multi-Layer Perceptron (MLP).

Recent search [25], proposed an ensembling multiple classifiers for detecting user’s intention instead of focusing on single classifiers.

This problem has an analogue in the cognitive sciences which called human generalization behavior and inductive inference.

Bayesian models of generalization [30,31,32,33] are models from cognitive science that focus on understanding the phenomena of how to generalize from few positive examples. They has been remarkably successful at explaining human generalization behavior in a wide range of domains [33], however their success is largely depend on the reliability of the initial query.

The main contribution of our work is to propose an algorithm to perform the challenge of grasping user’s intention and consequently decrease miss and noise. Our algorithm refines the initial query formulated by the user to infer the relevant concepts. It uses an ontology and exploits its semantic richness which allowed us to extract the hidden concepts that the user could not express them explicitly. Then we can improve the obtained results through the relevance feedback with negative examples.

3 Principle of Our Algorithm

In medicine, doctors apply a characteristic rule to summarize the symptoms of a specific disease, whereas, to distinguish one disease from others, they apply a discrimination rule to summarize the symptoms that discriminate this disease from others. This principle is exactly what we will use in our algorithm to understand user’s intention from a given query. We perform generalization using positive examples and specialization using negative examples. This principle is applied in content-based image retrieval [10] as follows:

  • For positive examples, a concentrated characteristic is important and a scattered characteristic is not important.

  • For positive and negative examples, a discriminated characteristic is important and a non-discriminated characteristic is not important.

We carry out a similar work with semantic concepts as follows:

  1. 1.

    For positive examples: with a query that contains several images:

    1. (a)

      A concept which is salient in the majority of query’s images is important.

    2. (b)

      A concept that appears in some images and does not appear in the others, in this case, we cannot judge immediately that it is not important. However, it can be classified into three types:

      1. i.

        Either, it has concepts that reinforces it in the other images via direct or indirect relationships (for example: apple - fruit - banana), and so it is important, or at least the common concept (here it is the ancestor Fruit) is important.

      2. ii.

        Either it has concepts that contradicts it in the other images, directly or indirectly, and therefore it is not important.

      3. iii.

        Either there is no concept that reinforces it or a concept that contradicts it in the other images, in this case, it can either be considered with moderate importance, or neglected.

  2. 2.

    For positive and negative examples:

    1. (a)

      The common concepts between the two sets (positive and negative) are to be neglected. common concept mean either the same concept or concepts that have a positive relations between them as it is illustrated in the case of positive example only.

    2. (b)

      Discriminant concepts (directly or indirectly through relationships) between the two sets are important. If the concept is present in the positive set, it must be sought; and if it is present in the negative set, it must be dismissed.

4 Data and Hypothesis

4.1 Query Formulation

We ask user to select from displayed images the maximum number of images that correspond to what he/she is looking for. It is the set of concepts that annotate the selected images that will constitute our initial query.

4.2 Ontology Creation

Our algorithm use an ontology to exploit its semantic richness and apply the principle describe above. We have chosen the domain of “nature” for our ontology. The first step of the ontology creation is to choose a collection of images representative of “Nature” domain. The ontology will then be used to annotate the images in this collection. Relationships in our ontology are classified into three categories:

  1. 1.

    Positive relationships (reinforcement relationship): means that the presence of such concept implies the presence of such other concept, for example: presence of concept “Dune” implies the presence of concept “Desert”.

  2. 2.

    Negative relationships (contradiction relationships): means that the presence of such concept generally implies the absence of such other concept, for example: presence of the concept “Snow” implies the absence of concept “Desert”.

  3. 3.

    Neutral relationships: means that the relations between concepts are null (ie neither positive nor negative).

The other steps of the ontology creation are the following:

  1. 1.

    Identify the different concepts presents in the images collection: for example: desert, camel, dune, snow, etc.

  2. 2.

    Calculate the implications weights between concepts that illustrate the degree of semantic correlation between these concepts.

    For each concept, the probability of having the other concepts was calculated as follows:

    $$\begin{aligned} P(A|B)= \dfrac{P(A \cap B)}{P(B)} \end{aligned}$$

    Where \( P(A\mid B)\) means the probability of having a concept A knowing that we are in the presence of the concept B.

    Because our challenge is not the ontology creation, we have calculated this probability manually. Firstly, we started the search with the concept B using the retrieval engine “Google”. The obtained images represent P(B). In this collection, we calculate the number of images that represents the concept A which gives us \( P (A\cap B)\).

  3. 3.

    Represent the ontology by a value oriented graph where concepts represent the nodes and the arcs are the semantic relationships. Our graph can be seen as a probabilistic graph.

  4. 4.

    Represent the relationships between concepts with a relevance matrix. This latter is a square matrix whose lines and columns are concepts and the value in box M [i, j] represents the probability of the relation between the concept i and the concept j.

A weight between two concepts models a type of semantic relationship between them.

  1. *

    The weights with the value 0: mean that the two concepts can never be found in the same image so this weight models a negative relationship;

  2. *

    [0,01 0, 49]: meant that there are times when the two concepts can be found in the same image and sometimes it cannot be found. Which models a neutral relationship;

  3. *

    [0.5 1]: means that the two concepts are often found in the same image so this weight models a positive relationship.

5 Our Algorithm

Our algorithm is processed in two steps. We called the first step: “Generalization algorithm” (see Algorithm 1) and the second step “Specialization algorithm” (see Algorithm 2).

figure a
figure b

6 Experimental Evaluation

6.1 Images Collection

To be able to apply our algorithm, we need an ontology and an images collection annotated with the concepts of this ontology. We have chosen the domain of “nature” for our ontology. Then, we have used Google images as an image retrieval engine to collect images for each concept in the ontology. The top 200 returned images of each query are crawled and manually labeled to construct the experimental data set.

6.2 Experiments

We implemented an image retrieval system that integrate our algorithm. We invited thirty users to participate in our experiments. We displayed them a set of images randomly selected from the images collection. Users can ask to see more images if the presented images do not reflect to what they wants.

To evaluate the performance of our algorithm, we carried out a comparison between: our algorithm, the Bayesian model of generalization used in [33] and the standard matching method (without generalization). As performance criteria, we use the precision and the recall measures.

First Experiment. The aim of this experiment is to evaluate the effectiveness of the first step of our algorithm (Generalization algorithm) to increase recall compared to the Bayesian model of generalization used in [33] and the standard matching method (without generalization). We asked users to make different queries, and then we calculated the recall of the three models. Fig. 1 illustrate the obtained results from 16 queries.

Fig. 1.
figure 1

The recall of the three models

We can see that the recall is considerably increased with our approach and then consequently the miss problem is decreased. Compared to our algorithm, the success of the Bayesian model of generalization is largely depend on the reliability of the examples present in the query to good reflect the user’s intention which is not always the case because sometimes the images of the query may composed of many objects, however the user is interest only with some of them. Our algorithm overcomes this problem by refining the initial query of the user.

Second Experiment. The aim of this experiment is to illustrate the effect of the negative examples to decease noise by improving the accuracy. We asked users to formulate queries using the Generalization algorithm and we calculated the accuracy then we asked him to select negative images and restart the search. the obtained results are illustrated in Fig. 2.

Fig. 2.
figure 2

The effect of negative examples to increase accuracy.

It is clear that the use of negative examples increase the accuracy. Indeed, after obtaining the results of a given query, the user can keep the positive images and enrich the query by including some undesired images as a negative examples. This implies that images similar to those of the negative example will be rejected, which reduces the noise. At the same time, the rejected images are replaced by others corresponding to what the user wants, so silence will be diminished.

7 Conclusion

In this paper, we proposed an algorithm that process in two steps: we called the first step “Generalization algorithm” and the second step “specialization algorithm”. The objective of this algorithm is to improve the semantic image retrieval quality by reducing the famous problems of image retrieval: noise and miss via understanding user’s intention. Our algorithm is based on the use of an ontology which allows us to exploits it’s semantic richness. We implemented an image retrieval system that integrate the proposed algorithm. Experimental evaluation verify the efficiency of our approach in comparison to the others proposed in the state-of-the-art. We have shown that the combination of positive and negative examples improve significantly precision and recall. The perspective of our work is to automate the ontology creation.