1 Introduction

Digital images are gaining importance nowadays in different domains, such as medical, education, astronomy, fashion and security [12, 14]. Everyday huge amount of images are generated by military or civilian equipment that need to be organized for efficient and accurate retrieval [3]. Image retrieval is a science of finding images that fulfill a user specified need [11]. Image retrieval process typically involves two steps: annotation (aka. Indexing) and retrieval. In text based retrieval systems, images are annotated with keywords (i.e., textual descriptors) in a natural language based on human perception [28]. A user specifies his/her requirements through a query comprising keywords. For retrieval, keywords in a query are matched with the keywords associated with images [9]. Text based retrieval computes the relevancy on the basis of lexical matching of keywords. However, it does not consider the meaning of the keywords. It is very difficult for computers to automatically retrieve images with the intended meaning of the associated keywords [30]. This is why the retrieval process has been extended with ontology to resolve the problem of semantic heterogeneity [7, 20, 30, 31].

Ontology is an explicit specification of the terms in a domain and their relations among them [9]. It provides an easy and feasible way of capturing a shared understanding of terms that can be used by humans and computers to exchange information [30]. Ontology based systems, such as OLYBIA and OntoPic have been proposed in [20, 25]. In [20], visual as well as animal ontologies have been built to reduce the semantic gap whereas in [25] better object recognition has been provided using landscape ontology.

In existing image retrieval systems, annotation of images with keywords is binary, i.e., a keyword is either associated with an image or not. However, both the annotation and retrieval processes involve human perception that is mostly approximate or uncertain [5, 37]. Figure 1 shows an image of beautiful view of a sea side where we can see some pink flowers in the bottom left side.

Fig. 1
figure 1

Sea-side view annotated by system analyst

A system analyst might annotate this image with keywords like water, mountain, sand, grass and flowers. If a user is searching for flower images and an image retrieval system retrieves this image as a first result then the user will be surprised to see this result. Since images are annotated using binary model and the search result is a crisp set of images where all images are equally relevant to the given query. We believe that images cannot be precisely represented with keywords using binary model of annotation. Therefore, existing systems could not produce the desired results. The objective of this research is to consider the relative importance of a particular keyword in both the annotation and retrieval processes because the importance of the keyword varies from user to user. Relevant retrieved images are essential for the satisfaction of users.

To achieve this objective, we have proposed a fuzzy ontology-based system in this paper. The proposed system makes use of fuzzy ontology to improve the retrieval performance. Images are represented with concepts and categories. In order to annotate an image with all the possible concepts, it is divided into regions in our dataset. Regions are then classified into concepts by adopting the technique proposed in [34]. A concept describes an object that the image contains. The frequency of occurrence of the concepts inside an image determines a category which depicts a scene. This categorization enables the semantic comparison of scenes and also helps in search space reduction while querying for specific concepts inside a category. Concepts, categories and images are linked among themselves with fuzzy values in the ontology. By adding a value for degree of membership to each concept and category in an image, the retrieved images from ontology based search reflect the likely information need. For mapping the query terms and ontology concepts, fuzzy search mechanism is applied that searches and ranks the retrieved results based on the degree of relevancy between the keywords of a query and images. The main contribution of this research is two-fold: (i) a new image retrieval system using fuzzy ontology has been proposed to enhance the retrieval performance and (ii) the proposed system has been subjectively evaluated to ensure its effectiveness.

The remaining paper is organized as follows: Section 2 describes the related work. Section 3 describes the proposed system methodology of image retrieval system. Section 4 contains results and discussion. The paper has been concluded in Section 5.

2 Related work

Image retrieval systems are either content based or text based. In content based image retrieval systems (CBIR), low level features are extracted automatically and images are retrieved based on the features like color, shape and texture [14]. But there is a gap between what image features a system can recognize and what human perceives from the image. The focus of this research is on text based image retrieval systems. Therefore, related work is further categorized as: text based, ontology based and fuzzy ontology based retrieval systems.

2.1 Text based retrieval systems

In text based image retrieval systems, images are annotated with keywords. Image retrieval is based on matching the keywords associated with images with the user specified keywords [28]. Keyword based system proposed in [24] has been built for qualitative spatial relationships like “before and after” or “more and less”. The system has been evaluated using “psychophysical evaluation” [32]. In [16], text based image retrieval system has been combined with content based model for efficient search. Text based search was applied first and then content based filtering was applied on the resulting set. Precision and recall measures were used for system evaluation.

2.2 Ontology based retrieval systems

In [30], an ontology based system has been proposed for exhibition system. The proposed system has been compared with text based approach using objective evaluation measures, such as precision and recall. In [13], authors built the natural scenes ontology to reduce the gap between low level features, such as color, texture, shape and high level semantics. Precision and recall metrics were used to measure the system performance. Keywords and ontology based image retrieval systems have been compared in [35]. Result shows that the ontology based system performed better as compared to the keyword based system in terms of precision. In [25], a supervised learning system, OntoPic, has been proposed that allows the semantic search. It has used DARPA agent markup language and ontology inference layer (DAML+ OIL) for domain knowledge but the system performance was not mentioned. In [35], semantic based image retrieval system has been proposed. A domain specific (i.e., flower family) low level feature based ontology has been created. These low level features were considered as a data property in web ontology language (OWL). Users can specify a query in the form of text or image. Features were extracted from a query image and matched with the corpus images through ontology and matched images were shown to the users. Semantic image representation model, containing local and global categorization of scenes, has been proposed in [33].

Ontology based image annotation (OLYBIA) system has been proposed in [20]. Low level features, such as color, shape and texture were extracted to build the visual ontology. Inference engine was used to extract high level concepts, such as “Eagle”, “Cheetah” using visual and animal ontologies and inference rules. The experimental results have not been compared with any other model. In [26], image annotation and retrieval through ontology have been discussed. Ontology was constructed for animal domain. Although the system showed benefits of using ontologies but the burden of manual annotation was still there. In [1], an ontology based image retrieval system has been proposed that utilizes visual features and semantic features. The proposed model has been evaluated using precision and recall.

2.3 Fuzzy ontology based retrieval systems

In image retrieval systems, fuzzy based models have been explored for object recognition [2, 4, 25, 27]. For example, if an object is recognized as sky with a value of 0.99, then that means it is 99% sure that the object is sky. It does not mean that the image contains 99% sky. In retrieval systems, users are not only concerned with object recognition but also want the maximum portion of the object in retrieved images. This has been done in different document search engines using fuzzy ontology.

A document search using fuzzy set theory has been described in [23]. The model considered the importance of keywords in search and their relevancy score between the query and the documents. Highly relevant documents were retrieved based on fuzzy set operations and shown to user. In the Ogawa model [19], a keyword connection matrix has been proposed for computing the relevance of the document with user keywords. In addition, users can enter compound queries containing operators such as and, or, not. In the Horng model [8], a multi-relationship fuzzy concept network has been proposed that shows the fuzzy relations between the concepts and their relevance degree with the documents. An information retrieval model based on ontology encoded with fuzzy relations has been proposed in [21]. When a user enters a query, composed of concepts, the system performs query expansion and adds new concepts based on ontology knowledge. After expansion, similarity between query and documents is calculated by fuzzy operations. The authors have compared their proposed model with Ogawa ([8, 19] models. Results show that the model proposed in (Pereira, Ricarte & Gomide 2006) gives better retrieval accuracy as compared to Ogawa and Horng models. The above mentioned fuzzy based systems were tested for text documents retrieval.

3 Proposed methodology of the system

In this paper, a fuzzy ontology based image retrieval system has been proposed that uses annotated images as an input. Images were annotated with concepts and categories as shown in Fig. 2 by adopting the technique followed in [34]. An input image was divided into 10 × 10 grids. Features, such as color and texture, were extracted from each region.

Fig. 2
figure 2

Image Annotation with concepts using [33]

Each region was annotated with one of the concept, such as sky, mountain and water. Each image was assigned a category based on concept occurrence in the image. An overview of the proposed image retrieval system is shown in Fig. 3. Fuzzy knowledge base and fuzzy search mechanism are two main modules of the proposed system. An image along with associated concepts and categories is the input to the fuzzy knowledge base. To conceptually represent the images, fuzzy ontology that utilizes the concepts and categories associated with the images was constructed. The fuzzy values in the ontology were then computed by applying data mining approaches on input images. For image retrieval, users were provided with an interface, where they can input multiple keywords based on their requirements. Fuzzy search mechanism was applied in the proposed system and the retrieved images were ranked and shown to user based on the relevancy degree between an image and the query keywords.

Fig. 3
figure 3

The proposed image retrieval system

In next subsections, fuzzy knowledge base is discussed in detail that shows step by step construction of fuzzy ontology. The image retrieval algorithm is discussed. This shows how query is processed; and in the end a walk-through example is presented.

3.1 Fuzzy ontology construction

The fuzzy ontology in the proposed model was constructed by adopting the idea of [22], which is used for documents retrieval. The fuzzy ontology shows the relationship between the images and concepts, concepts and categories and categories and images by values between 0 and 1 (i.e., both 0 and 1 are inclusive). The steps followed for computing the fuzzy values in the ontology are as follows:

Let I = {I1, I2, I3,. .., IM}, A = {A1, A2, A3,. .., AN} and B = {B1, B2, B3,. .., BO} are sets of images, concepts and categories consisting of M, N and O number of elements respectively. Let WCB be a matrix representing binary weights for relationship of a category to an image and is written as:

$$ \mathbf{WCB}=\left[\begin{array}{cccc}\hfill \mathrm{w}11\hfill & \hfill \mathrm{w}12\hfill & \hfill \cdots \hfill & \hfill \mathrm{w}1\mathrm{M}\hfill \\ {}\hfill \mathrm{w}21\hfill & \hfill \mathrm{w}22\hfill & \hfill \cdots \hfill & \hfill \mathrm{w}2\mathrm{M}\hfill \\ {}\hfill \vdots \hfill & \hfill \vdots \hfill & \hfill \hfill & \hfill \vdots \hfill \\ {}\hfill \mathrm{w}\mathrm{O}1\hfill & \hfill \mathrm{w}\mathrm{O}2\hfill & \hfill \cdots \hfill & \hfill \mathrm{w}\mathrm{O}\mathrm{M}\hfill \end{array}\right], $$
(1)

where wkj = 0 or wkj = 1, 1 ≤ k ≤ O and 1 ≤ j ≤ M. Let WCI be a matrix representing the frequency of concepts in image and is written as:

$$ \mathbf{WCI}=\left[\begin{array}{cccc}\hfill \boldsymbol{f}11\hfill & \hfill \boldsymbol{f}12\hfill & \hfill \cdots \hfill & \hfill \boldsymbol{f}1\mathrm{M}\hfill \\ {}\hfill \boldsymbol{f}21\hfill & \hfill \boldsymbol{f}22\hfill & \hfill \cdots \hfill & \hfill \boldsymbol{f}2\mathrm{M}\hfill \\ {}\hfill \vdots \hfill & \hfill \vdots \hfill & \hfill \hfill & \hfill \vdots \hfill \\ {}\hfill \boldsymbol{f}\mathrm{N}1\hfill & \hfill \boldsymbol{f}\mathrm{N}2\hfill & \hfill \cdots \hfill & \hfill \boldsymbol{f}\mathrm{N}\mathrm{M}\hfill \end{array}\right], $$
(2)

where fij is the frequency of a concept Ai in an image I, and 1 ≤ i ≤ N and 1 ≤ j ≤ M.

The relationship among image content (i.e., concepts, categories and an image itself) was originally a crisp set defined by WCB and WCI. The relationship was made fuzzy by the proposed methodology. In our system, an image content is represented by three matrices namely weight of the concept to image WA, weight of the category to image WB, and weight of the concept to category WCF and are defined as:

$$ \mathbf{WA}=\left[\begin{array}{cccc}\hfill \boldsymbol{a}11\hfill & \hfill \boldsymbol{a}12\hfill & \hfill \cdots \hfill & \hfill \boldsymbol{a}1\mathrm{M}\hfill \\ {}\hfill \boldsymbol{a}21\hfill & \hfill \boldsymbol{a}22\hfill & \hfill \cdots \hfill & \hfill \boldsymbol{a}2\mathrm{M}\hfill \\ {}\hfill \vdots \hfill & \hfill \vdots \hfill & \hfill \hfill & \hfill \vdots \hfill \\ {}\hfill \boldsymbol{a}\mathrm{N}1\hfill & \hfill \boldsymbol{a}\mathrm{N}2\hfill & \hfill \cdots \hfill & \hfill \boldsymbol{a}\mathrm{N}\mathrm{M}\hfill \end{array}\right], $$
(3)

where aij is the relevancy between the concept Ai and the image I, and 0 ≤ aij ≤ 1, 1 ≤ i ≤ N and 1 ≤ j ≤ M. Element of weight of a concept to an image matrix aij is calculated as:

$$ \boldsymbol{a}\mathrm{ij}=\frac{\boldsymbol{f}\mathrm{ij}}{\boldsymbol{T}\mathrm{j}}, $$
(4)

where fij is the frequency of the concept Ai in the image I and Tj is the total number of concepts in the image I. The weight of a concept to a category is a matrix as shown below:

$$ \mathbf{WB}=\left[\begin{array}{cccc}\hfill \boldsymbol{b}11\hfill & \hfill \boldsymbol{b}12\hfill & \hfill \cdots \hfill & \hfill \boldsymbol{b}1\mathrm{O}\hfill \\ {}\hfill \boldsymbol{b}21\hfill & \hfill \boldsymbol{b}22\hfill & \hfill \cdots \hfill & \hfill \boldsymbol{b}2\mathrm{O}\hfill \\ {}\hfill \vdots \hfill & \hfill \vdots \hfill & \hfill \hfill & \hfill \vdots \hfill \\ {}\hfill \boldsymbol{b}\mathrm{N}1\hfill & \hfill \boldsymbol{b}\mathrm{N}2\hfill & \hfill \cdots \hfill & \hfill \boldsymbol{b}\mathrm{N}\mathrm{O}\hfill \end{array}\right], $$
(5)

where bik is the relevancy between the concept Ai and the category Bk, and 0 ≤ bik ≤ 1, 1 ≤ i ≤ N and 1 ≤ k ≤ O. The proposed formula for calculating weight of the concept to the category bik is as follows:

$$ \boldsymbol{b}\mathrm{ik}=\frac{\sum_{\boldsymbol{j}=1}^{\boldsymbol{M}}\boldsymbol{a}\mathrm{ij}\kern0.5em \boldsymbol{w}\mathrm{kj}}{\sum_{\boldsymbol{j}=1}^{\boldsymbol{M}}\boldsymbol{w}\mathrm{kj}}, $$
(6)

The relationship between category and image can be obtained implicitly, i.e., through a transitive property, from weight of concept to image and weight of concept to category matrices. The weight of the category to image is a matrix as shown below:

$$ \mathbf{WCF}=\left[\begin{array}{cccc}\boldsymbol{c}11& \boldsymbol{c}12& \cdots & \boldsymbol{c}1\boldsymbol{M}\\ {}\boldsymbol{c}21& \boldsymbol{c}22& \cdots & \boldsymbol{c}2\boldsymbol{M}\\ {}\vdots & \vdots & & \vdots \\ {}\boldsymbol{c}\boldsymbol{O}1& \boldsymbol{c}\boldsymbol{O}2& \cdots & \boldsymbol{c}\boldsymbol{O}\boldsymbol{M}\end{array}\right], $$
(7)

where ckj is the relevancy between the category Bk and the image I, and 0 ≤ ckj ≤ 1, and 1 ≤ k ≤ O. Element of weight of a category to an image matrix aij is calculated as:

$$ \boldsymbol{c}\mathrm{kj}=\frac{\sum_{\boldsymbol{j}=1}^{\boldsymbol{M}}\boldsymbol{a}\mathrm{ij}\kern0.5em \boldsymbol{b}\mathrm{ik}}{\mathbf{F}\mathrm{ik}}, $$
(8)

where Fik is the number of concepts in a category.

3.2 Image retrieval

A user query consists of keywords that can be i) single or combination of concepts, (ii) single or combination of categories and (iii) combination of concepts and categories. The proposed retrieval algorithm is shown as Algorithm 1.

Algorithm 1.

The proposed fuzzy ontology based image retrieval algorithm.

figure d

The detail of algorithm is illustrated below through example.

3.3 Walk-through example

Let I = {I1, I2, I3, I4}, A = {Sky, Foliage, Grass, Water} and B = {Sky_Cloud, Field} are sets of images, concepts and categories of the image collection. The matrix WCB represents the binary weights of a category to an image and is given as:

$$ \mathbf{WCB}=\left[\begin{array}{cccc}1& 0& 1& 0\\ {}0& 1& 0& 1\end{array}\right] $$

The matrix WCI, represents the frequency of concepts in images and is defined as:

$$ \mathbf{WCI}=\left[\begin{array}{cccc}80& 20& 70& 40\\ {}0& 20& 30& 0\\ {}20& 60& 0& 60\\ {}0& 0& 0& 0\end{array}\right] $$

The fuzzy weights in matrices WA, WB, and WCF were computed using Eq. (4), Eq. (6) and Eq. (8) and are as follows:

$$ \begin{array}{l}\mathbf{WA}=\left[\begin{array}{cccc}\hfill 0.8\hfill & \hfill 0.2\hfill & \hfill 0.7\hfill & \hfill 0.4\hfill \\ {}\hfill 0\hfill & \hfill 0.2\hfill & \hfill 0.3\hfill & \hfill 0\hfill \\ {}\hfill 0.2\hfill & \hfill 0.6\hfill & \hfill 0\hfill & \hfill 0.6\hfill \\ {}\hfill 0\hfill & \hfill 0\hfill & \hfill 0\hfill & \hfill 0\hfill \end{array}\right]\kern0.5em \mathbf{WB}=\left[\begin{array}{cc}\hfill 0.75\hfill & \hfill 0.3\hfill \\ {}\hfill 0.15\hfill & \hfill 0.1\hfill \\ {}\hfill 0.1\hfill & \hfill 0.6\hfill \\ {}\hfill 0\hfill & \hfill 0\hfill \end{array}\right]\hfill \\ {}\kern2.28em {\mathbf{W}}_{\mathbf{CF}}=\left[\begin{array}{cccc}\hfill 0.31\hfill & \hfill 0.08\hfill & \hfill 0.285\hfill & \hfill 0.18\hfill \\ {}\hfill 0.18\hfill & \hfill 0.15\hfill & \hfill 0.12\hfill & \hfill 0.24\hfill \end{array}\right]\hfill \end{array} $$

The fuzzy ontology, constructed according to the above computed weights is shown in Fig. 4. The next step in retrieving an image is to take user requirements and retrieval size (as users are interested in top results only) and apply retrieval algorithm to get the list of retrieved images. If a query contains only concept, i.e., Q = {A1}. From WA following vector was extracted against the given query:

Fig. 4
figure 4

An example of fuzzy ontology for four images, four concepts and two categories

R = [0.8 0.2 0.7 0.4] ,

The above vector was sorted in the descending order as [0.8 0.7 0.4 0.2] and with the retrieval size of 3 out of 4 images, images I1, I3, and I4, corresponding to the vector values, were returned to the user. From Fig. 4, we can see that I1 and I3 are highly relevant images to query “sky”, while I4 is less relevant because it contains small portion of sky. When a query contains a category, i.e., Q = {B1}, then from WCF the following vector was extracted against the given query:

$$ \boldsymbol{R}=\left[0.31\ 0.08\ 0.285\ 0.18\right], $$

The above vector was sorted in the descending order as [0.31 0.285 0.18 0.08] and with the retrieval size of 3, the images I1, I3, and I4 were returned to the user. When a query contains both the concept and category, i.e., Q = {A1, B1}, the query is first split into two queries, i.e., Q1 = {A1} containing the concept and Q2 = {B1} containing the category. Q1 returns the following vector from WA :

$$ \boldsymbol{R}1=\left[0.8\ 0.2\ 0.7\ 0.4\right], $$

and Q2 returns the following vector from WCF :

$$ \boldsymbol{R}2=\left[0.31\ 0.08\ 0.285\ 0.18\right] $$

Sort both the vectors R1 and R2 in the descending order, take the intersection of images corresponding to these vector values and store the result in R. Keeping the retrieval size of 3, the images I1, I3, and I4 are retrieved for user illustration.

4 Results and discussion

This section discusses the results achieved in this research. First of all the experimental setup is explained to understand the context of this research. Then two types of evaluation, i.e., objective and subjective, have been carried out to measure the performance of the proposed system.

4.1 Experimental setup

A dataset of seven hundred annotated images (i.e., M = 700) about natural scenes [33] has been used to validate the proposed retrieval model. The dataset consists of five categories (i.e., O = 5), namely sky_clouds, forest, field, waterscapes and landscape with mountains and ten concepts (i.e., N = 10), namely sky, foliage, grass, rocks, mountains, trunks, flower, water, sand and fields. In order to compare our system, we have selected the fuzzy relational ontological model proposed in [22]. In that system, a user query is composed of concepts, categories or the combination of both. When user enters a query, it is expanded based on ontological knowledge. After expansion, relevancy score is calculated between the query keywords and the ontology concepts based on fuzzy operations.

Figure 5 shows the retrieved results of the proposed system (i.e., on the left side) and the reference system (i.e., on the right side) for three different queries with retrieval size of 15. The first row shows the result of query-by-concept, i.e., “flower”, second row shows the result of query-by-category, i.e., “field” and the third row shows the result of query-by-concept & category, i.e., “flower and field”.

Fig. 5
figure 5

Retrieval output of the proposed system and reference system for retrieval size 15. Left column: Proposed system, Right column: Reference system [22]

A total of 167 queries have been designed for evaluation purpose in which 10 queries were based on 01 × concept, such as “sky”, 41 queries were based on 02 × concepts, such as “sky and grass” and 71 queries were based on 03 × concepts, such as “sky grass water”. Similarly, 5 queries were based on 01 × category, such as “Sky_Cloud” and 6 queries were based on 02 × categories, such as “Sky_Cloud and Field”. 34 queries were based on 01 × concept & 01 × category, such as “Sky and Field”. The performance of the proposed system and the reference system has been evaluated in two different ways: (i) objective and (ii) subjective.

4.2 Objective evaluation

The system was objectively evaluated using two different approaches: (i) mean and variance and (ii) precision, recall and average normalized modified retrieval rank (ANMRR). Mean and variance indicate the amount of required information in the retrieved images, whereas precision, recall and ANMRR indicate the retrieval of relevant images in the results. Both approaches are described in detail in the next sections.

4.2.1 Mean and variance

Mean is computed by taking the sum of all the values in the dataset and then dividing it by the total number of values in the dataset. On the other hand, variance measures the spread or variability of the values from mean in the dataset. In order to show that the proposed system retrieves images with maximum amount of the user requested information, mean and variance were computed for the top 15 retrieved results. In this evaluation, mean is the average occurrence of particular concept in an image and variance is the variability of the concept’s occurrences from the mean. Higher value of mean indicates images with higher amount of concept occurrences are retrieved.

Table 1 shows the occurrence of foliage computed using Eq. (4) in the top 15 results for only one query, i.e., “foliage”. The mean (i.e., 0.967) of the proposed system shows that the average occurrence of foliage in the top 15 results is around 1. On the other hand, the mean (i.e., 0.232) of the reference system indicates that the retrieved images contain foliage in lesser quantity. Similarly, the variance (i.e., 0.0017) of the proposed system shows that the variation in the occurrence of foliage in all the retrieved images is very small as compared to the variance (i.e., 0.0372) of the reference system.

Table 1 Occurrence of “foliage” in the top 15 retrieved results of the proposed and reference system in response to query-by-concept

Figure 6 shows the mean and variance of the proposed and the reference system against ten different 01 × concept based queries, described as: C1 = “Sky”, C2 = “Foliage”, C3 = “Mountain”, C4 = “Grass”, C5 = “Field”, C6 = “Rock”, C7 = “Water”, C8 = “Trunk”, C9 = “Flower”, and C10 = “Sand”. It is evident that the proposed system performs better as compared to the reference system.

Fig. 6
figure 6

Comparison of the proposed system with reference model [22] for ten different queries by concept in terms of (a) mean and (b) variance

Table 2 shows the mean and variance of two randomly selected 02 × concepts queries, i.e., Q1 = “Foliage-Trunk” and Q2 = “Rock-Water” and 03 × concepts queries, i.e., Q3 = “Sky-Foliage-Grass” and Q4 = “Sky-Foliage-Field”. Mean and variance values shown in Table 2 are better for the proposed system as compared to the reference system.

Table 2 Comparison of the proposed system with reference system [22] for different queries by concept and categories in terms of (a) mean and (b) variance

Figures 7 and 8 show the results of mean and variance of different concept occurrences in two randomly selected query-by category, i.e., Q5 = “Landscape with mountain” and Q6 = “Forest” respectively. In Q5, results of the reference system are slightly better as it shows presence of concepts C1, C2 and C6, i.e., sky, foliage and rocks in higher amount in the images as compared to the proposed system, whereas in Q6 the proposed methodology performs better in terms of mean and variance.

Fig. 7
figure 7

Comparison of the proposed system with reference system [22] showing occurrence of different concepts in category “Landscape with Mountain” in terms of (a) mean and (b) variance

Fig. 8
figure 8

Comparison of the proposed system with reference system [22] showing occurrence of different concepts in category “Forest” in terms of (a) mean and (b) variance

Figure 9 shows mean and variance of 5 randomly selected query-by-concept & category, i.e., Q7 = “Sky and Field”, Q8 = “Foliage and Sky_Cloud”, Q9 = “Flower and Sky_cloud”, Q10 = “Flower and Field” and Q11 = “Sand and Field”. The plots show the amount of concept present in the top 15 retrieved images from a particular category. From the plots, it is evident that the proposed system performs better as compared to the reference system for all the five queries.

Fig. 9
figure 9

Comparison of the proposed system with reference system [22] showing occurrence of different concepts in different categories in terms of (a) mean and (b) variance

4.2.2 Precision, recall and ANMRR

Three different evaluation measures (i) precision, (ii) recall and (iii) ANMRR [10] were computed for each query. High precision value indicates that more relevant results are retrieved, whereas high value of recall indicates that most of the relevant results are retrieved. ANMRR score indicates the performance of algorithms based on ranking of the results. The low value of ANMRR means the algorithm ranked the results in better way. Readers interested in detail of ANMRR may consult [6, 15, 17] for details. Table 3 shows the result of proposed system and reference system in terms of precision, recall and ANMRR for different retrieval sizes, such as 15%, 30%, 50% and 100%. Varying retrieval sizes allow us to judge a system performance at different levels. For example, for the top 15% results of the proposed system when a query contains 01 × concept, precision value l indicates that all the retrieved results are relevant, whereas recall value of 0.1425 and ANMRR value of 0.8737 indicate that there are still many relevant images in the ground truth list for that query. As the retrieval size is increased, the recall and ANMRR values for 01 × concept for the proposed system change to 0.2911 and 0.6735 respectively, showing that more relevant images are retrieved. For 100% retrieval size, recall and ANMRR values are changed to 0.9816 and 0.0175 respectively. In case of query-by-concept & category, precision, around 1 for the proposed system shows that all the retrieved results are relevant as compared to 0.7647 value of the reference system. However, the reference system shows slightly better results in the case of a query by category.

Table 3 Comparison of the proposed and reference system in terms of precision, recall and ANMRR results for different queries

4.2.3 Discussion

It is evident from the results that the proposed system shows better performance in case of query-by-concept and query by-concept & category. In case of query-by-category, the reference system shows slightly better performance because the dataset contains predefined categories in which the relationship between an image and a category is binary. Also an image can belong to only one category even if the content of an image allows it to belong to different categories with different degrees of membership. For example, see the images in Fig. 10 where the image on the left side belongs to the category “Sky_Cloud” and the image on the right side belongs to the category “Field”. However, they both include similar contents, such as sky, field, mountain and foliage. If the search is based on category (e.g., Sky_Cloud) and retrieval system retrieves image from a different category (e.g., Field) having similar content, then can we judge the system to be successful despite the fact that the evaluation measures compute a very poor result? The proposed system retrieves images based on fuzzy ontology in which the relationship between an image and a category is fuzzy. An image belongs to different categories with different degrees of membership based on the frequency of concepts contained in the image. However, the objective measures used for evaluation show the result on the basis of predefined categories in which an image belongs to just one category and this is the reason why the results of query-by-category of the proposed system are poor. To ensure this problem, the retrieval system performance is subjectively evaluated. The next section shows the subjective results for the same dataset with the same set of queries.

Fig. 10
figure 10

Image on the left side belongs to category “Sky_Cloud” and image on the right side belongs to category “Field”

4.3 Subjective evaluation

Subjective evaluation is carried out based on perception of human observers [18]. The problem of retrieval system evaluation is its relevance, which is a subjective notion. For complete evaluation of the system, users’ expectation is of vital importance. The ranking of retrieved images varies with users depending on the particular content that the users’ attention is currently focused on.

Feedback from 300 observers, 55% male and 45% female, has been recorded in the digital systems laboratory of Computer Engineering Department, University of Engineering and Technology, Taxila (UETT), Pakistan. 280 participants were in the first age group (19–40 years) and their qualification was intermediate or BSc or MSC or PhD. The remaining 20 in the second age group (30–45 years) were the faculty members at UETT. The maximum of three query-result pairs of two different retrieval systems (i.e., proposed and reference) were shown to each user with retrieval size of 15, as shown in Fig. 11. The results are shown in random order, i.e., the user does not know which retrieval system is under evaluation. Each query was evaluated by five users in order to ensure that results are not biased by specific user score. The evaluation process took almost three months to complete the feedback process. The mean opinion score (MOS) in terms of normalized discounted cumulative gain (NDCG) and the mean overall score (O) were recorded from users’ feedback. MOS [29, 36] is a commonly used metric in which each retrieved image is evaluated by selecting a score that ranges from 0 to 5 and is defined as follows:

$$ MOS=\left\{\begin{array}{cc}0,& when\ irrelevant\ image\ is\ retrieved\\ {}1,& when\ slightly\ relevant\ image\ is\ retrieved\\ {}2,& when\ somewhat\ relevant\ image\ is\ retrieved\\ {}3,& when\ relevant\ image\ is\ retrieved\\ {}4,& when\ very\ relevant\ image\ is\ retrieved\\ {}5,& when\ highly\ relevant\ image\ is\ retrieved\end{array}\right. $$
(9)

where 0 ≤ MOS ≤ 5 and MOS is taken from a user against each retrieved image. NDCG is defined as:

$$ NDCG=\frac{1}{Q}\sum_{i=1}^Q DCG, $$
(10)

where Q is the total number of queries, and DCG is defined as:

$$ DCG=\frac{1}{U}\ \sum_{i=1}^{\mathrm{U}}\sum_{j=1}^S\frac{MOS_{i j}}{{ \log}_2 j}, $$
(11)

where MOS ij is the relevancy score of the jth retrieved image assigned by the ith user, U is the total number of users, and S is the retrieval size. Similarly, the mean overall score O associated with each query is defined as:

$$ O=\frac{1}{U}\ \sum_{i=1}^{\mathrm{U}}{u}_i, $$
(12)

where u i is the overall score of the retrieval result by i th user.

Fig. 11
figure 11

Feedback form for subjective evaluation containing 15 retrieved images for query “Sand”

NDCG measures the performance of a system based on graded relevance that varies from 0 to 5 (i.e., 0 = not relevant and 5 = highly relevant). The usefulness of a retrieved image is measured based on its position in a retrieved list. Higher NDCG value indicates that the highly relevant images are retrieved at the top of the list. From Table 4, it is evident that the proposed system outperforms the reference system when evaluated subjectively. The proposed system shows higher mean overall score against all the queries, except when a query contains 01 × category where the reference model shows slightly high value with a difference of 0.0293 which is tolerable. Similarly, NDCG values are higher for the proposed system against all the queries except when a query contains 02 × categories where the reference model shows a slightly high value with a difference of 0.0213 which is acceptable.

Table 4 Comparison in terms of Mean overall score and NDCG of the proposed and reference system

Figure 12 shows mean overall score of five users for 122 queries-by-concept (i.e., 10 queries contain 01 × concept, 41 queries contain 02 × concepts and the remaining 71 queries contain 03 × concepts). From the plot, it is obvious that users are satisfied with the retrieved results of the proposed system as mean overall score for any query lies in the range from 2 to 4.8, whereas for the reference system, it ranges from 0 to 4.

Fig. 12
figure 12

Comparison of the proposed system with reference system [22] for different queries by concept in terms of mean overall score

Figure 13 shows mean overall score of five users for 10 queries-by-category (i.e., 4 queries contain 01 × category and 6 queries contain 02 × categories), whereas Fig. 14 shows mean overall score of five users for 34 queries-by-concept-and-category for the proposed and the reference system. It is evident from the plots that the proposed system performs better in most of the queries as compared to the reference system.

Fig. 13
figure 13

Comparison of the proposed system with reference system [22] for different queries by category in terms of mean overall score

Fig. 14
figure 14

Comparison of the proposed system with reference system [22] for different queries by concept & category in terms of mean overall score

5 Conclusion and future work

In this paper, a fuzzy ontology based system has been proposed for improving the performance of image retrieval. First of all, fuzzy ontology was constructed by utilizing the concepts and categories associated with images. Concepts describe the objects that an image contains and category depicts a scene based on the frequency of concepts inside the image. Concepts, categories and images are linked among themselves with fuzzy values in the ontology. Then users are provided with an interface to input keywords that may consist of concepts, categories or both. Retrieved results are ranked based on the relevancy between the keywords of query and images. The advantages of the proposed model are (i) the relationship between an image and concepts, and image and categories are fuzzy values that resolve the problem of binary annotation and retrieval and (ii) it allows an image to belong to different categories with different degrees of membership based on the content of an image. With the help of reasoning through the ontology, a query asking for either concepts or categories can be expanded with their respective categories and concepts respectively for result improvement.

For evaluating the performance of the proposed system, both objective and subjective measures were used. Objective evaluation results show better performance for query-by-concept and query-by-concept & category whereas for query-by-category the reference system shows slightly better performance. To investigate the reason, we have subjectively evaluated the same set of queries with 300 observers of different age groups and qualifications. The experimental results show that the proposed system achieves higher values for MOS in terms of normalized discounted cumulative gain and mean overall score as compared to the reference system for all sets of queries.

Currently, we are improving the ranking of the retrieved results using fuzzy relations in ontology where user requirements include multiple concepts and categories.