1 Introduction

Indexing and searching images by content-based image retrieval (CBIR) remain the problems with important potential applications in various domains such as art collections, crime prevention, photograph archives, geographical information and remote sensing systems. These problems are mainly due to the complexity or ambiguous content of images such as pictures containing various objects with different colors. Thus, it is necessary to extract useful features that are able to depict the content of images more precisely, then synthesize them in a descriptor (or a set of descriptors). The second objective is the search technique which should be precise and fast.

Since Beta wavelet [1] is a powerful tool in various domains such as image compression [2], face recognition [3, 4], 3D face recognition [5], image classification [6, 7], phoneme recognition [8], speech recognition [9] and in particular Arabic word recognition [10] and hand tracking and recognition [11]; this study used the Fast Beta Wavelet Network (FBWN) modeling to propose a new approach for CBIR.

In this paper, we present a high-level approach for CBIR by proposing a parallel feature aggregation based on Fuzzy Decision Support System (FDSS). First, the shape descriptor is obtained, after an analysis by a FBWN, by calculating the geometric Hue’s moments. The texture descriptor is based on the calculation of energy at each of the first fifth levels of Beta wavelet decomposition. The color descriptor is based on the indexed color map of the image approximated by FBWN. Then, in the retrieval phase, the descriptors of the Query Image (QI) and those of all Reference Images (RI) are compared using a FDSS.

The paper is organized as follow: Sect. 2 will be devoted to present a survey of different CBIR techniques. In Sect. 3, we will present the main idea of the proposed approach. Section 4 will describe the methodology used to extract each of the three descriptors which are the shape , texture and fuzzy indexed color descriptors. Section 5 will be devoted to explain the proposed method for the similarity search using FDSS. Then, we will test the efficiency of our approach which is the purpose of Sect. 6 and end with a conclusion.

2 Related works

Nowadays, many approaches focus on CBIR domain which has been derived in significant applications for many professional groups. These can be categorized into two main approaches: approaches based on low-level content (shape, texture, color) and approaches based on high-level content which take into consideration the semantics of image, for example, approaches with relevance feedback, learning and optimized algorithms (neural networks, Bayesian networks, wavelet networks, ant algorithms, etc).

  • Low-level content approaches

    Low-level content approaches aim to extract low-level features like shape, texture or color or a combination of them in order to measure the dissimilarity between a query image (QI) and a set of reference images (RI). Many techniques, based on low-level features, in the literature can be found. Sunkari [12] discussed the use of two retrieval methods, query by texture ( QBT) and query by color (QBC). The first is done on the basis of color histogram in RGB space, and the second by using invariant histogram. Sunkari showed that QBT performs QBC. Also, in this context, Howarth [13] proved that texture features performed better than using color features, but their combination can boost the retrieval performance results. Other approaches were proposed for features selection and dimension reduction by using principal component analysis (PCA),wavelets, Ripplets and its derivations [1418]. Others combined the visual content in order to increase the robustness and efficiency [1923]. Also, Lande et al. [24] presented an effective approach which combine color, texture and shape features based on the extraction of dominant color of each block, gray-level co-occurrence matrix (GLCM) and Fourier descriptors, respectively. Hiremath and Pujari [25] used a conditional co-occurrence histograms between the image tiles and corresponding complement tiles, as local descriptors of color and texture, and combined them with Gradient Vector Flow fields and Invariant moments for shape feature extraction.

    Furthermore, different works can be cited, based on color features extraction such as histogram color [26], HSV histogram color [27], dominant color [23], weighted Invariant color features [28]. Others are based on texture features such as Local Binary Pattern (LBP) [29], GLCM [30], Motif Co-occurrence Matrix (MCM) [31]. Also, shape features extraction plays an important role in CBIR such as Pseudo-Zernik Moments [32], shape-Adaptive DCT [33]. A comparative survey of low-level features extraction methodologies was presented in [34]. Furthermore, many fusion techniques for descriptors merging can be found in the literature, such as CombSUM [35, 36] which aim to sum the results from different ranking lists, Borda Count combined with CombSUM [36], Z-score median combined with CombSUM [36], and Inverse Rank Position (IRP) [36, 37].

  • High-level content approaches

    High-level content approaches aim to extract high-level features and analyze the semantic content of the image in order to reduce the lack of coincidence between the information that one can extract from the visual data and the interpretation that the same data have for a user in a given situation.

    Bag-of-Words (BoW) model can be considered as a most popular approach for such task, based on color SIFT [38] or HSV color [39]. Also, Liang et al. showed how the integration of the SIFT visual word and binary features such as the use of Color Name (CN) descriptor [40, 41] can enhance the precision of visual matching and reduce the impact of false positive matches [42]. A comparative study of large scale categorization can be found in [4349].

    In other side, Philippe and Matthieu [50] introduced in their paper the most known active learning methods for image retrieval such as Bayes classification, k-Nearest Neighbors [51], neural networks [52, 53], wavelet network [54, 55], lattice trees [5658], Gaussian mixtures and support vector machines. Ekta and Hardeep [59] proposed the use of bayesian algorithm, as a supervised learning and a statistical method for classification, by reducing the noise from images. Some approaches combined low-level content and genetic algorithm for features optimization [60, 61]. Others integrated the user intervention by selecting and marking images as relevant/irrelevant and the system will update the results. This is known as approaches with relevance feedback and many techniques focused on this approach [6265]. In addition, Khadidja and Sihem presented a comparative analysis and major challenges of relevance feedback techniques [66].

  • Techniques to measure similarity

    In order to establish the similarity between images, many approaches have been presented in the literature. Most of them used a simple Euclidian distance, Mahalanobis, Manhattan or Minkowski distance [67]. However, in the case of CBIR systems combining many descriptors together, these distances cannot distinguish the importance of each of these descriptors. Thus, the CBIR system can be mistaken in the results. To remedy this problem, many works concentrated on this subject such as using weighted distance. Others proposed a fuzzy neural approach for interpretation and fusion of features [68, 69]. Syam and Sharon [70] proposed a genetic algorithm in order to merge descriptors and give more flexibility to the system. Bahareh et al. [71] proposed the combination of two Short Term Learning (STL ) methods of JR-N method [72] and SVM method [73, 74], in order to retrieve similar images.

The CBIR technology boils down to two intrinsic objectives: (a) Propose a robust descriptor(s) which can represent useful information which well describe the semantic content of the image. (b) Propose an efficient matching algorithm in order to measure the similarity between the descriptor(s) of QI and those of RI. This latter should discriminate irrelevant images and be fast to satisfy users wishes.

3 Overview of the proposed approach

3.1 Fast Beta Wavelet transform (FBWT)

The FBWT aims to provide a simple way to exploit the multiresolution analysis, i.e. the weights computing of the connections with other techniques are simpler and easier than those based on the projection on dual bases. It was demonstrated [6] that the approximation of a signal f to \(V_{j+1}\) can be calculated from its projection on \(V_j\) with the following equation:

$$\begin{aligned} v_n^{j+1}=\sum _{k}{h[k]v_k^j} \end{aligned}$$
(1)

The H coordinates are known as the low-pass filter or scale function filter. The detail coefficients of the scale \((j+1)\) can be calculated using the approximation coefficients of the scale (j).

$$\begin{aligned} \omega _n^{j+1}=\sum _{k}{g[k]v_k^j} \end{aligned}$$
(2)

The g filter is called the low-high filter or wavelet filter. Here, we applied the multiresolution analysis using only h and g filters and their dual filters for the reconstruction and decomposition stages, respectively. To accelerate the calculation of the approximation and detail coefficients , a fast algorithm for wavelet decomposition and reconstruction using filter banks was invented. Known as FBWT, this algorithm reduced the time consuming steps of the decomposition and the reconstruction considerably. To calculate the approximation of f at scale \(j+1\), the approximation \(v^j\) at scale j is convoluted by the dual filter \(\tilde{h}\). the resulting signal is decimated by 2 to obtain the approximation coefficients. The same steps are repeated using the dual filter \(\tilde{g}\) rather than \(\tilde{h}\) to obtain the detail coefficients \(\omega ^{j+1} \).

As mentioned above, the approximation signal at scale \(j+1\) may also be analyzed. So the same algorithm is applied to obtain the \(v^{j+2}\) and \(\omega ^{j+2}\) coefficients. The process can be iterated to analyze the signal at finer scales. More details for the FBWT algorithm can be found in [6].

3.2 Proposed approach

The proposed approach consists in the combination of two stages: the indexing phase which aims to extract three descriptors of the image (shape, texture and color) and the online phase in which we proposed a new decision making method based on fuzzy logic system with three fuzzy sets.

The proposed approach can be illustrated in the Fig. 1. As shown in this figure, the QI is normalized. After that, the three descriptors (shape, texture and color) are extracted and compared with all the descriptors of the reference images. Finally, the decision is taken using the proposed FDSS with three fuzzy sets.

One of the main focuses of CBIR application is to select the adequate features that describe the useful information fast. In this work, we exploited 2D FBWN analysis in order to extract fast and appropriate shape, texture and color descriptors.

The specificity of FBWN architecture is that it is a Neural Network whose hidden layer is composed of wavelet and scaling functions [1, 2, 6]. It calculates only coefficients of wavelet functions that contribute more to the reconstruction of the signal (image). The idea of this algorithm is that at each time of reconstruction, we choose an optimal wavelet functions (horizontal wavelets (\(\psi _{Hi}\)), diagonal wavelets (\(\psi _{Di}\)) and vertical ones (\(\psi _{Vi}\))) and an optimal scaling function (\(v_{i}\)) to compose the hidden layer of the wavelet network.

With the learning aspect, wavelet network can model and edit unseen visual content (shape, texture and color), which reduces and compresses the network size enormously.

It is very important to determine the features invariant to translation, scale and rotation and with low complexity thus, wavelet network can be an effective solution for such problem.

Fig. 1
figure 1

Overview of the proposed approach

So, after applying FBWN on the QI, we used wavelet coefficients in order to compute the shape descriptor. The scaling coefficients were used to extract the texture descriptor, and the color descriptor was calculated using the approximated image as shown in Fig. 2.

Fig. 2
figure 2

2D FBWN modeling-based features extraction

Fig. 3
figure 3

Shape detection using 2D FBWN modeling

4 Feature extraction

4.1 Shape feature extraction

Shape is one of the most important features and contains the most attractive visual information for human perception [41]. The term shape is used to refer to the information that can be deduced directly from images and that cannot be represented by color or texture; as such, shape defines a complementary space to color and texture. Shape representation techniques used in similarity retrieval are generally characterized as being region based and boundary based. A new 2D FBWN modeling-based shape descriptor was presented in this paper. The shape descriptor is based on three steps: Image filtering, shape detection using 2D FBWN modeling, and Hue moments calculation.

  • Step1 (Image filtering): this step consists in reducing the noise of the image. It counts the number of different colors contained in the matrix of the indexed image L. If this number is greater than a predetermined threshold (presence of noise), the colors of the entire image will be eliminated.

  • Step2 (Shape detection with 2D FBWN modeling): the image will be projected on the 2D FBWN. The coefficients of detail wavelets having the best contributions in the reconstruction of the image are summed in order to obtain the shape.

    The two above steps are illustrated in the Fig. 3.

  • Step3 (Calculation of Hue moments): once we detect the shape with 2D FBWN modeling, we will calculate the seven geometric moment of Hue by dividing the binary image found into four sub-equal parts.

    The moments are calculated by applying the following Eq. (1):

    $$\begin{aligned} M_{p,q}=\sum _{x}{\sum _{y}{x^p y^q I(x,y)}}. \end{aligned}$$
    (3)

    where: (pq) correspond to the moment order and I(xy) represents the pixel value at position (xy).

Fig. 4
figure 4

Extraction steps of texture descriptor

4.2 Texture feature extraction

Texture is one of the major defining features of an image. In image classification texture provides important information as in many images of real world. It can be characterized by energy, entropy, contrast ,or homogeneity.

In this study, the texture descriptor was essentially based on the combination of the use of wavelet and statistical techniques for more precision. Two steps are performed: The first step is to decompose the image using Beta wavelet until the fifth level, because all images are resized to 256*256, so five levels are enough to extract important coefficients of image. The second step was to compute the energy of approximation coefficients at each level, as shown in the following equation:

$$\begin{aligned} E=\frac{1}{\frac{m}{k}\frac{n}{k}}\sum _{i=1}^{\frac{m}{k}}{\sum _{j=1}^{\frac{n}{k}}{\left| {X(i,j)}\right| }}; \quad k \le 5 \end{aligned}$$
(4)

m and n are the dimensions of the image and X(ij) is the pixel (ij) value of the image X. K is the number of decomposition level.

In total, we got a vector of texture containing five components corresponding to energies of approximation coefficients of each decomposition level and the sixth component is the energy of initial image.

The steps of texture extraction of an image are presented in Fig. 4.

4.3 Fuzzy indexed color (FIC) descriptor

Color is a significant visual characteristic for both human vision and computer processing. An efficient color descriptor must characterize two important attributes: indication of the color content of an image and information about the spatial distribution of the colors. In this paper, we present a new color descriptor, based on indexed matrix analysis and FDSS, which we called a Fuzzy Indexed Color (FIC).

The algorithm steps of the proposed FIC descriptor can be divided into two sub-algorithms: the color feature extraction algorithm and the matching algorithm.

4.3.1 Color feature extraction algorithm

  1. 1.

    All images (rgb images) are resized to 256*256 and projected on a 2D FBWN.

  2. 2.

    The image can be defined as a function I defining RGB triplets for image positions \((x,y)\, I\,(x,y)\mapsto (R(x,y),G(x,y),B(x,y))\). So the image was converted to an indexed image with a color map that contains 256 different colors.

  3. 3.

    We count the number of unique colors stored with the corresponding color pixel values (\(v_r,v_g ,v_b\)) and its spatial position (i, j) in the corresponding indexed matrix.

So, we note:

  • \(UC_k\) as the \(k\mathrm{th}\) unique color of an image

  • \(\omega _k\) as the \(UC_k\) weight, which measures the importance of this color in the image and can be obtained by applying the following equation:

    $$\begin{aligned} \omega _k=\frac{Nbre\_ UC_k}{\mathrm{Total}\_ UC \_\mathrm{image}} \end{aligned}$$
    (5)
  • \(CP_{i,j}\) , as the color pixel of \(UC_k\) characterized by three values \((v_r,v_g ,v_b)\) and

  • \(Pos_{i,j}\) as its corresponding position in the indexed matrix

Therefore, each image can be defined by four parameters :\(UC_k, Pos_{i,j}, \omega _k, CP_{i,j}(v_r, v_g , v_b)\).

4.3.2 Fuzzy matching algorithm for color descriptor

The objective of this fuzzy matching algorithm is to select similar RI to the QI. The algorithm is composed of two steps: similarity measure and distances fusion process.

The first step consisted in calculating the weighted Euclidean distance, for each color channel (R,G,B) between the QI and each of RI by applying the three equations below:

$$\begin{aligned} \left\{ \begin{array}{r c l} \mathrm{d}_r=\sum \limits _{i=1}^N{\sum \limits _{j=1}^M{\left| {Q\omega r_{i,j}-R\omega r_{i,j}}\right| } *\sqrt{(Qv_r-Rv_r)^2}} \\ \mathrm{d}_g=\sum \limits _{i=1}^N{\sum \limits _{j=1}^M{\left| {Q\omega g_{i,j}-R\omega g_{i,j}}\right| } *\sqrt{(Qv_g-Rv_g)^2}} \\ \mathrm{d}_b=\sum \limits _{i=1}^N{\sum \limits _{j=1}^M{\left| {Q\omega b_{i,j}-R\omega b_{i,j}}\right| } *\sqrt{(Qv_b-Rv_b)^2}} \end{array} \right. \end{aligned}$$
(6)

Each color pixel in the query image \(CP_{i,j}\)(QI) is compared with the same color pixel in the reference image \(CP_{i,j}\)(RI). If the distance between them is less than a threshold \(\varDelta \) fixed above, then the two color pixel can be considered as similar. Else, we compare it with its eight neighboring pixels.

Once, we obtained the three outputs \(find_r\), \(find_g\) and \(find_b\) corresponding to the three color channel R, G and B, respectively, we can pass to the second step.

This latter aimed to merge these three outputs through using a FDSS in order to obtain a global distance with more smoothness and flexibility.

The process of fusion will be described in details in the next section.

5 Fuzzy decision support system for image retrieval

FDSS requires a normalization of the input data, i.e. they have the same range of values for each of the inputs. This normalization can guarantee stable convergence of weight and biases.

5.1 Data normalization

The idea is to measure the similarity between the query image descriptor D with n components and all images of the database with descriptors Di with i \(\in [1 \ldots nt]\) and nt is the total number of images in the database. The similarity distances (shape and texture) of the image i were calculated by applying the Euclidean distance.

If we note, for example, \(\mathrm{Min}_{DS}\) and \(\mathrm{Max}_{DS}\) the minimum and maximum values of the similarity distances shape of n images of the dataset, the normalized value \(\textit{ND}_{Si}\) will be calculated as follows:

$$\begin{aligned} \textit{ND}_{Si}=\frac{D_{Si}-\mathrm{Min}_{DS}}{\mathrm{Max}_{DS}-\mathrm{Min}_{DS}}. \end{aligned}$$
(7)

5.2 Fuzzy decision support system process

In order to make a final decision, we used a Fuzzy Decision Support System. As each fuzzy system, the support fuzzy decision system goes through three reasoning stages: Fuzzification, Inference and Defuzzification [45]. In this paper, a system is treated with three fuzzy sets that contribute to the decision formulation of similarity degree (SD).

Fig. 5
figure 5

Proposed functional diagram of Fuzzy Support Decision system

Table 1 Confusion matrix for the clusters of WANG database

Figure 5 shows the diagrammatic representation of such system.

  1. 1.

    Fuzzification

    This step is to define the membership functions for the different variables, especially for the input variables. This produces the passage of real magnitudes of linguistic variables (fuzzy variables) which may then be processed by the inference. In this work, we retained a triangular membership function and three sets for each normalized distance similarity.

  2. 2.

    Membership function

    The membership functions can be selected as triangular and defined in the variation domain [0, 1]. This choice was selected because of its simplicity: it needs only a small amount of data to be defined. Also, it is easy to modify its parameters of membership functions, on the basis of measured values of the input, in order to adjust the output of a system.

    • The membership functions relative for inputs are characterized by numbers NCj:

      $$\begin{aligned} NC_j=\frac{j-1}{2} ; j=1,2,3. \end{aligned}$$
      (8)
    • The membership functions relative for output are characterized by numbers \(ss_i\):

      $$\begin{aligned} ss_i=\frac{i-1}{6} ; j=1,2,\ldots ,7. \end{aligned}$$
      (9)
  3. 3.

    Rules of inference

    By linguistic terms, fuzzy rules used to determine the output Similarity Degree (SD) based on different fuzzy sets (inputs) \(DN_S, DN_T and DN_C\). Table 1 shows the rules in the case where three fuzzy sets are considered for inputs. Therefore, this table requires seven fuzzy sets for the output of system (excellent, very good, good, medium, acceptable, low, very low).

    For example, the adopted rule R2 in this case is as follows (see Fig. 6):

    If \(DN_S\) is low

    AND \(DN_T\) is low

    AND \(DN_C\) is medium

    OR

    If \(DN_S\) is low

    AND if \(DN_T\) is medium

    AND if \(DN_C\) is low

    OR

    If \(DN_S\) is medium

    AND if \(DN_T\) is low

    AND if \(DN_C\) is low

    THEN, SD is excellent.

    By matching a minimum to logic gate AND and a maximum to logic gate OR, the value of the output (SD), in this case, is determined by the following equation:

    $$\begin{aligned} D_i\!=\!\mathrm{Max} \left\{ \mathrm{Min}\left[ \mu _{FS_{j1}}(DN_S);\mu _{FS_{j2}}(DN_T);\mu _{FS_{j3}}(DN_C) \!\right] \!\right\} \end{aligned}$$
    (10)

    With:

    $$\begin{aligned} j_1,j_2,j_3 \in [1,2N+1];i \in [1,6N+1]. \end{aligned}$$
    (11)
  4. 4.

    Defuzzification

    The defuzzification consists in converting the fuzzy magnitude to a numeric value, through the resulting membership function for the output. To achieve this transformation, several methods can be considered: Maximum method (it provides a numerical value of the abscissa of the maximum value of the resulting membership function), Method of Maxima (where several subsets have the same maximum height, and their mean is computed) and center of gravity method (the output corresponds to the abscissa of the center of gravity of the surface from the resulting membership function).

    In this paper, we used the method of center of gravity because it is the most commonly used. The principle of this method is to determine the abscissa of the center of gravity of the fuzzy space of the resulting membership function in order to set the output of the system.

    In this case, the knowledge of fuzzy sets \(DN_S\), \(DN_T\) and \(DN_C\) at each sampling period led to the limitation of possible values of the output and introduced new \(ESS_i\) fuzzy sets (sets of possible values for \(ss_i\) if the rule \(R_i\) is applied) . The calculation of the normalized output with the barycenter method determine the barycenter of the assigned \(ss_i\) of coefficients \(D_i\).

    $$\begin{aligned} \mathrm{SS}=\frac{\sum _{i=1}^{7}\mathrm{ss}_iD_i}{\sum _{i=1}^{7}D_i}. \end{aligned}$$
    (12)

    The value SS representing the final decision of the system is a normalized value between 0 and 1.

Fig. 6
figure 6

Table of proposed rules of a FDSS with three fuzzy sets and their corresponding outputs (SD)

Fig. 7
figure 7

Examples of Wang clusters

Fig. 8
figure 8

Samples of INRIA holidays database

Fig. 9
figure 9

Samples of UKBench database

Fig. 10
figure 10

The 16 selected categories from ImageNet database

Fig. 11
figure 11

Comparison of precision rates between the proposed FIC descriptor and others color descriptors

Fig. 12
figure 12

Comparison of performance between the proposed texture descriptor and other ones

Fig. 13
figure 13

Comparison of precision between the proposed shape descriptor and shape based on Pseudo-Zernike moments

Fig. 14
figure 14

Impact of the combination of the different proposed descriptors on the precision rates

Fig. 15
figure 15

Wang Precision comparison between the proposed approach and other approaches

6 Results

To evaluate the performance of the proposed approach for image retrieval, we performed some experiments on some most popular databases:

  • Wang dataset:Footnote 1 which contains 1000 images classified into ten clusters (100 image per cluster) : buses, dinosaurs, elephants, food, flowers, African people, mountains, beaches, and horses as shown in Fig. 7. From each cluster, 50 images were randomly selected as query images.

  • INRIA Holidays dataset:Footnote 2 it is mainly composed of 5063 personal holiday photos and partitioned into 500 groups of a variety of objects (people, nature, water, houses, etc\(\dots \) ) (see Fig. 8).

  • University of Kentucky benchmark (UKBench) dataset:Footnote 3 which contains 10200 images, corresponding to 2550 groups of 4 images each. All the images are of the same size 640 \(\times \) 480. The different samples are presented in Fig. 9. The usual performance score is the mean number of images ranked in the first 4 positions (N-S score).

  • ImageNet dataset:Footnote 4 this database contains over 15 million labeled high-resolution images belonging to about 22,000 categories. In this study, we used only the 16 most popular Synsets cited in the ImageNet website:Footnote 5 Animal (fish, bird, mammal, invertebrate), Plant (tree, flower, vegetable), Activity (sport), Material (fabric), Instrumentation (utensil, appliance, tool, musical instrument), Scene (room, geological formation), Food (beverage). For each synset, we used 100 images. Some samples of the database are given in Fig. 10.

The performance of retrieval of the system can be measured in terms of its recall and precision. Recall measures the ability of the system to retrieve all the models that are relevant, while precision measures the ability of the system to retrieve only the models that are relevant.

They are defined as:

$$\begin{aligned} \mathrm{Precision}= & {} \frac{\mathrm{Number}\;\mathrm{of}\;\mathrm{relevent}\;\mathrm{images}\,\mathrm{retrieved}}{\mathrm{Total}\;\mathrm{number}\;\mathrm{of}\;\mathrm{images}\;\mathrm{retrieved}}\nonumber \\= & {} \frac{X}{X+Y}. \end{aligned}$$
(13)
$$\begin{aligned} \mathrm{Recall}= & {} \frac{\mathrm{Number}\;\mathrm{of}\;\mathrm{relevent}\;\mathrm{images}\,\mathrm{retrieved}}{\mathrm{Total}\;\mathrm{number}\;\mathrm{of}\;\mathrm{relevent}\;\mathrm{images}}\nonumber \\= & {} \frac{X}{X+Z}. \end{aligned}$$
(14)

where X represents the number of relevant images that are retrieved, Y the number of irrelevant items, and Z the number of relevant items which were not retrieved. The number of relevant items retrieved is the number of the returned images that are similar to the query image in this case.

The total number of items retrieved is the number of images that are returned by the search engine.

To evaluate the performance of proposed descriptors (color, shape and texture), many experiments have been done in order measure the efficiency of each proposed descriptor.

Beginning with experiments on Wang dataset, below is a curve (Fig. 11) which compares the proposed FIC to other color descriptors such as Color Histogram, HSV color histogram, Dominant Color and Weighted Invariant Color features. We can clearly remark that our FIC descriptor gives better results and this is because, in addition to color distribution information, it treats spatial pixels relationship. Besides, the integration of a FDSS in order to merge data allows to obtain a more flexible and robust result.

Also, some Texture and shape feature extraction methods are compared with our texture and shape descriptors, and they are presented in Figs. 12 and 13 respectively.

These experiments show the robustness of our three descriptors, and this can be explained by the use of wavelets and the FWT based on multiresolution analysis which ,at each scale, analyzes the image to extract more finer details. It is clear that an image can be described with three primordial visual content (shape, texture and color). So, merging these three contents can improve the results instead of treating each content separately.

An evaluation of different descriptors fusion scenarios (color& shape/color & texture/shape & texture/color & shape & texture) is given in Fig. 14.

The recall and precision measures of our approach can be summarized in the following confusion matrix (see Table 1).

Table 2 Comparison of Mean Average Precision (MAP) between the proposed fusion method and other fusion techniques
Table 3 Comparison of Mean Average Precision (MAP) between the proposed method and other techniques
Fig. 16
figure 16

Precision rates of each of the 16 ImageNet categories

Table 4 Performance of the propose approach with the corresponding Query time
Fig. 17
figure 17

Example of retrieved images from Wang database with confusion between classes

Fig. 18
figure 18

Example of retrieved images from ImageNet database

For Wang database, we compared our approach with the three approaches presented in [19, 20] and [21] and the results reported are very promising and provide better performance (see Fig. 15).

In order to evaluate the robustness of the proposed Fuzzy Decision Support System (FDSS), four further fusion methods were tested (see [36] for more details):

  • CombSUM: This is a simple method which can be described as the addition of all normalized scores per image. The normalization of a score (Nscore) is performed by applying the Min–Max Normalization procedure, which is given by the Eq. 15.

    $$\begin{aligned} Nscore=\frac{Value-MinValue}{MaxValue-MinValue} \end{aligned}$$
    (15)

    where Value is the score of the image i in the ranked list j before normalization. MaxValue and MinValue correspond to the maximum and the minimum scores, respectively in the ranked list. Therefore, CombSUM(i) is calculated as follow:

    $$\begin{aligned} CombSUM(i)=\sum _{j=1}^{N_j}Nscore_j(i) \end{aligned}$$
    (16)

    where \(N_j\) refers to the number of result sets to be merged.

  • Borda Count+CombSUM: The Borda count accords the maximum Borda Count points to the most relevant image in each ranked list \(L_j\). Each ulterior image gets one point less. Therefore, the Borda Count points of the image i (\(BC_j(i)\)) in the list \(L_j\) is calculated using the following equation:

    $$\begin{aligned} BC_j(i)=N-Rank_j(i) \end{aligned}$$
    (17)

    where \(Rank_j(i)\) takes integer value belonging to \([0, N-1]\).

    Then, the CombSUM is applied to these \(BC_j\) for each image, as shown in Eq. 18.

    $$\begin{aligned} BC(i)=\sum _{j=1}^{N_j}BC_j(i) \end{aligned}$$
    (18)

    Finally, the images are sorted according to their BC points in the descending order.

  • Z-SCORE with Median+CombSUM: The idea here is to combine CombSUM and Z-score with Median to improve the results. Z-score with Median is a linear normalization which indicates how much a score deviates from the median of the distribution. Using the median instead the mean can minimize any effects due to extreme (very high or very low) results.

    Z-SCORE with Median is performed using the Eq. 19.

    $$\begin{aligned} \mathrm{Z-SCORE}_\mathrm{Median}=\frac{MeasuredValue-MedianValue}{StandardDeviation} \end{aligned}$$
    (19)

    IRP: The Inverse Rank Position merges ranked lists in the decreasing order of the inverse of the sum of inverses of individual ranks.

    Equation 20 indicates how compute the IRP distance for each image IRP(i).

    $$\begin{aligned} \mathrm{IRP}(i)=\frac{1}{\sum _{j=1}^{N_j} {\frac{1}{Rank_j(i))}}} \end{aligned}$$
    (20)

    where \(Rank_j(i)\) \(\in \) \( \left[ 1, N \right] \). Then, the images are ranked based on their IRP distances.

Although these measures can be computed very efficiently, they lack from semantic information. However, the proposed fusion method (FDSS) is based on fuzzy decision with more flexibility which leads to get more reasonable decision demonstrating its superiority (see Table 2).

Furthermore, we validate our approach on Holidays and UKBench databases, and the reported results seem promising (see Table 3), especially for UKBench and this because the wavelet is a powerful tool and it is invariant rotation, translation and dilatation.

Also, we used some samples from ImageNet database, which can be grouped into 16 different categories of images. Figure 16 presents the precision rates obtained for each category.

Table 4 resume the performance of the proposed approach with the corresponding Query time.

We remark that merging color, shape and texture features can improve considerably the performance of the proposed system of content-based image retrieval, with a respectable search time. Some examples of retrieved images are presented in Figs. 17 and 18.

7 Conclusion

In this paper, we presented a semantic approach for content-based image retrieval. This approach consists of two stages: the first stage was for features extraction (shape, texture and color descriptors) and the second stage was for data fusion using a Fuzzy Support Decision system with three fuzzy sets.

The shape descriptor was based on FBWN modeling combined with the seven Hue moments. The texture descriptor was based on the calculation of Energy of the first fifth Beta wavelet decomposition. Finally, we presented a new fuzzy color descriptor based on indexed color map. In the second stage we tried to merge the distances between the query image and the reference ones using a FDSS in order to measure the similarity between them. The results obtained are satisfactory and ensure the robustness of the proposed approach.

Having an index with a huge number of features can increase the complexity of the computational cost algorithm. So, in a future work, we aim to decrease this computational cost through creating just one wavelet network for each class instead of one wavelet network for each image which will improve the results perfectly.