1 Introduction

With the seamless integration between devices and Internet in Internet of things (IoT), millions of images are accessed through social networking. The huge amount of data due to digital media, smart phone, and likewise result in ‘Big data’. Since these data are continuously uploaded, traditional “compute and storage” method does not able to handle the massive amount of data. In addition, most of the data are unstructured, however, although accompanied by explicit structured (e.g. geo-location and timestamps). This tendency leads large number of peoples to access large image database. A major part of these visual data becomes publicly available from blogs, social networking websites, image and video sharing websites, wikis, often guided by explicit structured and unstructured text annotations.

Furthermore, crowdsensing is the concept of a group of individuals with mobile devices collectively monitoring, sharing, and utilizing sensory information of their environment with common interests [1]. The huge amount of mobile users and the pervasive distribution of humans significantly expand the potential monitoring area to a global scale. However, since crowdsensing participators vary frequently, the sensory information is exhibited in various forms (images, audio, video, etc.), and the data accuracy is different. Extracting appropriate information from a huge amount of information are the main challenges here. Then how one can locate specific data such as image from a sea of images? This tough issue can be solved by content-based image retrieval (CBIR) scheme. CBIR is a demanding research area. Most popular implementation of this content-based image retrieval is Google image search with input as image instead of a query.

The requirement of CBIR has received extensive attention and huge numbers of solutions have been given in [2,3,4,5,6,7,8]. Malik and Baharudin [2] presented a technique that uses statistical texture information. This information is taken out from discrete cosine transform (DCT) blocks based on direct current (DC) and first three alternating current (AC) coefficients. For effective matching, various distance measurement techniques are used to calculate the similarities. Mohamed et al. [3] used texture and color information to extract related images with an average precision of 65%. Thawari and Janwe [4] explored color histogram, HSV and texture moments and achieved 53% precession. Duan et al. [5] presented a technique which combines shape, color and texture features for retrieval process. Singh et al. [6] proposed a technique that select the most suitable attribute (feature) to evaluate the newly received images to increase the retrieval efficiency and accuracy. For similarity assessment between query and database images, the scheme calculates the feature vectors after segmentation. To design approximate similarity retrieval Tang et al. [7] presented a neighborhood-discriminant-hashing (NDH) technique. The scheme is based on NDH by using local discriminative information. Li et al. [8] described an algorithm called subspace learning, for image retrieval. The scheme picks up an appropriate representation of data based on image feature learning and understanding by an integrated framework. The subspace learning is used to decrease the semantic differences between the low-level and high-level semantics.

Image quality estimation shows a significant role in the area of image processing. Image quality is the property of any image that is usually compared with an ideal or perfect image. The process like compression and acquisition etc may affect the quality of an image. Therefore, in many image-based applications precise evaluation of image fidelity is a significant step. The objective image fidelity evaluation is essential in multimedia applications since they eliminate or decrease the requirement for extensive subjective assessment. In the present paper, we use mean-structural-similarity-index measure (MSSIM) and feature-similarity-index measure (FSIM) as both of the techniques are based on HVS, are also widely used & cited in the literature and also have low computation cost [9].

In this paper, a content-based image retrieval technique is presented using full reference objective image quality metric. The major findings of this paper are described as follows:

  1. 1.

    Selection of proper IQA model that has low computational cost, easy to compute, and low execution time.

  2. 2.

    Combine the models that results better value of precession and recall for CBIR.

To the best of our information this feature of IQA has not yet been discussed by researcher for CBIR applications.

The organization of the article is as follows: Sect. 2 presents MSSIM for IQA and Sect. 3 discusses FSIM for IQA. Section 4 illustrates how MSSIM and FSIM are used jointly in CBIR for efficient feature selection. Section 5 presents the experimental results and discussion. Finally, conclusions and scope of future works are depicted in Sect. 6.

2 Preliminary on MSSIM for IQA

The MSSIM algorithm [10] believes that HVS can greatly reorganize the structural information from an image signal. Therefore, the system attempts to form the structural information of an image signal. The scheme splits the job of similarity calculation into three assessments i.e. luminance, contrast, and structure. Let, \({{\varvec{a}}}=\left\{ a_j |j= 1, 2,\ldots M\right\} \) and \({{\varvec{b}}}= \left\{ {b_j|j=1,2,\ldots M} \right\} \) be the un-distorted and distorted image signal, respectively. First, the luminance of each image signal (original and distorted) is evaluated. This is anticipated as the average intensity i.e.

$$\begin{aligned} \mu _a = \frac{1}{M}\sum \limits _{j = 1}^M {a_j}. \end{aligned}$$
(1)

So, the luminance similarity measurement is a function of \(\mu _a\) and \(\mu _b\). It is represented by L(a, b). Then the technique eliminates the average intensity from the image signal. The resultant image signal (\({{\varvec{a}}}-\mu _a\)) matches with the projection of vector ‘a’ onto the hyper-plane. It is described as:

$$\begin{aligned} \sum \limits _{j = 1}^M {a_j} = 0. \end{aligned}$$
(2)

The algorithm calculates signal contrast based on the standard deviation. The discrete form becomes:

$$\begin{aligned} \sigma _\mathrm{a} = \left( {\frac{1}{{M - 1}}\sum \limits _{j = 1}^M {\left( {a_j - \mu _a } \right) ^2 } } \right)^{^{1/2}}. \end{aligned}$$
(3)

The contrast estimation is the similarity measurement of \(\sigma _a\) and \(\sigma _b\). It is represented by C(a, b). Third, the image signal is divided with its own standard deviation. The structural similarity measurement S(a, b) is calculated on these normalized image signals \(({{\varvec{a}}}-\mu _a)/\sigma _a\) and \(({{\varvec{b}}}-\mu _b)/\sigma _b\). Finally, those three elements are pooled to represent an overall similarity assessment,

$$\begin{aligned} {{\varvec{Sf}}}({{\varvec{a}}},{{\varvec{b}}})=F\left( L\left( {{\varvec{a}}},{{\varvec{b}}}\right) ,C\left( {{\varvec{a}}},{{\varvec{b}}}\right) ,S\left( {{\varvec{a}}},{{\varvec{b}}}\right) \right) . \end{aligned}$$
(4)

A significant point is that the three elements are relatively independent. Luminance comparison is calculated as:

$$\begin{aligned} L({{\varvec{a}}},{{\varvec{b}}}) = \frac{{2\mu _a \mu _b + k_1 }}{{\mu _a^2 + \mu _b^2 + k_1 }}, \end{aligned}$$
(5)

Constant ‘\(k_1\)’ is incorporated to avoid instability in the above equation. This condition is occurred when \(\mu _a^2+\mu _b^2\) is very close to zero. The contrast similarity is calculated as:

$$\begin{aligned} C({{\varvec{a}}},{{\varvec{b}}}) = \frac{{2\sigma _a \sigma _b + k_2 }}{{\sigma _a^2 + \sigma _b^2 + k_2 }}. \end{aligned}$$
(6)

Structural similarity is conducted after luminance subtraction and variance normalization. Structural similarity is represented as:

$$\begin{aligned} S({{\varvec{a}}},{{\varvec{b}}}) = \frac{{\sigma _{ab} + k_3 }}{{\sigma _a \sigma _b + k_3 }}. \end{aligned}$$
(7)

The component \(\sigma _{ab}\) is represented as:

$$\begin{aligned} \sigma _{ab} = \frac{1}{{M - 1}}\sum \limits _{j = 1}^M {\left( {a_j - \mu _a} \right) } \left( {b_j - \mu _b } \right) . \end{aligned}$$
(8)

Finally, the above three relationship of (5), (6), and (7) are pooled to estimate SSIM index between image signals a and b.

$$\begin{aligned} \mathrm{SSIM}({{\varvec{a}}}, {{\varvec{b}}}) = \left[ L\left( {{\varvec{a}}}, {{\varvec{b}}}\right) \right] ^{p1} \cdot \left[ C({{\varvec{a}}}, {{\varvec{b}}}) \right] ^{p2} \left[ S({{\varvec{a}}}, {{\varvec{b}}})\right] ^{p3}, \end{aligned}$$
(9)

where \(p1>0,\,p2>0,\,p3>0\). Those factors are used to regulate the relative significance of the three components. In order to simplify the term, we set \(p1=p2=p3=1\) and \(k_3=k_2/2\). This results in an explicit form of SSIM as:

$$\begin{aligned} {\mathrm{SSIM}}({{\varvec{a}}}, {{\varvec{b}}}) = \frac{{\left( {2\mu _a \mu _b + k_1} \right) \left( {2\sigma _{ab} + k_2 } \right) }}{{\left( {\mu _a^2 + \mu _b^2 + k_1 } \right) \left( {\sigma _a^2 + \sigma _b^2 + k_2 } \right) }}. \end{aligned}$$
(10)

Finally, MSSIM is given as:

$$\begin{aligned} {\mathrm{MSSIM}}({{\varvec{a}}}, {{\varvec{b}}}) = \frac{\mathrm{1}}{{\mathrm{B}^{\prime}}}\sum \limits _{i = 1}^{\mathrm{B}^{\prime}} {{\mathrm{SSIM}}({{\varvec{a}}}_i, {{\varvec{b}}}_i )}, \end{aligned}$$
(11)

The symbols \({{\varvec{a}}}_i\) and \({{\varvec{b}}}_i\) are the image information at the ith local patch, i.e. the block. The symbol \(\mathrm{B}^{\prime}\) is the number of samples in the image.

3 Preliminary on FSIM for IQA

The FSIM algorithm depends on the assumption that HVS identifies an image signal based on its low level image information. In this scheme, phase-congruency (PC) is incorporated as main information. On the other hand, gradient-magnitude (GM) is incorporated as secondary information. The calculation of FSIM index consists of two phases [11]. The local comparison map is calculated first. Then the algorithm groups the similarity map into a single similarity index.

Let, the original and the distorted image signal are represented by a and b, respectively. The PC maps calculated from a and b are \(PC_a\) and \(PC_b\), respectively. Similarly, the symbols \(G_a\) and \(G_b\) are the GM maps calculated from them. The algorithm then divides the feature similarity calculation between a(x) and b(x) into two elements, i.e. for PC or GM. The similarity calculation (for a given location x) for \(PC_a\)(x) and \(PC_b\)(x) is represented as:

$$\begin{aligned} Sim_{\mathrm{PC}} (x) = \frac{{\mathrm{2}PC_\mathrm{a} (x) \cdot PC_\mathrm{b} (x) + \mathrm{c}_\mathrm{1} }}{{PC_\mathrm{a}^\mathrm{2} (x) \cdot PC_\mathrm{b}^\mathrm{2} (x) + \mathrm{c}_\mathrm{1}}}, \end{aligned}$$
(12)

where the symbol \(c_1\) is a positive constant. In the same way, \(G_a({{\varvec{x}}})\) and \(G_b({{\varvec{x}}})\) are evaluated. The similarity measurement is calculated as:

$$\begin{aligned} Sim_{\mathrm{G}} (x) = \frac{{\mathrm{2}G_{\mathrm{a}} (x) \cdot G_{\mathrm{b}} (x) + {\mathrm{c}}_{\mathrm{2}} }}{{G_{\mathrm{a}}^{\mathrm{2}} (x) \cdot G_{\mathrm{b}}^{\mathrm{2}} (x) + {\mathrm{c}}_{2} }}, \end{aligned}$$
(13)

where the symbol \(c_2\) is a positive constant. Then, \(Sim_{PC} ({{\varvec{x}}})\) and \(Sim_G({{\varvec{x}}})\) are pooled together to obtain the similarity \(Sim_L({{\varvec{x}}})\) as:

$$\begin{aligned} Sim_\mathrm{L} ({{\varvec{x}}}) = \left[ {Sim_{PC} ({{\varvec{x}}})} \right] ^\alpha \left[ {Sim_G ({{\varvec{x}}})} \right] ^\beta , \end{aligned}$$
(14)

where the symbols \(\alpha \) and \(\beta \) are used to regulate the relative significance of PC and GM features, respectively. For a given location x, if a(x) or b(x) has a major PC value, it represents that this location has higher impact on HVS. Therefore, \(PC_{max}({{\varvec{x}}})\) is calculated as \(PC_{max}({{\varvec{x}}}) = {\mathrm{max}} (PC_a({{\varvec{x}}}), PC_b({{\varvec{x}}}))\) to power the significance of \(Sim_L({{\varvec{x}}})\) in the overall comparison between a and b. Finally, the FSIM index is calculated as:

$$\begin{aligned} FSIM = \frac{{\sum \nolimits _{{{\varvec{x}}} \in {\varOmega } } {Sim_\mathrm{L} ({{\varvec{x}}})} PC_{max} ({{\varvec{x}}})}}{{\sum \nolimits _{{{\varvec{x}}} \in {\varOmega } } {PC_{max} ({{\varvec{x}}})} }}, \end{aligned}$$
(15)

where the symbol \({\varOmega }\) represents the spatial domain of image signal.

3.1 Color image quality assessment

The FSIM can be applied to color image also [11]. At first, RGB (red, green, blue) color images are transformed into YIQ color model. In YIQ color model, Y corresponds to the luminance feature and I and Q represent the chrominance feature. Let, the symbols \(I_a\) (\(I_b\)) and \(Q_a\) (\(Q_b\)) are the I and Q chromatic information of the images a(b), respectively. The similarities between chromatic features are defined as:

$$\begin{aligned} Sim_l ({{\varvec{x}}})= \frac{{\mathrm{2}l_{\mathrm{a}} (x)\cdot l_{\mathrm{b}} (x) + c_3 }}{{l_a^2 (x) + l{}_b^2 (x) + c_3 }}, \end{aligned}$$
(16)
$$\begin{aligned} Sim_Q ({{\varvec{x}}})= \frac{{\mathrm{2}Q_\mathrm{a} (x)\cdot Q_\mathrm{b} (x) + c_4 }}{{Q_a^2 (x) + Q{}_b^2 (x) + c_4 }}, \end{aligned}$$
(17)

where the symbols \(c_3\) and \(c_4\) are positive constants. Then \(Sim_I({{\varvec{x}}})\) and \(Sim_Q({{\varvec{x}}})\) is pooled to obtain the chrominance comparison measure. It is expressed by:

$$\begin{aligned} Sim_c ({{\varvec{x}}}) = Sim_l ({{\varvec{x}}})\cdot Sim_Q ({{\varvec{x}}}). \end{aligned}$$
(18)

Finally, the FSIM is extended to \(FSIM_c\) by integrating the chromatic information as:

$$\begin{aligned} FSIM_c = \frac{{\sum \nolimits _{x \in {\varOmega } } {Sim_\mathrm{L} (x)} \cdot \left[ {Sim_c (x)} \right] ^\lambda \cdot PC_{max} (x)}}{{\sum \nolimits _{x \in {\varOmega }} {PC_{max} (x)} }}, \end{aligned}$$
(19)

where the symbol \(\lambda >0\) is used to adjust the significance of the chromatic components.

4 Image feature extraction and similarity measure for the proposed CBIR technique

It is already mentioned that MSSIM algorithm imagines the fact that HVS is highly recognize for extracting structural information of a scene. But, this scheme does not deal with the color information of the image. However, the color information is one of the most significant visual properties to express an image. In the present CBIR scheme, both the structural and color features are incorporated for image retrieval. The color feature comparison is measured by

$$\begin{aligned} S_{Color} ({{\varvec{x}}}) = \left[ {Sim_l ({{\varvec{x}}})\cdot Sim_Q ({{\varvec{x}}})} \right] ^\lambda . \end{aligned}$$
(20)

In the present scheme, the value of \(\lambda =0.03\). Finally, the similarity of two images are measured as:

$$\begin{aligned} F_{Sim} = \alpha \times L({{\varvec{a}}},{{\varvec{b}}}) + \beta \times C({{\varvec{a}}},{{\varvec{b}}}) + \gamma \times S({{\varvec{a}}},{{\varvec{b}}}) + \delta \times S_{Color}, \end{aligned}$$
(21)

where the symbols \(\alpha ,\,\beta ,\,\gamma \), and \(\delta \) are the positive load factors. The load factors mean the amount of significance of individual feature in CBIR. In other words, if all loads are evenly significant, all load values must be 1/4 (i.e. \(\alpha +\beta +\gamma +\delta =1.0\) must hold).

The proposed technique consists of two modules namely, image features extraction and similarity matching of images. In the first stage, all the images are acquired, one after another from the image database and then feature extraction are done. After that feature vector is formed by the extracted image features which are stored in a database called feature database. From the ‘Y’ channel of the image, the luminance, contrast and structural information are extracted, while color information is extracted from ‘I’ and ‘Q’ components as described in Sects. 2 and 3, respectively. In second phase, user enters the search image to extract related images from the image database. The feature vector of the search image is then calculated and compared with the feature vectors of the database by calculating the equivalence as described in (21). According to the search image, the most related images are then displayed.

5 Results and discussion

In this section, we highlight large number of test to prove the effectiveness of the proposed technique for retrieval of images. We perform experiments on widely available benchmark image database, i.e., Corel dataset [12]. In the present scheme \(\alpha =0.148,\,\beta =0.148,\,\gamma =0.038\), and \(\delta =0.666\). The weight factors are obtained experimentally carried out over huge number of images with diverse characteristics(features) of image in the database. All the images in the database are in form of red–green–blue (RGB) color format as depicted in Fig. 1.

Fig. 1
figure 1

Benchmark images. a People, b buses, c dinosaurs, d elephants, e roses, f horses, g foods, h beaches, i buildings, j mountains

In image retrieval technique, the precession (Pr) value is defined as the amount of extracted appropriate images for the query image to the sum of extracted images. The symbol Pr is expressed by:

$$\begin{aligned} Pr = \frac{M}{N}, \end{aligned}$$
(22)

where the symbol ‘M’ represents the related extracted images and the symbol ‘N’ is the sum of extracted images from the image database. The recall (Re) in CBIR is expressed as the amount of retrieved related images to the sum of images in the image database. The symbol Re is represented as:

$$\begin{aligned} Re = \frac{M}{O}, \end{aligned}$$
(23)

where the symbol ‘M’ signifies the associated extracted images and the symbol ‘O’ is the sum of associated images in the image database. The ‘Pr’ and ‘Re’ values calculate the correctness and efficacy of image retrieval related to the database and query images. However, above two calculations cannot be considered individually as an absolute correctness for the efficient image retrieval. As a result, the above two measurements can be pooled to provide a single value that is called F-score, which is defined as:

$$\begin{aligned} F = 2 \times \frac{{\left( {Pr \times Re} \right) }}{{\left( {Pr + Re} \right) }}. \end{aligned}$$
(24)

In this paragraph, we explain the quantitative assessment for the benchmark images. Tables 1, 2 and 3 depict, the experimental outcome, in form of ‘Pr’, ‘Re’ and ‘F’ for all categories of images. From Table 1, 2 and 3, it is cleared that proposed technique offers attractive result using IQA. It is also cleared that when MSSIM and FSIM are used individually, offers poor results than the proposed scheme. The experimental results in Tables 1 and 2 are secured as the mean value of hundred autonomous tests carried out on huge images in database. Figures 2, 3, 4, 5, 6 and 7 show the retrieval results for different images.

Table 1 Average precision (Pr) value for the proposed technique
Table 2 Average recall (Re) factor for the proposed technique
Table 3 Average F-score (F) value for the proposed technique
Fig. 2
figure 2

Experimental outcome of ‘Roses’ image

Fig. 3
figure 3

Experimental outcome of ‘Horses’ image

Fig. 4
figure 4

Experimental outcome of ‘Elephants’ image

Fig. 5
figure 5

Experimental outcome of ‘Beaches’ image

Fig. 6
figure 6

Experimental outcome of ‘Dinosaurs’ image

Fig. 7
figure 7

Experimental outcome of ‘Buildings’ image

To express the comparative evaluation of the proposed technique for similarity measure, we evaluate it with the correlated techniques available in the literature. Table 4 depicts that the technique provides improved results compared to the related techniques. This is because, the present technique uses both structural and color information for the retrieval. All the above experiments are done in Pentium 4, 512 MB RAM, 2.80 GHz processor in MATLAB 7. The average execution times (in s) are 10.45, 116.99, and 14.14 for MSSIM, FSIM and proposed technique.

Table 4 Comparative performance measure (Precision in %)

6 Conclusions and scope of future work

This paper, presents a CBIR scheme using IQA model. In this technique, a combination of four image features i.e. luminance, contrast, structure and color information are used for similarity measure. Performance assessment is calculated on standard benchmarks like Pr, Re and F value. The proposed technique is only tested on image based query for similarity calculation. The future work includes the semantic annotation-based method to improve retrieval performance for text based query. Moreover, optimum selection of weight factors is another research issue.