1 Introduction

Approximately from the last two decades, the amount of generated multimedia data has increased enormously due to advancements and consumer electronics availability, e.g., smartphones. This type of data is often shared with other persons or on various social media platforms. In one year, approximately 250 Billion and 40 Billion images were shared on Facebook and Instagram, respectively [1], which reflects the growth in multimedia data usage. With data sizes at this scale, it is challenging to retrieve specific images from a massive-scale repository. Recently, cloud service offered users a convenient and cost-effective way to store and share images. This involves transferring data to an unknown third-party as a cloud server to store, handle, and retrieve in the future.

Content-based image retrieval (CBIR) services typically incur high storage and computation complexities. Simultaneously, the cloud has the capabilities that can match the demand of high computation and gigantic storage for the large-scale CBIR system, hence making the outsourcing of data practicable. By outsourcing CBIR functionalities to the cloud, the data owner may divert his/her duties of maintaining a local image database and satisfying user/application requirements. Despite its many benefits, cloud platforms pose security concerns related to image storage, transmission, and retrieval [2]. Images often contain personal- or business-sensitive information which must be handled with care. A reliable cloud service is expected to offer users secure means to transfer, share, and access data. For instance, a patient may want to share his/her medical images with their doctor. Additionally, a cloud service provider (CSP) may act maliciously, or a security infringement may compromise his systems. Hence, there is a risk that user’s data may be leaked or accessed by an unauthorized entity. A common practice to protect against data breaches is encrypting users’ images before uploading them to a cloud server. These stored images are usually retrieved using queries. In the healthcare scenario, a hospital outlets a large number of medical images, often on a remote cloud server(s). A doctor can review past cases of similar medical conditions for better analysis and treatment of a patient. The doctor may generate a query to retrieve similar cases based on medical features. If the CSP CBIR’s service is not effective enough, it can return wrong images leading to inaccurate diagnosis.

Hence, to provide security, users encrypt their images before transferring them to the cloud. Image retrieval is carried out based on the encrypted features of these images. Initially, images will be retrieved using the annotation attached to them depending on the visual perception, which may vary from person to person for an image. However, with a large number of images, it is impractical to manually annotate each image. CBIR [3, 4] presents a feasible solution to retrieve similar images from a large set of images. In the CBIR system, the image features play a vital role in the retrieval process. There must be a close association between the primitive visual image features and the image’s actual content. These primitive image features include color [5], texture [6], and shape [7]. During the initial phase of CBIR, researchers tend to incorporate only one feature. Based on only one feature, the results are derived. But now, various combinations of these primitive visual features are fused to form the final feature vector, and the same is used to attain better retrieval performance. In CBIR, image features are extracted in two ways. First one, i.e., the global image features, the selected image feature(s) is extracted from the entire image. But due to the increased enhancement in image quality and with such large variation within a small portion of an image, global features do not efficiently address the locality and spatiality of an image. This results in the degradation of the overall system performance. Hence, feature extraction starts with emphasizing the local visual features from non-overlapping blocks before merging them together. It is known that the local image features correlate more with the actual content. A minor change in an image orientation or size creates a big difference in the retrieved results. To overcome this issue, various invariant features [8] are incorporated. Another critical aspect of a conventional CBIR system is the similarity matching distance, e.g., Euclidean, Manhattan, Jaccard, Minkowski, etc. Based on the selected similarity measurement and distances, the top indexed images are the final retrieval results. When the feature vectors are also encrypted, conventional CBIR similarity measurements cannot be applied. In a conventional CBIR system, all the operations take place at one end. However, insecure image retrieval scenarios, CBIR operates at three different ends, as depicted in Fig. 1. Image features are extracted from the original images at the owner’s part, then these images and feature vectors are encrypted and deployed at the cloud server. An authorized user issues an image query based on the image feature vector. This query image feature vector is again encrypted and stored in the cloud. Now, the cloud server is responsible for generating similar image indices and provide results.

This article focuses on secure image retrieval from cloud servers. The aim is to achieve an acceptable overall image retrieval system performance while maintaining data security and privacy. To achieve this, both the transferred images as well as their respective retrieved feature vectors are encrypted before transfer to the cloud server. This is essential as feature vectors may reveal some information to a malicious entity or eavesdropper. As feature vectors are also encrypted, a technique based on an asymmetric scalar-product-preserving encryption (ASPE)-based similarity measurement is implemented to retrieve image indexes.

In the literature, existing methods of secure image transfer assume that either the cloud server is fully honest-but-curious, or the registered user is fully trusted. However, these assumptions are not always met in real-world environments. Additionally, many systems have incorporated multiplicative group-based schemes to encrypt their data which incur a high computational cost. Therefore, in this article, we propose a scheme that provides a mechanism for securely transmitting images from an untrusted cloud to a designated user. A key management center is incorporated as a fully trusted-third party to keep track of key data exchange and to prevent forgery from the user side. The main contributions of the proposed CBIR system are as follows:

  • The image retrieval performance is improved based on the color, shape, and texture image features.

  • It is capable to handle rotational features by considering invariant features.

  • A database owner can store images and their associated features using the proposed image encryption scheme and an ASPE respective.

  • It is based on four entities, specifically, the database owner, CSP, authorized user, and a fully trusted KMC that establishes a trusted environment between the aforementioned entities.

The rest of the paper is organized as follows. In Sect. 2, similar works in the literature are critically reviewed. In Sect. 3, the overall system structure is discussed. In Sect. 4, we briefly present the various techniques used during the course of this article. In Sect. 5, the details of the feature vector construction are presented. In Sect. 6, various security requirements in a CBIR system are presented. In Sect. 7, we have represented the objective of this secure proposed CBIR system. In Sect. 8, various security measures supported by the proposed CBIR system are presented. Section 9 discusses the adopted updating mechanism. Section 10 presents the performance evaluation results. Section 11 concludes the article and discusses future work avenues.

Fig. 1
figure 1

Component diagram of a basic CBIR system

2 Related work

The past decade witnessed unprecedented growth in the amount of image data generation [9, 10]. To meet the demand for image data processing, cloud service emerged as a suitable technology for effective image storage, sharing, and retrieval. Often, data hosted on the cloud in an encrypted form, which requires an efficient searchable encryption system to transfer images from a legitimate person in a secure way [11]. A searchable encryption system allows the user to get similar images from an encrypted form. Originally, this method was applied on text documents to find a document. However, with the increased availability of multimedia data like images, searchable encryption has been implemented for image retrieval as well. Song et al. [12] had proposed one of the proposals to provide high retrieval time from a large database. However, their study did not cover statistical attacks. Then, Chang et al. [13] enhanced the [12] efficiency with encrypted hash tables to create image indexes. In this work, image pixels are encrypted using a suitable stream cipher and later from these retrieved encrypted images Markov features are directly extracted. Curtmola et al. [14] designed another searchable encryption scheme that incorporates some security features and achieves the optimal search time. All of these mentioned works are very early works which are related to searchable encryption. These works only support Boolean searching majorly for documents which result in whether a particular encrypted document is present or not in a database. In [15], cloud-based image retrieval method is proposed. This method may reveal all the content to the cloud service provider. The work [16] constructs the probably first scheme that performs image retrieval on encrypted images. Initially, image features are extracted to form the visual word and later to rank the similar images Jaccard similarity has been opted between the query image feature vector and the image feature database. In order to preserve the visual content of the image min-hash algorithm and order-preserving encryption has been imposed. In the work [17] proposed another scheme namely SEISA. The scheme has been used secure k-means along with access control to update an image dataset on a cloud server. In [18], respective feature vectors are constructed to represent the concerned image. Later, a hash table in combination with local sensitive hashing is incorporated to increase the efficiency of the system. In this scheme, a kNN-based algorithm is used to protect the feature vector. Similarly, the work proposed in [19] supports multi-owner such that individual owners have their keys so that no one can access other’s content. This work registered user is assumed to be always trusted, but it is always not feasible to achieve. Xia et al. [20] presented a secure CBIR system where local visual image features are extracted to form the image feature vector and earth mover’s distance has been used as similarity measurement. In [21], image encryption is applied based on the encrypted indexes. This method enhances the efficiency of the system by introducing automatic database upgrading in the cloud. Several works have been proposed which uses homomorphic encryption-based searchable encryption. This kind of scheme has a multiplicative group-based encryption process and similarity finding. As in the work [22], Bellafqira et al. proposed a homomorphic encryption-based system that incurs high computation and does not achieve accurate retrieval results. The work proposed by Weng et al. [23] provides two layers of security. At the first level, query image content and respective feature are made secured by incorporating robust hash values. This scheme gives access to the registered client to exclude some bits from the hash values. This step enhances the uncertainty at the server’s end. Now, it will be really hard for the cloud server to know about the client’s information. Thereafter, the client search to get the best available result. In this work, cloud server and client’s privacy is preserved to either of them as only hash values are reciprocated. In the work proposed by Zhou et al. [24], authors have presented a secure scheme for e-health care in cloud-assisted environment which can efficiently preserve the privacy of the medical data and its corresponding extracted visual words. Here, privacy has been preserved based on the homomorphic encryption. Later, this scheme incorporated disease modeling outsourcing and also it’s early-stage intervention. These are taken care by an efficient privacy-preserving function correlation matching retrieved from dynamic medical text mining and a secure image feature extraction mechanism for the medical images. The work presented by Shengshan et al. [25] proposed an efficient privacy preserving scheme which is based on extraction of secure scale-invariant feature transform (SIFT) over a large volume of images. This work starts by pointing out the issues with previous similar works which lacks in security, practicality, or overall efficient system designing. Neither of the previously proposed similar works can optimize the use of secure SIFT image features in terms of robustness and distinctiveness. Further, they design a new efficient and secure scheme to meet the practicality. They started the scheme with splitting the input image. Second, incorporated two schemes for multiplications and secure comparison and lastly, algorithmically spread image feature extraction to two unknown cloud servers. In the literature, there has been very few works that consider the case where in a trusted user may also turn unfaithful during the course of time for some favor. To deal with this scenario, Zhihua et al. [26] presented a solution where the feature vector is constructed using EDH, color layout descriptor, color structure descriptor, and scalable color histogram. Then kNN-based security is imposed. This scheme is complex and incurs high computational cost. Based on this literature review, there is a need to design a system which can deal with unfaithful registered user efficiently. To address this gap, we present a solution where local invariant features are extracted to form the feature vector. Then, ASPE-based security is applied. A fully trusted third party based on KMC is introduced for generating image encryption keys, feature vector encryption keys, storing them, and transferring results. In this work, a registered user can access only the legitimately allowed number of images from the cloud in a secure environment. The overall system structure is presented in the following section.

3 An overview of the system structure

Our proposed system comprises four entities, namely image owner, untrusted cloud server, the authorized user, and KMC. The overall structure of the proposed system is pictorially depicted in Fig. 2.

Fig. 2
figure 2

Overall structure of the proposed system

An image owner could be an individual or organization which wants to use a cloud service and has image database(s) \({\mathrm {DB}}_I\). The main role of the image owner is to extract the visual image features based on color, texture, and shape and combine them together to form the feature vector of an image i such that \(i\in {\mathrm {DB}}_I\) and \({\mathrm {FV}}_i\). Afterward, all the feature vectors are combined to form the feature vector database \({\mathrm {DB}}_{{\mathrm {FV}}}\). Next, the image owner setup a key exchange protocol (KEP) [27] between the image owner and the KMC to attain M and \(K_i\). Then, the next duty of the image owner is to encrypt the entire image database using \(K_{i}\), received from KMC, to get \(E_{{\mathrm {DB}}_I}\) and encrypt \({\mathrm {FV}}_{{\mathrm {DB}}}\) using M to attain \(E_{{\mathrm {DB}}_{{\mathrm {FV}}}}\). Finally, the image owner communicates \(E_{{\mathrm {DB}}_I}\) and \(E_{{\mathrm {DB}}_{{\mathrm {FV}}}}\) to the specific cloud server for storage and future retrieval of similar images.

Fig. 3
figure 3

Flowcharts illustrating data flow from both data owner and authorized users ends

An authorized data user is a legitimate entity that queries an image and wants to retrieve similar images. One can illustrate an authorized user as a doctor working in a hospital who is permitted to access patient data stored on the cloud server. The user first runs the similar image feature extraction process from the query image to form the query feature vector \(Q_{{\mathrm {FV}}}\). Then, to retrieve similar images from the cloud, the user uses a KEP where the KMC provides a key \(K_{{\mathrm {KMC}}}\). The user encrypts the \(Q_{{\mathrm {FV}}}\) using the cryptographic key \(K_{Q}\) received from KMC using suitable KEP and sends it to the KMC. The KMC retrieves similar results from the cloud and sends them back to the user with keys related to those image’s index only. The roles of the image owner and the authorized users are illustrated using the flowcharts in Fig. 3.

The cloud server offers massive storage capacity. The functionalities of cloud are to accumulate \(E_{{\mathrm {DB}}_I}\) and \(E_{{\mathrm {DB}}_{{\mathrm {FV}}}}\). When the cloud server receives an encrypted query feature vector \(E_{Q_{{\mathrm {FV}}}}\) from the KMC, the cloud server performs similarity matching between \(E_{Q_{{\mathrm {FV}}}}\) and \(E_{{\mathrm {DB}}_{{\mathrm {FV}}}}\) and generates the required indexes. In this work, just for the retrieval part, the cloud server is assumed to be honest-but-curious, i.e., it will perform this task fairly and produced the results correctly. Finally, the cloud server sends back those encrypted images along with indices to the requesting KMC. One another duty of the cloud server is to update the image database or the feature vector database when directed by the image owner.

Fig. 4
figure 4

Flowcharts illustrating data flow from both the CSP and the KMC ends

The KMC is assumed to be a fully trusted third party. The duties of KMC include generating different keys, storing keys, collaboration with the cloud server to retrieve images and provide desired results to the user. When the KEP is set up between KMC and the image owner, the KMC provides a feature vector encryption key, i.e., invertible key M and image encryption keys \(K_i\) to the image owner to encrypt the feature database and each image. Another KEP runs between the KMC and the user to provide a random key to the user to encrypt the generated \(Q_{{\mathrm {FV}}}\) and have \((E_{Q_{{\mathrm {FV}}}})_{K_{Q}}\) before it sends them to the KMC. KMC decrypts \(Q_{{\mathrm {FV}}}\) and again encrypt it with \(M^{-1}\). Then, the KMC transfers this query feature vector to the cloud to find similar encrypted images. When the KMC receives the encrypted images from the cloud, it transfers them to the concerned user with the set of keys on only those images. Various operations of KMC and CSP are illustrated in Fig. 4. In the following, we summarize various steps in the proposed scheme:

  1. 1.

    The image owner extracts various visual image features and combines them to form the final feature vector. Then, the KEP is set up between the KMC and the image owner so that the image owner receives the feature vector key and image encryption keys. The image owner provides the KMC with the list of the registered users. A user encrypts the image database and feature database before she/he transfers them to the cloud server and find similar images later. When a new user is registered, the image owner sends an updated list so that the KMC can overwrite it with the older list.

  2. 2.

    The registered user extracts the same features from the query image and forms the query feature vector. The user runs a KEP with the KMC to get the one-time key to encrypt the query feature vector. After encrypting the query feature vector, it sends it to the KMC.

  3. 3.

    After receiving the encrypted query feature vector from the user, the KMC decrypts it and re-encrypt it using \(M^{-1}\) and sends the result to the cloud server.

  4. 4.

    When the cloud server receives the request from the KMC, it runs the designated similarity matching algorithm and finds the indexes and encrypted images. Then, the cloud server transmits these retrieved results to the KMC.

  5. 5.

    The KMC sends those retrieved images and their respective keys to the user where the user decrypts them to obtain the final retrieved results.

Various communication steps take place during the course of the proposed CBIR system, starting from owner feature extraction to the registered user getting the desired images. All four system entities are depicted in Fig. 5.

Fig. 5
figure 5

Complete overview of communication between image owner, CSP, KMC, and authorized user in a cloud-assisted environment

4 Preliminaries

In this section, we briefly explain various techniques that are used by our proposed CBIR system.

4.1 Quaternion moments

According to Hamilton [28], a quaternion, Q, can be generalized as a complex number such that \(Q= r+s{{\hat{x}}}+t{{\hat{y}}}+u{{\hat{z}}}\), where r is the real part and others are the complex one. Here \(r,s,t,u \in R\) and \({\hat{x}}\), \({\hat{y}}\), \({\hat{z}}\) are the complex units such that

$$\begin{aligned} {\hat{x}}^2= & {\hat{y}}^2={\hat{z}}^2={\hat{x}} \times {\hat{y}} \times {\hat{z}} = -1\\ {\hat{x}} \times {\hat{y}}= & - {\hat{y}} \times {\hat{x}}= {\hat{z}}, \\ {\hat{y}} \times {\hat{z}}= & - {\hat{z}} \times {\hat{y}}= {\hat{x}}, \\ {\hat{z}} \times {\hat{x}}= & - {\hat{x}} \times {\hat{z}}= {\hat{y}}.\\ \end{aligned}$$

Here, a quaternion is called pure when its real part is zero \((r=0)\). The conjugate and modulus of Q are defined as:

$$\begin{aligned} Q^c= & r-s{{\hat{x}}}-t{{\hat{y}}}-u{{\hat{z}}}\\ |Q|= & \sqrt{r^2+s^2+t^2+u^2}\\ \end{aligned}$$

To the best of our knowledge, Ell and Sangwine [29] have first described an RGB image as a quaternion. Let IM(xy) be an RGB color image, it can be represented as a three of its primitive channel as a pure quaternion as \(IM(x,y)= {IM}_R(x,y){{\hat{x}}} + {IM}_G(x,y){{\hat{y}}}+ {IM}_B(x,y){{\hat{Z}}}\) where \(I_R(x,y), I_G(x,y),\) and \(I_G(x,y)\) are the red, green, and blue color channels of a pixel.

In this paper, we use six different orthogonal moments [30], namely Quaternion Zernike Moments, Quaternion Pseudo Zernike Moments, Quaternion Legendre-Fourier Moments, Quaternion Exponent Moments, Quaternion Radial Harmonic Fourier Transform Moments, and Quaternion Radial Substituted Chebyshev Moments.

4.2 Tchebichef moments

The scaled Tchebichef polynomials are defined as

$$\begin{aligned} {\bar{T}}_n (x) = {{T_n (x)}\over {\beta (n,N)}} \end{aligned}$$
(1)

where Tn(x) is the discrete Tchebichef polynomial of degree n, and (nN) is a suitable constant which is independent of x. Under the above transformation, the squared-norm of the scaled polynomials gets modified according to the following formula

$$\begin{aligned} {\bar{\rho }} (n,N) = {{\rho (n,N)}\over {\beta (n,N)^2 }} \end{aligned}$$
(2)

We now define the Tchebichef moments [31] as

$$\begin{aligned} T_{pq}=\frac{1}{{\bar{\rho }}(p,N)\bar{\rho }(q,N)}\sum _{x=0}^{N-1}\sum _{y=0}^{N-1}{{\bar{t}}}_p(x) {{\bar{t}}}_q(x) f(x,y) \end{aligned}$$
(3)

where \(x,y= 0,1,2,3,\ldots , N-1\)

5 Feature vector construction

A reliable CBIR system extracted features must reflect the actual content of an input image. Since the local image features are more associated with the image’s actual content, hence, in our proposed CBIR system, we incorporate local color, shape, and texture visual image features to enhance the accuracy of the returned results. In this paper, we have also taken care of the fact that if the input image is scaled to smaller or larger dimension the content of the image does not change. But this scaling can lead to different output images for various scaled images. Hence, in this work, we have not only fused local image features but those local image features which are invariant in different ways.

In this work, the feature extraction process has been divided into two stages. The first stage evaluates the color and shape features providing color feature vector and shape feature vector. The second stage comprises of texture visual image feature that results in a texture feature vector. All three feature vectors are combined to form the final image feature vector. For the color evaluation, we adopt Quaternion moments. This provides not only local, but also rotation, scale, and translation invariant color features. Then, in the same stage, we deploy Tchebichef moments for the shape features. This process provides scale and translation invariant shape features. The result of this stage is color and shape features. These processes are described in Algorithm 1.

figure a

In the second stage, texture information is evaluated converting the input RGB image to its YCbCr counterpart. After that, the Y-component has been decomposed. Then, texture information is extracted into two parts. In the first part, 2-D DT-CWT [32] is applied directly on the Y-component, and some statistical parameters are evaluated. In the second part, the first Hilbert transformation [33] of the decomposed Y-component is taken to avail more directional textural information. Then, a 2-D DT-CWT is applied to this transformed Y-component followed by evaluating some statistical parameters. Finally, to form the texture-based feature vector, both parts of texture information are merged. The texture feature extraction process is summarized in Algorithm 2.

figure b

To complete the feature vector process color, shape, and texture feature vectors are fused. The pictorial representation of the feature extraction process is shown in Fig. 6.

Fig. 6
figure 6

Feature vector generation process

figure c

5.1 Secure content-based image retrieval

Before retrieving the results, an authorized user first set up the key exchange between itself and its KMC to get a user query key \(U_{{\mathrm {Key}}}\). The registered user extracts the shared image features from the query image to form the query feature vector \(Q_{{\mathrm {FV}}}\). Then, the user encrypts the \(Q_{{\mathrm {FV}}}\) using \(K_Q\) to get \(({E_{Q_{{\mathrm {FV}}}}})_{K_Q}\) before sending it to the KMC. Since the key exchange is secret, no other entity can access the actual feature vector. On receiving the \(({E_{Q_{{\mathrm {FV}}}}})_{K_Q}\), the KMC decrypts it. Then, the KMC encrypts it using \(M^{-1}\) of the corresponding owner to obtain \(({E_{Q_{{\mathrm {FV}}}}})_{M^{-1}}\). The KMC transfers \(({E_{Q_{{\mathrm {FV}}}}})_{M^{-1}}\) to the CSP. Because the feature vector is encrypted, the traditional similarity measures cannot be imposed here.

In this work, we use the kNN similarity measurement technique to find similar indices at the CSP end. After retrieving the indices, the CSP transfers these indices along with the corresponding images to the KMP. The KMP forwards those images with only the keys required to decrypt them. After receiving the decrypted images and the associated keys, the authorized user decrypts the images and gets the final set of plain images.

6 Embedded security measures

In this proposed CBIR scheme, we implement security measures on the images and the retrieved feature vector. The feature vector can reveal information to a fraudulent person, e.g., an image with a blue color on the higher side, histogram values will be higher. This could be an indication that the images may represent beaches, mountains, or sky categories. Hence, it is necessary to embed security not only to protect images but also their feature vectors. In this section, we give the details of the different security measures implemented in our CBIR system.

6.1 Image encryption

Figure 7 depicts a cryptographic image encryption scheme that is based on confusion–diffusion paradigm. Initially, a colored image is decomposed into its principal components. Then, individual components pass through several rounds of confusion to scramble their pixels. This image goes through several rounds of overall confusion–diffusion process so that its statistical attributes can be modified. The diffusion process can be executed for multiple rounds to achieve better secrecy. However, this process is applied on the pixel level which means that each and every pixel is involved in image encryption making it more computationally expansive. In our proposed CBIR system, we use image encryption based on bitplanes levels. Initially, each decomposed color channel is divided into its respective bitplanes. Then, these bitplanes are randomly shuffled. The first eight bitplanes are encrypted as the red channel, the second eight bitplanes as the blue channel, and so on. These three encrypted channels are combined together to form the initial shuffled image. Now, here a logistic map is considered where the seed value will be treated as a cryptographic key and the sequence obtained from the logistic map can be mapped to a number of 24 digits. Later, the bit-wise XOR operation is performed to get the final encrypted image. This process is carried out for N number of rounds to increase the encryption security (see Fig. 8). Various algorithmic steps involved in the image encryption process are presented in Algorithm 4.

Fig. 7
figure 7

Conventional confusion–diffusion structure of a color image encryption scheme

figure d
Fig. 8
figure 8

Our proposed image encryption scheme

6.2 Security analysis of the image encryption scheme

A robust image encryption scheme output cipher should not reveal any visual or statistical information about the encrypted data. Often, the pixels in an image are redundant and highly correlated, which preserves some statistical patterns. These patterns could reveal information on the content of the image; hence, pixel correlation must be minimal to reduce the risk of statistical attacks. In this paper, we analyze our algorithm under the influence of histogram analysis, entropy, peak signal-to-noise ratio (PSNR), and correlation coefficient.

6.2.1 Histogram analysis

The histogram of an image gives the number of pixels that belong to particular pixel intensity. For an 8-bit image, the pixel intensities range from 0 to 255. In different channels of an RGB image, one bin value will be higher than the other. For a good cipher image, it is always considered that the histogram should be symmetric, i.e., all bin values should be similar. In Fig. 9, we have displayed the histogram of individual channels as then the histogram of each cipher image.

6.2.2 Entropy-based analysis

Shannon [34] proposed a way to represent the information held by an image in terms of intensity. Intensity of an image can be calculated by Eq. 4.

$$\begin{aligned} {\mathrm {ENT}}=-\sum _{i=0}^{2^h-1}h_i\log _{2}(h_i) \end{aligned}$$
(4)

where i is the probability of the pixel intensity \(h_i\). As, in the original image, the pixel intensities are very different. Figure 9 shows the entropies for red, green, and blue color channels are 7.8133, 7.4219, and 7.5184, respectively. In the cipher image, as all the intensities are almost the same, their entropies should also be close to 8. In our proposed scheme, the calculated entropies are 7.9981, 7.9983, and 7.9994 which are very close to 8. This demonstrates that our proposed scheme produces a highly random cipher image.

Fig. 9
figure 9

Analysis of the proposed image encryption scheme

6.2.3 PSNR based analysis

The performance of a cryptographic system can also be measured in terms of the PSNR [35]. Let I(xy) be the original image which is encrypted using key to get \(C_I\). The \(C_I\) undergoes noise attacks to obtain \(C_I^{'}\). Now, \(C_I^{'}\) is decrypted to get \(I^{'}(x,y)\). If the dimension of the image is \(l \times k\) and pixel depth is h, then PSNR can be evaluated using Eq. 5.

$$\begin{aligned} {\mathrm {PSNR}}=10 \times \log _{10}\frac{(2^h-1)^2}{\frac{1}{l \times k}\sum _{x=1}^{l}\sum _{y=1}^{k}\{I^{'}(x,y)-I(x,y)\}^2} \end{aligned}$$
(5)

For a cipher image, the value of the PSNR should be less than 10 dB. In this work, the proposed scheme produced the PSNR values of 7.94 dB, 7.19 dB, and 7.40 dB for the respective color component. This proves the proposed scheme produces a good cipher image.

6.2.4 Correlation coefficient-based analysis

In a plain image, neighboring pixel intensities are very similar, i.e., they are highly correlated. It is a requirement for a cipher image that all the pixel values are bifurcated randomly to reduce their correlation. As visualized in Fig. 10, for a plain individual color component, the values of the correlation coefficient in each direction are close to 1, but the values for the cipher images are close 0 which reflects that they are highly irrelated.

Fig. 10
figure 10

Correlation analysis of the proposed image encryption scheme

6.3 Feature vector encryption and decryption

Before transmitting the image feature database to the cloud server, each image feature is encrypted using a secret key \(K_i\). Here, we have deployed encryption scheme based on ASPE that conserves scalar products. Distance comparisons are performed to find the neighbors of a query image feature vector. If the image feature vector \(||f_{v}||\) is known to the attacker, he will come to know that \(f_{v}\) is located on the hypersphere, which is in turn centered through the origin with a radius \(||f_{v}||\). The location of \(f_{v}\) will not be known, yet, the information which is revealed will identify the middle ground. In our proposed CBIR, this information is kept hidden by encrypting of both the feature vector \(||f_{v}||\) and \(||f_{vq}||\), and kNN is applied on such encrypted data. Initially, we have computed \(||f_{v}||^{2}\) which is an image feature vector of \(d+1\) dimension. We create \({f_{v}}\) from \((d+1)-st\) dimension database feature vector \(f_{v}\). The d dimension of \({f_{v}}\) is the same as \(f_{v}\). The dimension \((d+1)\) is calculated as \(-0.5||f_{v}||^{2}\). Then, the extended database feature vector are altered using ASPE. Likewise, before applying the ASPE query, the image feature vector is also extended to \((d+1)\) dimension in \({f_{vq}}\). The \((d+1)\) dimension in \({f_{vq}}\) as is assigned the value 1.

The shortcoming of this basic technique is that the decoded feature vector \({f_{vq}}\) all lies on a d-dimensional hyperplane with the unit vector in the \((d+1)-st\) dimension being the ordinary of the hyperplane. Since APSE is a direct change, the encoded feature vector all lies on a d-dimensional hyperplane in the transshipped space too. The attacker can decide the normal of that hyperplane in the changing space. By considering the typical in the first space and the ordinary in the changing space, the attacker gets unwanted data. To overcome this problem, we present an arbitrary factor. For each inquiry \(f_{vq}\), we produce an irregular number \(r > 0\) and scale \({f_{vq}}\) by r, i.e., \({f_{vq}}=r(f_{vq}^{T},1).\) Theorem 2 shows this scaling does not affect the accuracy of the distance comparison operation. This operation is summarized in the following steps:

  • Key: a \((d+1)\times (d+1)\) invertible matrix M.

  • Feature vector encryption function \(E_{T}(\cdot )\): Consider a database feature vector \(f_{v}\). First, create a feature vector \({f_{v}}=(f_{v}^{T},-0.5||f_{v}||^{2})^{T}\) of dimension \((d+1)\). Second, compute an encrypted feature vector \(f_{v}^{'}= M^{T}{f_{v}}\).

  • Query Image feature vector encryption function \(E_{q}(\cdot )\): Consider a query image feature vector \(f_{vq}\). Generate a random number \(r>0\). Then, create a feature vector \({f_{vq}}=r(f_{vq}^{T}, 1)^{T}\) of dimension \((d+1)\). Finally, compute the encrypted query feature vector as \(f_{vq}^{'}= M^{-1}{f_{vq}}\).

  • Distance comparison operator: Let \(f_{v1}^{'}\) and \(f_{v2}^{'}\) be the encrypted feature vectors of \(f_{v1}\) and \(f_{v2}\), respectively. To know whether the feature vector \(f_{v1}\) is nearer to the query image feature vector \(f_{vq}\) than \(f_{v2}\), we calculate \((f_{v1}^{'}- f_{v2}^{'})\cdot f_{vq}^{'} >0\), where \(f_{vq}^{'}\) is the encrypted feature vector \(f_{vq}\).

  • Decryption Function: Consider an encrypted feature vector \(f_{v}^{'}\). The feature vector \(f_{v}= \pi _{d}M^{T^{-1}}f_{v}^{'}\), where \(\pi _{d}\) is a \(d\times (d+1)\) matrix and \(\pi _{d}=(I_{d},0)\), where \(I_{d}\) is \(d \times d\) identity matrix.

7 Design objectives of the proposed CBIR system

In this section, we discuss the need for embedding security at the different levels of a CBIR system. Then, the other functional requirements are presented.

7.1 Eavesdropping

The user’s objective is to transmit images from the cloud to only a legitimate person. That information must be kept secret from any other unauthorized entity. For that, all the images and feature vectors are encrypted with different keys. When the data are captured from the communication channel an eavesdropper will not able to learn the content of the transmitted messages.

7.2 Untrusted CSP

The cloud server cannot be fully trusted; therefore, users must secure their data before transfer it to the cloud. In our proposed CBIR, the cloud server is assumed to be honest-but-curious just for the retrieval part, i.e., he will perform the retrieval task according to the designated algorithm and return the actually generated results back to the KMC without any interference. All the images and image feature vectors are encrypted and image similarity is computed on encrypted feature vectors. This prevents the cloud from learning the content of its hosted data.

7.3 Authorized registered user

Most solutions in the literature assume that the registered user is a fully trusted entity. However, in real-life implementations, insider threats present a serious challenge. In our proposed CBIR system, a registered data owner provides only the visual image encryption algorithm. The KMC sends him a temporary key to encrypt the query image feature vector. Then, this encrypted feature vector is transmitted back to the KMC. The KMC re-encrypt this using \(M^{-1}\) and transfers the data to the cloud. When the cloud sends the images back to the KMC, it redirects them to the user who possesses a key that is only related to those images.

7.4 Image and feature vector privacy

The proposed CBIR provides image privacy from the CSP, unauthorized entity, and in some cases even from authorized entities. Only the data owner has all the original image and feature vector content. Content stored on the cloud is encrypted with keys that are kept secret on the KMC. At the same time similarity matching is performed on the encrypted images and the encrypted feature vector.

7.5 Retrieval accuracy

Image feature vectors are constructed using local invariant features which combine color, texture, and shape information. The proposed CBIR system exploits the strong association between the actual content and local image features.

7.6 Overall system efficiency

Real-time applications require low computation time delay. Hence, the proposed CBIR system does not incorporate any multiplicative group-based scheme. It also reduces the number of steps required to transmit the images from the cloud to an authorized person.

8 Security analysis

Theorem 1

Scalar-product-preserving encryption is distance-recoverable.

Proof

Let \(f_{v1}^{'}\) and \(f_{v2}^{'}\) be the encrypted points of \(f_{v1}\) and \(f_{v2}\), respectively, in a database. A function f can be defined as

$$\begin{aligned} f(f_{v1}^{'}, f_{v2}^{'})= \sqrt{f_{v1}^{'}.f_{v1}^{'}-2(f_{v1}^{'}.f_{v2}^{'})+ f_{v2}^{'}.f_{v2}^{'}} \end{aligned}$$

Since the encryption process conserves the scalar product, we can say that

$$\begin{aligned} RHS= \sqrt{f_{v1}.f_{v1}-2(f_{v1}.f_{v2})+ f_{v2}.f_{v2}} \end{aligned}$$

Let E is an encryption function and \(E(f_{v1},K)\) is the encrypted value of a feature vector \(f_{v}\) with the key K. E is an ASPE subject to E preserves the scalar products, two other types do not preserve, i.e.,

\([i]f_{vi}.f_{vq}=E(f_{vi},K).E(f_{vq},K)\) for any \(f_{vi}\) in DB and \(f_{vq}\) feature vector of query image.

\([ii]f_{vi}. f_{vj}\ne E(f_{vi},K).E(f_{vj},K)\) for any \(f_{vi}\) and \(f_{vj}\) in DB.

In Definition 1, the encrypted value of a query feature vector \(f_{vq}\) should be different to any point \(f_{vj}\) in DB, even when \(f_{vq}= f_{vj}\) . This requires that query feature vector and database feature vector must be encrypted such that the encryption functions \(E_{T}()\) and \(E_{q}()\) in the encryption outline are different.

The scalar product of \(f_{v}\), \(f_{vq}\) (signified as column vectors) is denoted as \(f_{v}^{T}If_{vq}\) where \(f_{v}^{T}\) is the transpose of \(f_{v}\) and I is an \(d\times d\) identity matrix. I can be further decomposed to \(MM^{-1}\) for any invertible matrix M, i.e., \(f_{v}^{T}f_{vq}= (f_{v}^{T}M) ( M^{-1} f_{vq})\).

If we set \(f_{v}^{'}= E_{T}(f_{v},K)= M^{T} f_{v}\) and \(f_{vq}^{'}=E_{q}(q,k)= M^{-1}q\), getting the value of \(f_{v}\) and \(f_{vq}\) is not possible for anyone from \(f_{v}^{'}\) and \(f_{vq}^{'}\)without knowing M. Likewise, the \(f_{v}^{'T}f_{vq}^{'}=\) \(f_{v}^{T}MM^{-1}f_{vq}= f_{v}^{T}f_{vq}\), i.e., the scalar product is conserved. Assume \(f_{v1}^{'}\) and \(f_{v2}^{'}\) are the encrypted feature vector of image 1 and image 2 of a database, then \(f_{v1}^{'T}f_{v2}^{'}= f_{v1}^{T}MM^{T}f_{v2}\), which is different from \(f_{v1}^{T}f_{v2}\). Therefore, ASPE is applied by using M and \(M^{-1}\) as the alterations for a database image feature vector and query image feature vector, correspondingly. \(\square\)

Theorem 2

Let \(f_{v1}^{'}\) and \(f_{v2}^{'}\) be the encrypted feature vector of \(f_{v1}\) and \(f_{v2}\), respectively. The feature vector \(f_{v1}\) is nearer to a query image feature vector \(f_{vq}\) than \(f_{v2}\), the evaluation of \((f_{v1}^{'}- f_{v2}^{'}).f_{vq}^{'} >0\), where \(f_{vq}^{'}\) is the encrypted feature vector \(f_{vq}\).

Proof

Consider that

$$\begin{aligned}&{(f_{v1}^{'}}- f_{v2}^{'}).f_{vq}^{'}=(f_{v1}^{'}- f_{v2}^{'})^{T}.f_{vq}^{'}\\&\quad =(M^{T}\hat{{f_{v1}}}-M^{T}\hat{{f_{v2}}})^{T}M^{-1}\hat{{f_{vq}}}\\&\quad =(\hat{{f_{v1}}}-\hat{{f_{v2}}})^{T}\hat{{f_{vq}}}. \end{aligned}$$

The scalar product of these two \((d+1)\) dimension feature vectors can be denoted as

$$\begin{aligned}&(f_{v1}- f_{v2})^{T}(rf_{vq})+(-0.5||f_{v1}||^{2}+0.5||f_{v2}||^{2} )r\\&\quad = 0.5r(||f_{v2}||^{2}-||f_{v1}||^{2}+2(f_{v1}-f_{v2})^{T}f_{vq}) \\&\quad =0.5r(d(f_{v2},f_{vq})-d(f_{v1},f_{vq}))>0\\&\quad =d(f_{v2},f_{vq})\ge d(f_{v1},f_{vq}) \end{aligned}$$

\(\square\)

9 Upgradation in images and indexes

There have been times when the image owner wants to update either the stored images or the encrypted feature vector to improve the retrieval accuracy. When there is an update in the number of images, then it must also reflect on their individual indexes. Our proposed CBIR scheme allows the image owner to change the number of images in a particular database or modify the stored encrypted feature vectors.

9.1 Insert new images

When the image owner wants to insert new images, he extracts the feature vector of each image and then encrypts them using the key received from the KMC. The owner requests the KMC to generate some new image encryption keys and transmit them back to him. Then, the image owner encrypts his images individually. Finally, the owner sends the new encrypted images and their respective feature vectors to the CSP. When the CSP receives these images, it stores them in a respective database and increases its indexes by the number of images. After adding those images to the database, the CSP inform the KMC about the upgradation with the list of indexes.

9.2 Remove images

If the image owner wants to delete some of his images, then he sends their indexes to the CSP. The CSP removes the indexed images and updates the index of the rest of the database. Then, the image owner communicates those images to the KMC so that the KMC can also delete those image indexes from its list.

9.3 Updating the stored data

Image owner sometimes wants to remove some stale images or may update the feature extraction method. When an image is updated, the visual image features are then communicated to authorized users to use the new information to perform a feature extraction query. The image owner then performs the feature extraction from all database images and sends the result to the cloud after encrypting them with M. Then, the CSP replaces the old stored data and with the new ones. There may be a chance that the data owner wants to update the index of the stored image, then the owner must communicate the updated list to the CSP as well as the KMC.

10 Performance evaluation

In this session, we evaluate the performance of the proposed CBIR system. The experiments were performed using MATLAB installed on a system having, Intel(R) Core(TM) i7-4770 CPU@3.4-GHz, 4GB RAM.

The performance of the proposed system is measured using the precision, recall, and F-score parameters. Precision is described as the ratio of relevant images retrieved to the total images retrieved. If \(T_I\) is the total number of images of the same category present in the database and \(T_R\), is the total number of images retrieved which include relevant as well as non-relevant images then

$$\begin{aligned} {\mathrm {precision}}=\dfrac{T_I\cap T_R}{T_I} \end{aligned}$$

Recall is the ratio of the total relevant image retrieved to the total images present for the same category in the database

$$\begin{aligned} {\mathrm {recall}}=\dfrac{T_I\cap T_R}{D_I} \end{aligned}$$

where \(D_I\) in the total number of images present in the respective database of a particular category.

F-measure or F-score is the harmonic mean of the precision and recall and is the single-valued measures of the overall system.

$$\begin{aligned} F\_{\mathrm {Score}}=\dfrac{2\times {\mathrm {precision}} \times {\mathrm {recall}}}{({\mathrm {precision}} + {\mathrm {recall}})} \end{aligned}$$

In our experiments, we use the Wang image dataset. This dataset [36] has 1000 corel images classified into ten categories, namely people, beaches, buildings, buses, dinosaurs, elephants, flowers, horses, mountains, and foods. Each category has 100 images. Figure 11 shows the result for each category of images on the basis of a different number of outcome images. Figure 12 shows the proposed CBIR system results based on the original feature vector and using the Euclidean distance as a similarity measure. As similar images are retrieved from an encrypted environment using encrypted feature vector-based similarity, Fig. 13 shows the results for the different number of output images. It can be observed that the encrypted similarity-based results are on the lower side when compared to the Euclidean-based results.

We compare our proposed CBIR system using the encrypted feature vector-based similarity measures. We compare the performance of the proposed system with Xu et al. [37], Xu et al. [38], Majhi et al. [39], and Kundu et al. [40]. Our solution enhances the percentage of average precision by \(21.41\%\), \(26.16\%\), \(3.12\%\), and \(8.92\%\) when compared to the aforementioned solutions from the literature, respectively. The comparison based on average precision, recall, and f-score are displayed in Fig. 14.

Fig. 11
figure 11

a Precision, b recall, c F-score for different numbers of retrieved images for WANG dataset using encrypted similarity

Fig. 12
figure 12

Mean average precision, recall, and F-score for different numbers of outcome images of WANG database using Euclidean similarity measurement

Fig. 13
figure 13

Mean average precision, recall, and F-score for different numbers of outcome images of WANG database using encrypted similarity measurement

10.1 Result comparison based on different machine learning algorithms

In recent literature, researchers started combining different approaches to classify their data and then retrieve results based on individual category. In this work, we evaluate our proposed solution on similar basis. We classify our database through four well-known techniques, namely decision tree [41], random forest [42], support vector machine (SVM) [43], and multilayer perceptron (MLP) [44]. Initially, the database is randomly divided into 80 : 20 ratio, i.e., \(80\%\) training images and \(20\%\) testing images. Afterward, all the mentioned techniques are applied in order. Their classification efficiencies are \(58.23\%\), \(68.23\%\), \(70.29\%\), and \(57.35\%\), respectively. Since the SVM provides the best efficiency, it is selected for the retrieval process. We draw results for the Top-10, Top-15, Top-20, Top-25, Top-30, Top-35, Top-40, Top-45, and Top-50 images and their respective precision values in percentage are 68.4, 68.66, 67.75, 69.14, 71.67, 72.02, 72.5, 76, and 78.21.

Fig. 14
figure 14

Comparison results using the WANG dataset

10.2 Rotation invariance

In our proposed CBIR, the extracted features invariant and the rotation of the image will not have any effect on the retrieved results. When a user uploads an image to the cloud server, one cannot guarantee about the its orientation. In a conventional CBIR cannot deal with such situation as different orientation will produce different retrieved results which will degrade the overall system performance. Hence, in this paper, we have chosen such image features who can produce similar results with different orientations of query image. Figure 15 shows a beach image and its retrieved results. A similar experiment was performed on the Dinosaur image, see Fig. 16.

Fig. 15
figure 15

Results of a beach image with various rotations

Fig. 16
figure 16

Results of a dinosaur image with various rotations

10.3 COIL dataset based result analysis

The Columbia Object Image Library (COIL-100) [45] has 7200 object images divided into 100 different categories such that each category has 72 images. Figure 17 gives the average precision, recall, and F-score values for various numbers of output images using the Euclidean distance on original feature vectors. In Fig. 18, our average results based on the encrypted similarity over secure trapdoors are displayed.

Fig. 17
figure 17

Mean average precision, recall, and F-score for different numbers of outcome images of COIL-100 database using the Euclidean similarity measurement

Fig. 18
figure 18

Mean average precision, recall, and F-score for different numbers of outcome images of COIL-100 database using the encrypted similarity measurement

10.3.1 Result comparison based on different machine learning algorithms

Classification efficiency experimental results obtained from the decision tree, random forest, SVM, MLP classification mechanism are \(83.79\%\), \(98.73\%\), \(98.76\%\), and \(97.79\%\), respectively, where data is divided into 80 : 20 ratio training and testing portions. SVM exhibits the best performance among all studied mechanisms; hence, we draw the retrieval results based on that method only. The retrieved precision values of Top-10, Top-15, Top-20, Top-25, Top-30, Top-35, Top-40, Top-45, and Top-50 images are, respectively, \(98.61\%\), \(98.52\%\), \(98.75\%\), \(98.53\%\), \(98.79\%\), \(98.61\%\), \(98.23\%\), \(98.08\%\), and \(98.78\%\).

10.4 Color medical image dataset-based result analysis

This color medical image dataset contains a total of 10000 images from four categories. These categories are skin cancer, retina, endoscopy, and whole slide image (WSI). In each category, 2500 images are present. This dataset is formed by combining different semantically similar datasets. To make it workable in this environment, a region of interest of the whole slide images is extracted and further decomposed into 2500 unique \(2014\times 2014\) regions of interest (ROI) whole slide images. Figure 19 shows the average precision, recall, and F-score values for the various number of output images using the Euclidean distance on the original feature vector. Figure 20 shows the average results based on the encrypted similarity over secure trapdoors.

Fig. 19
figure 19

Mean average precision, recall, and F-score for different numbers of outcome images of color medical image database using Euclidean similarity measurement

Fig. 20
figure 20

Mean average precision, recall, and F-score for different numbers of outcome images of color medical image database using encrypted similarity measurement

10.4.1 Result comparison based on different machine learning algorithms

We perform an experiment based on the different classification mechanisms. The results obtained on the classification efficiency are \(94.63\%\), \(97.56\%\), \(99.45\%\), and \(99.02\%\), respectively, for Decision Tree, Random forest, SVM, MLP where data is divided into 80 : 20 ratio training and testing portions. Since SVM performs the best among all studied methods, it is used to study the retrieval results. The retrieved precision values of Top-5, Top-10, Top-15, and Top-20 images are, respectively, \(100\%\), \(100\%\), \(100\%\), and \(99.9\%\), respectively.

From the above experimental results analysis, we conclude that even after applying classifications, the retrieval results are not significantly improved. Implementing classification increases the computational overhead with no significant benefits.

10.5 Produce-1400 dataset based result analysis

Produce-1400 [46] has 1400 images divided into 14 different categories where each category has 100 images. In Fig. 21, we plot the average precision, recall, and F-score values for various number of output images using the Euclidean distance on the original feature vector. Figure 22 shows the average results based on the encrypted similarity over secure trapdoors.

Fig. 21
figure 21

Mean average precision, recall, and F-score for different numbers of outcome images of PRODUCE-1400 database using Euclidean similarity measurement

Fig. 22
figure 22

Mean average precision, recall, and F-score for different numbers of outcome images of PRODUCE-1400 database using encrypted similarity measurement

10.5.1 Result comparison based on different machine learning algorithms

Using different classification mechanism, we obtain the classification efficiency of \(84.24\%\), \(95.21\%\), \(96.32\%\), and \(95.38\%\), respectively, for decision tree, random forest, SVM, MLP where data is divided into 80 : 20 ratio training and testing portions. Only the top performer SVM is used to draw the retrieval results. The retrieved average precision values of Top-5, Top-10, Top-15, and Top-20 images are \(99.45\%\), \(98.63\%\), \(97.12\%\), and \(90.67\%\), respectively. From the above results analysis, we conclude that even after applying classifications, the retrieval results are not significantly enhanced. Therefore, implementing classification will only increase the computational overhead.

11 Conclusion

In this article, a secure CBIR system has been proposed to retrieve color image data securely. The proposed solution comprises four entities Image Owner, Registered User, CSP, and KMC. The data owner can secure the transfer of his images to the cloud using the proposed image encryption technique. This encryption method is practical in real-time applications where the number of confusion-diffusion rounds are less compared to the conventional confusion–diffusion color image encryption techniques. Additionally, the image owner transmits the encrypted feature vector, which is received from the ASPE. This prevents any malicious entity from learning any information from the encrypted feature vector. The proposed CBIR system also works with both natural and medical images. The experimental results are promising and be used safely to secure medical data. One of the future directives will be to enhance retrieval performance further while increasing the security level.