1 Introduction

With the advancement of cloud storage and social media on the Internet, massive digital images containing sensitive information are created and shared daily. Image encryption protects raw images being accessed by an adversary who tries to intercept sensitive information. Compared with textual messages, digital images have the characteristics, such as a large amount of redundancy and bulk data. Popular block ciphers such as 3-DES, AES are designed to encrypt the textual information that consists of a set of words. Yet, they are not efficient enough to encrypt images [1]. Recently, applying chaos theory to image encryption has gained great attention due to the intrinsic properties of chaos, such as extreme sensitivity to initial conditions and pseudo-random behavior. Existing algorithms [2,3,4,5,6,7,8] encrypt an entire image without considering common cases in which only a certain region of the image is sensitive. We employ the notion of region of interest (ROI) for representing the sensitive region to be encrypted in an image. The ROI coordinates are referred to as ROI auxiliary information. In this paper, our research focuses on ROI-based encryption.

ROI-based encryption algorithms perform encryption operation on particular regions that have multiple detected objects. Two main problems existing in detection are (1) how to accurately locate the objects and identify their categories from a vast range of categories; and (2) how to accomplish the detection process efficiently. Many object detection algorithms aim to solve these two problems, which are classified into geometric representations, statistical classifiers, applying handcrafted feature descriptors, and discriminative classifiers. In 2012, a deep learning model called AlexNet [9] achieved a qualitative leap in classification accuracy in the Large Scale Visual Recognition Challenge (ILSVRC) [10]. Since then, lots of following works on object detection have been proposed [11,12,13,14,15,16]. These models are regarded as a promising tool to provide great help for ROI encryption [17,18,19,20,21,22,23].

Fig. 1
figure 1

An example of leaking edge area of the detected object

1.1 Explicit motivation

The focus of most existing ROI encryption methods is mainly on the design of ROI encryption strategy. Albeit their success, current usage of object detection algorithms overlook taking care of security flaws brought by themselves. (a). From a security perspective, the output bounding box does not contain the entire object in most cases, leading to missing some edge areas of the object. Under this circumstance, undetected edge area of ROI still leaks some confidential information, as shown in Fig. 1. Therefore, we need to modify existing object detection algorithms to make the bounding box contain the entire object as much as possible. (b). The second flaw appears in the decryption process. To decrypt ciphered ROI correctly, ROI auxiliary information needs to be sent to the receiver side. Almost all the existing ROI-based encryption algorithms directly send the ROI auxiliary information to the receiver. However, any eavesdropper can easily know ROI positions of an targeted image. We need to encrypt the ROI auxiliary information before sending them. (c). The third concern is that most ROI encryption algorithms are designed for encrypting one object at one go. These schemes cannot be extended to a multi-object scenario since protecting multi-object in one image brings extra difficulty in processing overlapping areas.

1.2 Our solution and contribution

To the ends aforementioned, we propose a multi-object encryption algorithm based on chaos and coordinates hiding. The latest object detection algorithm, YOLOv4 [24], which has high accuracy and speed, is employed as our building block. It is modified to contain the entire object of a ROI. Notably, our method is generic for applying to other DCNN-based object detection models. In the encryption process, all the pixels are marked in all ROI, thus enabling both overlapping and non-overlapping areas to be encrypted with the same number of rounds. This step guarantees same encryption strength for all ROI. Then, we encrypt the ROI auxiliary information and employ difference expansion to embed the ciphered coordinates into the entire image. The embedding positions are controlled by the data hiding key. Both the encryption key and the data hiding key are required to decrypt ciphered image correctly at a receiver side. The above three parts are integrated into a hybrid-secure ROI solution. To our best knowledge, we are the first to analyze object detection from a security perspective and propose a solution, protecting both ROI and its auxiliary information.

Summary of our contribution.

  • From a security perspective, the bounding boxes output by our modified YOLOv4 can contain all areas of objects.

  • We provide hybrid-protection to protect both sensitive ROI and its auxiliary information.

  • Our encryption supports to protect multiple objects at one go, and guarantees same encryption strength to resist various security attacks.

  • Embedding ROI auxiliary information removes the trouble of separately distributing the image and its ROI auxiliary information.

  • For completeness in practical and theoretical parts, we conduct experimental evaluation and security analyses and the results shows the all ROI are well-protected.

The rest of this paper is organized as follows. Section 2 provides related works of ROI encryption, YOLO’s evolution, and reversible data hiding. In Sect. 3, we explain the security concerns caused by current DCNN-based object detection algorithms. Then, we introduce our algorithm in details. Experimental results and security analyses are reported in Sect. 4. Section 5 concludes this paper.

2 Background and related works

This section comprises three parts: (1) Review of ROI encryption; (2) evolution of YOLO-based object detection and our choice; (3) introduction to reversible data hiding and our selected algorithm.

2.1 The ROI-based encryption algorithms

In [17], the target regions are detected by a geometric active contour model. In [18], authors divide the medical image into several blocks. They use a statistical measure on each block to determine whether it is ROI or not. One problem is over-broad identification of ROI caused by involving meaningless ROI, such as date and tags. ROI with irregular shapes are chosen and detected arbitrarily in [19]. Yet, we need an automatic detection tool that saves time and effort. In [20], the ROI is detected by [25]. After encryption, steganography is utilized to protect the significant bits of encrypted pixels. In the decryption process, a receiver obtains an encrypted image and its corresponding ROI boundary from the sender. The concern is no protection of ROI boundaries in the distribution process. In [21, 22], ROI is detected via a Gaussian mixture model and HOG feature extraction. In [23], YOLOv3 [26] and UNet [27] are used for ROI detection. But security flaws mentioned in Sect. 1 still remain.

For the ROI encryption, the security requirement should not be limited to the design of encryption algorithm. Otherwise, the encryption scheme is incomplete. Whether the detected ROI contains the entire object and the protection of ROI auxiliary information should also be considered. In this paper, our goal is to design a complete ROI encryption algorithm. The specific details of the proposed algorithms are stated in Sect. 3.1.

2.2 The evolution of YOLO-based object detection

YOLO treats object detection as a regression problem. The features of the entire image are used to make predictions. YOLO provides high accuracy and good generalization ability. Given YOLO, many improved algorithms are proposed, such as YOLOv2 [28], YOLOv3, YOLO3D [29], and YOLO-LITE [30].

Different from the above algorithms consuming lots of hardware resources or with unsatisfactory accuracy, YOLOv4 can achieve a good trade-off between accuracy and speed with only a 1080Ti GPU. Thus, we employ it as our tool for fast detection. Its output bounding boxes are modified to make the detected ROI contain the entire objects as much as possible. Detailed operations are explained in Sect. 3.2.

2.3 The reversible data hiding

Reversible data hiding means that we can embed/extract some confidential messages losslessly into/from cover media. In recent years, many excellent reversible data hiding algorithms have been proposed [31,32,33,34,35,36,37,38]. These works aim to embed as much data as possible in plain domain or encrypted domain. Yet, our encryption algorithm outputs a partially encrypted image, in which only ciphered ROI auxiliary information needs to be embedded. If we embed them into plain regions or encrypted regions of image, the embedding operation must be done on the basis of knowing the boundary information of ROI. However, a receiver cannot extract the ciphered ROI auxiliary information without ROI boundaries, since the sender has embedded them into the whole image.

Our goal is to design a data hiding algorithm that can embed/extract the secret information into/from any regions with low computation complexity. As the amount of data to be embedded is small, we want to divide them into many parts and randomly embed them into the whole image and the embedding position can be controlled by the data hiding key. Luckily, the reversible data hiding using a difference expansion (DE) [39] can meet our requirements. The method uses the redundancy between pixel pairs to embed data. As long as the embedding condition (no overflow and underflow problems after embedding a bit into the difference value of two adjacent pixels) is satisfied, we can embed the data regardless of the pixel pairs position. The detailed embedding process is described in Sect. 3.3.

3 The proposed encryption algorithm

To our best knowledge, almost no ROI-based encryption algorithms provide security by protecting ROI and its auxiliary information at the meantime. Current schemes are designed for encrypting only one object, which is not easy to extent to multi-object settings. To solve this problems, we design a scheme for a multi-object setting without leaking any ROI information.

The encryption pipeline is shown in Fig. 2a. We first use our modified YOLOv4 to get the ROI auxiliary information. Then we use \({Key_{ROI}}\) and \({Key_{eninfo}}\) to encrypt the ROI and its corresponding auxiliary information, respectively. At last, we use \({Key_{embed}}\) to embed the ciphered ROI auxiliary information into the cipher-image for obtaining the marked cipher-image. The decryption process is roughly the reverse of encryption process as shown in Fig. 2b.

This section is organized as follows, we first analyze the security flaws in existing algorithms and describe how we solve them. In the second part, the multi-object-oriented encryption algorithm is introduced. Next, we explain how we embed the ciphered ROI auxiliary information into the cipher-image. The last part is the decryption process.

Fig. 2
figure 2

The working pipeline of proposed algorithm

3.1 The modified YOLOv4

Fig. 3
figure 3

The results of object detection. a The person image, b The detection result of YOLOv3, c The detection result of YOLOv4, d The detection result of our modified YOLOv4

Let’s take Fig. 3 as an example. Figure 3a–d shows the plain-image, YOLOv3’s output, YOLOv4’s output, and our modified YOLOv4’s output, respectively. The outputs of YOLO-based object detection model are three bounding boxes, each of which contains five predictions. They involve the center point coordinates, the width/height of bounding box, and confidence scores, respectively. Here, confidence scores indicate the probability of containing an object in a box.

The three bounding boxes are selected from all the bounding boxes predicted by the model. The selection uses Non-Maximum Suppression (NMS). Suppose that there are nc object classes, \(C=\{c_0, c_1, ..., c_{{nc}-1}\}\) and nb bounding boxes \(B = \{{box}_0, {box}_1, ..., {box}_{{nb}-1}\}\). For object class \(c_0\), the model sorts these boxes in descending order according to the probability that the object class contained in each box is \(c_0\). Then, we get \({BSorted} = \{{boxSorted}_0, {boxSorted}_1, ..., {boxSorted}_{{nb}-1}\}\) Next, we calculate the IOU between \({boxSorted}_0\) and other boxes. If the IOU is larger than a user-defined threshold, the model determines that the two boxes predict the same object at the same location. The smaller of two predicted probability is set to 0, which means the corresponding box is deprecated. For the rest of boxes, \(\{{boxSorted}_1, {boxSorted}_2, ..., {boxSorted}_{{nb}-1}\}\), repeat the above process to select the best box that predicts the same object. And for the other classes, \(\{c_{1}, c_2, ..., c_{{nc}-1}\}\), repeat the above process to get the best boxes that predict a specific object with a specific class. As shown in Fig. 3b and c, the three bounding boxes selected by YOLOv3 and YOLOv4 are the best boxes that contain dog, person, and horse.

The detection results of Fig. 3b and  c are acceptable in terms of accuracy and speed. However, there is a risk of leaking information of edge areas. The reason is that some bounding boxes with the edge areas of object are deprecated when performing NMS. To solve the security problems, the NMS process is modified to obtain the bounding box that contains entire object. We name this kind of bounding box as a greedy box.

The detailed process is described in Alg. 1. Here, ncnbthresh, and detboxes represent the number of object classes, the number of detected bounding boxes, the value of user-defined threshold and the array of detected bounding boxes, respectively. The data type of each element in detboxes is declared as a structure including two members, a bounding box with (xywh) and the probabilities of detected objects belonging to all classes. The process in the lines of 2-7 is the same as the second paragraph in this subsection. The core idea is to integrate the bounding boxes that predict the same object in the same position. For a specific object in a specific position, the gB is the greedy box. At first, the gB is initialized as the same size as the bounding box, which predicts the object best. Then we traverse the other bounding boxes. When gB meets another bounding box, b, it compares its boundary with b to get a smaller left boundary, a larger right boundary, a smaller top boundary, and a larger bottom boundary between itself and b. Note that the coordinate counter starts at 0 from left to right and top to bottom, so the coordinates values on the left and top are smaller. According to the new boundary values, gB calculate its new values of (xywh). Then, b is deprecated. After the traversal is over, the model outputs the greedy bounding box, gB, that contains the entire object. The detection result of our greedy detection algorithm is shown in Fig. 3d. Compared with YOLOv3 and YOLOv4, our detection algorithm successfully contains the entire object, so the experimental result is acceptable from a security perspective.

3.2 The encryption of multiple objects

The encryption process comprises two stages, permutation and diffusion. In the previous works, most encryption structures are designed for encrypting a single object. If we still utilize the existing algorithm to encrypt multiple objects, for each single object, the encryption influence is confined within itself and does not spread to other objects. As a result, the algorithm does not have the avalanche effect. In particular, tiny changes in a plaintext or the key should greatly impact the ciphertext. Another problem is that if we encrypt objects one by one, overlapping regions of multiple objects are repeatedly encrypted. This problem results in that the encryption strength of overlapping regions is different from that of non-overlapping regions.

To solve the above problems, we design an encryption algorithm for protecting multiple objects. Our permutation strategy can swap a pixel position with another pixel in any regions, achieving the total shuffling of pixels in all bounding boxes. And in diffusion stage, the encryption influence of one region can be spread to other regions .

figure a
Fig. 4
figure 4

The encryption structure

The encryption scheme is depicted in Fig. 4 and the detailed encryption process is described in Alg. 2. Here numROI represents the number of ROI, roiBoxes. ImgDat is the 2-D image data and \({(x_0, y_0, z_0, u_0)}\) is the initial condition of Jia system. To simplify the calculation of the coordinates during the encryption process, we first extract the pixels in each bounding box to a 1-D array, rdt. To avoid the pixels in overlapping regions being processed repeatedly, a flag for each pixel is set to record whether it has ever been read when extracting pixels. The above process corresponds to the lines 2-11 of Alg. 2. During encryption process, Jia system [40] is iterated to generate the keystreams for permutation and diffusion. Mathematically, the system is defined by,

$$\begin{aligned} \left\{ \begin{aligned}&\frac{dx}{dt} = -a(x-y)+u, \\&\frac{dy}{dt} = -xz + rx - y, \\&\frac{dz}{dt} = xy - bz,\\&\frac{du}{dt} = -xz + du, \end{aligned} \right. \end{aligned}$$
(1)

where arb are the system parameters, and d is the control parameter. When \(a = 10, r = 28, b = 8/3\) and \(0.85< d < 1.3\), the system exhibits chaotic behavior. And Runge–Kutta fourth-order method is used to solve Eq.(1), and the step length is 0.0005.

The whole permutation process is shown in the line 12-20. Jia system is pre-iterated for \(T_0\) times to avoid the harmful effect of the transitional procedure, where \(T_0\) is a user-defined value. Then, in the 1-D array, rdt, we perform pixel swapping strategy, which means that each pixel swaps positions with the a random pixel behind it. The permutation coordinates are extracted from generated chaotic sequences. The diffusion process is shown in the line 21-28. \({cdt}_{-1}\) is a user-defined value, here is 128. L is the gray level, for a 24-bit RGB color image, \(L = 256\).

The above process can be applied for several rounds to obtain a satisfactory encryption effect. The encryption effect is shown in Fig. 5, it can be seen that all parts of the objects are protected. Figure 6 can more intuitively depict how our encryption algorithm spreads the encryption influence of a certain region to other regions.

figure b
Fig. 5
figure 5

The cipher-image with encrypted ROI

Fig. 6
figure 6

The process of encrypting multiple objects

3.3 The protection of ROI auxiliary information using reversible data hiding

In this subsection, we first give the basic concept of the reversible data hiding using DE, and present the idea of using DE to embed/extract the ROI auxiliary information.

The core idea of difference expansion is the integer Haar wavelet transform. Assume there is a pixel pair (xy), we define the values of integer average, intAver and the value of difference, diffV by

$$\begin{aligned} {intAver} = \lfloor \frac{x+y}{2}\rfloor , {diffV} = x - y. \end{aligned}$$
(2)

The inverse of Eq. (2) is

$$\begin{aligned} x = {intAver} + \lfloor \frac{{diffV}+1}{2}\rfloor , y = {intAver} - \lfloor \frac{{diffV}}{2}\rfloor . \end{aligned}$$
(3)

For an 8-bit gray image, the gray scale \(L = 256\), so the range of pixel value is [0, 255], we have

$$\begin{aligned} \begin{matrix} 0 \le {intAver} + \lfloor \frac{{diffV}+1}{2}\rfloor \le 255,\\ 0 \le {intAver} - \lfloor \frac{{diffV}}{2}\rfloor \le 255. \end{matrix} \end{aligned}$$
(4)

And Eq. (4) is equivalent to

$$\begin{aligned} \left\{ \begin{aligned}&\left| {diffV}\right| \le 2(255 - {intAver}), {if}\, 128 \le {intAver} \le 255\\&\left| {diffV}\right| \le 2{intAver}+1, {if}\, 0 \le {intAver} \le 127.\\ \end{aligned} \right. \end{aligned}$$
(5)

Next we embed a bit b into diffV, and we get new \({diffV_{{new}}} = 2\times {diffV} + b\). According to Eq. (5), if

$$\begin{aligned} \left| {diffV_{{new}}} \right| \le {min}(2(255 - {intAver}), 2{intAver}+1), \end{aligned}$$

then we call such diffV is expandable, and we can embed ciphered ROI auxiliary information into the difference value of such pixel pairs. The new values of pixel pair are

$$\begin{aligned} x' = {intAver} + \lfloor \frac{{diffV}_{{new}+1}}{2}\rfloor , y' = {intAver} - \lfloor \frac{{diffV}_{{new}}}{2}\rfloor . \end{aligned}$$
(6)

During the extraction process, we extract the bit from the new difference value, then we get the original difference value. And we can use Eq.(7) to restore the original pixel values,

$$\begin{aligned} x = x' - \lfloor \frac{{diffV}+1}{2}\rfloor , y' = {intAver} + \lfloor \frac{{diffV}+1}{2}\rfloor . \end{aligned}$$
(7)

The process of embedding ciphered ROI auxiliary information is described in Alg 3. Here, roiInfo represents the data of plain ROI auxiliary information, cimgRDat is the cipher-image with encrypted ROI. H and W are the height and width of image. The length of image is \({len}_{{img}}\), whose value is \(3\times H\times W\). \({lgx}_0\) is the initial value of logistic map [41], which is defined by

$$\begin{aligned} x_{n+1} = \mu x_n(4 - x_n), x_n\in (0, 1) \end{aligned}$$
(8)

where \(x_n\) is the state variable, and \(\mu\) is control parameter whose range is (0, 4]. When \(\mu = 4\), the logistic map has the best pseudo-randomness.

The data type of each ROI coordinate value is declared as integer, which needs 64 bits to represent. They are stored in roiInfo in bytes. Next, a one-bit bitmap of the image with ciphered ROI is generated. If the difference value of a pixel pair meets the requirement for embedding a bit, the value in its corresponding position of the bitmap is set to 1, otherwise 0. After the encryption of roiInfo, the ciphered data are embedded into the image bit by bit according to the bitmap. The embedding position is determined by the current values of logistic map and the embedding interval.

figure c

3.4 The decryption process

The decryption process is roughly the reverse of encryption process as shown in Fig. 2b. The decryption result is shown in Fig. 7. Particularly, the reverse of line 27 in Alg. 3 is given by,

$$\begin{aligned} \begin{matrix} {rsfdt}_i = \{{ksdf}_i\oplus {cdt}_i\oplus {cdt}_{i-1}+L- {ksdf}_i \} \mod L. \end{matrix} \end{aligned}$$
(9)
Fig. 7
figure 7

The decryption result of our proposed algorithm

4 Experimental results and security analysis

4.1 Key space analysis

For the proposed algorithm, the key space comprises four initial conditions of Jia system. Their data type is all declared as double precision, which needs 53 bits to represent. So the key space is \(2^{53\times 4} = 2^{212}\). It can be considered secure to resist the brute force attack as the key space is larger than \(2^{100}\) [42].

4.2 Statistical attack

4.2.1 Histogram analysis

Fig. 8
figure 8

The histograms of three plain-regions and their corresponding cipher-regions. ac are the histograms of the plain-person, plain-horse and plain-dog, respectively. df are the histograms of the cipher-person, cipher-horse and cipher-dog, respectively

From a qualitative perspective, we carry out histogram analysis to evaluate the frequency distributions of pixel values in three plain-regions and their corresponding cipher-regions. Figure 8a–c and d–f depicts the 3D histograms of three plain-regions and their corresponding cipher-regions. We can see that, compared with plain-regions, the frequency distributions of the cipher-regions are almost uniform. It means that our algorithm has good performance in masking the pixel distribution.

4.2.2 Information entropy analysis

From a quantitative perspective, information entropy is used to measure the randomness and unpredictability of three plain-regions and their corresponding cipher-regions. Mathematically, it is defined by

$$\begin{aligned} H({inS}) = -\sum _{i=0}^{N-1} P({inS}_i)\log _2^{P({inS}_i)}, \end{aligned}$$
(10)

where inS represents an information source which contains N possible values \(\{{inS}_0, {inS}_1,..., {inS}_{N-1}\}\) and the probability of \({inS}_i\) is \(P({inS}_i)\). If inS is a random information source, its information entropy is \(\log _2^N\). The gray level of the test image is 256, so the information entropy of three cipher-regions should be close to 8. Table 1 lists the information entropy of three plain-regions and their corresponding cipher-regions. From this table, we can see that the information entropy of three cipher-regions are very close to 8, indicating that the pixel distributions of three plain-regions are successfully hidden.

Table 1 The information entropy of the three lain-regions and their corresponding cipher-regions

4.2.3 Correlation of adjacent pixels analysis

Pixels usually have similar values with their neighbors, this is a sign of strong correlations among them. An effective encryption algorithm must eliminate the correlation, otherwise the attacker can easily predict the pixel values of certain region by some simple predictor, such as MED predictor and GAP predictor. The strength of correlation can be measured by calculating the correlation coefficient among adjacent pixels. The calculation method is defined by

$$\begin{aligned} r_{xy} = \frac{\frac{1}{N}\sum _{i=1}^N (x_i - {\bar{x}})(y_i-{\bar{y}})}{\sqrt{(\frac{1}{N}\sum _{i=1}^N (x_i - {\bar{x}})^2) (\frac{1}{N}\sum _{i=1}^N (y_i - {\bar{y}})^2)}}, \end{aligned}$$
(11)

where \(x_i, y_i\) represent the values of two adjacent pixels, \({\bar{x}} = \frac{1}{N} \sum _{i=1}^{N} x_i\), \({\bar{y}} = \frac{1}{N} \sum _{i=1}^{N} y_i\), and N is the number of sampled pixel pairs. In the three color channels of each plain-region and its corresponding cipher-region, 5000 pairs of neighboring pixels are sampled in horizontal, vertical, and diagonal directions. And Table 2 reports the test results. From this table, we can see that the correlation coefficients are close to 1 in each plain-region, while those of each cipher-region are close to 0. It implies that our encryption algorithm successfully decorrelates the strong correlation in each plain-region.

Table 2 The correlation coefficients of three plain-regions and the corresponding cipher-regions

Scatter diagram is usually used to analyze the correlation among adjacent pixels from a qualitative perspective. These pixel pairs sampled from the red channel of each region are plotted to 3D scatter diagrams, as depicted in Fig. 9. On the X-axis of each scatter diagram, \(x = 0, 1, 2\) represent the horizontal, vertical, and diagonal directions, respectively. Then, each sampled pixel pair \((x_i, y_i)\) is plotted as a point in the Y-Z plane. The values of \(x_i\), and \(y_i\) determine the positions on the Y-axis and Z-axis, respectively. Figure 9a,  c, and  e depicts the scatter diagrams of plain-person, plain-horse and plain-dog, respectively. We can see that, in each Y-Z plane, most points lie along the diagonal line, showing a strong correlation among neighboring pixels in plain-regions. Figure 9b, d, and  f depict the scatter diagrams of their corresponding cipher-regions. We can see that the distribution of these points is evenly cover the entire Y-Z plane. Similar results can be obtained for the other two color channels in each region. This phenomenon shows the weak correlation among adjacent pixels in three cipher-regions.

Fig. 9
figure 9

The scatter diagrams of three detected regions in red channel. a and b are the scatter diagrams of plain-person and cipher-person. c and d are the scatter diagrams of plain-horse and cipher-horse. e and f are the scatter diagrams of plain-dog and cipher-dog

Fig. 10
figure 10

The key sensitivity analysis in decryption process. AD are the decrypted results using modified keys

Table 3 The results of NPCR and UACI test
Table 4 Key sensitivity analysis of encryption process

The test results in Sects. 4.2.1– 4.2.3 show that our ROI encryption algorithm can resist statistical attack.

4.3 Differential attack

To resist differential attack, if we input two plain-images with only 1 bit difference in one of three ROI, the corresponding cipher-regions in two output cipher-images should be completely different.

There are two criteria, NPCR (the number of pixel change rate) and UACI (the unified average changing intensity), for measuring the degree of difference between two images/ROI with same size. NPCR is defined by

$$\begin{aligned} {NPCR} = \frac{\sum _{i=1}^{W}\sum _{j=1}^{H}{Dffe}(i, j)}{W\times H}\times 100\%, \end{aligned}$$
(12)

where

$$\begin{aligned} {Dffe}(i, j) =\left\{ \begin{matrix} 0\quad {if}\;R_1(i, j) = R_2(i, j),\\ 1\quad {if}\;R_1(i, j) \ne R_2(i, j). \end{matrix}\right. \end{aligned}$$
(13)

UACI is defined by

$$\begin{aligned} {UACI} = \frac{\left[ \sum _{i=1}^{W}\sum _{j=1}^{H}\frac{\left| R_1(i, j)-R_2(i, j)\right| }{2^L-1}\right] }{W\times H}\times 100\%, \end{aligned}$$
(14)

For two random images/ROI (gray level \(L=256\)), the theoretical values of NPCR and UACI are \(99.609\%\) and \(33.464\%\), respectively.

We first use secret key to encrypt the three plain-regions, and get their corresponding cipher-regions. Then, we randomly select a pixel whose coordinates are (501, 70) in plain-person, and modify the value of its red channel from 129 to 130. Next, we still use the same key to encrypt three plain-regions and get their corresponding cipher-regions. Finally, we calculate NPCR and UACI between two sets of cipher-regions. The test results are reported in Table 3, from which we can see that the values of two criteria are very close to theoretical values. It implies that the two sets of cipher-regions are completely different, demonstrating the strong ability of resisting differential attack. Three rounds of encryption are used to achieve the encryption effect.

4.4 Key sensitivity analysis

A well-designed encryption algorithm should be extremely sensitive to the secret key. The key sensitivity is tested using the most extreme case. For each test case, only the least significant bit is changed in a key component, and the other three key components remain unchanged. Then, each modified secret key is used to encrypt/decrypt the plaintext/ciphertext produced by the original key. If the proposed algorithm has good key sensitivity in encryption process, the output cipher-regions corresponding to different keys should be completely different. And the corresponding case in decryption process is that the cipher-regions cannot be correctly decrypted with the wrong key.

4.4.1 Encryption key sensitivity analysis

The initial values of encryption key are (6.13455323257449, \(-6.76623087823196, 7.52223762673178, 6.22045403584687)\). We use each modified key to encrypt three plain-regions and calculate NPCR and UACI between the output cipher-regions and the cipher-regions ciphered by the original key. The modified key values and the corresponding test results of NPCR and UACI are reported in Table 4. From this table, we can see that the values of NPCR and UACI are very close to theoretical values, showing the strong encryption key sensitivity of our algorithm.

4.4.2 Decryption key sensitivity analysis

In decryption process, the modified keys listed in Table 4 are used to decrypt the cipher-regions encrypted by the original key. The decryption results are shown in Fig. 10, we can see that the cipher-regions cannot be decrypted correctly, showing strong decryption key sensitivity.

It should be noted that the decryption process depends on the decryption key and data hiding key. The receiver cannot decrypt the cipher-region without the correct data hiding key.

4.5 Comparison of proposed work and state-of-the-art algorithms

Table 5 Comparison of proposed work and state-of-the-art algorithms

We compare our work with some state-of-the-art similar algorithms [18,19,20, 22, 23], which are reviewed in Sect. 2.1. The automatic detection, the protection of objects, edge area, and ROI information are the key points of comparison. Table 5 lists the comparison results. It can be seen that our algorithm outperform others in the protection of ROI coordinates information. And the object detection model is modified to protect all the areas of detected object, instead of using existing models without change.

4.6 Limitation and discussion

Although the Jia system provides a large key space, its iteration is time consuming when there are many images that need to be encrypted. Recently, some 2D discrete chaotic systems with simple structures and continuous chaotic ranges have been developed [43, 44]. For example, the authors proposed a 2D modular chaotification system in [44], and proved that it significantly improves the chaos complexity and enlarges the chaotic ranges of existing 2D chaotic maps. And it’s very convenient to use such 2D chaotic systems to encrypt a large number of images.

5 Conclusion and future work

In this paper, we protect the image ROI using chaos-based encryption and DCNN-based object detection. We first analyze the security problems in the object detection process, as the existing object detection algorithm fails to contain all parts of detected objects, we modify the detection process to make the output bounding boxes contain the whole objects. Then we propose an encryption algorithm for protecting multiple detected objects, ensuring that the encryption effect of each cipher-region can meet the security requirements. After that, we encrypt the ROI auxiliary information and embed them into the whole image using reversible data hiding. The encryption and embedding of ROI auxiliary information can be seen as a second layer of ROI protection and also save the trouble of its distribution. The experimental results and security analyses show that our proposed encryption scheme is secure and very suitable for ROI encryption.

In the future work Footnote 1, we would explore the use of access control encryption (ACE) in object detection. ACE [45, 46] decides not only what users are allowed to read but also what users are allowed to write. It can be constructed by attribute-based encryption, in a way that senders of correct knowledge (e.g., secret key) can transmit data to a restricted recipient with particular attributes. Object detection supports picking regions of interests/importance/sensitivity that should be limited in both reading and writing. Extending ACE usage to practical life would be promising in ROI scenarios.