1 Introduction

Fragile watermarking aims to check integrity and authenticity of digital contents and to locate the areas replaced with fake information. While block-wise schemes have been developed to detect blocks containing fake contents [4, 5], some pixel-wise schemes can accurately locate the modified pixels when the tampered area is not too extensive [6]. If the embedded watermark data are derived both from pixels and blocks, a receiver can first identify the tampered blocks and then use the watermark hidden in the rest blocks to find the detailed modification pattern [7]. Since it takes advantages of both block-wise and pixel-wise techniques, the performance in locating tempered pixels is better than that of the method presented in [6]. Furthermore, in some self-embedding approaches, the embedded watermark is related to the host image content so that the original content in tampered area can be reconstructed. For example, the principal DCT coefficients or a low color depth version of the original contents can be embedded into the least significant bits (LSB), or into differences between pixels at different positions [1]. After locating malicious modification in a watermarked image, information extracted from the reserved regions is exploited to recover the principal content of the tampered areas. In [8], the embedded data are exclusive-OR between a pseudo-random sequence and the polarity sequence of DCT coefficients. Similarly, rough retrieval of the original content in the tampered areas can be obtained by iterative projections of the polarity information on a convex set. In [2], values of the first 11 DCT coefficients in each block are embedded into the LSB of another block and used to recover the content of tampered blocks.

In these self-embedding methods, the data representing the principal content in a region are always hidden in another region of the image. Thus restoration will fail when certain region and the region containing its original information are both tampered. We call it a tampering coincidence problem. If the content restoration is successful, the recovered content is derived from the data hidden in the corresponding region. That means the quality of restoration is the same no matter the tampered area is small or extensive.

We propose a new self-embedding watermarking scheme with flexible restoration quality. In this scheme, a block-wise mechanism for tampered area localization and a pixel-wise mechanism for original content recovery are integrated, and the reference data produced by exclusive-OR operation on the original MSB are embedded into the LSB planes. At authentication side, after identifying the tampered blocks, reference data extracted from other regions and the spatial correlation are exploited to recover the principal content in the tampered area in a pixel-by-pixel manner. Since the data representing the content of a block are scattered and embedded into the entire image, the tampering coincidence problem is avoided. Using this method, the smaller the tampered area, the more reference data can be extracted, leading to a higher the quality of the recovered content.

2 Watermark embedding procedure

In the scheme, the 5 most significant bits (MSB) of all pixels in the host image are kept unchanged, and the 3 least significant bits (LSB) of all pixels are replaced with watermark data. Here, the watermark data are determined by the MSBs and made up of two parts, which are respectively used to locate tampered blocks and to recover the original content.

The detailed steps are as follows:

  1. 1.

    Denote the numbers of rows and columns in an original image as N 1 and N 2, the total number of pixels as N (N = N 1 × N 2), and the gray pixel-values p n ∈ [0, 255], n = 1, 2, ... , N. Here, we assume that both N 1 and N 2 are multiples of 8. Each p n can be represented by 8 bits, b n,7, b n,6, ..., b n,0, where

$$ {b_{n,m}} = \left\lfloor {{{{p_n}} \mathord{\left/{\vphantom {{{p_n}} {{2^m}}}} \right.} {{2^m}}}} \right\rfloor \;\bmod \;2\;,\quad m = 0,\,1,\, \ldots \,,\,7 $$
(1)

We permute the N pixels, and the way of permutation is determined by a secret key. The permuted pixels are then divided into a series of pixel-pairs, each of which containing two pixels. Therefore the number of pixel-pairs is N/2. For each pixel-pair, denote its two pixels as p i and p j , and obtain 5 bits by executing exclusive-OR operations between the higher MSB of a pixel and the lower MSB of another one,

$$ {r_m} = {b_{i,7 - m}} \oplus {b_{j,m + 3}}\;,\quad m = 0,\,1, \ldots, \,4 $$
(2)

We call the calculated bits as reference-bits. So, we get a total of 5N/2 reference-bits from N/2 pixel-pairs. Then, we segment the N/2 pixel-pairs into N/64 sets, each of which containing 32 pixel-pairs. For each pixel-pair set, there are 160 produced reference-bits, and we call them as a reference-bit group. The number of reference-bit groups is also N/64.

  1. 2.

    Divide the original image into N/64 non-overlapped blocks sized 8 × 8. In each block, we pseudo-randomly select 160 positions from the 192 bits in the 3 LSB-layers according to the secret key. The total number of selected LSB is also 5N/2. In order to enhance security, the LSB selections in different blocks should be mutually different. After mapping the reference-bit-groups to the blocks in a one-to-one manner, replace the original bits at the 160 selected positions in each block with the reference-bits in the corresponding group.

  2. 3.

    Randomly generate 32 bits I(1), I(2), ..., I(32) for the host image and call them as image-ID-bits. For each block, we collect the 320 original bits in the 5 MSB-layers, and the 160 reference-bits used to replace the selected LSBs. Also, we convert the row index of the block i and the column index of the block j into 64 bits and call them position-bits (1 ≤ i ≤ N 1/8, 1 ≤ j ≤ N 2/8). Then, feed the 320 MSBs, 160 reference-bits and 64 position-bits into a hash function to generate 32 hash-bits, and denote them as h i,j(1), h i,j(2), ..., h i,j(32). Here, the hash function must have the property that any change on an input would result in a completely different output. Calculate 32 localization-bits for the block,

$$ {l_{i,j}}(m) = {h_{i,j}}(m) \oplus I(m)\;,\quad m = 1,\,2,\,\, \ldots, \,32 $$
(3)

Put the localization-bits into the 32 remaining LSB positions in the block. At last, combine the original MSBs and the substituted LSBs to produce a watermarked image.

In watermark embedding procedure, the pixel pairs are pseudo-randomly generated and the embedding positions of the reference bits of pixel pairs are also pseudo-randomly determined. That means the two pixels in a same pair perhaps come from different areas and their reference bits may be embedded into another area. To ensure security, a number of operations are dependent on the secret key. Actually, we may use a primary secret key to generate a pseudo-random sequence, and regard it as a series of pseudo-keys to directly control pixel permutation and position selection. For permuting N pixels, we take a portion of the pseudo-random sequence with length N, and sort it. Then, the sorting order can be used as a way of pixel permutation, and the numbers of possible ways of pixel permutation is 2N. With sufficiently large N, it is virtually impossible to perform a successful attack. Similarly, for selecting 160 positions from 192 LSB, we may take different portions with a length of 192 from the pseudo-random sequence for different blocks. The 160 smallest values indicate the selected positions, and the selected positions in different blocks are mutually different.

In the watermark embedding procedure, 5 MSBs of the cover image are preserved and 3 LSBs replaced with the reference-bits and localization-bits. Assuming that the original distribution of the 3 LSBs is uniform, the average energy of distortion caused by watermarking on each pixel is

$$ {E_{\rm{D}}} = \frac{1}{{64}} \cdot \sum\limits_{u = 0}^7 {\sum\limits_{v = 0}^7 {{{\left( {u - v} \right)}^2}} } $$
(4)

So, the approximate PSNR is

$$ {\hbox{PSNR}} \approx 10 \cdot {\log_{10}}\left( {{{{{255}^2}} \mathord{\left/{\vphantom {{{{255}^2}} {{E_{\rm{D}}}}}} \right.} {{E_{\rm{D}}}}}} \right) = 37.9\;{\hbox{dB}} $$
(5)

In general, the distortion caused by watermark embedding can not be detected by human visual system. Actually, the more the data in original image is kept, the more imperceptibility can be achieved, but the watermark capacity is lower. To make a reasonable tradeoff, we let the 5 MSB be preserved in the proposed scheme.

3 Procedure of content restoration

Suppose an adversary alters some content of the watermarked image without changing the image size. Having received a suspicious image, we first identify the tampered blocks, and then recover the MSBs in the tampered area by using the reference data extracted from other blocks and the spatial correlation of the natural image.

The first stage is to locate the tampered blocks. After dividing the received image into non-overlapped 8 × 8 blocks, we select 160 positions from the 3 LSB-layers in each block using the same secret key. For each block, feed its 320 bits in the 5 MSB-layers, 160 bits in 3 LSB-layers at the selected positions and 64 position-bits of the block into the hash function to obtain 32 hash-bits h i,j(1), h i,j(2), ..., h i,j(32), and extract the 32 bits in 3 LSB-layers at the rest positions l i,j(1), l i,j(2), ..., l i,j(32). Calculate the image-ID-bits for each block

$$ {I_{i,j}}(m) = {h_{i,j}}(m) \oplus {l_{i,j}}(m)\;,\quad m = 1,\,2,\, \ldots, \,32 $$
(6)

If the image has not been tampered, the calculated image-ID-bits of all blocks should be identical with the original image-ID-bits. Otherwise, if some blocks are modified or moved from another position/image as in [3], the calculated image-ID-bits will be changed. Although the original image-ID-bits is unknown at authentication side, we can compare the image-ID-bits calculated from different blocks, and judge the blocks possessing the identical calculated image-ID-bits as “not tampered”. For the other blocks, a “tampered” decision is made. Here, a block without any tampering must be judged as “not tampered”. Probability with which a block containing modified contents or moved from another position/image is falsely judged as “not tampered” is 2−32. False judgment is therefore virtually impossible. That means any modification on MSB or LSB will result in a “tampered” decision.

Denoting the ratio between the numbers of tampered blocks in an image and that of all blocks as α, we can recover the MSB of pixels in tampered blocks. We consider the following three cases.

  1. Case 1:

    For a pixel in the tampered area, if another pixel belonging to the same pair is located in “not tampered” area and the five reference-bits derived from the pair are also embedded into “not tampered” area, we can retrieve the original MSBs of the pixel without any error. Actually, probability for thus a case to occur is (1−α)2. In this case, the 5 MSBs of another pixel and the 5 reference-bits can be obtained from the received image. Denoting the two pixels in the pair as p i and p j , if p i is in tampered area and p j is another pixel, we calculate 5 MSBs of p i

$$ {b_{i,7 - m}} = {r_m} \oplus {b_{j,m + 3}},\quad m = 0,\,1,\, \ldots, \,4 $$
(7)

where r m is the 5 reference-bits, or calculate 5 MSBs of p j if it is in tampered area,

$$ {b_{j,m + 3}} = {r_m} \oplus {b_{i,7 - m}}\;,\quad m = 0,\,1,\, \ldots, \,4 $$
(8)
  1. Case 2:

    When both pixels in a pair are in the tampered area and the corresponding reference-bits are hidden in “not tampered” area, we can estimate their MSBs in the following way. For a pixel in tampered area, probability for this case to occur is (1−αα. After extracting the 5 reference-bits from the received image, a restriction to the MSBs of the two pixels is given. For example, if (r 0 r 1 r 2 r 3 r 4) = (10110), the MSBs of the two pixels, p i and p j , must satisfy

$$ {b_{i,7}} \ne {b_{j,3}},\quad {b_{i,6}} = {b_{j,4}},\quad {b_{i,5}} \ne {b_{j,5}},\quad {b_{i,4}} \ne {b_{j,6}},\quad {b_{i,3}} = {b_{j,7}} $$
(9)

Decimal values of the MSBs can be written as

$$ {v_i} = \sum\limits_{m = 3}^7 {{b_{i,m}} \cdot {2^m}}, \quad {v_j} = \sum\limits_{m = 3}^7 {{b_{j,m}} \cdot {2^m}} $$
(10)

There are all together 32 possible combinations of (v i , v j ) according to the restriction. Denote a set including the “not tampered” area and the pixels recovered in Case 1 as Set 1, the pixel closest to p i in Set 1 as q i , and the pixel closest to p j in Set 1 as q j . For each possible (v i , v j ), calculate

$$ D = {\left( {{v_i} - {u_i}} \right)^2} + {\left( {{v_j} - {u_j}} \right)^2} $$
(11)

where u i and u j are decimal values of the 5 MSBs of q i and q j . Because of the spatial correlation in a natural image, the value of D corresponding to the original (v i , v j ) would be small. Since the relationship between the higher MSB of a pixel and the lower MSB of another one is restricted by the reference-bits, for the other candidates of (v i , v j ), there must be some higher MSB different from the original, leading to the large values of D. Then, we find the minimal one among the 32 values of D, and use the corresponding (v i , v j ) as the recovered result of MSB of p i and p j . In other words, both the restriction condition derived from the reference-bits and the spatial correlation in the natural image are used to recover the original content in this case.

  1. Case 3:

    The reference-bits of a pair containing one or two pixels in the tampered area are stored in the tampered area. For a pixel in tampered area, probability of this case is α. The reference-bit cannot be extracted from the received image in this case. So, we exploit only the spatial correlation to recover the MSBs of missing pixels. Denote the missing pixel in the pair as p i or p j , a set including the “not tampered” area and the pixels recovered in Cases 1 and 2 as Set 2, the pixel closest to p i in Set 2 as g i , and the pixel closest to p j in Set 2 as g j . Then, the MSBs of g i and g j are used as the recovered MSBs of p i and p j , respectively.

In this scheme, if the tampered area is small, most pixels in the tampered area belong to the first case, and their MSBs can be correctly restored according to the extracted reference data. Moreover, the correctly-restored pixels can provide sufficient information to estimate the original MSBs of the other missing pixels by exploiting the spatial correlation. So, the recovered result is very close to the original content. On the other hand, if the tampered area is extensive, image recovery can still be performed with compromised quality.

4 Experimental results

Using two test images Lake and Crowd sized 512 × 512 as the covers, Fig. 1 gives their watermarked versions. PSNR values due to watermark embedding are respectively 37.9 dB and 37.8 dB, confirming the theoretical result in (5), and the distortion is imperceptible. We modified the watermarked images by replacing the original content with fake information. The tampered images are shown in Fig. 2, and the tampering rates are 28.7% and 45.4%, respectively. In tampered-block identification, all the blocks containing fake content were found. Figure 3 gives the results when the MSB of the pixels in the first case were recovered, and the other pixels in the tampered area are represented by extreme white. It can be seen the pixels in this case distribute in the entire tampered area. With the values of the retrieved pixels, the spatial correlation can be employed to recover their surrounding tampered pixels. The final restored images are shown in Fig. 4, and PSNR of the recovered content calculated only in the tampered area are 30.2 dB and 27.1 dB, respectively.

Fig. 1
figure 1

Watermarked images Lake and Crowd

Fig. 2
figure 2

Tampered versions

Fig. 3
figure 3

“Not tampered” content and recovered pixels in the first case

Fig. 4
figure 4

Restored images

If the adversary alters the original content of a watermarked image with different tampering rates, qualities of the recovered results in the tampered area vary. As mentioned above, the probabilities of the tampered pixels belonging to cases 1, 2 and 3 are (1−α)2, (1−αα and α, respectively. In fact, the differences between the actual proportions and the theoretical probabilities are less than 4%. Table 1 shows PSNR of the recovered content in the tampered area with respect to the tampering rates. Four gray images sized 512 × 512 were used as the covers. Experiment on other images provides similar results. It has been observed that, even if the tampering rate is greater than 50%, the recovered content has reasonable quality. As described previously, the spatial correlation of natural image is exploited to estimate the original MSB of missing pixels, so that the smoother content can be restored with better quality. The rank of PSNR values in Table 1 is consistent with the fluctuation magnitudes of the four test images.

Table 1 PSNR of recovered content in the tampered area with different tampering rates

Table 2 compares several fragile watermarking schemes with restoration capability. The methods in [1] and [8] suffer from the tampering coincidence problem and the method in [7] does not work with a large tampering rate. By using the proposed scheme, the original content in an extensive area can be recovered and the tampering coincidence problem is avoided. Also, the lower the tampering rate is, the better quality of restored content can be obtained. That means the proposed scheme is more flexible than the previous methods.

Table 2 Comparison of restoration capability among several fragile watermarking schemes

5 Conclusions

In the scheme proposed here, the LSBs of a host image are replaced with the reference data and localization data derived from the cover. In the authentication, the localization data are used to locate the blocks containing substitute information, and the reference data and the spatial correlation are used to recover the original content of tampered area. The smaller the tampered area, the MSBs of more pixels in the tampered area can be correctly recovered. For the other pixels in the tampered area, the MSBs are estimated according to their neighbors. In this way, since the reference data are scattered in the entire image, the tampering coincidence problem is overcome. And the principal content can still be recovered even if the tampered region is extensive, with the quality of recovered content decreasing with the increasing extent of tampering.