1 Introduction

Marketing of and research on stereo-pair images have grown rapidly because of the immersive experiences provided by the 3D content. A stereo-pair image is created by capturing a scene from slightly shifting viewpoints, and these different viewpoints create left and right images.

Two methods are used for obtaining stereo-pair images. The first method, termed depth-image-based rendering (DIBR), entails constructing a central image and a depth map. Based on the depth map, the central image is used to generate the left and right images to display the 3D content. The second method, termed stereo image recording (SIR), involves recording two captured images that can be directly displayed on a 3D monitor.

Copyright protection has become more crucial as applications for and research on stereo-pair images continue to grow. Kim et al. [5] proposed a watermarking method for DIBR 3D images. The central image is decomposed into nine subband pairs by using the three-level dual-tree complex wavelet transform [6]. Only four subband pairs are used to embed the watermark. Every row in each pair of embeddable subbands is used to represent one watermark bit. All coefficients in the row of one subband are quantized to represent 0, and all coefficients in the row of the other subband are quantized to represent 1. Although this method is robust against malicious attacks such as JPEG compression, noise addition, median filtering, scaling, and rotation, quantizing the wavelet coefficients reduces the image quality.

Campisi [1] proposed an object-oriented watermarking method for SIR stereo-pair images. Both the left and right images are decomposed into seven subbands by using the two-level discrete wavelet transform, and the two 2LL subbands obtained in each image are used to derive a disparity map. The right image is segmented into several objects based on the disparity map. The watermark is embedded into each 2LH, 2HL, and 2HH subband of the right image based on the position of each object. Embedding is performed by quantizing the coefficients. Although this method is robust against compression attacks such as JPEG and JPEG2000, quantizing the coefficient results in substantial distortion.

Wu et al. [12] proposed a relationship-modulation-based blind watermarking method for stereo-pair images. The left and right images are first divided into several nonoverlapping blocks, and the discrete cosine transform (DCT) is applied to each block. To embed one bit for each pair of blocks in the right and left images, an AC coefficient (ac) is selected, and the average of some of the previous AC coefficients (pa) in the same block is calculated. The sign of (acpa) is then obtained, and 1 is embedded if the pair of blocks exhibit different signs; otherwise, 0 is embedded. Thus, if the embedded watermark bit is 0 (1) and the pair of blocks exhibit the same (different) sign, the sign of (acpa) is altered by increasing or reducing the ac value. This method enables extracting the watermark blindly. This method is robust against JPEG compression and produces less distortion than does the method proposed by Campisi [1] because the watermark is embedded in the quantized AC coefficients. However, a malicious attack on one or both images changes the sign relationship between the left and right images, causing the watermark to be extracted incorrectly.

To increase the robustness and retain the blind verification property, this paper presents a novel method for watermarking stereo-pair images. The proposed method includes watermark creation, embedding, and verification. First, a feature map is extracted from the stereo-pair image. The feature map is then encrypted with the owner’s secret key to create a watermark. Subsequently, the created watermark is embedded into the left image. Before the watermark is embedded, the left image is divided into several blocks, and one bit is embedded in each block by using a pair of quantized AC coefficients exhibiting the smallest swapping error.

Owners can prove their ownership through a verification process. The proposed verification process involves first extracting the embedded watermark from the left image and then decrypting the extracted watermark to obtain an estimated feature map (EFM). In addition, a feature map (FM’) can be extracted from the concerned stereo-pair image. The ownership of the stereo-pair image can be verified by comparing EFM with FM’. Experimental results indicate that the proposed method is robust against malicious attacks such as JPEG compression, convolution, median filtering, affine transform, rescaling, and rotation. Furthermore, this method produces less distortion than do the methods proposed by Campisi [1] and Wu et al. [12] because a watermark bit is embedded by using a pair of quantized AC coefficients that exhibit the smallest swapping error.

The rest of this paper is organized as follows. Section 2 details the proposed method. Section 3 presents experiments performed to evaluate the robustness of the proposed method. The final section presents the conclusion.

2 The proposed watermarking method

The proposed method includes watermark creation, embedding, and verification. A robust watermark is created based on a stereo-pair image, and the following subsection details the proposed method.

2.1 Watermark creation

Figure 1 shows a block diagram of the watermark creation process. First, a stereo image feature extractor is used to extract a feature map, which records the positions of the unmatched blocks between the left and right images. A robust watermark is then created based on the feature map. The stereo image feature extractor is divided into two parts: preprocessing and the feature extractor. The details are described in Sections 2.1.1 and 2.1.2.

Fig. 1
figure 1

Block diagram of the watermark creation process

2.1.1 Preprocessing

The stereo-pair image is preprocessed to increase the robustness of the created watermark, and Fig. 2 shows a block diagram of the preprocessing operation.

Fig. 2
figure 2

Block diagram of the preprocessing operation

First, the left image is divided into 8 × 8 blocks. The DCT is then applied to each block, and the coefficients of each block are quantized. Based on the zig-zag scanning order, the first 16 quantized AC coefficients (ac 0, ac 1,…, ac 15) are randomly grouped into eight pairs by using a secret key. Assume the randomly grouped AC pairs are denoted as (ac p(1,0), ac p(1,1)), (ac p(2,0), ac p(2,1)),…, (ac p(8,0), ac p(8,1)), where (ac p(m,0), ac p(m,1)) represents the m th pair of the AC coefficients, and the corresponding quantization steps are denoted as (q p(1,0), q p(1,1)), (q p(2,0), q p(2,1)),…, (q p(8,0), q p(8,1)). One of the eight AC pairs is selected based on their swapping distortions. The swapping distortion of the m th pair is defined as

$$ \begin{array}{l} dist(m)=\left|a{c}_{p\left(m,1\right)}\times {q}_{p\left(m,0\right)}-a{c}_{p\left(m,0\right)}\times {q}_{p\left(m,0\right)}\right|\\ {}\kern4em +\left|a{c}_{p\left(m,0\right)}\times {q}_{p\left(m,1\right)}-a{c}_{p\left(m,1\right)}\times {q}_{p\left(m,1\right)}\right|,\end{array} $$
(1)

where m = 1,2,…, 8. Let

$$ m*=\underset{m}{ \arg \min}\; dist(m), $$
(2)

then the m th pair with minimal swapping distortion is used to embed a watermark bit.

Figure 3 illustrates an example of the AC pair selection process. Figure 3a shows the AC coefficients in the zig-zag scanning order, Fig. 3b shows the quantization steps and Fig. 3c and d show the quantized AC coefficients and randomly grouped AC pairs, respectively. In this example, (ac p(1,0), ac p(1,1)) = (ac 5, ac 6) is the first pair, and the swapping distortion dist(1) is 3; (ac p(2,0), ac p(2,1)) = (ac 1, ac 8) is the second pair, and the swapping distortion dist(2) is 36. The swapping distortions of the remaining six pairs are 14, 285, 16, 20, 40, and 8, respectively. The first pair is used to embed the watermark because it exhibits the minimal swapping distortion. Selecting the corresponding AC coefficients to embed a watermark bit does not alter dist(m); this is explained further in Section 2.2.

Fig. 3
figure 3

Example of AC pair selection. (a) AC coefficients in the zig-zag scanning order. (b) Quantization steps. (c) Quantized DCT coefficients. (d) Eight randomly grouped AC pairs

Since the coefficients of the selected pair may be changed during the watermark embedding process. To ensure that both encoder and verifier can obtain the same feature, the coefficients of the selected pair will be ignored by setting to 0 before downsampling left image. That is, the AC coefficients of the selected pair are set to 0. All blocks are then dequantized and transformed to the spatial domain by using the inverse DCT. The resulting left image and original right image are downsampled to 1/4 of their original sizes (1/2 height × 1/2 width). The downsampling process can reduce spatial noise, thus facilitating the reduction of errors in the extracted feature map caused by malicious attacks.

2.1.2 Feature extractor

The feature extractor extracts a feature map from the downsampled stereo-pair image. Figure 4 shows a block diagram of the feature extraction process. The downsampled left image (DL) obtained from the preprocessing operation is divided into 8 × 8 blocks. Let B k,l denote block (k, l); the first pixel (usually the top-left pixel) in B k,l is at position (k × 8, l × 8).

Fig. 4
figure 4

Block diagram of the feature extraction process

For each B k,l in the DL, a similar area matching method is applied to the downsampled right image (DR) to determine the most similar 8 × 8 area based on the minimal block matching error. The search area in the DR is centered at position (k × 8, l × 8) with size (2r h , 2r v ). The minimal block matching error (S k,l ) is calculated as follows:

$$ {S}_{k,l}=\underset{\begin{array}{l}i\kern0.5em \in \left[-{r}_h,\;{r}_h\right)\\ {}j\in \left[-{r}_v,\;{r}_v\right)\kern0.5em \\ {}\kern0.5em \end{array}}{ \min}\left(\frac{1}{8}\sqrt{{\displaystyle \sum_{x=0}^7{\displaystyle \sum_{y=0}^7{\left({D}_L\left(8k+x,\kern0.5em 8l+y\right)-{D}_R\left(8k+x+i,\kern0.5em 8l+y+j\right)\right)}^2}}}\right) $$
(3)

After all S k,l values are obtained, the distribution function (DSF) of S k,l can be calculated as follows:

$$ DSF(t)=\frac{Number\kern0.5em of\kern0.5em blocks\kern0.5em with\kern0.5em {S}_{k,l}\le t}{Total\kern0.5em number\kern0.5em of\kern0.5em blocks}, $$
(4)

where t is a threshold value and DSF(t) represents the percentage of blocks with S k,l  ≤ t. Let t p be the p-percentile of the DSF; that is, DSF(t p ) = p / 100. A p-percentile is used as a threshold, and each B k,l with S k,l  ≥ t p is regarded as an unmatched block. An unmatched block in the DL is recorded in a bitmap called the feature map (FM). In FM, 0 represents an unmatched block and 1 represents a matched block.

To increase the robustness, FM is duplicated four times (Fig. 5), yielding a duplicated feature map (DFM). The size of DFM is equivalent to the block number of the left image. A random bitmap (RBM) with the same size as DFM is generated by using a secret key, and the exclusive-or operation is applied to DFM and RBM to obtain a randomized DFM called the watermark (w).

Fig. 5
figure 5

Duplicated feature map

2.1.3 Robustness of the created watermark

The robustness of the created watermark depends on the determination of the unmatched blocks. The proposed method considers B k,l with S k,l  ≥ t p as an unmatched block, and the reason will be illustrated in the following paragraphs.

Figure 6a and b depict the left and right images, respectively, of an original stereo-pair image. All S k,l values for the blocks in the left image were calculated and sorted using the proposed method. Beside the values greater than the 90-percentile, the list of the remaining sorted S k,l values was divided into nine sublists, each of which comprised 10 % of the values. The list of the top 10 % of the values was divided into two sublists, each of which contained 5 % of the values. Table 1 lists part of the average and standard deviation of sublists.

Fig. 6
figure 6

Stereo-pair image with and without watermarking and attacking. (a) Left image. (b) Right image. (c) Watermarked left image with the 90-percentile as the threshold. (d) Left watermarked image attacked using JPEG compression with Q = 50. (e) Magnified part of (a). (f) Magnified part of (c). (g) Magnified part of (d)

Table 1 Part of average and standard deviation of sublists of the sorted minimal block matching errors

Table 1 indicates that the averages and standard deviations of the top 20 % sublists (i.e., 80–100 %) were greater than 12 and 1.1, respectively. Thus, for each B k,l in the DL with S k,l  ≥ t 80 , the average difference of each pair of pixels between B k,l and the matched block in the DR was greater than 10, implying that each block in the top 20 % sublists was different from its matching block.

Before a suitable threshold value t p is determined, the correct bit rate of the feature map extracted from an attacked watermarked stereo-pair image is defined. Let FM be the feature map extracted from the original stereo-pair image and FM’ be the feature map extracted from the attacked watermarked stereo-pair image. The correct bit rate (CBR) is defined as

$$ CBR=\frac{\left|FM\cap FM\hbox{'}\right|}{\left|FM\right|} $$
(5)

where |FM| represents the bit number of FM, and |FMFM’| represents the number of bits that have the same values in FM and FM’.

In determinining a suitable threshold t p value, various threshold values ranging from the 80-percentile to the 99-percentile were applied to 30 randomly selected stereo-pair images to generate watermarked stereo-pair images. All watermarked images (with different threshold values) were attacked using JPEG compression with a quality factor of 50, and Fig. 6 illustrates one example. Figure 6c shows the watermarked left image with a threshold of t 90. Figure 6d depicts the attacked result of Fig. 6c, and e–g depict the magnified parts of Fig. 6a, c, and d, respectivily.

Figure 7 shows the average CBRs of the 30 randomly selected images, and the CBRs from five selected stereo-pair test images at various threshold values. For further description, Fig. 7 indicates that the average CBRs of the 30 stereo-pair images are greater than 0.96. Thus, the suitable threshold is between the 80-percentile to the 99-percentile. In this study, the 90-percentile and the 95-percentile were used as the threshold in the experiments.

Fig. 7
figure 7

Average correct bit rates of 30 stereo-pair test images and correct bit rates from five selected stereo-pair test images for the extracted feature maps (FM’) at various threshold values

2.2 Watermark embedding

Figure 8 shows a block diagram of the watermark embedding process. First, the left image is divided into several 8 × 8 blocks. The DCT is applied to each block, and all coefficients are quantized. Based on the secret key, an AC pair is selected using the method applied in the preprocessing operation described in Subsection 2.1.1. Let the indices of the chosen pair be (p(m*,0), p(m*,1)). If the sign of (p(m*,0) - p(m*,1)) is identical to the sign of (ac p(m*,0) - ac p(m*,1)), 0 is embedded in the pair; otherwise, 1 is embedded in the pair. Therefore, if the representation of the embedded watermark bit is different from the representation of the pair, then (ac p(m*,0), ac p(m*,1)) is swapped to fit the watermark bit.

Fig. 8
figure 8

Block diagram of the watermark embedding process

Figure 3 illustrates an example of the AC pair selection process. The first pair (5, 6) is used to embed a watermark bit. The quantized AC coefficients (0, -1) in Fig. 3c is not swapped if the watermark bit is 1. However, if the watermark bit is 0, then the quantized AC coefficients (0, -1) are swapped, and the resulting (ac 5, ac 6) is (-1, 0).

2.3 Ownership verification

Figure 9 shows a block diagram of the proposed ownership verification procedure, which comprises three parts: the stereo image feature extractor, embedded watermark extraction, and comparison. The stereo image feature extractor, which is the same as that discussed in Section 2.1, is used to extract the feature map (FM’).

Fig. 9
figure 9

Block diagram of the proposed ownership verification process

The embedded watermark is extracted from the watermarked left image, and converted to an estimated feature map (EFM), which is compared with the extracted feature map (FM’) in the verification process.

Figure 10 shows a block diagram of the embedded watermark extraction process. First, the watermarked left image is divided into several 8 × 8 blocks, and the DCT is applied to each block. The DCT coefficients of each block are then quantized. Subsequently, for each block, the quantized AC pair (ac p(m*,0) , ac p(m*,1)) with minimal swapping distortion is selected using the AC pair selection method described in Subsection 2.1.1. The embedded watermark bit is extracted by comparing the signs of (p(m*,0) - p(m*,1)) and (ac p(m*,0) - ac p(m*,1)). The embedded watermark (w’) is formed by grouping all extracted watermark bits.

Fig. 10
figure 10

Block diagram of the proposed watermark extraction process

Feature conversion is conducted after w’ is obtained. A random bitmap (RBM) is generated by using the secret key, and an embedded duplicated feature map (EDFM) is obtained by applying the exclusive-or operation to w’ and RBM. Because the EDFM is 4 times the size of the embedded feature map, the embedded feature map can be estimated by applying a voting scheme to the four duplicated feature bitmaps. Let b i c ∈{0,1} represent the i th bit in the c th copy, where c∈{1,2,3,4}. According to the voting scheme, the i th element in the estimated embedded feature map EFM(i) can be set as

$$ EFM(i)=\left\{\begin{array}{c}\hfill 0\kern1em if{\displaystyle \sum_{c=1}^4{b}_i^c<2},\hfill \\ {}\hfill 1\kern1em if{\displaystyle \sum_{c=1}^4{b}_i^c>2,}\hfill \\ {}\hfill -1\kern1em if{\displaystyle \sum_{c=1}^4{b}_i^c=2.}\hfill \end{array}\right. $$
(6)

The error rate between FM’ and EFM is calculated to prove ownership. The error rate is defined as

$$ ErrorRate=\frac{DifferentBits}{TotalBits- TieBits} $$
(7)

where DifferentBits is the number of bits with EFM(i) ≠ -1 and EFM(i) ≠ FM’(i), TotalBits is the size of the FM’, and TieBits is the number of bits with EFM(i) = -1. Ownership is proved when the error rate is less than a predefined threshold value.

2.4 Example

This subsection presents an example illustrating the entire watermarking method. First, the pair of AC coefficients with the smallest swapping error on each block in the left image is selected and set to 0 based on the owner’s secret key. Both the left and right images are then downsampled to 1/4 of their original sizes, and they are used to calculate the FM (Fig. 11c).

Fig. 11
figure 11

Example of the proposed watermarking method. (a) Original left image. (b) Original right image. (c) Extracted feature map (FM). (d) Duplicated feature map (DFM). (e) Watermark (w). (f) Watermarked left image

FM is then duplicated four times and encrypted by using the owner’s secret key to create the watermark (Fig. 11d and e). The watermark is then embedded into the left image (Fig. 11f).

Two feature maps are first extracted when ownership must be verified. The first map (FM’, Fig. 12a) representing the original feature map is extracted from the downsampled stereo pair illustrated in Fig. 11f and b by using the stereo image feature extractor. The second map (EFM, Fig. 12b) representing the estimated map is directly extracted from the watermarked left image by using the watermark extractor. Subsequently, FM’ and EFM are compared to calculate the error rate and verify ownership.

Fig. 12
figure 12

Example of the proposed verification method. (a) Extracted feature map (FM’). (b) Estimated feature map (EFM)

3 Experimental results

A total of 150 stereo image pairs were obtained from four sets. The first set contained 38 image pairs from the database established by Scharstein et al. [4, 911]. The remaining three sets were produced in this study. The second set contained 27 nature image pairs captured using a Fujifilm FinePix Real 3D W1 camera. The third set contained 70 nature image pairs captured using an HTC Evo 3D camera. The final set contained 15 nature image pairs captured using a Nintendo 3DS camera. Figure 13 illustrates image pairs from the various image sets.

Fig. 13
figure 13

Image pairs from the various sets. (a) From the database established by Scharstein et al. (b) Captured using a Fujifilm FinePix Real 3D W1 camera. (c) Captured using an HTC Evo 3D camera. (d) Captured using a Nintendo 3DS camera

Attacks from the Stirmark benchmark program [7, 8] were applied to evaluate the robustness of the proposed method. These attacks included convolution filtering, JPEG compression, median filtering, affine transform, rescaling, rotation, and cropping. Table 2 shows the corresponding parameters. The 3 × 3 convolution filter mask coefficients [2] are defined as

Table 2 Attacks with various testing parameters
$$ \left[\begin{array}{ccc}\hfill {C}_a\hfill & \hfill {C}_b\hfill & \hfill {C}_c\hfill \\ {}\hfill {C}_d\hfill & \hfill {C}_e\hfill & \hfill {C}_f\hfill \\ {}\hfill {C}_g\hfill & \hfill {C}_h\hfill & \hfill {C}_i\hfill \end{array}\right], $$
(8)

and the affine transform matrix coefficients [3] are defined as

$$ \left[\begin{array}{ccc}\hfill {A}_a\hfill & \hfill {A}_b\hfill & \hfill 0\hfill \\ {}\hfill {A}_c\hfill & \hfill {A}_d\hfill & \hfill 0\hfill \\ {}\hfill 0\hfill & \hfill 0\hfill & \hfill 1\hfill \end{array}\right] $$
(9)

The attacks were classified into three groups. The first group comprised convolution filtering, JPEG compression, and median filtering. This group can be verified blindly, meaning that verification does not require extra information. The second group comprised affine transform, rescaling, and rotation. The size of the original watermarked image is required to verify the images attacked using the operations in this group. The third group contained cropping. The original FM is required to verify the images attacked using cropping. In the experiments, each attack was applied to both the left and right images of each stereo pair. Figure 14 depicts selected attacked left images.

Fig. 14
figure 14

Selected attacked results. (a) Convolution Filtering Test 1. (b) Affine Transform Test 4. (c) JPEG compression with quality factor 30. (d) Magnified part of (b). (e) 5 × 5 median filtering. (f) Magnified part of (e). (g) Rotation at 30°. (h) Cropping with retain ratio = 0.71

3.1 Errors in feature maps

The error rates between FM and FM’, which were extracted from the original images and the attacked watermarked images, respectively, were calculated. The error rate was defined as

$$ ErrorRate\kern0.2em 1=\frac{DifferentFeatureBits}{TotalBits}, $$
(10)

where Different Feature Bits is the number of bits with FM(i) ≠ FM’(i). Figure 15 shows the error rates caused by various attacks. The Y axis represents the percentage of images, and the X axis represents the ErrorRate1. For example, there are 120 images with ErrorRate1 = 0 after the attack of JPEG with quality factor 50, as shown in Fig. 15a, then the percentage of images is 80 % (120/150).

Fig. 15
figure 15

Rates of error in the extracted feature maps caused by various attacks. (a) Convolution and JPEG compression attacks. (b) Median filtering and affine transform attacks. (c) Rescaling and rotation attacks

Figure 15 indicates that only images attacked using convolution exhibited error rates greater than 1 % and less than 10 %, and the error rates of the other attacked images were less than 1 %. These results demonstrated the robustness of the proposed feature map extractor against most malicious attacks.

3.2 Errors in estimated feature maps

The errors in the estimated feature maps were caused by the loss of embedded information after the attacks. This type of error is due to the difference between EFM and FM, and the rate was defined as

$$ ErrorRate\kern0.1em 2=\frac{DiffEstimatedBits}{TotalBits- TieBits}, $$
(11)

where DiffEstimatedBits is the number of bits with EFM(i) ≠ -1 and EFM(i) ≠ FM(i). Figure 16 shows the rates of error in EFM caused by various attacks. The Average ErrorRate1 and ErrorRate2 are also calculated to demonstrate the robustness, as shown in Table 3.

Fig. 16
figure 16

Rates of error in estimated feature maps caused by various attacks. (a) Convolution and JPEG compression attacks. (b) Median filtering and affine transform attacks. (c) Rescaling and rotation attacks

Table 3 Average ErrorRate1 and ErrorRate2 of 150 images after various attacks

The rates of error caused by JPEG compression attacks (Fig. 16a) decreased with an increase in the quality factor. Convolution and JPEG compression attacks with a quality factor of 30 or 50 produced similar error rates. Figure 16b and c indicate that some error rates were close to 0.5, resulting in failure of the final result verification.

3.3 Verification results and comparison with other methods

A verification error rate less than 0.4 enables preserving the watermark because a random guess results in an error rate of approximately 0.5. A threshold value of 0.35 was used to compare the methods proposed by Campisi [1] and Wu et al. [12].

Verification fails when an error rate is greater than this threshold. Based on this assumption, Tables 4 and 5 illustrate the percentages of images for which verification failed after they underwent various attacks.

Table 4 Percentages of 150 images for which verification failed after various attacks (Group 1)
Table 5 Percentages of 150 images for which verification failed after various attacks (Group 2)

In the proposed method, the 95-percentile and 90-percentile are used as thresholds for creating the feature map. All methods are robust against the cropping attack; therefore, the results of this attack are not described.

Table 4 lists the results of the attacks belonging to the first group; watermarks were extracted from the attacked images blindly. Most verifications performed using the proposed method were successful after the convolution attacks; however, the methods proposed by Campisi [1] and Wu et al. [12] almost failed in verifications.

The methods proposed by Campisi [1] and Wu et al. [12] were robust against JPEG compression attacks with quality factors > 40 and > 90, respectively. The proposed method was robust against a JPEG compression attack with a quality factor ≥ 30. The proposed method was robust against a 3 × 3 median filter attack, but the methods proposed by Campisi [1] and Wu et al. [12] did not withstand median filter attacks.

Table 5 lists the results of the attacks belonging to the second group, for which the size of the original image is required for the verification process, and indicates that the method proposed by Wu et al could not withstand the attacks in this group. The method proposed by Campisi [1] withstood only rotation attacks with a rotation degree < 1; however, the proposed method withstood more attacks.

3.4 Quality of the watermarked images

In this study, the peak signal-to-noise ratio (PSNR) and structural similarity (SSIM) were used to evaluate the image quality after the watermark had been embedded.

The PSNR is defined as

$$ PSNR=10\times \log {}_{10}\left(\frac{W\times H\times {255}^2}{{\displaystyle \sum_{i=0}^{W-1}{\displaystyle \sum_{j=0}^{H-1}{\left[I\left(i,j\right)-I\hbox{'}\left(i,j\right)\right]}^2}}}\right), $$
(12)

where I(i,j) and I’(i,j) are the gray values of pixel (i,j) in the original left image and the watermarked left image, and W and H are the image width and height, respectively. Figure 17 illustrates the PSNR results of images watermarked using different embedding methods, indicating that the proposed method produces higher PSNR values than do those proposed by Campisi [1] and Wu et al. [12].

Fig. 17
figure 17

PSNR results of various embedding methods

The SSIM is defined as

$$ SSIM\left(I,I\hbox{'}\right)=\frac{\left(2{\mu}_I{\mu}_{I\hbox{'}}+{c}_1\right)\;\left(2{\sigma}_{I,I\hbox{'}}+{c}_2\right)}{\left({\mu}_I^2+{\mu}_{I\hbox{'}}^2+{c}_1\right)\;\left({\sigma}_I^2+{\sigma}_{I\hbox{'}}^2+{c}_2\right)}, $$
(13)

where μ I and μ I ' are the averages of image I and I’, respectively; σ I and σ I ' are the variances of image I and I’, respectively; σ I,I ' is the covariance of I and I’; and c 1 = (0.01 × 255)2 and c 2 = (0.03 × 255)2 are two constant values. Figure 18 shows the SSIM results of images watermarked using different embedding methods, indicating that the SSIM results of the proposed method and that proposed by Wu et al [12] were similar; however, most of the images watermarked using the proposed method exhibited higher SSIM values.

Fig. 18
figure 18

SSIM results of various embedding methods

3.5 Computational analysis

In the proposed method, the feature extraction is a time consuming process due to that a full search is applied in block matching. This process needs \( \frac{W\times H\times {r}_h\times {r}_v}{4} \) block matching. To demonstrate the feasibility of the proposed method, a program was written in C under the Microsoft Visual Studio 2010 environment, using an Intel i5-2500, 3.30 GHz personal computer with 8.0 GB memory. Assume that the image size is 1024 × 768 and r v  = 3, the computational time under different r h is shown in Table 6.

Table 6 The computational time under various r h

According to this table, even the search area covers half of the downsampled image, the proposed method needs only few seconds to do embedding and verification.

4 Conclusion

This paper proposes a robust watermarking method for protecting copyrights of stereo-pair images. A feature map is extracted based on the similar properties of stereo image pairs, and a watermark is created and embedded in the left image. In addition, a verification process is conducted to ensure that the copyright is protected. The feature map records the positions of the unmatched blocks and is robust against malicious attacks. Furthermore, the embedding method entails selecting from the randomly grouped AC pairs the pair with the smallest swapping distortion to embed a watermark bit. This process increases the PSNR and SSIM values. The experimental results indicate that the proposed method is robust against various malicious attacks such as JPEG compression, filtering, affine transform, rescaling, rotation, and cropping. The experimental results also indicate that the proposed method protects stereo images and minimizes distortion more efficiently than other stereo image watermarking methods do.