Keywords

1 Introduction

Document images are used as proof for authentication and business transactions. Traditionally digital watermarking has been used as a primary technique for copyright protection and integrity management of document images [1,2,3]. The document image consists of information with various levels of sensitivity. For instance, in a cheque image, the signature and amount are dynamically changing information for each cheque and thus possess highest level of sensitivity. The bank name, logo, cheque number contain regeneratable information content and hence constitute lower level of sensitivity. There also exists many empty areas in a cheque which can be classified as insensitive areas. Each sensitivity level needs different type of protection. Therefore, there is a need to use multiple watermarking techniques on the different areas of the same document image. The multiple watermarking schemes have two fold objectives: improve the perceptual quality of the watermarked image by reducing embedding capacity; perform tamper detection and recovery with better accuracy.

This paper is organized as follows: Sect. 2 provides a literature review of the existing works in intelligent and multiple watermarking schemes. The proposed model is explored in Sect. 3. Section 4 presents experimental results of the proposed multiple watermarking scheme. Conclusions of the proposed work are summarized in the last section.

2 Literature Review

Digital watermarking is classified as robust, fragile and semi-fragile based on the robustness to incidental and intentional attacks [4]. A detailed survey of the works on robust, fragile and semi-fragile watermarking techniques can be found in [5,6,7,8,9,10]. Most of the past efforts on watermarking schemes apply single type of watermarking technique on the entire document image. Houmansadr et al. [11] proposed a watermarking technique based on the entropy masking feature of the Human Visual System (HVS). Kankanhalli and Ramakrishnan [12] developed a watermarking technique by embedding just noticeable watermarks. Radharani et al. [13] designed a content based watermarking scheme in which watermark is generated using Independent Component Analysis (ICA) for each block of the input image. In [14,15,16], few works on the segmentation of the image into objects using image statistics and subsequently applying the robust watermarking schemes for each objects are described. Shieh et al. [17] proposed the use of genetic algorithm (GA) [18] to compute the optimal frequency bands for watermark embedding into a Discrete Cosine Transform (DCT) based watermarking system, which can simultaneously improve security, robustness, and image quality of the watermarked image. A novel idea was put forward in [19] to embed multiple watermarks with different compression domains into the same source. Lu et al. [19] developed an algorithm for embedding multiple watermarks into the Vector Quantization (VQ) domain, as well as for hiding the secret keys associated with the watermarks in the transform domain to enhance the robustness of the watermarked image. Sheppard et al. [20] discussed the different ways of multiple watermarking like rewatermarking, segmented watermarking and composite watermarking [20]. They explored different attack scenarios [21, 22] and level of robustness that could be provided by each category of multiple watermarking.

The literature reviews on the content based multiple watermarking techniques reveals that most of the existing works lack intelligent classification of information content of a document image based on sensitivity to the attacks. In the existing techniques, authors attempted to apply multiple watermarks of the same type to each block of the document image and it is not based on the appropriateness of the watermarking for the information content present in the block. In addition, the existing schemes also incur tradeoff between robustness and fragility of the watermarking multiple times. These issues motivate towards an intelligent classification of the different areas of a document image and application of different types of watermark schemes appropriate to the sensitivity requirement of each area of the document image. In this paper, a new model for intelligent multiple watermarking is designed that automatically computes desired type of watermarking for each block of the document image.

3 Proposed Model

The proposed model for the novel intelligent multiple watermarking system consists of two processes namely Embedding and Extraction. The Embedding process divides the input document image into blocks and intelligently determines the type of watermarking to be applied for each block. The watermarking algorithm depends on the information content of the image. This is primarily available through the energy component and hence luminance component in transformation is used. Further image is converted back to color after watermarking to produce watermarked image. The embedding technique depends on the type of watermarking. Robust watermarking is implemented using integer wavelet embedding [23] and fragile watermarking is accomplished using contourlet based embedding [24]. Extraction process is carried on the blocks of the watermarked image. The result of the watermark extraction depends on the type of the watermarking. The outcome of the robust watermark extraction on the block of the watermarked image is content authentication of the block. The outcome of fragile watermark extraction on the block of the watermarked image results in tamper detection and recovery of information content in the block. The following subsection explores the embedding process and extraction process in detail.

3.1 Multiple Watermark Embedding

The multiple watermark embedding process is shown in Fig. 1. It is an intelligent and adaptive embedding scheme which depends on the information content of the document image. Experiments have been conducted on all document images corpus to analyze the effect of size of the block on accuracy in identification of the type of the block and processing time. For each block, gradient binarized version of the information content in the block is obtained. Further, the sensitivity level of each block and type of watermarking required is found automatically. Subsequently, appropriate watermark embedding algorithm is applied for each block.

Fig. 1.
figure 1

Multiple watermark embedding process

The gradient binarized version of the information content in the block is computed using the following algorithm:

figure a

In this algorithm, the values of the weights \( w1 \) and \( w2 \) is empirically set to 0.5. The number of iterations required for termination of this algorithm depends on the distribution of the information content in the block. The outcome of this algorithm is a binary version of the block that gives segmentation of foreground and background information contents in the block.

figure b

Experiments have been conducted on the document image corpus to decide on the appropriate range to relative energy distribution and homogeneity values for determining the sensitivity levels of the blocks. The average \( RED_{b} \) and \( HM_{b} \) values for different types of information content in these document images is calculated and values are recorded in Table 1. It can be observed from the values in Table 1 that RED values for blocks of document image containing dynamically changing information content are in the range 0.7–0.85 and HM values lie between 0.29–0.50. Thus, sensitivity level of the block with HM less than 0.5 and RED above 0.7 is set to 0. Similarly, it can see in Table 1 that blocks of the document image containing preprinted information content has RED values above 0.3 and HM values in between 0.5 to 0.85. Therefore, sensitivity level of these blocks is set to 1. For all the other blocks, sensitivity level is set to 2.

Table 1. Computation of RED and HM values for different classes of document images in the corpus

The type of watermarking used depends on the sensitivity levels of the information content in the block. Highly sensitive blocks are protected using fragile watermarking technique. In this paper an effective fragile watermarking technique based on contourlets [24] is used. Partially sensitive blocks are protected using robust watermarking technique [23]. The size of the block is decided based on two factors: effectiveness in the identification of the sensitivity of the block and processing time for identification. Experiments have been conducted exhaustively on all the document images to measure the impact of size of the block against accuracy in identifying sensitivity level of the block. The average number of blocks expected for each sensitivity level and number of blocks being accurately identified is recorded in Table 2. It can be observed from average accuracy in identification values that the blocks of lesser size exhibits higher accuracy.

Table 2. Impact of size of the blocks of a document image on accuracy in identification of its sensitivity level and processing time for identification

Where, EB-Expected no. of blocks evaluated manually by an expert, IB-Identified No. of blocks from the proposed approach, IA-Identification Accuracy of a block, which is calculated as the ratio of sum of IB of all block types over sum of EB of all the block types. Considering the values of the accuracy in identification of sensitivity level of a block and processing time incurred for identification, size of the block is set to 128 × 128.

3.2 Multiple Watermark Extraction

Multiple watermark extraction process has two fold objectives based on the of watermark extraction process involved. Robust watermark extraction aims at content authentication of the block. This is implemented using robust watermarking scheme [23]. Fragile watermark extraction involves tamper detection and recovery of the information content of a document image. The fragile watermark extraction is performed using contourlets [24]. Multiple watermark extraction has similar steps as in multiple watermark embedding process discussed in Sect. 3.1 until the identification of the type of the gradient binarized block. Subsequently, the type of the block extracted and generated is compared and if there is a mismatch, the corresponding block of the document image is declared “inauthentic”. However, if there is a match, then watermark extraction is carried out based on the type of the block. The extracted and generated watermarks are compared for similarity using Feature Similarity Index [24] and based on the comparison, the tamper detection of the block is decided. If the block is tampered, recovery of information content is made by extracting watermark embedded at robust locations [24]. During robust watermark extraction, the watermark is extracted from the LL-band of the integer wavelet transformation performed on the block of a document image. The extracted watermark is decoded using binary block coding technique [23]. The decoded watermark is compared with original watermark and decision of content authentication of the block is performed [23].

4 Results

We have created a corpus of document images. All the images in the corpus are scanned document images. The classes of document image corpus considered are Cheques, Bills, Identity Cards, Marks cards and Certificates. Each class of document image consists of 30 images. The results of the identification of type of the blocks of a sample document image in the corpus are shown in Fig. 2.

Fig. 2.
figure 2

Results of identification of the type of the block of a sample Cheque image

It can be observed in Fig. 2, that there are three types of blocks in the sample Cheque image. The blocks with dashed border are Type-0 blocks. They are highly sensitive blocks containing large variations in the information content and distribution of the information. The blocks with dotted border are Type-1 blocks i.e. partially sensitive blocks which contain preprinted information. They have moderate homogeneity in distribution of the information content. Remaining type of blocks in the document image are the insensitive blocks (Type-2) which contain less energy and higher homogeneity of information. We have tested the accuracy of the identification for all the classes of document images in the corpus.

Once the blocks are identified, appropriate type of watermarking has been applied based on the type of the block to obtain watermarked image. Subsequently multiple watermark extraction has been applied on the watermarked image and incidental and intentional attacks have been applied on the watermarked image. The results of multiple watermark embedding and extraction are shown in Fig. 3. Figure 3 shows that watermarked image is perceptually similar to source document image in the corpus. An example of incidental attack on partially sensitive block and intentional attack on a highly sensitive block of the watermarked image is demonstrated in Fig. 3. Further, one could also observe there is great degree of accuracy in tamper detection and recovery of the highly sensitive block.

Fig. 3.
figure 3

Results of proposed multiple watermarking system (a) source document image (b) watermarked image (c) original watermark for robust watermarking (d) zoomed up Partially sensitive block with salt and pepper noise attack (e) zoomed up Highly sensitive tampered block (f) extracted robust watermark (g) tamper detection results (h) tamper recovery result

5 Analysis

The performance of the proposed watermarking system is measured in terms of the following parameters: (i) Performance analysis using Peak Signal to Noise Ratio (PSNR) (ii) Robustness Analysis using Normalized Correlation Coefficient (NCC) (iii) Fragility Analysis using accuracy of Tamper detection and recovery.

5.1 Performance Analysis

The performance of the proposed multiple watermarking scheme is evaluated in terms of PSNR. The perceptual quality of the watermarked image of size \( NXN \) is measured using Peak Signal to Noise Ratio (PSNR) [25]. A graph of PSNR values is depicted in Fig. 4 for different classes of the document images. The graph shown in Fig. 4 reveals that PSNR values of the multiple watermarking schemes are better than robust and fragile watermarking schemes when applied separately. This increase in PSNR and subsequently the perceptual quality of the watermarked image is due to the fact that all the blocks of the document image are not watermarked. The quantity of the watermark to be embedded depends on the type of the block. Hence, the noise induced due to watermarking is reduced to some extent and this result in the better fidelity of the watermarked image.

Fig. 4.
figure 4

Effect of watermarking schemes on PSNR values of different classes of document images in the corpus

5.2 Robustness Analysis

The robustness of the proposed multiple watermarking scheme is tested by applying various attacks such as horizontal cropping, vertical cropping, resizing, noise and JPEG compression on all the document images in the corpus. The degree of robustness obtained is evaluated in terms of NCC: [26]. The NCC values obtained by the application of proposed watermarking scheme only on partially sensitive blocks and robust watermarking scheme applied on the entire document image is recorded in Table 3. The NCC values in Table 3 show that there is a slight improvement in the robustness of the watermarked image. The increase in robustness is due to the localization of robustness to the blocks that are partially sensitive.

Table 3. Average NCC values for different incidental attacks

5.3 Fragility Analysis

The fragility capability of any watermarking scheme is evaluated in terms of accuracy of tamper detection and tamper recovery parameters. Accuracy of tamper detection is evaluated as follows:

$$ TDA = 1 - \frac{{\mathop \sum \nolimits_{i = 1}^{n} \left( {ta_{i} \oplus td_{i} } \right)}}{n} $$
(7)

where, \( n \) – total number of bits in the fragile watermarked blocks, \( ta \) – tampered bit, \( td \) – tamper detection bit. The average values of TDA and TRA are computed for all document images in the corpus under different intentional attacks for proposed fragile watermarking scheme and contourlet based scheme [24] separately. These values are tabulated in Table 4. It can be observed that proposed multiple watermarking schemes has a slight improvement capability in detection and recovering from tampering of information content of document image.

Table 4. Average TDA and TRA values for different intentional attacks

6 Conclusions

A novel intelligent multiple watermarking schemes are proposed in this paper. The blocks of a document image have been automatically classified into various sensitivity levels with greater accuracy. The performance analysis of the proposed approach reveals improvement in the perceptual quality of the watermarked image. The proposed scheme also outperforms the existing methods [23, 24] in providing robustness, tamper detection and recovery capabilities. Improvement on the accuracy of identification of type of block is taken up as future work of the current study.