1 Introduction

Nowadays, the security of information has become a primordial issue especially with the rapid development of numeric transmission techniques. Among the most important techniques for the protection of information, we can find Digital Watermarking, Cryptography, Fingerprint and Steganography.

Digital Watermarking is the art of concealment [4] which consists in hiding a message (image, text, etc.) inside a digital media (image, text, video, audio, PDF, etc.) for copyright protection, hence the high importance of the cover work. The main idea behind this technique is that once a careful user detects the presence of the hidden message, he should be unable to remove that message without strongly altering the watermarked document.

Portable Document Format [6], abbreviated as PDF, is a Page Description Language created by Adobe Systems Society, and considered as an evolution of PostScript format and whose specificity is to preserve the formatting of the file.

Several methods of Steganography and Digital Watermarking in PDF and Text documents have been proposed. In [9], a steganographic approach is presented by hiding information using inter-word and inter-paragraph spacing in a text. The main disadvantage of this method is that the hidden message can be destroyed by simply deleting some spaces between the words in the stego text. In [1], two different algorithms are proposed which are considered as an alternative for the original TJ operator method. The TJ operator displays the text string in a PDF document, allows individual character positioning and uses character and word spacing parameters from the text state. The alternative method has less embedding capacity than the original method. In [8], an encryption technique is proposed by combining the information hiding technique in PDF documents and the quadratic residue as basis and then apply it to copyright protection and digital learning. The main drawback of this method is that the hidden message can be easly removed. In [7], an embedding method in source programs using invisible ASCII codes is proposed. This method is very easy to detect by simply extracting the modified text from the document, converting it to hexadecimal, extracting all the inserted invisible ASCII characters, and then, decoding the embedded message. In [11], a data hiding in PDF files and applications by imperceivable modifications of PDF object parameters is proposed. This method serves to hide data by slight modifications of the values of various PDF object parameters such as media box and text matrices. The method is considered to have sufficient transparency while its main drawback is its very low embedding capacity.

Substitutive Quantization Index Modulation (QIM) methods were introduced by Chen and Wornell [3]. The Spread Transform Dither Modulation (STDM) is an implementation of this scheme and it has been considered robust under different watermarking attacks [2, 5, 10].

In this paper, the goal is to present a blind digital watermarking scheme for PDF documents based on a variant of the Quantization Index Modulation method called Spread Transform Dither Modulation (STDM). The main difficulty in PDF documents is to find a significant watermarking space in order to embed the secret message under a sufficient Transparency-Robustness tradeoff. Our contribution consists in using the x-coordinates of a group of characters to embed each bit of the secret message while choosing the appropriate mean distortion value which gives the strong tradeoff between transparency and robustness.

The remainder of this paper is organised as follows. In Section 2, the PDF file structure is briefly summarized. Then, in Section 3, a brief explanation on STDM concept is presented. The proposed embedding method is presented in Section 4. Experimental results are shown in Section 5. Finally, Section 6 gives concluding remarks and some directions for future work.

2 PDF file structure

All PDF files provide a common structure decomposed into 4 components (e.g., see [6]) as shown in Fig. 1. Here we give a very simple example in order to understand how a string can be encoded in a PDF file.

Fig. 1
figure 1

PDF document and file structure

Header

Contains the PDF file version. It also makes the application able to identify the file as being a PDF.

Body

Contains series of objects such as Page, Font, etc. that collectively represent a PDF document. A PDF body supports eight types of objects: Boolean, Integer, String, Name, Array, Dictionary, Stream and Null. The 1 0 obj is the Root object having 1 as identifier and 0 as generator. It is a Catalog object (/Catalog) of type dictionary (<< >>). It contains the key version (/version) of value 1.4 (/1.4). Notice that version and 1.4 are two objects of type “name” since they are preceeded by a slash (/). It contains another key named Pages (/Pages) that represents a reference (R) to the object number 2 (Fig. 2).

Fig. 2
figure 2

Body example of the PDF shown in Fig. 1

The 2 0 obj is a Pages object (/Pages) of type Dictionary. It contains the key Count (/Count) of value 1 because there is only 1 page in the document. The key Count is an object of type Name while 1 is of type Numeric. The object also contains a reference to the object number 3 (kids [3 0 R]) in order to represent the page in more details. The 3 0 obj is a Page object (/Page) of type Dictionary. It contains the length of the page (/MediaBox), a reference to the parent object number 2, a reference to the object 4 (4 0 obj) that contains a reference to the Font object (6 0 obj). The object 3 also contains a reference to the object 5 (5 0 obj). The object 5 contains a reference to the object 7 (7 0 obj) including the length of the string, and all the information about the stream such as the font and size (Tf operator), the positioning of the string (Td operator), and the text showing (Tj operator). In this example, “15 385 Td” represents the offset of the beginning of the current line “Steganography in the PDF documents” in the document (Td operator: move to the start of the next line and offset from the start of the current line by (tx, ty) [6]). Therefore, 15 and 385 refer to the x and y coordinates of the first character ‘S’, respectively. The other characters take their corresponding x-coordinates values depending on the spacing in horizontal writing (defined by the “Tc” operator and which is equal to zero by default (Tc = 0)) between the characters. Notice that BT and ET represent the Begin Text and End Text, respectively. Finally the object 6 (6 0 obj) contains a reference to the object 8 (8 0 obj) where this last specifies the font used (Helvetica) and the applied encoding (WinAnsiEncoding).

As a result, all these objects are organized as a linked list where each node represents an object as shown in Fig. 3.

Fig. 3
figure 3

Body linked list

Cross-Reference Table

Each Cross-Reference table begins with a line containing the keyword xref and all the next lines are exactly 20 bytes long, including the end-of-line marker as shown on the left of Fig. 4. The first number after xref says that this list starts at object 0. But a “0 0 obj” does not exist in the PDF file because it is a special sort of entry that represents the head of a linked list. That is why, the first line in this list has a “f” at the end. The second number after xref is a count of how many objects are in this Cross-Reference Table. The lines with “n” at the end refer to the objects existing in the body section. Therefore, each indirect object has its own line in the Cross-Reference table which includes the location (offset) of the object to be accessed in the body.

Fig. 4
figure 4

Cross-reference table and trailer structures

Trailer

The trailer is used to find the xref table which will enable it to locate certain specific objects within the body of the file as shown on the right of Fig. 4. The trailer is a dictionary containing a link to the Root object, the total number of objects (/size 9), the keyword startxref, the offset of the Cross reference table to access it and, finally, the End Of the File (EOF).

The PDF file is therefore executed as follows: Header—Trailer—Cross Ref—Body.

3 Spread Transform Dither Modulation

In order to present the QIM based method, a bit message m ∈ {0,1} is considered to be embedded in a host signal x. Therefore, according to the value of the embedded bit m, two different dither quantizers are used. To embed the bit message m = 0, the dither quantizer Q 0 is used as:

$$ Q_{0}(x, {\Delta}) = \left\lfloor\frac{(x-d_{0})}{\Delta}\right\rfloor{\Delta} +d_{0} $$
(1)

While Q 1 is used to embed the bit message m = 1

$$ Q_{1}(x, {\Delta}) = \left\lfloor\frac{(x-d_{1})}{\Delta}\right\rfloor{\Delta} +d_{1} $$
(2)

where Δ is the Quantization Step Size, also called Quantization Factor. ⌊.⌋ denotes a rounding operation. The real values d 0 and d 1 represent the dither levels

$$ d_{0}=-\frac{\Delta}{4} \quad \text{and} \quad d_{1}=\frac{\Delta}{4} $$
(3)

Notice that d 0 can also be chosen pseudo randomly from a uniform distribution over [−Δ/2,Δ/2]. In such a situation, according to the sign of d 0, Δ/2 can be either added or subtracted from d 0 to form d 1.

$$d_{1} = \left\{\begin{array}{ll} d_{0} + {\Delta}/2, & \quad\text{if}~ ~d_{0}<0 \\ d_{0} - {\Delta}/2, & \quad\text{otherwise} \end{array}\right.$$

In the STDM method, each bit of the message is inserted into a sample vector x of length L of the host signal and the quantization occurs entirely in the projection of the host signal using projection vector p. The most important advantage of this method is that the embedding-induced distortion is spread into all the groups of samples instead of into one sample only. That is why this type of dither modulation is called Spread Transform Dither Modulation.

The quantized signal is given by:

$$ x^{\prime} = x+(Q_{m}(x^{T} p, {\Delta}) - x^{T} p ) p \quad m \in \{ 0, 1\} $$
(4)

The equation (4) can be re-written as:

$$ x^{\prime} = x+ \left( \left( \left\lfloor\left( \frac{(x^{T} p) -d_{m}}{\Delta}\right)\right\rfloor{\Delta} +d_{m} \right) - x^{T} p\right)p $$
(5)

The extraction of the embedded message can be performed by using a minimum distance decoder as of the form:

$$ \textit{ExtMessage} = \mathit{arg}\min_{m \in \{0, 1\}} \mid x^{\prime T} p - Q_{m} (x^{\prime T} p, {\Delta}) \mid $$
(6)

The average expected distortion [11] is:

$$ D_{s} = {\Delta}^{2} / 12L $$
(7)

4 Proposed method

4.1 Embedding concept

The embedding process can be divided into 6 steps:

  1. Step 1

    The message is ciphered by applying a XOR operation between the binary message and a random secret key. Any other Cryptographic algorithm can be used.

  2. Step 2

    The original document is read, and then all the necessary resources (x-coordinate, y-coordinate, width, height, etc.) are founded of each character that exists in the document. Let k be the length of the binary cipher message. Thus the algorithm requires k×L ressources to embed the whole secret message.

  3. Step 3

    The host signal is created and which corresponds to the x-coordinates of all the selected characters to be modified or quantized.

  4. Step 4

    Each bit of the encoded message is embedded into L different values (L ≥ 1) of the host signal created in step 3 corresponding to the x-coordinate of the characters to be modified. The embedding function is applied as shown in (4).

    Assume that m 0 and m 1 shown in Fig. 5 are two bits of the secret message to be embedded in the x-coordinate values, where L = 8. Thus, to embed m 0, both the quantizer Q 0 and the dither level d 0 are used, while Q 1 and d 1 are used to embed m 1.

    As a result, to embed a message formed by k bits into the document where each bit is embedded into L samples, we need k×L characters to modify. In other words, each character in the document has its own (x, y), therefore, if L is chosen to be 8, each bit of the encoded message being inserted into 8 values that correspond to the x-coordinate of 8 characters (x 0, y 0),(x 1, y 1),(x 2, y 2),(x 3, y 3), (x 4, y 4),(x 5, y 5),(x 6, y 6),(x 7, y 7).

    After the embedding process, the 8 characters become:

    \((x^{\prime }_{0}, y_{0}), (x^{\prime }_{1}, y_{1}), (x^{\prime }_{2}, y_{2}), (x^{\prime }_{3}, y_{3}), (x^{\prime }_{4}, y_{4})\), \((x^{\prime }_{5}, y_{5}), (x^{\prime }_{6}, y_{6}), (x^{\prime }_{7}, y_{7})\) where \(x^{\prime }_{i}\) is calculated as in (5).

  5. Step 5

    After the embedding process, each character a i takes its corresponding modified coordinate \((x^{\prime }_{i}, y)\) and be re-written separately in the document as shown in (b) of Fig. 6,

  6. Step 6

    Finally, the embedded message can be extracted by applying (6).

Fig. 5
figure 5

Basic embedding example using 2 bits and L = 8

Fig. 6
figure 6

a Original document, b Watermarked document

4.2 Discussion problem

Equation (7) has shown that the distortion is quadratic in Δ for a given L. We have represented this function in Fig. 7 with 0 ≤ Δ ≤ 3 and 0 ≤ L ≤ 100.

Fig. 7
figure 7

3D representation of the distortion D s

The user is then left to choose a D s value that would lead a sufficient robustness with sufficient transparency. In the proposed method, some distortions are considered acceptable whereas others are not. But the remaining question to be solved is “What makes a distortion acceptable”. In other words, what is the value of D s for which the method shows sufficient robustness with sufficient transparency. However, transparency and robustness are two opposite objectives. In our method, there are basically two threshold levels to consider, namely a and b. The transparency threshold level a is always computed by the transparency experiments while the robustness threshold level b is computed by the robustness experiments.

If the distortion D s is inferior to a, we thus have a sufficient transparency. On the opposite, if D s is greater than b, the method ensures sufficient robustness (but weak transparency). There are thus two cases to consider: the former is when a is inferior to b. In such a situation, for any value of D s , the corresponding distortion is either inferior to b (and the robustness is not established) or greater than a and the transparency is weak. The latter is when ba. In such a situation, the interval bD s a corresponds to the acceptable distortion values that can show sufficient robustness with sufficient transparency. Let us consider an example which includes both cases: If b = 0.5and a = 0.2 in this case, we have b > a and D s can either be greater than or equal to 0.5 (sufficient robustness with weak transparency) or inferior to 0.5. In this latter case, D s can either belong to the interval [0.2 0.5[ (weak robustness with weak transparency), or be inferior to 0.2 (weak robustness with sufficient transparency). If b = 0.2and a = 0.5 in this case, we have b ≤ a and D s can either be greater than 0.5 (sufficient robustness with weak transparency) or inferior than or equal to 0.5. In this latter case, D s can either belong to the interval [0.2 0.5] (sufficient robustness with sufficient transparency), or be inferior to 0.2 (weak robustness with sufficient transparency).

5 Experiments

Several transparency and robustness experiments are performed in order to deduce the strong approximation values of a and b. All the experiments were computed by function of Δ. Three cases can be considered:

  • Case 1: a balance between the number of characters (length) in the document and the message to be embedded.

  • Case 2: the number of characters in the document is increased in order to have a large document while keeping the same message length used in case 1.

  • Case 3: the length of the message is shortened while keeping the same length of the document used in case 1.

The threshold values of a and b are thus deduced from case 1 since they are always accepted by both cases 2 and 3. It can be explained by the fact that both cases 2 and 3 are able to represent better transparency-robustness tradeoff than case 1. In order to argument our approach, we present a brief example on a PDF document and message of case 1. The proposed method has been implemented in JAVA using the Netbeans program. Let us consider the original document: Violin.pdf shown in the top-left hand side of Fig. 8 and the message to be embedded: UFC. The violin document contains n = 947 characters. Each character of the message is encoded into 8 bits in order to form a total of k = 24 bits. Each bit message is then embedded into L = E(n/k = 39.458)=39 characters’ x-coordinates extracted during step 2 of Section 4. Therefore, a total of k×L = 936 characters are used from the document to embed the whole 24 bits of the message.

Fig. 8
figure 8

Perceptual PDF difference– violin.pdf and modified_violin.pdf using Δ=0.1,Δ=0.5,Δ=1,Δ=1.5,Δ=2,Δ=2.5,Δ=3,Δ=5 and Δ=10, respectively. The document shown in the top-left hand side is the original document

5.1 Tests of transparency (Violin.pdf, UFC)

Three different kinds of experiments (error measurements, perceptual PDF differences and distortion plots) are presented in order to test the transparency of the proposed method under several values of Δ: 0.1, 0.5, 1, 1.5, 2, 2.5, 3, 5 and 10. Table 1 presents error measurements between the original and the modified documents after watermarking using three different metrics: Mean Square Error (MSE), Root Square Error (RSE) and Mean Absolute Error (MAE). The results show that error values increase when Δ increases. Figure 8 exhibits a perceptual difference between the original and modified document and the results show a slight modification in the characters’ position when Δ is small while notable modification when Δ is high (equal to 5 or 10 for example). Figure 9 exhibits clearly how the positioning of some characters after watermarking is affected by simply comparing the deviation of the x marks in relation to the center of o marks. The x marks are exactly centered into the o marks when the distortion is very low.

Table 1 Error computations between the original and modified violin documents in terms of their x-coordinate values
Fig. 9
figure 9

Distortion plots—For L = 39, the distortion is computed using some of the characters using Δ=0.1, Δ=0.5, Δ=1, Δ=1.5, Δ=2, Δ=2.5, Δ=3, Δ=5 and Δ=10. The x marks are exactly centered into the o marks when the distortion is very low

All the transparency experiments shown in Table 1, Figs. 8 and 9 prove that the higher the value of Δ is, the more the transparency decreases. Based on these experiments, we assume that for a distortion D s greater than 0.01335, any perceptual difference between the original and the watermarked document can be noticed. Thus the transparency threshold level a is equal to 0.01335. The distortion values that are selected to show good transparency are shown in bold in Table 1.

5.2 Tests of robustness

Experiments are done on the Violin PDF document shown in the top-left hand side of Fig. 8 and the embedded message “UFC” using L = 39. Two different watermarking attacks: Gaussian and Salt&Pepper noises are applied to the x-coordinates of the characters in the watermarked document. Only the digits after the decimal point are modified. After the attacks, the extracted message is compared to the original message by computing the Pearson’s linear correlation coefficient (corr), the Mean Square Error (MSE) and the Bit Error Rate (BER). The simulations were repeated 500 times. Tables 2 and 3 illustrate an average of all the robustness results (corr, MSE and BER), and from their values, we notice that the higher the value of Δ is, the more the robustness increases. Since two noises attacks (Gaussian and Salt&Pepper) under two densities (0.1 and 0.25) are applied, therefore we will get four different robustness threshold levels: b 1, b 2, b 3 and b 4. b 1 and b 3 are computed respectively from the experiments of Gaussian and Salt&Pepper noises under a density equal to 0.1, while b 2 and b 4 under a density equal to 0.25. The robustness threshold level b is therefore computed. It corresponds to the best robustness under all the watermarking attacks. In our experiments, we consider that BER = 12.5 % can be tolerated to deduce the values of b 1, b 2, b 3 and b 4. This is motivated by the fact that this percentage of BER which corresponds in our experiments to a total of 4 error bits from k = 24, can be corrected by the majority of error correcting codes. Each robustness threshold level of each noise attack under each density value is thus equal to the distortion D s from which all the error bits (inferior than or equal to 4) can be corrected.

Table 2 Tests of robustness under gaussian attack
Table 3 Tests of robustness under Salt&Pepper attack

Table 2 and 3 present respectively the tests of robustness under Gaussian and Salt&Pepper noises attacks with two density values: 0.1 and 0.25. For a density equal to 0.1, we notice from Table 1 that for D s ≥0.00547, the average BER is less than or equal to 3.8080, while 3.3620 for D s ≥0.00308 from Table 2. Therefore b 1 and b 2 are equal to 0.00547 and 0.00308, respectively. For a density equal to 0.25, Table 1 shows that the average BER that can be entirely corrected is less than or equal to 3.6060 for the interval of D s ≥0.01034, while 3.6200 for the interval of D s ≥0.00692 from Table 2. Therefore b 3 and b 4 are equal to 0.01034 and 0.00692, respectively. The four threshold levels are shown in bold in Tables 2 and 3.

Figures 10 and 11 present the results of the BER and correlation (shown in Tables 2 and 3) by function of Δ, respectively. The figures serve to compare between the Gaussian and Salt&Pepper noises attacks under the two density values: 0.1 and 0.25. They exhibit 4 different curves. The dashed and dashdotted curves represent the Gaussian noise under the two densities 0.1 and 0.25, respectively. The dotted and dash pattern curves represent the Salt&Pepper noise under the two densities 0.1 and 0.25, respectively. The plotted curves prove that the Salt&Pepper attack is always more robust than the Gaussian attack even under the two density values.

Fig. 10
figure 10

Gaussian and Salt&Pepper comparisons in terms of BER

Fig. 11
figure 11

Gaussian and Salt&pepper comparisons in terms of correlation

5.3 Robustness with transparency

We have found 0.00547 ≤ D s ≤ 0.01335 for 1.6 ≤ Δ ≤ 2.5 and 0.00308 ≤ D s ≤ 0.01335 for 1.2 ≤ Δ ≤ 2.5 under Gaussian and Salt&Pepper attacks with density = 0.1, respectively. While 0.01034 ≤ D s ≤ 0.01335 for 2.2 ≤ Δ ≤ 2.5 and 0.00692 ≤ D s ≤ 0.01335 for 1.8 ≤ Δ ≤ 2.5 under Gaussian and Salt&Pepper attacks with density = 0.25, respectively.

The final acceptable distortion interval that can show sufficient rorbustness and transparency under all the watermarking attacks at the same time is the distortion D s that belongs to the interval [0.01034 0.08] and it is represented with the dashdotted curve in Fig. 12. The dashed, dotted, dash pattern and dashdotted curves refer to the Gaussian (density = 0.1), Salt&Pepper (density = 0.1), Salt&Pepper (density = 0.25) and Gaussian (density = 0.25), respectively.

Fig. 12
figure 12

Robustness correlations of all possible acceptable distortion under all the watermarking attacks

5.4 Our method vs related work

Our method has shown an new watermarking scheme to embed the secret message under a sufficient Transparency-Robustness tradeoff. In contrast to what has been proposed in [1] and [11], our method presents better transparency and higher embedding capacity. For example in [11], the message was embedded by slightly modifying the decimal values of the media box and text matrices, which means that the increase in the number of characters in the document does not affect the embedding capacity of the method. That is why, we exploited the characters for the embedding. More specifically, we exploited the x-coordinates of the characaters for the embedding and we used each group of them to embed one bit message by taking advantage of the STDM concept. In this case our method shows sufficient transparency and sufficient robustness at the same time where the embedded message becomes hard to be removed in contrast to what is deduced in [8]. It also provides an efficient solution of [9] and [7] by making the detectability of the message more difficult. The y-coordinates values were not used because they are constant for the characters of the same line, which can increase the detectability.

6 Conclusion and future work

In this work, we have shown in details the four different components of a PDF file structure: Header, Body, Cross-Reference Table and Trailer. The structure has been exploited to be used for an efficient blind digital watermarking scheme in terms of Transparency-Robustness tradeoff. The proposed scheme was based on a variant of the Quantization Index Modulation (QIM) method called Spread Transform Dither Modulation (STDM). Since the x-coordinates values of the characters presented in the document are non-constant especially those belonging on the same line, they have been exploited to embed each bit of the secret message.

The main contribution of this work was to achieve sufficient resistance against very high density noises attacks while preserving sufficient transparency at the same time. One of the biggest difficulties was to perform multiple transparency and robustness evaluations in order to estimate the strong value of distortion that would lead a sufficient robustness with sufficient transparency. That is why this work relies on two distinct threshold levels a and b which are computed by exploiting the transparency and robustness experiments, respectively. The strong distortion value D s that would lead to a sufficient robustness with sufficient transparency should be neither greater than a nor inferior to b. The value satisfying this condition is called “The acceptable distortion”.

As for future enhancements, we plan to extend this work into both practical and theoretical directions. In the practical part, we plan to find how robust is the approach against the JPEG compression. This hard task is challenging and presents direct applications into newspaper watermarking for instance. In the theoretical part, we plan to study how secure the STDM based approach is, i.e., how many bit are sufficient to find the encoding key as in a classical cryptographic approach.