1 Introduction

With the Internet becoming part and parcel of our lives, there is an increase in vulnerability to copyright and piracy threats as well. Technological methods that are designed to protect digital content such as images, video and databases are called “Technological Protection Measures” (TPM). They include the technologies that are used to control access to copyrighted digital content or to prevent users from copying such protected content. Digital watermarking is one such TPM that has emerged as an effective means to protect shared and outsourced databases from infringement. Watermarking protects digital content by embedding some digital marks and thus considered as a viable and cost-effective TPM. For web databases, the watermarking algorithms designed introduce small changes in the data being protected without altering the database in significant way. It provides a means to establish the true identity of the owner and deters attempts to plagiarize or distort it [1]. Prior state of art shows several works for protecting Relational databases (RDBs). However, Object Relational databases (ORDBs) are recently being implemented successfully in various domains and hence have escaped the attention of researchers. As ORDB works on objects rather than simple values, thus, existing techniques cannot be applied to them. In this research, we propose a watermarking model that taps unique characteristics of ORDBs to resolve two important security issues: Ownership proof and Tamper detection. We briefly explain the concept of ORDBs in Section 2.

Unlike encryption and hashing techniques, watermarking does not attempt to hide the data, but instead infuses a kind of ownership proof in the data. Encryption and hashing provide protection to the content by making the information indecipherable by an attacker. On the contrary, digital watermarking works on the principle of modifying the content in a manner so that the usability of the data is retained fully. In fact an attacker or observer has no way to decipher that a watermark is actually present in the database. However, the watermark does remain within the content inseparably providing a proof of the ownership or a signature to detect tampering or to track the people who may have obtained the content legally and are illegally redistributing it [17]. The main contributions of this paper are summarized below:

  1. (i)

    Simultaneous treatment of both ownership proof and tamper detection in ORDBs by sequentially applying two different watermarks for ownership proof and tamper protection, and by using the services of a trusted third party.

  2. (ii)

    The exploitation of two unique features of ORDB to enhance the security capabilities of our watermarking scheme: Firstly, the unique owner-accessible Object IDentifier (OID) of each row object is utilized for concealing the watermark, thereby making the scheme primary key independent. Secondly, the watermark is concealed in object type attributes giving us more bandwidth to accommodate watermark bits profusely, thus increasing the degree of robustness.

  3. (iii)

    Development of WORD, an extensible software system based on the proposed ORDB watermarking model.

  4. (iv)

    We used the principles of design patterns to make the software flexible and scalable so that new types of attributes can be incorporated as a database evolves.

The rest of paper is organized as follows: In Section 1 we have discussed about digital watermarking and its need to secure ORDBs. The details of features and applications of ORDBs are given in Section 2. In Section 3, we present prior sate of art. In Section 4, we describe the architecture of the proposed ORDB watermarking model. Section 5 analyses the security features incorporated in the proposed model. Section 6 illustrates the implementation details. Section 7 describes the experiments performed and demonstrates the results obtained on robustness and fragility of experimental database. Finally, we conclude the work in Section 8.

2 Background of object-relational databases (ORDBS)

ORDBs are relatively new on the horizon and have gained a lot of popularity due to their advantages over conventional relational databases [32]. An ORDB models object oriented components such as objects, classes and inheritance relationships into a relational database schema [9]. It provides support for including multimedia objects such as audio, video, image and graphics in the form of attributes. The ORDBs differ significantly with that of RDBs because of the following factors:

  1. 1.

    Data types in ORDBs differ significantly from that of RDBs. ORDBs are aimed at application domain where complex data plays significant role.

  2. 2.

    Number of potential location varies significantly.

  3. 3.

    In RDBs, primary key is used to identify the records uniquely, while in ORDBs there is an object identifier (OID) that refers to the inherited data.

Due to the differences, the technique to watermark ORDBs differs from that of RDBs. Several real-world applications are proof of their growing popularity. Barrodale computing services Ltd. has used ORDBMS to provide solutions to various applications involving the management of complex data types [3]. Some of the applications that use ORDBs include weather modelling and prediction system software, Commercial shipping information system, Life sciences databases and Ocean observatories.

An ORDB consists of groups of object tables. An object table is a special kind of table that holds objects called row objects. It gives a relational view of attributes of those objects. The allowable data-types in row objects are dependent or independent complex data types and primitive data types. A complex data type may be an object type, or collection of data elements of similar and dissimilar objects [25]. Objects that are not referenced by other objects are termed independent objects. In the proposed work, we conceal a watermark in an independent object type attribute of ORDB. Figure 1 illustrates an example of object type and object. Emp _ Detail is created as an object type that contains details of an employee. Emp _ Detail comprises of various attributes that make up a structured data unit. These attributes can be of primitive data type or another object type or an array of similar elements. Emp _ Detail includes Emp _ ID and Name as non-numeric type attribute; we refer such data type as string. D _ O _ J attribute maintains date of joining of employee. It is an object type that comprises of date i.e. day, month and year. Tot _ Exp attribute stores total number of experience in years an employee is having. We refer this data type as float indicating it contains fractional values. PrevEmpDetails attribute is object type describing previous employment details of employee. It comprises of attributes of float and string types described in P _ Detail object type. Object represents instance of object type Emp _ Detail. It shows the value of all the attributes in Emp _ Detail.

Fig. 1
figure 1

Illustration of object type and object through an example

3 Literature survey

We now analyze the state-of-art in the domain of watermarking for digital databases. Prior work focused primarily on Relational Databases (RDBs). Khanduja et al. [14, 18], Halder et al. [8] have presented the detailed literature survey in domain of digital watermarking of databases. We classify watermarking databases techniques into two categories: Robust watermarking techniques and fragile watermarking techniques.

  1. i.

    Robust Watermarking Techniques: We apply robust techniques to resolve ownership issues. The watermark is embedded into a database redundantly so that watermark can be extracted successfully even after alterations introduced by an attacker. The main objective is to make a watermark robust against various attacks.

  2. ii.

    Fragile Watermarking Techniques: Fragile watermarking techniques aim at tamper detection. The watermark usually acts as a signature of a database. Even the slightest alterations made to a database, deliberately or otherwise, will immediately alter its signature and hence its watermark. The tampering effect can therefore be detected. The main objective here is to make the watermark fragile to various attacks.

Table 1 represents the landmarks in state of art of robust watermarking techniques for the RDBs. The watermark can be created using random bit stream [2, 30], specific identifier [7], Image [33] or can be biometric trait such as voice [16]. The target attribute where watermark is inserted includes numeric [2, 7, 13, 16, 20, 30, 33, 35] or categorical [13, 20]. Date type column in Table 1 represents the target attribute data type.

Table 1 Summary of robust watermarking techniques

Surprisingly, relatively fewer attempts have been made to develop schemes that protect the integrity of relational databases with fragile watermarking. They work upon the principle that the slightest change on data will immediately destroy the watermark, thereby detecting such attempts. Table 2 enumerates the work of researchers on protecting integrity of database. To the best of our knowledge no prior work has been reported on watermarking ORDBs which is the focus of the present research. In the proposed technique, the exploitation of two unique features of ORDB is done to enhance the security capabilities of our watermarking scheme: Firstly, the unique owner-accessible Object IDentifier (OID) of each row object is utilized for concealing the watermark, thereby making the scheme primary key independent. Secondly, the watermark is concealed in object type attributes giving us more bandwidth to accommodate watermark bits profusely, thus increasing the degree of robustness.

Table 2 Summary of fragile watermarking techniques

4 Architecture of WORD

We proposed a watermarking model that provides ownership protection and detects any tampering done maliciously or inadvertently to ORDB. A database \( \mathcal{O} \) is built upon Object relational model. The ORDB \( \mathcal{O} \) consists of a group of object tables comprising Nt row objects. We assume a total number of fixed columns in a row object as Na. To maintain simplicity in text, we ignore methods within an object as they don’t play significant role in the watermarking process. The owner enumerates a set of secret parameters: secret key \( {\mathcal{K}}_s \), row object selection parameterγ, number of candidate attributes β, number of permissible Least Significant Bits (LSB) within an attribute to embed a watermark α and a number of partitions Np. These parameters are secretly decided by the owner and are kept private.

Figure 2 depicts a generic architecture ofWORD. It consists of five processes. Firstly, we prepare a watermark \( {\mathcal{W}}_o \) for ownership protection and in second process we profusely conceal it within a database \( \mathcal{O} \). Next, we prepare signature of a database \( {\mathcal{W}}_s \) for tamper detection and in forth process we register it with trusted third party (TTP). Lastly, Generate Decision phase decides true owner and/or whether database is tampered. Table 3 enlists the symbols used in this manuscript. We now discuss each of these steps in detail subsequently.

Fig. 2
figure 2

The overall process of the proposed watermarking model WORD

Table 3 Parameter table

4.1 Prepare ownership watermark \( {\mathcal{W}}_o \)

This is the first process of WORD thatsecurely prepares a watermark as shown in Fig. 2. It converts the identity of a database’s copyright holder to a watermark, thus ensuring proof of ownership. We prepare ownership watermark \( {\mathcal{W}}_o \) using owner selected watermark w and secret key \( {\mathcal{K}}_s. \)

We follow the Kerckhoff’s principle that says a system must be secure by using secret key \( {\mathcal{K}}_s \); even if the algorithm used is publically known [12]. If the direct watermark i.e. say owner’s identity is taken as a watermark; then an attacker can guess the watermark knowing the owner of a database. Use of secret key \( {\mathcal{K}}_s \) makes this process secure.\( {\mathcal{W}}_o \) is calculated using Eq. 1 as:

$$ {\mathcal{W}}_o= hash\left(w\ \Big\Vert {\mathcal{K}}_s\right) $$
(1)

Where, Ks is secret key, || is a concatenation operator, and hash(.) is a cryptographic hash function. A cryptographic hash function is a procedure that takes an arbitrary block of data and returns a fixed-size hash value such that any intentional change to the data will change its hash value [28]. Hash function provides the owner with a liberty to select his own watermark. Owner can select large files such as an image, audio or video as a watermark which are compressed to fixed size by applying a hash algorithm.

Many cryptographic hash algorithms exist in literature, e.g. RIPE-MD, MD5, SHA-2, SHA-3, HAVAL, SNEFRU, etc. We employ SHA-2 algorithm as a cryptographic hash function that yields 256-bit hash value owing to its improved resilience against attacks [26, 36]. Final watermark comprises of Nw binary bits that are securely embedded following a secure procedure explained in the next step.

4.2 Embed ownership watermark

This is the second process of WORDas shown in Fig. 2, where watermark \( {\mathcal{W}}_o \) is embedded repeatedly into the object type attribute of the database.

An object type can be of primitive data type, multimedia data type or another object type or an array of similar elements. Thus, a single object type attribute has a potential to hide multiple instances of the watermarking bits within varied attributes present in it, thereby incorporating a higher degree of robustness. We now discuss, various sub-processes involved in the process of concealing \( {\mathcal{W}}_o \).

  1. A.

    Select target positions: Figure 3 shows the pseudo-code to calculate the potential locations for embedding \( {\mathcal{W}}_o \). The pseudo-code commences by calculating row object’s hash \( {\mathcal{H}}_o(r) \) using Eq. 2.

$$ {\mathcal{H}}_o(r)= Hash\left( OID(r)\left\Vert {\mathcal{K}}_s\right.\right) $$
(2)

Where, Hash(.) represents cryptographic hash function, \( {\mathcal{K}}_s \) is secret key decided by the owner and OID is secure row object’s unique identifier. Every row object is assigned a logical unique Object IDentifier (OID). No amount of modifications made to a row object will ever alter its OID. Moreover, OID is granted special privileges to make it visible only to the owner of the database. This feature makes it attractive to use OID in contrast to primary key used for RDBs.

Fig. 3
figure 3

Pseudo-code for selecting target positions to embed watermark

The algorithm uses two different portions of the partition hash \( {\mathcal{H}}_o(r) \) (line 3) for making the following decisions:

  1. (i)

    Sf(r)– the first 20 bits \( {\mathcal{H}}_o(r)\left[1:20\right] \), decides whether the row object r is selected for embedding a watermark bit or not (line 4). This process serves two purposes. One, it limits the amount of distortion by limiting the number of watermarked row objects. Secondly, the use of a secure hash function and secret parameters enhances security by concealing the identity of the watermarked row objects from an intruder.

  2. (ii)

    Se(r)– the first 30 bits \( {\mathcal{H}}_o(r)\left[1:30\right] \), chooses the target attribute for embedding if r is selected for this purpose (line 6). Thus, for any given row object, one out of β candidates attributes is chosen randomly for embedding. This process adds another level of obfuscation by ensuring that the same attribute is not always chosen across different row objects for the purpose of embedding.

It may be noted that the different number of bits of \( {\mathcal{H}}_o \) are used for selection of row objects and attributes to eliminate any common pattern in their selection. Further, for selecting bit position to conceal a watermark bit within an attribute of the selected object type attribute; Sf is concatenated with \( {\mathcal{K}}_s \) to get a value whose mod with α yields position where a watermark bit is concealed (line 7).

Different watermark bit is selected for every earmarked row object. However, in a particular row object, same watermark bit WIndex(r) is concealed repeatedly within different attributes of the selected object type attribute. This adds redundancy resulting in enhancing robustness of \( {\mathcal{W}}_o \).

  1. B.

    Insert Watermark: We embed a watermark in independent object type attributes to yield watermarked database \( {\mathcal{O}}_{\mathrm{w}} \). The owner specifies the attribute set Aβcomprising of attributes that can tolerate small amount of perturbations without compromising usability constraints. Usability constraints are pre-defined for every attribute by owner of a database depending upon its application.

We conceal watermark into fractional part of numeric type attributes present within the selected object. We refer to them as float attributes. However, the technique can be easily extended for varied multimedia data types supported by ORDB such as image, audio, video, etc. by following any of the existing techniques [4, 6, 22, 31].

We show recursive sub-routine in Fig. 4 that embeds a watermark in fractional part of the numeric attribute to introduce minimum distortions. The row objects whose W _ Status(r) is 1, participate in watermarking process. Within a selected row object, attribute AAIndex is chosen among Aβ. All the attributes within a selected object type AAIndex are scanned. For every selected object type attribute, the watermark bit selected \( \mathcal{W}\left[ WIndex\left(\ r\right)\right] \) is embedded using pseudo-code in Fig. 4. Selected attribute AAIndex of float type is passed as a parameter to Embed_W(.). Each watermark bit is embedded repeatedly within a selected attribute by recursively calling Embed_W(.) subroutine.

Fig. 4
figure 4

Pseudo-code to embed watermark into selected attribute

It may be noted that use of OIDs to select secret target locations for embedding watermark and Object type attributes for embedding watermark enhances the security and robustness of the proposed scheme.

4.3 Prepare watermark for tamper detection

The third step of WORD is to prepare signature \( {\mathcal{W}}_s \) of the watermarked database for tamper detection as depicted in Fig. 2. We take the signature of database \( {\mathcal{O}}_{\mathrm{w}} \), which is concealed with the ownership watermark to create a fragile watermark. The process of creating \( {\mathcal{W}}_s \) comprises of following sub-processes:

  1. A.

    Partition Database: All the row objects in \( {\mathcal{O}}_{\mathrm{w}} \) are divided into Np virtual partitions. We define Partition set \( P=\left\{{P}_0,\dots, {P}_{I_d},\dots, {P}_{N_p-1}\right\} \) as set consisting of Npvirtual partitions such that for any two partitions Pi ∩ Pj = {∅}, if i ≠ j. A unique partition number is assigned to every row object, \( r\ \varepsilon\ {\mathcal{O}}_{\mathrm{w}} \) using Eq. 3.

$$ {I}_d(r)=\mathit{\operatorname{mod}}\left(\mathrm{OID}(r),{N}_p\right)+1 $$
(3)

Where, OID(r) is OID of a row object and Np is the number of partitions.

  1. B.

    Create Watermark\( {\mathbf{\mathcal{W}}}_{\mathbf{s}} \): Figure 5 illustrates the process of creating a signature watermark for each of the Np virtual partitions by utilizing all the row objects assigned to the partition. It returns the composite watermark which is registered with a TTP.

Fig. 5
figure 5

Pseudo-code for creating the watermark \( {\mathcal{W}}_{\mathrm{s}} \)

While preparing for partition watermark, a secure row object’s watermark hash \( {\mathcal{H}}_w(r) \) that acts as a signature of a row object is generated by picking up the attribute values (line 3). Attributes in ORDB are of varied data type. Each attribute of a row object is processed and converted to real number. These values are used to generate the watermark hash \( {\mathcal{H}}_w(r) \) of each row object as shown in Eq. 4.

$$ {\mathcal{H}}_w(r)= Hash\ \left( Aval\left({A}_1(r)\right)\ \left\Vert \dots Aval\left({A}_{AIndex}(r)\right)\dots \right\Vert Aval\left({A}_{N_a}(r)\right)\right) $$
(4)

Where, Aval(Ai(r)) denote the value of an attribute Ai for the row object r, such that: 1 ≤ i ≤ Na. Next, each row object’s watermark wP[r][1 : Nwp] is obtained by taking modulus of \( {\mathcal{H}}_w(r) \) with \( {2}^{N_{wp}} \) and converting the decimal into binary form (line 4). Nwp is the length of watermark in each partition decided by the owner. Finally, the partition watermark \( {\mathcal{W}}_P \) of length Nwp is calculated by taking the XOR of the corresponding watermark bits for all row objects assigned to the partition (line 6):

$$ {\mathcal{W}}_P\left[1:{N}_{wp}\right]= XOR\left({w}_P\left[{r}_1\right]\left[1:{N}_{wp}\right],{w}_p\left[{r}_2\right]\left[1:{N}_{wp}\right]\dots .\right) $$
(5)

Where, wP[r][1 : Nwp] denotes the Nwp bits of the watermark created for the row object rof partition P. Watermarks of all the partitions are combined to get final watermark as \( {\mathcal{W}}_{\mathrm{s}}={\mathcal{W}}_1\left\Vert \dots \right\Vert {\mathcal{W}}_P\left\Vert \dots \dots \right\Vert {\mathcal{W}}_{Np} \) comprising of Nwp ∗ Np bits.

4.4 Register with TTP

In the fourth step, Owner registers the watermark \( {\mathcal{W}}_{\mathrm{s}} \) created in above step with third party for tamper detection. TTP provides transparent security to the process of watermarking.

4.5 Generate decision

This is the last step of the proposed watermarking model as depicted in Fig. 2. A database may be used for varied purposes depending on its application domain. Thus, the need of database protection varies accordingly. Depending on whether ownership rights or integrity of the database is violated, this phase generates the decision. We now discuss how a decision is generated in each of following cases.

  1. A.

    Ownership Proof: An attacker might try to destroy the watermark embedded in database and then claim ownership over a pirated copy. Generate decision process is responsible for proving ownership rights over suspected database \( \mathcal{O}^{\prime } \). Watermark \( {\mathcal{W}}_x \) is successfully extracted from the suspected database \( {\mathcal{O}}^{\prime } \) using pseudo-code inscribed in Fig. 6. The process starts by calculating target locations (line 4-7). It then extracts the watermark bits by calling recursive subroutine Extract_W(.)inscribed in Fig. 7.

Fig. 6
figure 6

Pseudo-code for extracting the ownership watermark \( {\mathcal{W}}_x \)

Fig. 7
figure 7

Recursive algorithm to extract the watermark from selected attribute

We apply majority voting to decide the watermark bits of \( {\mathcal{W}}_x \) (line 11-18). Majority voting is a decision rule which selects one value which has occurred maximum number of times. For each watermark bit position b, we count the number of 1 s (Count[b][1]) and 0 s (Count[b][0]) extracted. If number of ones is greater than zeros for a particular bit b, then we assign \( {\mathcal{W}}_x\left[b\right] \) = 1 else \( {\mathcal{W}}_x\left[b\right] \) = 0. In case both are equal we consider that bit as invalid.

The bits of the watermark extracted \( {\mathcal{W}}_x \) from a suspected database are matched with the corresponding bits of the original watermark \( \mathcal{W} \). The original watermark \( \mathcal{W} \) is either stored or can be re-generated using process described in Section 4.1. If the number of matches Nm is very large we suspect piracy. To assess such quantities, a threshold τw is taken. We define τw as the threshold value that decides the database piracy. It tells the permissible fraction of matches of bits of \( {\mathcal{W}}_x \) with \( \mathcal{W} \) to claim ownership over the disputed database. This value is decided by the owner depending on the sensitivity of data. Equation 6 shows the condition to detect the piracy.

$$ If\left({N}_m\ge {\tau}_w\ast {N}_w\right), then\ database\ is\ pirated. $$
(6)

It may be noted that the proposed watermark extraction technique is blind as it does not require the original database for extracting the watermark from \( {\mathcal{O}}^{\prime }. \)

  1. B.

    Tamper Detection: Whenever a database is transmitted across network, receiver of the database checks for any perturbations introduced during network transmission. In order to detect any attempt to tamper the contents of the database, we regenerate the watermark \( {\mathcal{W}}_g \) from the suspected database \( {\mathcal{O}}^{\prime }. \) We partition the suspected database using Eq. 3 to get P′, and then invoke subroutine Create_W(.) inscribed in Fig. 5 to get \( {\mathcal{W}}_g \). Inputs to the subroutine Create_W(.) are \( {\mathcal{O}}^{\prime } \) and P′. The watermark is created by using attribute values of all row objects in the suspected database. Hence, any modification in the database will result in the creation of a different watermark. Owner retrieves the registered watermark \( {\mathcal{W}}_s \) from the TTP. Both the watermarks are then compared. If \( {\mathcal{W}}_g\ne {\mathcal{W}}_s \); database has been tampered with. For every partition, different watermark is created and concatenated to yield final one. This provides localization up to partition level to our tamper detection scheme.

It may be noted that tamper detection process does not require any of secret parameters used in ownership proof; thereby maintaining security of ownership process.

5 Security analysis

Assume that Alice is the owner of the database \( \mathcal{O} \) and has embedded watermark \( \mathcal{W} \) to generate a watermarked data set \( {\mathcal{O}}_{\mathrm{w}} \). Mallory is an attacker who has no access to the original database \( \mathcal{O} \) and does not know any of the secret parameters: object identifier OID, secret key Ks, number of partitions Np, fraction of watermarked row objects γ and the number of candidate attributes β. Mallory may try to destroy the watermark to claim the database to be hers. Under such circumstances, we justify the security of our model with respect to the following parameters:

  1. i.

    Probability to identify watermarked positions

We describe WORD as secure mechanism making it extremely difficult for Mallory to find the positions of embedded bits. WORD embeds watermark into γ fraction of the total Nt row objects. Further, it selects one out of β candidate attributes for watermarking within selected row objects. Obviously, without knowing the secret parameters γ and β, an attacker will find it impossible to correctly identify the watermarked row objects. We now calculate the probability of correctly identifying watermarked object types.

Probability of correctly choosing single watermarked row object is:

$$ {P}_{r1}=\gamma \ast {N}_t/{N}_t $$
(7)

Probability to correctly identify all the γ ∗ Ntwatermarked row objects is given as:

$$ {\displaystyle \begin{array}{c}{P}_r=\frac{\gamma \ast {N}_t}{N_t}\ast \frac{\left(\gamma \ast {N}_t\right)-1}{\left({N}_t-1\right)}\ast \frac{\left(\gamma \ast {N}_t\right)-2}{\left({N}_t-2\right)}\ast \dots \dots ..\ast \frac{1}{\left({N}_t-\gamma \ast {N}_t+1\right)}\\ {}=\left(\frac{1}{{}_{\gamma \ast {N}_t}{}^{N_t}C}\right)\end{array}} $$
(8)

Where, γ is fraction of tuples selected for watermarking, Nt is total number of row objects and C is combination of Nt row objects taken γ ∗ Nt at a time without repetition. Without knowing Aβ, Mallory will select AF as a set of all the object type that can be in candidate attributes set; such that Aβ ⊆ AF. An earmarked object type attribute can be chosen in 1/βF ways.

Thus, probability of correctly selecting an object type per row object is given as:

$$ {P}_{or}={P}_{r1}\ast \frac{1}{\beta_F}=\frac{\gamma \ast {N}_t}{N_t\ast {\beta}_F} $$
(9)

Probability to correctly identify targeted locations with respect to watermarked object type of all the γ ∗ Nt selected row objects is given as:

$$ {\displaystyle \begin{array}{c}{P}_l=\frac{\gamma \ast {N}_t}{N_t\ast {\beta}_F}\ast \frac{\left(\gamma \ast {N}_t\right)-1}{\left({N}_t-1\right)\ast {\beta}_F}\ast \frac{\left(\gamma \ast {N}_t\right)-2}{\left({N}_t-2\right)\ast {\beta}_F}\ast \dots \dots ..\ast \frac{1}{\left({N}_t-\gamma \ast {N}_t+1\right)\ast {\beta}_F}\\ {}=\left(\frac{1}{{}_{\gamma \ast {N}_t}{}^{N_t}C}\right)\ast {\left(\frac{1}{\beta_F}\right)}^{\gamma \ast {N}_t}\end{array}} $$
(10)

Where, C is combination of Nt row objects taken γ ∗ Nt at a time without repetition. Substituting γ = 1/3, βF = 10, Nt = 6000 in Eq. 10, we get Pl = 0.22 e−3656 which is negligible. Hence, it is impossible for the Mallory to correctly identify the watermarked positions. This probability further decrease, depending on the number of float attributes each watermarked object type contains. Tables 4 and 5 show the effect of Pl at different values of βF. As a number of candidate attributes increases, probability decreases further. Similarly, the probability is inversely proportional to γ and Nt, as it decreases with the increase in γ and Nt.

Table 4 Effect of βF on Pl (∗e−2000) at different values of γ
Table 5 Effect of βF on Pl(∗e−2000) at different values of Nt
  1. ii.

    False Hit Rate

PFH is the probability of extracting a valid watermark from a non-watermarked database. Our robustness analysis is grounded upon the approach given in [18]. Let a watermark bit bi be embedded Nr times in a partition. For simplicity, we calculate Nr assuming one watermark bit per row-object as \( {N}_r=\left\lfloor \frac{\gamma \ast {N}_t}{N_w}\right\rfloor \). Each extracted character bi from a non-watermarked relation has same probability of 1/2 to match or not match the original embedded watermark bit in the database. The final watermark bit is decided based on majority voting. We define Pbit as the probability of correctly extracting one watermark bit after applying majority voting by sheer chance. This is equivalent to saying that Pbit is the probability that at-least one more than half of Nr bits can be detected from non-watermarked relation by a sheer chance. We now calculate Pbit using binomial distribution considering Nr independent trials [23] as:

$$ {P}_{bit}={\sum}_{j=\raisebox{1ex}{${N}_r$}\!\left/ \!\raisebox{-1ex}{$2$}\right.+1}^{N_r}b\left(j;{N}_r,\frac{1}{2}\right)=B\left(\left(0.5\ast {N}_r+1\right);{N}_r,0.5\right) $$
(11)

Where, b(j; n, p) is the probability of obtaining j successes in n Bernoulli trials with probability p for success and 1 − p for failure and is given in Eq. 12.

$$ b\left(k;n,p\right)=\left({}_k{}^nC\right)\ast \left({p}^k\right)\ast {\left(1-p\right)}^{n-k} $$
(12)

We use B(k; n, p) referred as cumulative binomial probability representing the probability of obtaining at-least k successes from n Bernoulli’s trials [23]; given as:

$$ B\left(k;n,p\right)={\sum}_{i=k}^nb\left(i;n,p\right) $$
(13)

For a watermark of length Nw, we calculate false hit rate PFH referring Eq. 6 as:

$$ {P}_{FH}=B\left({\tau}_w\ast {N}_w;{N}_w,{P}_{bit}\right) $$
(14)

Where, τw is watermark threshold, Nw is length of watermark in bits and Pbit as given in Eq. 11. We took Nt = 6000, Nw = 50 and plotted the graph by varying τw. Figure 8 illustrates that false hit rate is monotonically decreasing with increase in threshold τw. False hit rate PFH is low for the nominal value of τw = 0.6 , which means that our method is robust and secure. We observed the effect of γ on PFH and plotted the values in same graph. It is revealed that PFH increases with increase in γ. As more number of row objects are earmarked, PFH increases.

Fig. 8
figure 8

Change in false hit rate by varying watermark threshold τw and γ

  1. iii.

    False Miss Rate

We define false miss rate, PFMas the probability of not detecting the watermark from a watermarked database. Mallory will try to destroy the watermark through various attacks. If Mallory successfully deletes (τw ∗ Nw + 1) watermark bits out of Nw bits, watermark is destroyed. We define PFM as given in Eq. 15.

$$ {P}_{FM}={\prod}_{i=1}^{\tau_w\ast {N}_w+1} Prob(i) $$
(15)

Where, Prob(i) is probability of successfully destroying ith watermark bit. We now analyse the false miss rate against Subset deletion, addition and alteration attacks.

  1. A.

    Subset Deletion Attack: In this attack, the Mallory deletes ∮ row objects from the watermarked database \( {\mathcal{O}}_w \) with the intention of deleting earmarked row objects. Each watermark bit is embedded Nr times in the database. To destroy ith watermark bit; ∮ random deletions must include Nr marked row objects containing ith watermark bit. Probability that ith watermark bit is completely destroyed with ∮random deletions is given by hyper-geometric distribution [10] as:

$$ Prob(i)=\left(\frac{\left({}_{N_r}{}^{N_r}C\right)\ast \left({}_{\upphi \oint -{N}_r}{}^{N_t-{N}_r}C\right)}{{}_{\oint }{}^{N_t}C}\right) $$
(16)

Suppose a population consists of N items, k of which are successes. And a random sample drawn from that population consists of n items, x of which are successes. Then the hypergeometric probability is:

$$ h\left(x;N,n,k\right)=\left(\frac{{}_x{}^kC\ast {}_{n-x}{}^{N-k}C}{{}_n{}^NC}\right) $$
(17)

Taking ∮ = 1000, Nt = 6000, γ = 1/3, τw = 0.6, Nw = 50, we calculated PFM(Subset Deletion) = 3.2 e−413 by substituting Eq. 16 in 15. Thus PFM is negligible and close to zero for subset deletion attack. We plotted the graph in Fig. 9 of Prob(i) by varying value of ∮ at different values of γ.The results show that with the increase of ∮, probability increases. However, this value is very less even for ∮ = 4000. If an attacker deletes beyond this, database becomes useless. It is also observed that as γ increases, probability decreases. At γ = 1/3, we get very low probability making our scheme secure. As more number of row objects is involved in watermark insertion, PFM(Subset Deletion) further decreases.

  1. B.

    Subset Addition Attack: In this attack, Mallory tries to fudge the embedded watermarks by adding ∮ fake row objects to the watermarked database \( {\mathcal{O}}_w \). Recall that she has no knowledge of any of the secret parameters. The additional row objects do not contain watermark bits but insert noise. Let us consider the worst-case scenario when any row object being added is selected and contains all inverted watermark bits. In order to completely obliterate the watermark bit, Mallory has to add minimum of earmarked row objects corresponding to ith watermark bit i.e. (Nr). Probability that ith watermark bit is completely destroyed with ∮ random additions of row objects is same as calculated above in Eq. 16; which is negligible. Hence, the proposed technique is robust against this attack.

  2. C.

    Subset Alteration Attack: In this attack, Mallory tries to modify ∮ row objects randomly to alter the embedded watermark. Let us consider the worst-case scenario when any row object after being altered contains all inverted watermark bits. In order to completely obliterate the watermark bit, Mallory has to alter one more than half of earmarked row objects corresponding to ith watermark bit i.e. ((Nr/2) + 1). Probability that ith watermark bit is completely destroyed with ∮ random alterations of row objects is given by hyper-geometric distribution [10] as:

$$ Prob(i)=\left(\frac{\left({}_{\left({N}_r/2+1\right)}{}^{N_r}C\right)\ast \left({}_{\upphi \oint -{N}_r/2-1}{}^{N_t-{N}_r}C\right)}{{}_{\oint }{}^{N_t}C}\right) $$
(18)

Taking ∮ = 1000, Nt = 6000, γ = 1/3, τw = 0.6, Nw = 50, wecalculated PFM(Subset Alteration) = 1.2e−88 by substituting Eq. 18 in 15, thus PFM is negligible. We plotted the graph of Prob(i) by varying value of ∮ at different values of γ as shown in Fig. 10. The results show that with the increase of ∮, probability increases significantly till ∮ = 3000. Beyond this, there is marginally decrease in PFM. Further, it is observed that as γ increases, probability decreases. At γ = 1/3, we get low PFM making our scheme robust against this attack.

Fig. 9
figure 9

Graph showing change in probability to destroy a watermark bit against random deletions by varying γ

Fig. 10
figure 10

Graph showing change in probability to destroy a watermark bit against random alterations by varying γ

  1. iv.

    Integrity Analysis

Another objective of WORD is to check the integrity of a suspected database. For tamper detection, watermark should be fragile to detect even a slightest change made in a database. In the following analysis, we assume single attribute alterations per row object and analyse the probability of not detecting any tampering after different attacks.

  1. A.

    Subset Alteration: The procedure creates a signature of the entire database; any modification to the database is bound to alter its signature. Therefore, if the attacker tries to make any changes to earmark or non-earmark attributes in the database, then on recreating the watermark from the suspected database will always produce a different watermark from the one registered and hence temperedness will be detected.

  2. B.

    Subset Addition: When an attacker adds a random row object, its Nw bit watermark is xored with watermarks of other row objects. There is only one combination, i.e. all zeros that will retain the original watermark. Thus in this case, the probability of not detecting tampering through addition of a single row object is defined as:

$$ {P}_{Addition}=\kern0.75em \frac{1}{2^{N_w}} $$
(19)

For Nw = 50, PAddition = 0.8e−15 which is negligible. As the length of watermark and number of row objects added increases, this probability further decreases.

  1. C.

    Subset Deletion: In case of deletion of a row object, the tampering will not be detected only if the deleted row object’s watermark contained all zeros. This is produced with a probability = \( 1/\left({2}^{N_w}\right). \) Furthermore, probability that the attacker deleted this particular row object = 1/Total row objects =1/Nt.Thus, probability of not detecting tampering by subset deletion is defined as:

$$ {P}_{Deletion}=\frac{1}{2^{N_w}}\ast \frac{1}{N_t} $$
(20)

For Nw = 50, PAddition = 0.15e−17. As the length of watermark and/or number of row objects increases this probability further decreases.

6 Implementation of WORD

We have modeled the system using pattern designs to incorporate flexibility and scalability. Since design patterns give a higher-level perspective on the problem and on the process of design and object orientation, WORD is developed using strategy pattern. This illustrates “design-with-change-in-mind” approach [29]. Figure 11 shows the class diagram of WORD. In consonance with the philosophy of design patterns, each of the classes shown has a pre-designated responsibility. Association between classes is represented using different arrows following Unified Modelling Language (UML) notations. Inheritance is represented using arrow head and aggregation using open diamond. Dashed line with an arrow depicts dependency. Data members of classes are made protected or private while member functions are made public.

Fig. 11
figure 11

Class diagram showing implementation details of WORD

The Watermark _ Manager (WM) is a singleton class that initiates the watermarking process by accepting watermarking requests and initializing a set of secret parameters required for watermarking. It collaborates with the Watermark _ Embedder(WE) and the Security _ Manager(SM) classes to realize the core functions involved in watermarking process. The class WE specialize into ObjectType _ WE class which in turn is extended by Numeric _ WE, Image _ WE, Sound _ WE and Video _ WE classes. These classes are responsible for implementing specific strategies for preparing and embedding bits of ownership watermark for various data types. The strategy pattern allows one to incorporate any new kind of embedding approach as and when the need arises. WE uses Watermark _ Creation (WC) class for preparing \( {\mathcal{W}}_o \) and \( {\mathcal{W}}_s. WC \) has class Partition _ Assigner(PA) that creates Np partitions of a database.

Security _ Manager (SM) class discusses two important security goals; ownership proof and tamper detection. Watermarking techniques can also address other security concerns such as data provenance [27], fingerprinting [21] and their combination. However, the security concern for a database can change at different points of time. It is therefore, judicious to adopt a strategy pattern that abstracts security as a conceptual concern and uses specialized methods in inherited classes for addressing specialized security issues. This allows room for adding new security functionality. In the Fig. 11, we have shown Tamper _ Detection and Ownership _ Proofas specialized strategies of SM due to focus of our work on these issues. Ownership _ Proof has Watermark _ Extractor (WEx) that extracts the embedded watermark from specialized abstract class ObjectType _ WE. Depending on an attribute data type used for embedding, the appropriate data type class overrides Extract _ W(.) function of ObjectType _ WE class. For the ease of understanding, we have included only the important data members and member functions in their respective classes.

7 Experimental results and analysis

We conducted our experiments on an Intel Core™ i7 2.30 GHz system in MATLAB 7.8.0. We have applied our technique on the generated synthetic object oriented database developed using PostgreSQL 9.2.1. PostgreSQL is an open-source object relational database management system. Data set comprises of 50,000 row objects. The various experiments performed on this data-set to test its robustness and integrity is discussed in following subsections.

7.1 Robustness analysis

When a database is freely distributed over a network, an attacker Mallory could perform different types of attacks in an attempt to perturb or even delete the embedded watermark. It may be noted that Mallory is faced with the dilemma of trying to destroy the watermark without affecting the usability of the data. We have assumed that all the perturbations made by owner to embed the watermark or done by an attacker to destroy the watermark are within usability constraints. We now simulate the effects of different types of attacks on database to demonstrate its robustness.

  1. A.

    Subset Deletion Attack: In this attack, the Mallory deletes ɸ row objects from the watermarked database \( {\mathcal{O}}_w \) with the intention of deleting earmarked row objects. The experimental results shown in Fig. 12 reveal that even on 90% deletion of row objects, the watermark is extracted with 100% accuracy. This is possible owing to increased redundancy of watermark bits and majority voting mechanism. This verifies the results of theoretical analyses shown in Section 5 proving resilience of our technique against this attack.

  2. B.

    Subset Addition Attack: In this attack, Mallory tries to fudge the embedded watermarks by adding ɸ random row objects to the watermarked database \( {\mathcal{O}}_w \). Assume that she has no knowledge of any of the secret parameters. The additional row objects do not contain watermark bits but insert noise. However, each added row object is equally likely to get selected and contain same or reverse watermark bits. Till the number of row objects added Nn is less than total row objects Nt, the correct watermarking bit is extracted. If (Nn > Nt), then extracted bit may differ. This result in adding more than 100% noise in dataset; thereby destroying its usability. Experimental results shown in Fig. 12 reveals that even on adding 100% new row objects; we are able to extract watermark thus verifying the results of theoretical analyses discussed previously in Section 5.

  3. C.

    Subset Alteration Attack: In this attack, Mallory tries to modify ɸ row objects randomly to alter the embedded watermark. In subset alteration attack, we assume that attacker will alter few attributes per row object randomly. The experimental results in Fig. 12 verify the robustness of the proposed scheme. On 90% alteration, the watermarking system WORD could recover the complete watermark by tapping the inherent redundancy within a row object using majority voting.

  4. D.

    Row object and Attribute Reordering: In this attack, the Mallory reorders the position of the row objects/attributes in the database with an intention of destroying the embedded watermark. To counter attribute reordering, we suggest attributes to be sorted first and then their indices are selected for concealing watermark. Experimental results reveal 100% resilience against this attack. In our technique, position of a row object in a database holds no significance while creating, embedding or extracting a watermark from a database. We use the OID hash which cannot be altered by simply repositioning a row object or any attribute in it. Thus, our technique shows robustness towards this attack.

  5. E.

    Attribute Deletion Attack: In this attack, Mallory deletes the attributes of a database hoping to delete attributes with watermarks embedded. We embed a watermark in randomly chosen attribute from the list of candidate attributes, which is different for each row object. Graph in Fig. 13 shows percentage of watermark extracted correctly on deleting the candidate attributes. On deleting nearly 50% of candidate attributes, the watermark is accurately extracted. On further deletion, it degrades gradually.

  6. F.

    Watermark Synchronisation Error: In this attack, the watermark is shifted by a few positions relative to its originally embedded position. In our proposed technique, watermark index is calculated using OID, hence is dependent on row object. Every selected row object will determine the watermark bit index that is embedded in it. Even if certain earmarked row objects are deleted, this will not affect the watermark bit positions.

Fig. 12
figure 12

Percentage of watermark extracted correctly by subset deletion, insertion and alteration attack

Fig. 13
figure 13

Percentage of watermark extracted in attribute deletion attack

7.2 Integrity analysis

We performed experiments on the subset attacks. Figure 14 shows the percentage of a recreated watermark match with their corresponding original registered one when subjected to subset addition, deletion and modification attacks. It is observed that with a 5% addition, deletion or alteration of the database, there is as much as 50%mismatch. This indicates that even with the slightest modification, the recreated watermark changes significantly, thus showing unambiguously that database has indeed been tampered with.

Fig. 14
figure 14

Integrity analysis against subset addition, alteration and deletion attack

7.3 Imperceptibility

To analyse the amount of distortions introduced after watermark embedding, we use mean and variance as usability constraints. Taking Nt = 6000, γ = 1/3; the change in the mean and variance are calculated for different attributes and are recorded in Table 6. The results indicate that the amount of errors introduced is very low. Hence, the watermark can be safely embedded without significant distortion to the database.

Table 6 Effect of watermarking on mean and variance of data

8 Conclusion and future work

WORD provides a generic framework for protection of ORDBs. It is robust in its approach to extract the embedded watermark and fragile in its approach while re-creating the watermark from a suspected database. The sequential applications of two different watermarks using the services of a trusted third party provides ownership proof and tamper detection. Watermark is concealed within object type attribute using OID of each row object. This increases the security level, makes the watermarking technique primary key independent and gives us more bandwidth to accommodate watermark bits and hence results in increase in robustness. To deal with changing security aspects with time, strategy pattern can be adopted while implementing the watermarking model WORD. The use of strategy pattern designs incorporates flexibility and scalability in our model.

Prior research mainly focussed on RDBs. We have designed a generic scheme for protection against tamper detection and ownership proof in ORDBs. The direction now points towards: (i) extending from ORDBs to OODBs and other emerging databases such as No-SQL databases flourishing in the web (ii) extending the scheme to address additional security concerns like finger-printing, data provenance etc.

Thus, there is a need to design suitable watermarking techniques targeting emerging web databases to provide protection against various security aspects.