1 Introduction

In nowadays, there exit more and more multimedia content sharing networks, e.g., p2p based file sharing websites, User Generated Content sharing websites, video-on-demand or live TV systems, etc. With respect to such properties as anonymous sharing and free uploading/consuming, content sharing networks are often filled with various contents. Thus, it is often possible to find the copyrighted contents (images, videos, musics or flashes, etc.) illegally spread over Internet. This case happens when the customer redistributes the received multimedia content illegally, e.g., to other customers without the permission to consuming, or to public networks. Thus, it is urgent to detect the content with copyright issues or even tell the content’s illegal distributor [25].

Till now, there exist various techniques that can be used to detect or protect copyright of multimedia content, e.g., digital watermarking, digital fingerprinting, and copy detection. Watermarking technique [13] is used to protect multimedia content’s ownership, which embeds the ownership information (e.g., the producer’s name or ID) into multimedia content by modifying the content slightly. Later, the ownership information can be extracted and used for authentication. Generally, invisible watermarking that embeds the ownership information imperceptibly is often used for ownership protection. Digital fingerprinting [39] is the technique used to detect the illegal redistributors. It embeds different information, such as Customer ID, into multimedia content, produces a unique copy, and sends the copy to the corresponding customer. If a copy is spread to unauthorized customers, the unique information in the copy can be detected and used to trace the illegal redistributor. Recently, the concept of content-based copy detection (CBCD) [18] has been proposed as an alternative means of identifying illegal media copies. Given an image registered by the owner, the system can determine whether near-replicas of the image are available on the Internet or through an unauthorized third party. If it is found that an image is registered (i.e., it belongs to a content owner), but the user does not have the right to use it, the image will be deemed an illegal copy.

As can be seen, the existing techniques, i.e., digital watermarking, digital fingerprinting and copy detection realize different functionalities. To realize copyright authentication (detect the illegal media content and identify the illegal distributor), all the techniques should be combined together [23].

Now, there are some schemes consider to realize both copyright detection and media index. For example, the work in [6] proposes the scheme to extract the features of the key frames from the video and use the features as a key to embed the watermark. However, the features are only used for security but not for robust watermark embedding or extraction. The work in [9] presents the image indexing methods and watermarking methods in wavelet domain, respectively. However, there is no relation between the watermarking and indexing operations.

Considering that images are often operated by compression, rotation, shearing, scaling, translation, etc., to detect the watermark in the operated images is difficult. To implement robust media copyright protection, media contents’ feature may be used, which can be obtained from the feature database in content index. Till now, no works have been done in copyright protection combined with content index.

This paper aims to propose a copyright authentication scheme based on both digital watermarking or fingerprinting and copy detection or media index. The media content is marked before distribution, and the robust feature is extracted from the marked media content and registered in a feature database. To detect whether an image is released over Internet, the robust feature is computed from potential images and compared with the registered one. Additionally, the feature is used to emend the image in order to improve the watermark/fingerprint detection rate. Here, the emendation operations can recover the image from such geometric attacks as shifting, rotation, resize, etc. The paper’s brightest innovation is to propose the algorithm to combine digital watermarking/fingerprinting and copy detection/media index.

The rest of the paper is arranged as follows. In Section 2, some related works are reviewed. Then, the architecture of the proposed copyright authentication system and the detailed algorithms are proposed in Section 3. In Section 4, the example based on feature points and additive watermarking is presented in detail. And, the experimental results and analysis are given in Section 5. Finally, in Section 6, the conclusions are drawn.

2 Related work

2.1 Digital watermarking

A good watermarking algorithm satisfies some performance metrics such as imperceptibility, robustness, capacity, security, oblivious detection, etc. During the past decades, many watermarking algorithms have been reported, which can be classified by different methods. According to the embedding domain, watermarking can be embedded in either temporal domain, spatial domain or frequency domain. Taking video watermarking for example, the watermark can be embedded in the frame-pixels, the motion vectors or the DCT coefficients, which obtains different performances. Spatial domain watermarking embeds information in pixels directly, such as the LSB method [33] and the perceptual model based methods [30]. Generally, these methods are often not robust to signal processing or attack, although they are efficient in computing. Frequency domain Watermarking is embedded in transformation domain, such as DCT transformation [24], wavelet transformation [35], etc. Compared with the watermarking in spatial domain, the one in frequency domain obtains some extra properties in robustness and imperceptibility. Additionally, the embedding can be done during compression, which is compatible with international data compression standard. Temporal domain Watermarking is embedded in temporal information. For example, in audios, echo property is used to hide information, which is named echo hiding [15]. In videos, the temporal sequence is partitioned into static component and motive component, with information embedded into motive component [19]. Considering that human’s eyes are more sensitive to static component than to motive one, embedding in motive component can often obtain higher robustness. However, error accumulation or floating makes the watermarked videos blurred in some extent, which should be improved by error compensation.

Additionally, some synchronization algorithms are proposed to improve the robustness against various operations. The typical ones are based on robust features. For example, the corners can be used to identify an object in a robust manner [1]. It firstly detects the corners, then forms the shape matrix through a neighborhood operation of the detected corners, and uses the shape matrix to match the expected objects (such as the house). Another method uses the robust regions to realize synchronization [36]. It extracts the disks in the image as the features (that is stored) and then embeds the watermark into the image’s FFT domain. The disks will be used to restore the rotated image.

2.2 Digital fingerprinting

The most serious threat to watermarking-based fingerprinting is collusion attack that fabricates a new copy by combining several copies in order to avoid the tracing. Generally, five kinds of collusion attacks are considered, i.e., averaging attack, linear combinatorial collusion attack (LCCA) [37], min-max attack, negative-correlation attack and zero-correlation attack. Since the past decade, finding new solutions resisting collusion attacks has been attracting more and more researchers. The existing fingerprinting algorithms can be classified into three categories, i.e., orthogonal fingerprint, coded fingerprint and warping-based fingerprint. In orthogonal fingerprinting [22, 34], the unique information (also named fingerprint) to be embedded is the vector independent from each other. For example, the fingerprint can be a pseudorandom sequence, and different fingerprint corresponds to different pseudorandom sequence. The orthogonal fingerprint can resist most of the proposed collusion attacks, which benefits from the orthogonal property of the fingerprints. Fingerprinting can be carefully designed in codeword form, named coded fingerprinting [7], which can detect the colluders partially or completely. Till now, two kinds of encoding methods are often referenced, i.e., the Boneh-Shaw scheme [7] and the combinatorial design based code [34]. Compared with orthogonal fingerprinting, the coded fingerprinting has some advantages. Firstly, the embedding method is not only limited to additive embedding, some other existing embedding methods are also usable. Secondly, the correct detection rate does not depend on the number of colluders. However, with respect to LCCA attacks, the coded fingerprinting is not so robust. In warping-based fingerprinting [26], the multimedia content (e.g., image or video) is desynchronized imperceptibly with some geometric operations in order to make each copy different from others. This kind of fingerprinting aims to make collusion impractical under the condition of imperceptibility. However, in this scheme, the compression ratio is often changed because of the pre-warping operations. Additionally, it is a challenge to support large number of customers by warping the content imperceptibly.

2.3 Copy detection

Image copy detector searches for all copies of a query image, and is different from content-based image retrieval (CBIR) [2] that searches for similar images. Thus, it is not usually feasible to apply existing CBIR techniques to CBCD because they may cause a considerable number of false alarms. For CBCD, the key challenge is to extract the suitable features that can obtain a good tradeoff between discriminability and robustness. The discriminability denotes the ability to distinguish different media contents. The robustness refers to the ability to survive such operations as cropping, noising, contrast changing, zoom, insertion, etc. Generally, the extracted features are compared with the registered ones, whose distance tells the repetition.

According to the methods that extract features, the CBCD algorithms can be classified into two types: global feature-based algorithms, and local feature -based algorithms. For example, the algorithm in [8] extracts the global features from wavelet transformed coefficients and color space, the one in [20] extracts the ordinal measure of DCT coefficients from the whole image, the one in [38] uses elliptical track division strategy to extract features from all the elliptical track blocks, and the one in [40] uses a sliding window to extract the block’s relationship with its neighboring blocks. These global feature-based algorithms often obtain good discriminability, while bad robustness. For example, they are not robust to such operations as block cropping. Differently, local feature-based algorithms have better robustness. For example, the algorithm in [5] computes many descriptors for each image, in which, each descriptor corresponds to one image block, and the algorithm in [17] extracts the key points from each image part. They can still identify the content even when it is tampered (e.g., cropped or modified) greatly. Their disadvantage is the high computational complexity, and the research challenge is how to determine the block size.

2.4 Drawbacks of prior arts

The digital watermarking/fingerprinting technique is often used in copyright identification, which embeds copyright information into media content imperceptibly and uses it to authenticate the copyright. For example, in the watermarking based image protection method [32], the ownership information is embedded into images before spreading them out. By detecting whether the watermark exists in the image or not, the copyright of the image spreading over Internet can be authenticated. However, considering that images are often operated by compression, rotation, shearing, scaling, translation, etc., to detect the watermark in the operated images is difficult. To implement robust media copyright protection, some extra information should be used to resist the operations. Copy detection or index aims to find some contents matching the targeted one. For example, the content index method [31] provides image index based on image feature extraction and matching. In the method, some features are extracted from an image and compared with the ones stored in a database, and thus, one or a group of matched images are found. Generally, the copy detection/index is in no relation with digital watermarking/fingerprinting. In fact, the media content’s feature extracted for copy detection/index can also be used to help the detection of watermarking/fingerprinting. In this paper, we will propose the scheme combining them together, give an example based on corner features and additive watermarking, and present some experiments to show its practicability.

3 Architecture of the proposed content distribution and copyright authentication scheme

The content distribution and copyright authentication system, as shown in Fig. 1, is composed of several steps: Firstly, the content provider/producer embeds the watermarking/fingerprinting information (content provider’s information or customer ID) W into the media content P before distributing it, and produces the marked media content C. Secondly, the robust feature F is extracted from media content C and registered in the feature database. Thirdly, the content provider distributes the media content C to customers. Fourthly, from Internet, the suspicious media contents are searched by copy detection or media index. Fifthly, for each suspicious media content C′, the copyright information is detected by watermark extraction and authentication.

Table 1 Data structure of the feature database

Among the system, there are five key algorithms, i.e., media watermark embedding, robust feature extraction and registering, media distribution, copy detection/media index, and watermark extraction and authentication. They are presented in detail in the following content.

3.1 Media watermark embedding

The watermark W is embedded into the media P under the control of K, which produces the watermarked media C. Here, the watermark W represents the ownership information or customer ID. The embedding method may be the existing watermarking algorithms [11, 12] robust to general attacks [10]. The watermarking algorithms embed watermarks into the spatial domain or frequency domain of images, videos or audios. And the algorithms are often robust to such general attacks as adding noise, compression, A/D or D/A conversion, filtering, etc. The key K controls embedding position or parameters.

3.2 Robust feature extraction and registering

The feature F is extracted from the watermarked media C and stored in the feature database D with size of M. The data structure of the feature database is shown in Table 1. The feature F should satisfy two requirements: firstly, F is robust to watermarking attacks [4], such as general attacks, rotation, shearing, translation, scaling, etc. Secondly, F can be used in media index [27, 42] and is much fewer in volumes compared with the media itself. Such feature as corner point, boundary, edge, histogram, etc [27, 42] can be used.

3.3 Media distribution

The watermarked media is distributed to customers through such means as broadcasting, multicasting, unicasting, etc. Some customers may distribute directly the received media over Internet, or distribute it after such operations as recompression, scaling, translation, etc. This media represented as C′ is then spread from one person to another freely.

3.4 Copy detection/Media index

Copy detection/Media index is to find the media content C′ similar to the ones registered in the feature database D. For each image, a robust feature F′ is extracted matched with the feature database D. The matching result gives the n (1≤n < M) most matched features (F0,F1, … ,Fn−1). The n media contents corresponding to the n features are the matched results.

3.5 Watermark extraction and authentication

If some media over Internet is to be authenticated, the watermark extraction and authentication process shown in Fig. 2 is followed, which consists of two sub-steps.

Fig. 1
figure 1

Architecture of the proposed media copyright authentication system

Fig. 2
figure 2

Watermark extraction and authentication process

Firstly, a watermark W′ is extracted from the suspicious media C′ and compared with the original watermark W. Here, the watermark extraction method is symmetric to the watermark embedding method [11, 12]. If |W′−W|<T (T is the extraction threshold determined before hand), then the watermark exists, the media is copyright protected, and the authentication process is finished. Otherwise, continue to the second step.

Secondly, a robust feature F′ is extracted from the media C′ and matched with the feature database D. The matching result gives the n (1≤n<M) most matched features (F0,F1, … ,Fn−1). The matching algorithm is based on the comparison between |F- Fi| (i = 0,1,…,n−1) and TF (TF is the threshold determined before hand). If |F- Fi|<TF , the feature Fi is the matched one. Otherwise, the feature Fi is not the matched one. The n media contents corresponding to the n features are the index results. From the n indexed features, each pair (F,Fi) (i = 0,1,…,n−1) is used to compute the changing quantity caused by attack operations, e.g. the corner matching [28], and the changing quantity is then used to emend the media from C′ to C″. For example, if Fi is different from F in rotation angle, then C′ is inverse rotated in the same angle. From the emended media C″, a watermark W″ is extracted and compared with the original watermark W. If |W′−W|<T, then the watermark exists, the media is copyright protected, and the authentication process is finished. Otherwise, continue until i = n or the watermark is detected.

4 The example based on feature points and additive watermarking

Taking corner point as the feature, an image copyright authentication method is shown in Fig. 3 and Fig. 4. Among them, Fig. 3 shows the watermark embedding and feature extraction process, and Fig. 4 is the watermark extraction and authentication process.

Fig. 3
figure 3

Watermark embedding and feature extraction

Fig. 4
figure 4

Watermark extraction and authentication process

4.1 Watermark embedding and feature extraction

Firstly, the original image P, watermark W and key K are initialized. Here, W = [w0,w1,...,wm−1] is Gaussian sequence.

Secondly, W is embedded into the wavelet domain of P under the control of K, which produces the watermarked media C. Here, K controls the permutation of W, and the permutation method based on pseudorandom number or chaotic map can be used. In watermark embedding, P is firstly transformed by wavelet transformation, which produces the coefficients in different subband. For example, if P is decomposed into 4 resolution levels using wavelet transformation, the produced subbands are denoted by I r,s. Here r∈{0,1,2,3} is the resolution level, and s∈{LL,LH,HL,HH} is the orientation. Then, the watermark is embedded into the subbands according to

$$ {I_i}^{\prime r,s}\left( {i = 0,1, \cdots, m - 1} \right) = \left\{ {\begin{array}{*{20}{c}} {{I_i}^{r,s}\left( {1 + \alpha {w_i}} \right),} \hfill & {s \ne LL} \hfill \\{{I_i}^{r,s},} \hfill & {otherwise} \hfill \\\end{array} } \right.. $$
(1)

In order to keep robust, r∈{2,3} is prefer. α (0 < α ≤ 1 and 1+αw i  > 0) is the embedding strength, which ranges in [0,1] and can be computed by HVS [21] or set as a constant. After embedding, the subbands are inversely transformed by wavelet transformation, and the produced watermarked image is C.

Thirdly, the corners F = [f0,f1,...,fR−1] are extracted from C with the method proposed in [16]. Here, fi = (xi,yi) (i = 0,1,…,R−1) is the coordinates of a corner point. The corner points are often robust to such geometric operations as rotation, shearing, scaling, translation, etc. Generally, for different image, the number of the extracted corner points is different. To keep coherence, only R points are selected as the feature. The selection can be random.

Fourthly, the watermarked image C is registered. That is, to store the corners F and the corresponding image name in the feature database D. The data structure of the database is shown in Table 2.

Table 2 Structure of the corner point based feature database

Fifthly, the watermarked and registered image C is distributed to customers.

4.2 Watermark extraction and authentication

Firstly, the received image C′, feature database D, watermark W, key K and feature matching threshold TF are initialized. Here, C′ is the operated copy of C, which has been attacked by such operations as adding noise, compression, rotation, shearing, scaling, translation, etc. T is the threshold to determine the existence of watermark. TF is the threshold to determine the matched features.

Secondly, the watermark W is permuted under the control of K, the subbands \( {I^{\prime r,s}} \) are obtained by wavelet transformation, and the correlation value

$$ \rho = \frac{1}{m}\sum\limits_{i = 1}^m {\left| {I_i^{\prime r,s}} \right|} {w_i} $$
(2)

and the threshold

$$ T = \frac{6}{N}\sqrt {{\sum\limits_{i = 1}^N {{{\left( {I_i^{\prime r,s}} \right)}^2}} }} $$
(3)

are computed. If ρ≥T, then the watermark exists, and the authentication process is finished. Otherwise, continue to the following steps.

Thirdly, A feature F′ = [f′0,f′1,...,f′R−1] composed of R corner points is extracted from the watermarked image C′. The n most matched features SF = {F0,F1, … ,Fn−1} are obtained by computing the distance Dist(Fi,F′) and compared with the threshold TF. Here,

$$ Dist\left( {{F_i},F\prime } \right) = \frac{1}{R}\sum\limits_{j = 0}^{R - 1} {\sqrt {{{{\left( {{x_{ij}} - x_j^\prime } \right)}^2} + {{\left( {{y_{ij}} - y_j^\prime } \right)}^2}}} } . $$
(4)

If \( Dist\left( {{F_i},F\prime } \right) \leqslant {T_F} \), then the feature Fi is the matched feature. Otherwise, Fi is not the matched one. The n images corresponding to the n features in SF are the indexed images.

Fourthly, set k = 0, do the following operations:

  1. i)

    if k = n, there is no watermark in the image, the image is not copyright protected, and the authentication process is finished. Otherwise, continue.

  2. ii)

    Match the two features (Fk, F′) by computing the parameters (a,b,c,d,e,f) of affine transformation according to

    $$ \left( {\begin{array}{*{20}{c}} {x\prime } \\{y\prime } \\\end{array} } \right) = \left( {\begin{array}{*{20}{c}} a & b & c \\d & e & f \\\end{array} } \right)\left( {\begin{array}{*{20}{c}} x \\y \\1 \\\end{array} } \right) = H\left( {\begin{array}{*{20}{c}} x \\y \\1 \\\end{array} } \right). $$
    (5)

    Here, (x′,y′) and (x,y) is the corner point’s coordinates in F′ and Fk, respectively. Using at least three point pairs, the parameters in H can be computed. The computing method may be least square method [41] or the coarse matching before random sample consensus (RANSAC) method [14], etc.

  3. iii)

    Using the computed parameters (a,b,c,d,e,f) to emend the image from C′ to C′k. That is, these parameters are used to emend the pixel position (x′,y′) in C′ according to

    $$ \left\{ {\begin{array}{*{20}{c}} {x{{_k^\prime }} = \frac{{ex\prime - by\prime - ec + bf}}{{ea - bd}}} \hfill \\{y_k^\prime = \frac{{dx\prime - ay\prime - cd + af}}{{bd - ae}}} \hfill \\\end{array} } \right.. $$
    (6)

    Here, (x′k,y′k) is the pixel position in C′k.

  4. iv)

    From the emended image C′k, the watermark correlation ρ and the watermark threshold T is computed, respectively, with the method similar to the one in the second step. If ρ≥T, then the watermark exists (image is copyright protected), and the authentication process is finished. Otherwise, continue to the following steps.

  5. v)

    Do k = k+1, and go to i).

5 Experiments and performance analysis

The experiments are done to show the proposed scheme’s authentication performance. The image library composed of 2000 natural images is used to simulate the images over Internet. Among them, 30 images are registered in the feature database, and 270 images are the operated versions corresponding to the 30 images. The operations include rotation, scaling, cropping, shearing, resizing, filtering, noising, etc., as shown in Table 3. Here, m = 100, R = 80, TF = 20, n = 10, and 500 images are tested, including baboon, plane, lena, Elaine, Barbara, etc. Fig. 5 shows the corner detection results of the original and operated images. As can be seen, there are high similarities between these images’ corners. Figure 6 shows the result of corner point matching, from which, the rotation direction and angle can be estimated. Figure 7 shows the result of watermark detection. The high correlation value can be detected if the watermark exists in the operated media content.

Table 3 Robustness test under the condition of various operations
Fig. 5
figure 5

Corner detection in the original and operated images (from left to right, from top to bottom: original, cropping, rotation, scaling, shearing, translation)

Fig. 6
figure 6

Corner point matching of the rotated image

Fig. 7
figure 7

Watermark detection with correlation and threshold

Table 3 shows the robustness test under the condition of various operations. As can be seen, the scheme can survive most of the desynchronization attacks based on global geometrical transforms and the general signal processing attacks. For example, the rotation or rotation-scaling operation can be detected if the rotation angle is not bigger than 10°. However, after Stirmark attacks [29], the watermark is difficult to be detected. This is because the combined operations in Stirmark increase the difficulties of both corner detection and watermark detection.

Table 4 shows the detection rate of both copy detection/media index and watermark detection. Here, the detection rate denotes the ratio between the number of correctly detected images and the number of tested images. Keeping other parameters unchanged, TF = 10, 20 and 30 are tested respectively. Seen from Table 4, the smaller the TF is, the more possible the operated image copies are missed. The more the TF is, the more possible the unrelated images are detected by a mistake. Thus, there is a tradeoff for copy detection/media index. Here, TF = 20 gets the suitable tradeoff. Additionally, for watermark detection, the feature point based emendation increases the detection rate greatly no matter what’s kind of parameter is applied. Furthermore, the bigger the TF is, the more the matched images are detected, and the more the possibility of watermark detection is.

Table 4 Detection rate of different algorithms

In this scheme, copy detection/media index is introduced to select the suspicious media copies, and then the watermark detection or image emendation is applied. To evaluate the computational cost of the proposed scheme, we compare it with the watermark-only scheme (without copy detection and emendation) and watermark-based scheme (without emendation). Set TW be the cost of watermark detection in an image, TM the cost of two image’s feature matching, TE the cost of an image’s emendation, and T1, T2, T3 the cost of n images’ authentication with the watermarking-only scheme, watermarking-based scheme, and the proposed scheme. Then, the costs satisfy

$$ \left\{ {\begin{array}{*{20}{c}} {{T_1} = n{T_W}} \hfill \\{{T_2} = mn{T_M} + {n_1}{T_W}{\kern 1pt} {\kern 1pt} \left( {n \geqslant {n_1}} \right)} \hfill \\{{T_3} = mn{T_M} + {n_1}{T_W} + {n_2}\left( {{T_E} + {T_W}} \right)\left( {n \geqslant {n_1} \geqslant {n_2}} \right)} \hfill \\\end{array} } \right. $$
(7)

where m is the size of registered image database, n is the total number of images to be authenticated, n1 is the number of suspicious images detected by feature matching, and n2 is the number of suspicious images that are not authenticated by watermark detection. The experiments are done to test the time cost. Here, m = 100, R = 80, TF = 20, n = 10, and various number of images are tested. The schemes are implemented by C code, and worked in the computer of 1.20 GHz CPU/1.49 GB RAM. The results (with the metric of seconds) are shown in Table 5. As can be seen, the proposed scheme costs the most time compared with the watermark-only scheme and watermark-based scheme, while the watermark-only scheme costs the least time. This is because the feature matching and image emendation often cost more time than watermark detection. To reduce the time cost, some typical means can be adopted, e.g., the LSH-based search method [3], which will be investigated in future work.

Table 5 Time cost of different copyright authentication schemes

6 Conclusions

This paper presents the content distribution and copyright authentication scheme based on both copy detection/media index and digital watermarking. Firstly, the architecture of the scheme is presented. Then, each algorithm in the architecture is proposed in detail. Thirdly, the example based on feature point and additive watermarking is presented, together with the experiments and analysis. The scheme uses copy detection/media index to get the matched media contents, and uses digital watermarking to detect the copyright information. Furthermore, the features extracted in copy detection/media index are also used to emend the media contents. Experimental results show that the emendation operation improves the watermark detection rate greatly. Although the example is based on images, the scheme can also be applied to video or audio contents. Considering that in the example, the feature points’ extraction often costs much computation, some better features may be investigated in order to reduce the computational cost. Additionally, the features robust against such attack as Stirmark or combined attacks need to be studied. Furthermore, to distinguish the illegal distributors, the fingerprinting code’s detection against collusion attacks will be considered in future work.