1 Introduction

In the information era, multimedia data plays an important role in our daily life. To ensure trustworthiness, multimedia authentication techniques have emerged to verify content integrity and prevent forgery [4, 21].

Various image hash schemes have been proposed in literatures for image authentication. Swaminathan’s hashing scheme [19] incorporates pseudo randomization into Fourier-Mellin transform to achieve better robustness to geometric operations. However, it suffers from some classical signal processing operations such as noising. Kozat proposed to use low-rank matrix approximations obtained via the well-known singular value decomposition (SVD) for image hashing [10]. While the SVD-based hashing scheme exhibits good geometric attack robustness, it does so at the expense of significantly increasing misclassification. Monga introduced nonnegative matrix factorization (NMF) into their hashing algorithm [15]. The NMF hashing possesses excellent robustness under a large class of perceptually insignificant attacks, while it significantly reduces misclassification for perceptually distinct images. Other image hashing schemes [79, 14, 17] have also contributed to the development of image hashing.

In recent years, a new theory Compressive Sensing (CS) has been proposed as a more efficient sampling scheme. The theoretical framework of CS was developed by Candes et al. [1] and Donoho [5]. The CS principle claims that a sparse signal can be recovered from a small number of random linear measurements. The CS has been used to design secure digital image encryption schemes [23, 24]. In [23, 24], two new hybrid image compression–encryption algorithms based on compressive sensing are proposed. The proposed algorithms with sensitive keys and nice image compression ability can resist various attacks. From the perspective of image content integrity authentication, it is promising to use CS to generate image hash due to its properties of sensitivity, uniqueness, simple calculation, one-way and a small amount of data. In [20], an image authentication scheme based on CS and distributed source coding (DSC) was proposed, where the image hash is derived from the DSC-encoded quantized random projection coefficients of an image. This scheme has the capability of tampering recovery. In [18], an robust image hashing scheme based on CS and Fourier-Mellin transform was proposed. This scheme yields better identification performances under geometric attacks such as rotation attacks and brightness changes. This scheme does not have the capability of tampering recovery. In [22], we proposed a reversible image authentication scheme based on CS, which has a short hash for image authentication and a long hash for tamper localization and recovery.

In some areas, such as medical or military applications, we not only need to authenticate the image using the image hash, but also need to recover the original image from tampering. The state-of-the-art image hash methods do not have the capability of tampering recovery, except the image hashing scheme [20]. However, the CS hashing scheme [20] has not shown strong ability to distinguish content-preserving operations from tampering. In the situation of the tampering introduced in the image does not have a sparse representation in any basis, image hashing scheme [20] has limited tampering recovery performance. In order to solve these two problems, we proposed the LRRCS hashing method.

The state-of-the-art image hash methods do not consider the problem of robust feature extraction from manipulations. Our proposed hashing scheme is based on the robust feature extraction capability of LRR, which can recover subspace structures from corruptions and errors. We use LRR to extract the primary feature of images in the cases of manipulations. Then we use CS to recover primary feature. At last we use LRR to recover the image from tampering. Experiments reveal that our hashing scheme is robust to content preserving modifications and has better image recovery performance compared with existing hashing schemes.

The rest of this paper is organized as follows. We first introduce the theoretical background of Low-Rank Representation and Compressive Sensing in Section 2. We propose the hashing scheme for image authentication and tampering recovery in Section 3. Experimental results are exhibited in Section 4. The conclusion is given in Section 5.

2 Theoretical background

2.1 Low-rank representation

In real applications, our observations are often noisy, or even grossly corrupted, and observations may be missing. In order to recover the low-rank matrix X 0 from the given observation matrix X corrupted by errors E, it is reasonable to consider the following regularized rank minimization problem [1113]:

$$ \begin{array}{l}\underset{Z,E}{ \min }\ {\left\Vert Z\right\Vert}_{*}+\lambda {\left\Vert E\right\Vert}_{2,1},\\ {}s.t.\kern0.5em X=XZ+E,\end{array} $$
(1)

where \( {\left\Vert E\right\Vert}_{2,1}={\displaystyle \sum_{j=1}^n\sqrt{{\displaystyle \sum_{i=1}^n{\left({\left[E\right]}_{ij}\right)}^2}}} \) is called as the l 2,1 -norm, and the parameter λ > 0 is used to balance the effects of the two parts, which could be chosen according to properties of the two norms, or tuned empirically. After obtaining an optimal solution (Z , E ), we could recover the original data by using X − E (or XZ ). In order to solve Problem (1), we convert it to the following equivalent problem:

$$ \begin{array}{l}\underset{Z,E,J}{ \min }\ {\left\Vert J\right\Vert}_{\ast }+\lambda {\left\Vert E\right\Vert}_{2,1,}\\ {}s.t.,\kern0.5em X=XZ+E,\\ {}\kern5.25em \mathrm{Z}=\mathrm{J},\end{array} $$
(2)

which can be solved by solving the following Augmented Lagrange Multiplier (ALM) problem:

$$ \begin{array}{l}\underset{Z,E,J,{Y}_1,{Y}_2}{ \min }\ {\left\Vert J\right\Vert}_{\ast }+\lambda {\left\Vert E\right\Vert}_{2,1}+\\ {}tr\left[{Y}_1^t\left(X-XZ-E\right)\right]+tr\left[{Y}_2^t\left(Z-J\right)\right]+\\ {}\frac{\mu }{2}\left({\left\Vert X-XZ-E\right\Vert}_F^2+{\left\Vert Z-J\right\Vert}_F^2\right),\end{array} $$
(3)

where Y 1 and Y 2 are Lagrange multipliers and μ > 0 is a penalty parameter. The above problem can by solved by inexact ALM algorithms [11]. Its convergence properties could be proved.

2.2 Compressive sensing

Compressive sensing theory asserts that it is possible to perfectly recover a signal from a limited number of incoherent nonadaptive linear measurements, provided that the signal can be represented by a small number of nonzero coefficients in some basis expansion.

Let x ∈ R n denote the signal of interest and y ∈ R m, m < n, be a number of linear random projections (measurements) obtained as y = Φx. The measurement matrix must be chosen in such a way that it satisfies a restricted isometry property (RIP) of order k [2], which says that all subsets of k columns taken from Φ are in fact nearly orthogonal or, equivalently, that linear measurements taken with Φ approximately preserve the Euclidean length of k sparse signals. The entries of Φ ∈ R m × n, the measurement matrix, can be random samples from a given statistical distribution, e.g., Gaussian or Bernoulli. At first, let us assume that x is k sparse, i.e., there are exactly k ≪ n nonzero components. The goal is to reconstruct x given the measurements y and the knowledge that x is sparse. The recent results of compressive sensing have shown that, if x is sufficiently sparse, an approximation of it can be recovered by solving the following minimization problem:

$$ \min {\left\Vert x\right\Vert}_1\kern0.5em \mathrm{s}.\mathrm{t}.\ \mathrm{y}=\varPhi \mathrm{x} $$
(4)

which can be immediately translated to a linear program. The solution of (4) is obtained provided that the number of measurements satisfies m ≥ Ck log(n/k), where C is some small positive constant.

These results also hold when the signal is not sparse, but it has a sparse representation in some orthonormal basis. Let Ψ ∈ R n × n denote an orthonormal matrix, whose columns are the basis vectors. Let us assume that we can write x = Ψθ, where θ is k sparse. Given the measurements y = Φx, the signal can be reconstructed by solving the following problem:

$$ \min {\left\Vert \theta \right\Vert}_1\kern0.5em \mathrm{s}.\mathrm{t}.\ \mathrm{y}=\varPhi \varPsi \theta $$
(5)

For the case of noisy measurements, the signal model can be expressed as y = Φx + z, where the noise amplitude is assumed to be bounded, i.e., ‖z2 ≤ ε. An approximation of the signal can be obtained by solving the following problem:

$$ \min {\left\Vert \theta \right\Vert}_1\kern0.5em \mathrm{s}.\mathrm{t}.\ {\left\Vert \mathrm{y}\hbox{-} \varPhi \varPsi \theta \right\Vert}_2\le \varepsilon $$
(6)

In this work, we adopt the GPSR algorithm [6] to find a solution to (6).

3 Image hashing via low-rank and sparse representation

In this section, we propose an image hashing scheme via Low-Rank and Sparse Representation (LRRCS hashing scheme). It composes of two stages: image hash generation, image authentication and image recovery from tampering.

3.1 Generation of image hash

In the stage of image hash generation, the original image owner generates the image hash and stores it in the image authentication server as follows:

  1. (1)

    Image Preprocessing. Let the original image X undergo pre-processing, including image re-sizing and color space conversion. Since the luminance plane contains most of the geometric and visually significant information, for a color image we only consider the luminance component.

  2. (2)

    Apply Low-Rank Representation (LRR) to image X to obtain the low-rank feature matrix Z and error matrix E. The LRR operation to image X is defined in (1) and can be solved via ALM algorithm [11].

    $$ \begin{array}{l}\underset{Z,E}{ \min }{\left\Vert Z\right\Vert}_{*}+\lambda {\left\Vert E\right\Vert}_{2,1},\\ {}s.t.\kern0.5em X=XZ+E,\end{array} $$
    (1)

    where ‖ ⋅ ‖* denotes the matrix nuclear norm (sum of the singular values of a matrix) [11], which is a convex relaxation of the rank function, the parameter λ > 0 is used to balance the effects of the two parts, and ‖ ⋅ ‖2,1 is the l 2,1 norm defined as the sum of l 2 norms of the column of matrix E. Through LRR operation, the image X is decomposed into two parts: the low-rank feature matrix Z and the error matrix E. The purpose of this step is to take the advantage of LRR operation to obtain robust feature Z to generate image hash, other than directly use the raw image X.

  3. (3)

    Apply Discrete Wavelet Transform to feature matrix Z to get feature vector w. Z = Ψw, w ∈ R n. The feature vector w is sparse and satisfied to CS requirement.

  4. (4)

    Use Compressive Sensing to encrypt and compress the feature vector w. A number of linear random projections y ∈ R m, m < n is produced as

    $$ y=\varPhi w $$
    (7)

    The entries of the matrix Φ ∈ R m × n are sampled from a Gaussian distribution, generated using a random seed S.

  5. (5)

    Post Processing. We quantize the resulting vector y and apply gray coding to obtain the binary hash sequence H(X), which is stored in the authentication server for later on image authentication and tampering recovery.

3.2 Image authentication and image recovery from tampering

Image authentication and Image recovery works as follows:

  1. (1)

    On the received image X′, the image user follows the hash generation steps (1)-(3) in Section 3.1 to obtain the feature vector w′.

  2. (2)

    Use Compressive Sensing to encrypt and compress the feature vector w′. A number of linear random projections y′ ∈ R m, m < n are produced as

    $$ {y}^{\prime }=\varPhi {w}^{\prime } $$
    (8)

    The entries of the matrix Φ ∈ R m × n are sampled from a Gaussian distribution, generated using the same random seed S as the hash generation stage. We quantize the resultant vector y′ to obtain a quantized version \( {\overline{y}}^{\prime } \).

  3. (3)

    The image user requests the hash H(X) to the authentication server. Upon the received H(X), gray decoding is applied and a quantized version \( \overline{y} \) is obtained.

  4. (4)

    Image authentication. An estimate of the distortion in terms of the mean square error (MSE) between the original and the received image is computed by

    $$ MSE\left(X,{X}^{\prime}\right)\approx \frac{1}{m}{{\left\Vert {\overline{y}}^{\prime }-\overline{y}\right\Vert}_2}^2=\frac{1}{m}{{\left\Vert \varPhi \left({w}^{\prime }-w\right)\right\Vert}_2}^2 $$
    (9)

    If the actual distortion between the original and the received image is smaller than the maximum distortion threshold expected by the original image owner, the images are declared to be the same and tampering recovery can be provided. Otherwise, the images are declared to be different.

  5. (5)

    Image recovery from tampering. We use CS to recover the primary feature vector Z, and then use LRR to recover the tampering. With the knowledge of \( {\overline{y}}^{\prime } \) and \( \overline{y} \), the image user can obtain

    $$ \varPhi \left({w}^{\prime }-w\right)={\overline{y}}^{\prime }-\overline{y}=b $$
    (10)

Compressive Sensing has shown that if e′ = w′ − w is sufficiently sparse, an approximation of the tampering e′ = w′ − w can be recovered by solving the following l 1 minimization problem:

$$ \begin{array}{l}e^{\prime }= \min {\left\Vert \left({w}^{\prime }-w\right)\right\Vert}_1\ \\ {}s.t.\ {\left\Vert b-\varPhi \left({w}^{\prime }-w\right)\right\Vert}_2\le \varepsilon \end{array} $$
(11)

If a sparse solution to the problem (11) can be found, the tampering vector e′ = w′ − w is obtained. Then through inverse wavelet transform, we obtain the primary feature vector

$$ Z=\varPsi w=\varPsi \left({w}^{\prime }-{e}^{\prime}\right) $$
(12)

At last, the original image and tampering signal can be recovered through LRR operation defined as

$$ X={X}^{\prime }Z $$
(13)
$$ e={X}^{\prime }-{X}^{\prime }Z $$
(14)

4 Experimental results and discussion

4.1 Robustness of the proposed hashing scheme

To examine the robustness properties, we consider the performance of our proposed hashing scheme to different content preserving manipulations. The manipulations considered are: (a) rotation; (b) cropping; (c) additive noise contamination; (d) Filtering; and (e) JPEG compression. For each image of size 512 × 512, we generate a hash by computing random projections m = 450 and quantize them with a step size Δ = 10. The LRR parameter is chosen as λ = 0.14. We use hamming distance as the performance metric to measure the robustness against content preserving manipulations defined as

$$ HD={\displaystyle \sum_{i=1}^n\left|{h}_i\left({s}_1\right)-{h}_i\left({s}_2\right)\right|} $$
(15)

where H(s i ) = {h 1(s i ), h 2(s i ), …, h n (s i )} means the corresponding hash vector with length n of the image s i .

Figure 1a–e plots the hamming distance between the hash vectors of the standard image and each of the five manipulated images, respectively. We observe that the proposed hashing scheme perform very well for these distortions. We further note that the hamming distance between the hashes of the noisy image and the original image is very small. We observe that, except for some rare cases, the values of hamming distance HD are less than 200. This indicates that the image hashing scheme is robust against rotation, cropping, additive noise contamination, filtering and JPEG compression. This illustrates the advantage of taking LRR to obtain robust feature Z, other than directly use the raw image X′ to generate image hash.

Fig. 1
figure 1

a Hamming distance of the proposed hashing scheme under rotation. b Hamming distance of the proposed hashing scheme under cropping. c Hamming distance of the proposed hashing scheme under noise. d Hamming distance of the proposed hashing scheme under filter. e Hamming distance of the proposed hashing scheme under JPEG compression. ae Hamming distance of proposed hashing scheme under rotation, cropping, noise, filter and JPEG compression

4.2 Comparison of hash performance

A comparison among the proposed method and [14, 15] and [20] is given in Table 1. The NMF-NMF hash method [15] is based on pseudo-randomly selected subimages, which is changed after rotation so that it is not robust against rotation. The method [15] does not have the ability to locate tampering. The SCH hash method [14] is based on feature points, which is changed after rotation so that it is not robust against rotation. The CS hash method [20] is not robust against rotation and cropping. It is remarkably mentioned that CS [20] and the proposed method have the ability to recover tampering.

Table 1 Comparison of hash performance

We evaluate hash performance in distinguishing manipulation operations from tampering. It is based on experiments on 100 images taken from the original image database, 100 images processed with content-preserving operations, and 100 images from the CASIA tampered image detection evaluation database [3]. The receiver operating curve (ROC) is a plot of the probability of false positive P FP versus the probability of false negative P FN as the threshold is varied. The error probabilities are defined as

$$ {P}_{FP}=\frac{Number\ \mathrm{of}\ \mathrm{natural}\ \mathrm{images}\ \mathrm{detected}\ \mathrm{a}\mathrm{s}\ \mathrm{tampered}\ \mathrm{images}}{Tota\operatorname{l}\ \mathrm{number}\ \mathrm{of}\ \mathrm{natural}\ \mathrm{images}} $$
(16)
$$ {P}_{FN}=\frac{Number\ \mathrm{of}\ \mathrm{tampered}\ \mathrm{images}\ \mathrm{detected}\ \mathrm{a}\mathrm{s}\ \mathrm{natural}\ \mathrm{images}}{Tota\operatorname{l}\ \mathrm{number}\ \mathrm{of}\ \mathrm{tampered}\ \mathrm{images}} $$
(17)

Figure 2a–c show the ROC curves of tampering detection under noise, rotation and cropping manipulations. The proposed method has shown stronger ability to distinguish content-preserving operations from tampering than the other three methods [15], [14] and [20]. There is a trade-off between robustness and tamper detection capability. For example, in Fig. 2a the P FP of proposed method is kept at 0.05, while the corresponding P FN is 0.01, which is reasonably low.

Fig. 2
figure 2

a ROC curves of tampering detection under standard deviation of the Gaussian noise σ = 0.5. b ROC curves of tampering detection under rotation of 10° angle. c ROC curves of tampering detection under cropping of 5 percentage of image. ac ROC curves of tampering detection under noise, rotation and cropping manipulations

4.3 Image recovery from tampering

We evaluate the capability of image recovery from tampering. For each image of size 512 × 512, we generates a hash by computing random projections m = 450 and quantizes it with a step size Δ = 10. The LRR parameter is chosen as λ = 0.14.

We adopt the Peak Signal to Noise Ratio (PSNR) to assess the quality of image recovery, which is calculated as

$$ PSNR=10\times { \log}_{10}\left(\frac{255^2\times h\times w}{{\displaystyle \sum_{i=0}^{h-1}{\displaystyle \sum_{j=0}^{w-1}{\left({p}_{i,j}-{q}_{i,j}\right)}^2}}}\right) $$
(18)

where h, w are the height and width of the image signal, p i,j and q i,j are the pixel values of the original image signal and recovered image signal.

Figure 3a shows the image recovery performance under the attack of one logo added to the original Lena image (PSNR = 27.82 dB). Figure 3b shows the image recovered using the scheme of CS hashing [20] (PSNR = 32.85 dB). Figure 3c shows the XZ′ and E′ through Low-Rank Representation operation to the attacked image X′. Figure 3d shows the image recovered using our proposed LRRCS hashing scheme (PSNR = 59.47 dB).

Fig. 3
figure 3

Image recovery performance under the attack of one logo insertion. a Tampered image (27.8 dB). b Image recovered using CS hashing [20] (PSNR = 32.85 dB). c The corrected data (XZ′) and the error (E′) after applying LRR to attacked X′. d Image recovered using proposed hashing scheme (PSNR = 59.47 dB)

Figure 4 shows the image recovery performances under the attacks of several logos are added to the original Lena image. When more than four logos are added to Lena image, the PSNR becomes much smaller for the scheme of CS hashing [20] (PSNR = 22.93 dB). This is because when more logos are added to the image, the sparse of (X′ − X) becomes small which influences the recovery performance of Compressive Sensing. However, our proposed scheme adopts Low-Rank Representation operation which separates the errors caused by tampering to E and makes the sparse of feature vector (Z′ − Z) little changed during attack, which results in better recovery performance. When more than six logos are added to the image, the recovery performance of LRR decreases, so the PSNR of our proposed scheme becomes small (PSNR = 47.35 dB).

Fig. 4
figure 4

Image recovery performance under the attacks of logos insertion.

Figure 5a shows the image recovery performance under the attack of brightness adjustment to the original Pepper image (PSNR = 23.57 dB). Figure 5b shows the image recovered in the Haar domain using the scheme of CS hashing [20] (PSNR = 28.94 dB). Figure 5c shows the XZ′ and E′ through LRR operation to attacked image X′. Figure 5d shows the image recovered using our proposed scheme (PSNR = 48.63 dB).

Fig. 5
figure 5

Image recovery performance under the attack of brightness adjustment. a Tampered image (23.5 dB). b Image recovered using the CS hashing [20] (PSNR = 28.94 dB). c The corrected data (XZ′) and the error (E′) after applying LRR to attacked X′. d Image recovered using proposed hashing scheme (PSNR = 48.63 dB)

Figure 6 shows the image recovery performance under the attacks of cropping the original Lena image. The image recovery performance of our proposed hashing scheme is better than the CS hashing scheme [20]. When the percentage of cropping is less than 10 %, the recover performance in CS hashing scheme [20] is good (PSNR = 30.76 dB). However, when the percentage of cropping is more than 30 %, the attack becomes not sparse enough, which influences the recover performance of CS hashing scheme [20] (PSNR = 20.18 dB). When the percentage of cropping is more than 40 %, the capability of LRR’s error correction decreases, which results in PSNR = 28.51 dB.

Fig. 6
figure 6

The image recovery performance under the attacks of cropping to the Lena image

Figure 7 shows the image recovery performance under the attacks of noise contamination to the original Lena image. The CS hashing scheme [20] has very good performance under noise attacks due to CS’s capability of recovery from noise (PSNR = 30.75 dB when the standard deviation of the Gaussian noise is 2). Our proposed hashing scheme has a little better recover performance due to the error correction capability of LRR (PSNR = 36.58 dB when the standard deviation of the Gaussian noise is 2).

Fig. 7
figure 7

The image recovery performance under the attacks of noise contamination to the Lena image

Figure 8 shows the image recovery performance of our proposed hashing scheme under different JPEG compression factor. It is worth noting that we don not present the CS hashing scheme [20] here, because the JPEG compression introduced in the image does not have a sparse representation in any basis, it can not be recovered by the CS hashing scheme [20].

Fig. 8
figure 8

The image recovery performance under the attacks of JPEG compression

4.4 Computation complexity and security analysis

4.4.1 Computation complexity

We consider average time consumed in calculating image hashes on a desktop computer with Dual Core 2.6-GHz CPU and 2GB RAM, running Matlab10a. The average time of the proposed method is 5.83 s. The major computation load of our proposed method is in LRR operations to extract the primary feature. When considering the cost time of image recovery, the average time of the proposed method is 10.74 s. The average time of the method of [20] is 1.58 s. The average time of the method of [15] and [14] are 1.42 s and 3.96 respectively. Compared with the other three methods, the proposed method needs more time.

4.4.2 Security analysis

Most of the image hashing methods proposed in the literatures, for example [14, 15], use the secret key to randomly select the features. In this paper, we adopt a different approach. Instead of using the secret key to randomly select features, we use the secret key (measurement matrix) to transform the feature space into the Compressive Sensing domain, which increases the entropy of the feature space and increases the security of the hash.

The CS has proven to be computational secure [16]. Without the knowledge of the key, the attacker can not obtain the content of original image. Furthermore, the sensitivity of the secret key on the randomness of the features makes the proposed method have strong security. The little change in the secret key significantly changes the hash. Experiment shows that the hash of Lena image is generated and compared with 100 hashes of the same image generated with 100 randomly generated keys, 98 % of images for different keys are detected as tampered.

We do the anti-collision tests to evaluate the security of the proposed method. If two different images have a hash distance less than a given threshold T, collision occurs. We generate hashes of 100 different images. The threshold T is set to 50. The probability density functions (PDF) of these hash distances are identified as the normal distribution with its mean and standard deviation being μ = 120.9 and σ = 11.7. Then the collision probability is computed as 6.81 × 10− 10.

5 Conclusion

We propose an image hashing scheme based on Low-Rank and Sparse Representation for image authentication and tampering recovery. Low-Rank Representation is applied to the attacked image to obtain image feature matrix and error matrix. Then we use CS to recover the primary feature and furthermore use LRR to recover the image from tampering. Experiments reveal that our proposed hashing scheme is robust to content preserving manipulations and has better image recovery performance compared with existing hashing schemes.