Keywords

1 Introduction

With the rapid development of the industrial Internet and smart city, more and more wireless sensors are used to collect and exchange multimedia and video data. In applications of various industries, we can see that the transmission and exchange of video data are inevitable. In response to the development of the era, many methods for processing multimedia and video data have been proposed [1,2,3]. However, the rich temporal redundancy in video data is easy to be exploited by forgers, which makes the digital video face the threat of forgery. Especially in wireless sensor network, the massive data collected by nodes has to actively discard part of video frames during transmission due to limited bandwidth, and then the frame rate up-conversion (FRUC) [4] is implemented to restore video at the receiver. In order to ensure the users’ right to know the data integrity and to avoid inappropriate post-processing, it is necessary to design a digital forensics method to identify FRUC forgery [5].

Frame replication (FR) is the simplest FRUC method, which is used by some commercial video editing softwares (e.g., ImTOO, Video Edit Magic, etc.) because of its simplicity and ease of use. Aiming at FR forgery, the forgery traces can be found by analyzing the similarity between adjacent frames. For example, Bian et al. [6] quantify the inter-frame similarity by using structural similarity (SSIM) index [7]. Yang et al. [8] extract the feature of each frame and quantify the inter-frame similarity using the Euclidean distance between features. The periodicity of similarity indexes is a strong evidence of the FR forgery. The high frame rate of video produced by FR often appears flickering and jerkiness, which is caused by ignoring the motion between frames. Therefore, the more advanced method, motion-compensated FRUC (MC-FRUC) [9], is favored by forgers. MC-FRUC interpolates pixels along the motion trajectory, generating a smooth, clear, and comforting video, so it is a popular approach to increase frame rate. However, there is no periodic inter-frame similarity in these high-quality forged videos, which makes the forensics algorithm of detecting FR forgery invalid [6]. Bestagini et al. [10] take the lead in detecting MC-FRUC forgery. First, the suspicious video is down-sampled, and the frame average operator is used to increase the frame rate to obtain the detection video. Then, the variance is calculated frame by frame for the detected video and the suspicious video. Finally, the forgery evidence is obtained by analyzing the periodicity of variance. Bestagini’s algorithm achieves certain effects, but the algorithm must know the up-conversion factor. However, this condition cannot be met in most cases. Yang et al. [11,12,13] also explore the connection between some statistical features and interpolation frame operation, which improve the detection accuracy of MC-FRUC forgery. Since MC-FRUC can be implemented by different technical methods [14,15,16], the detection accuracy will be greatly degraded when the above MC-FRUC detection algorithm encounters complex high-precision forgery methods (e.g., multi-hypothesis motion estimation [16], post-texture rendering [17]). Especially after the forger conducts compression, de-noising, and other post-processing, the situation will get worse. Therefore, it is a large challenge to ensure the property stability of MC-FRUC forensics algorithm at present.

To improve the detection accuracy of MC-FRUC, this chapter proposes to measure the statistical variation of video data with edge feature. First, the Sobel operator is used to detect the edge of video frames. Then, the edge is quantified to obtain the edge complexity of each frame. Finally, the periodicity of the edge complexity along time axis is detected, and FRUC forgery is automatically identified by hard threshold decision. Experimental results show that the proposed forensics algorithm can effectively identify video sequences forged by different MC-FRUC methods. Especially after de-noising and compression attacks, the proposed algorithm can still ensure high detection accuracy.

The rest of this chapter is organized as follows. Section 2 briefly describes the forensics of FR forgery and MC-FRUC. Section 3 presents the proposed forensics algorithm. Section 4 discusses the experimental results and analysis. We conclude this chapter in Section 5.

2 Background Knowledge

2.1 Forensics of FR Forgery

FR is to increase the frame rate of video by directly copying adjacent frames and inserting in the waiting time. That is, suppose the t-th frame is the interpolated frame, then

$$f\left[t\right]=f\left[t+1\right]$$
(1)

where f[t] is the interpolated frame and f[t + 1] is the adjacent original frame. As shown in Fig. 1, the frame rate of original video is increased by periodically inserting interpolated frame, which means that there are always video frames consistent with its contents near the interpolated frames. Therefore, it can be proved that the video has FR forgery as long as interpolated frames are found in the neighborhood of a certain time. The existing work adopts two detection methods, namely, residual detection [10] and similarity detection [6]. The abovementioned two detection methods are briefly described below.

Fig. 1
figure 1

Example of FR forgery

  1. 1.

    Residual detection

The residual detection method calculates the inter-frame residual energy of the t-th frame and its adjacent (t + 1)-th frame as follows:

$$v\left[t\right]={\left\Vert f\left[t\right]-f\left[t+1\right]\right\Vert}_2^2$$
(2)

where ||·||2 is l 2 norm. If f[t] is interpolated frame, v[t] is 0; otherwise, v[t] is not 0. Once the interpolated frame appears periodically, the residual energy will periodically generate the value 0. Therefore, as long as the periodic attenuation of the residual energy can be detected to be 0, it can be proved that there is FR forgery. In general, after the frame rate of original video is improved by FR, video coding systems (e.g., H.264, HEVC, etc.) are adopted to implement lossy compression on the forged video in order to reduce the data volume as much as possible, which will lead to a certain error between the compressed video frame and the original video frame. That is,

$$\widehat{f}\left[t\right]=f\left[t\right]+e\left[t\right] \vspace*{-1.2pc}$$
(3)
$$\widehat{f}\left[t+1\right]=f\left[t+1\right]+e\left[t+1\right]$$
(4)

where \(\widehat{f}\left[t\right]\)and\(\widehat{f}\left[t+1\right]\)are the compression frame of the t-th frame and the (t + 1)-th frame, respectively. e[t], e[t + 1] is the error term between \(\widehat{f}\left[t\right]\),\(\widehat{f}\left[t+1\right]\) and its original frame, respectively. The residual energy between the compressed frames is calculated as follows:

$$\widehat{v}\left[t\right]={\left\Vert \widehat{f}\left[t\right]-\widehat{f}\left[t+1\right]\right\Vert}_2^2=v\left[t\right]+{\left\Vert e\left[t\right]-e\left[t+1\right]\right\Vert}_2^2$$
(5)

When the forged frame encounters lossy compression, it can be seen from Eq. (5) that the residual energy of the compressed frame includes not only the residual energy of the original frame but also the error energy. The error term will vary randomly over time due to the different quality reductions of each compressed frame. Therefore, the lossy compression will interfere with the periodic variation of residual energy, thereby reducing the detection performance.

  1. 2.

    Similarity detection

The similarity detection method calculates the SSIM value of the t-th frame and its adjacent (t + 1)-th frame as follows:

$$s\left[t\right]=\mathrm{SSIM}\left(f\left[t\right],f\left[t+1\right]\right)=\frac{\left(2{\mu}_t{\mu}_{t+1}+{c}_1\right)\left(2{\lambda}_{t,t+1}+{c}_2\right)}{\left({\mu}_t^2+{\mu}_{t+1}^2+{c}_1\right)\left({\lambda}_t^2+{\lambda}_{t+1}^2+{c}_2\right)}$$
(6)

where μ t and μ t + 1 denote the mean value of f[t] and f[t + 1], respectively. λ t and λ t + 1 denote the variance of f[t] and f[t + 1], respectively. λ t,t + 1 denotes the covariance of f[t] and f[t + 1]; c 1 and c 2 denote two constants. The SSIM value ranges from 0 to 1, where 1 indicates that the two frames are identical. The smaller the value is, the less similar the two frames are. When the interpolated frame is inserted periodically, the SSIM value will periodically appear as value 1. Therefore, as long as the SSIM value can be detected to periodically increase to 1, it can be proved that there is FR forgery. It is inevitable that noise will be mixed in forged video during transmission. That is,

$$\overline{f}\left[t\right]=f\left[t\right]+n\left[t\right] \vspace*{-1.2pc}$$
(7)
$$\overline{f}\left[t+1\right]=f\left[t+1\right]+n\left[t+1\right]$$
(8)

where \(\overline{f}\left[t\right]\) and \(\overline{f}\left[t+1\right]\) denote the noisy frame of the t-th frame and the (t + 1)-th frame, respectively. n[t] and n[t + 1] are noise terms, respectively. The SSIM value of the noisy frame is calculated as follows:

$$\overline{s}\left[t\right]=\mathrm{SSIM}\left(\overline{f}\left[t\right],\overline{f}\left[t+1\right]\right)=s\left[t\right]+\delta \left(n\left[t\right],n\left[t+1\right]\right)$$
(9)

where δ(n[t],n[t + 1]) is the interference value caused by the noise terms n[t] and n[t + 1]. When the forged frame is subjected to noise interference, it can be seen from Eq. (9) that the SSIM value of the noisy frame includes not only the SSIM value of the original frame but also the interference term. When the noise mixed in is too large, it is bound to affect the periodic variation of SSIM value, thus reducing the detection performance.

2.2 MC-FRUC

MC-FRUC is a video post-processing technology, which can realize inter-frame interpolation by predicting inter-frame motion estimation. MC-FRUC can effectively enhance motion continuity and improve the fluency of video. By virtue of abundant original data in the video and high correlation along the motion trajectory, MC-FRUC has attracted wide attention from industry and academia. As shown in Fig. 2, MC-FRUC is mainly composed of motion estimation, motion vector smoothing, motion vector mapping, and motion-compensation interpolation. The first three parts use the reference frames f t−1 and f t + 1 to generate the motion vector field V t of the current frame f t. The last part is to form the current frame estimation \({\widehat{f}}_t\) based on V t. First, the motion estimation is used to generate the motion vector field V t−1 between f t−1 and f t + 1. Then, the motion vector smoothing corrects the abnormal motion existing in V t−1 to obtain a smooth motion vector field \({\overline{\mathbf{V}}}_{t-1}\), and \({\overline{\mathbf{V}}}_{t-1}\) is mapped to the current frame motion vector field V t by motion vector mapping. Finally, according to V t, the matching pixel of any pixel in the current frame can be found in the reference frame, and the current frame is interpolated as follows:

$${\widehat{f}}_t\left(i,j\right)=\frac{1}{2}\left[{f}_{t-1}\left(i+{V}_{t,1}^{\left(i,j\right)},j+{V}_{t,2}^{\left(i,j\right)}\right)+{f}_{t+1}\left(i-{V}_{t,1}^{\left(i,j\right)},j-{V}_{t,2}^{\left(i,j\right)}\right)\right]$$
(10)

where (i, j) represents the pixel position and \(\left({V}_{t,1}^{\left(i,j\right)},{V}_{t,2}^{\left(i,j\right)}\right)\) represents the motion vector of V t at pixel position (i, j). The MC-FRUC interpolation accuracy mainly depends on the accuracy of V t; therefore a lot of works has focused on improving the performance of motion estimation and motion vector smoothing and mapping. For example, in [14], 3-D recursive search (3DRS) is adopted in motion estimation to reduce abnormal motion through implicit smoothing constraints; in [15], motion vector mapping is performed by using bidirectional motion symmetry assumption to avoid the problem of overlapping and holes; in [16], multi-hypothesis motion estimation is adopted to form a more accurate motion vector field by the motion vector fields of different densities. Different strategies are used to realize the motion estimation part, which can provide the motion vector field with different precision levels, so the interpolated frames with different visual qualities can be generated by Eq. (10). As shown in Fig. 3, using the MC-FRUC method in [14,15,16] to restore the 98-th frame of the Foreman sequence in CIF format, it can be seen that the multi-hypothesis motion estimation proposed in [16] provides a better interpolation quality and in [14, 15] proposed algorithm takes second place. Due to the actual demand of “fake real,” the high level of MC-FRUC is more popular and widely used in many video applications.

Fig. 2
figure 2

Framework of MC-FRUC

Fig. 3
figure 3

Subjective visual quality comparison of the 98-th frame in Foreman sequence interpolated with different MC-FRUC strategies

3 Proposed Forensics Algorithm

The flow of proposed forensics algorithm is shown in Fig. 4: First, the original video is subjected to MC-FRUC forgery to generate up-converted video, and attacks may also implement on it such as de-noising and compression; then, the Sobel operator is used to extract the edge feature of each video frame; and finally, analyzing the periodic variation of the edge feature of suspicious video over time to identify whether there is MC-FRUC forgery. The core of the proposed algorithm is the edge feature extraction and periodicity detection, which will be introduced in detail below.

Fig. 4
figure 4

Flow of the proposed forensics algorithm

3.1 Edge Feature Extraction

Sobel operator is a discrete differential operator that combines Gaussian smoothing and differential derivation, and it is often used to calculate the approximate gradient of the image [18]. Image edge represents the jump process of pixels, and the gradient is the way to measure the degree of jump. The large gradient value indicates the significant improvement of pixel value and reflects the distribution of edge features in the image. Let the original video sequence to be composed of L video frames with size M × N. For the t-th frame f t, using the Sobel operator to realize the edge detection in horizontal and vertical direction as follows:

$${\boldsymbol{g}}_h=\left[\begin{array}{lll} -1& 0& +1\\ -2& 0& +2\\ -1& 0& +1\end{array}\right]\ast {\boldsymbol{f}}_t$$
(11)
$${\boldsymbol{g}}_v=\left[\begin{array}{lll}-1& -2& -1\\ 0& 0& 0\\ +1& +2& +1\end{array}\right]\ast {\boldsymbol{f}}_t$$
(12)

where * denotes convolution operation. Geometrical average is calculated pixel by pixel for edge pixel values in horizontal and vertical direction as follows:

$${\boldsymbol{g}}_t=\sqrt{{\boldsymbol{g}}_h^2+{\boldsymbol{g}}_v^2}$$
(13)

where g t is the edge map of f t. Due to the pixel value of video sequences rapidly varying over time, each frame will contain different edge complexities. The edge complexity of g t is measured as follows:

$${\sigma}_t=\frac{M\times N}{\sum \limits_{m=1}^M\sum \limits_{n=1}^N{g}_t\left(m,n\right)}$$
(14)

From Eq. (14), if the edge feature occurs greatly in variation in the intra-frame, the σ t value is smaller; otherwise it is larger.

Due to MC-FRUC algorithm cannot completely reflect the motion trajectory between adjacent frames, there always are some motion abnormalities in MC-FRUC forgery. Moreover, there are always some artificial traces in the forged frames, which have a large impact on detection edge. Therefore, the edge complexity presents periodic mutations in the forged video. As shown in Fig. 5, for Foreman video, using Sobel operator to extract the edge map of unforged and forged video, and then the edge complexity is calculated. It can be seen that the edge complexity curve of unforged video varies steadily and slowly, while the edge complexity curve of forged video rapidly appears in periodic variation. That is why the periodicity of edge complexity curve can be regarded as a strong evidence to detect MC-FRUC forgery.

Fig. 5
figure 5

Edge complexity curves of the Foreman video sequence of unforged and forged (Note: The unforged video is the original video of 30fps, and the forged video is transferred to 30fps by the method proposed in [15].)

3.2 Periodicity Detection

By using spectrum analysis to detect the periodicity of edge complexity curve, it can realize the automatic identification of MC-FRUC forgery. Fast Fourier transform (FFT) is used to calculate the spectrum F k of the edge complexity curve σ t as follows:

$${F}_k=\mathrm{FFT}\left({\sigma}_t\right),k=0,1,\cdots, L$$
(15)

where FFT(•) denotes the FFT operator and L is the length of the noise standard curve. High-pass filter is used to suppress the direct current (DC) and low-frequency (LF) components of spectrum F k, as follows:

$${F}_k^h=\mathrm{HFP}\left({F}_k,d\right)=\left\{\begin{array}{cc}{F}_k& d<k\le L-d\\ {}0& 0\le k\le d,L-d<k\le L\end{array}\right.$$
(16)

where HFP(•) denotes high-pass filter operator, \({F}_k^h\) denotes high-frequency component, and d is the cutoff frequency. In order to highlight the spectrum peak, amplitude enhancement is performed on \({F}_k^h\) as follows:

$${S}_k=\sum \limits_{l=d+1}^{L-d}\left|{F}_k^h-{F}_l^h\right|,k=d+1,\cdots, L-d$$
(17)

As shown in Fig. 6, after the processing of Eqs. (16) and (17), the spectrum appears dense and small peaks in the unforged video, while the spectrum center appears a large peak in the forgery video whose amplitude is much higher than the surrounding peak. Thus, the appearance of the large peak proves that the spectrum of edge complexity curve has periodicity. However, it can be seen from Fig. 6b that there are still some small peaks around the large peaks. In order to filter out small peaks, S k can be disposed as follows:

  1. 1.

    Initialization: after the maximum value of the spectrum, S k is retained, and the remaining components are set to be 0; P k (0) is assigned, and the iteration variable is set to n = 1.

  2. 2.

    Calculate the mean value E (n−1)of P k (n−1).

  3. 3.

    Hard threshold shrinkage of P k (n−1) using E (n−1), as follows:

Fig. 6
figure 6

The spectrum of the edge complexity curve for the unforged and the forged video

$${P}_k^{(n)}=\left\{\begin{array}{cc}{P}_k^{\left(n-1\right)}& {P}_k^{\left(n-1\right)}\ge 0.8{E}^{\left(n-1\right)}\\ {}0& {P}_k^{\left(n-1\right)}<0.8{E}^{\left(n-1\right)}\end{array}\right.$$
(18)
  1. 4.

    If it satisfies

$$\left\{{P}_k^{(n)}\left|k=0,1,\cdots, L\right.\right\}\cap \left\{{P}_k^{\left(n-1\right)}\left|k=0,1\cdots, L\right.\right\}=\varnothing$$
(19)

then stop iteration, output P k = P k (n); otherwise make n = n+1 go to (2) to continue.

As shown in Fig. 7, after the above steps are performed, some small peaks are retained in the unforged video, while only a large peak is retained in the forged video. Therefore, if an abnormal large peak is detected in suspicious video, it can be proved that MC-FRUC forgery operation exists in this video. In order to realize automatic detection, the two spectrum states must be quantified. Thus, the forgery level value is designed as follows:

$$\mathrm{FV}=\frac{\operatorname{MAX}\left\{{P}_k\left|k=0,1,\cdots, L\right.\right\}}{J\times {E}^{(0)}}$$
(20)

where MAX{•} represents the maximum value of input set, J represents the peak number of P k, and E (0) represents the mean value of P k (0). As can be seen from Eq. (20), for the forged video, the peak value of P k is abnormally large and the number of peaks is extremely few, so its FV value is larger. On the contrary, for the unforged video, the peak value of P k is smaller and the number of peaks is much more, so its FV value is smaller. Thus, the FV value can be regarded as a quantitative indicator to determine whether there is MC-FRUC forgery, and automatic detection can be achieved by setting appropriate threshold value, that is,

$$\mathrm{MC}\hbox{-} \mathrm{FRUC}=\left\{\begin{array}{cc}\mathrm{on}& \mathrm{FV}\ge \mathrm{Thr}\\ {}\mathrm{off}& \mathrm{FV}<\mathrm{Thr}\end{array}\right.$$
(21)

where Thr is the decision threshold, on represents the existence of MC-FRUC operation, and off represents the inexistence of MC-FRUC operation.

Fig. 7
figure 7

The spectrum of edge complexity curve after filtering out the small peak interference for the unforged and the forged video

4 Experimental Results and Analysis

Based on 23 groups of CIF format and 30 fps testing video sequence, the negative set (NS) and positive set (PS) are constructed to evaluate the performance of the proposed algorithm. For the NS, 23 groups of video sequences are directly mixed in Gaussian white noise with standard deviation 0, 3, 5, 7, 9, and 11 to form a total of 138 groups of testing video sequences. For the PS, the video sequence of the NS is first down-converted to 15 fps, and then the MC-FRUC method proposed in [14,15,16] is adopted to tamper them to 30 fps to form 552 groups of testing video sequences. Firstly, the proposed algorithm is performed to obtain the distribution of FV values of test videos and to determinate the range of FV value of unforged and forged video. Then, the proposed algorithm is used to detect NS and PS, and its performance is evaluated. Finally, we evaluate the ability of the proposed algorithm to resist the attacks of de-noising and compression. The performance index adopts false-positive rate (FPR) and false-negative rate (FNR), respectively, which are defined as follows:

$$\mathrm{FPR}={R}_{\mathrm{NS}}/{N}_{\mathrm{NS}}$$
(22)
$$\mathrm{FNR}={R}_{\mathrm{PS}}/{N}_{\mathrm{PS}}$$
(23)

where R NS and R PS are the number of outliers in NS and PS, respectively and N NS and N PS are the number of test videos in NS and PS, respectively. Moreover, detection accuracy (DA) is defined as follows:

$$\mathrm{DA}=1-\frac{\mathrm{FNR}+\mathrm{FPR}}{2}$$
(24)

4.1 Performance Analysis

Figure 8 shows the average FV values of NS and PS under different standard deviation values of noise. As can be seen from Fig. 8, the FV values of NS and PS are significantly different, so the identification of the MC-FRUC can be achieved through hard threshold decision. Figure 9 shows the effects of compression and de-noising attacks on FV values of PS. It can be seen that after the compression and de-noising attacks, the FV value decreases, and it decreases in large amplitude especially for the de-noising attack. The lower the noise variance is, the smaller the FV value is. While the noise variance is larger, the FV value can still guarantee the higher value. Therefore, the proposed forensics algorithm can better resist the attacks of compression and de-noising.

Fig. 8
figure 8

Under different standard deviations of noise, the average FV value distribution of the NS and PS

Fig. 9
figure 9

Effects of compression and de-noising attacks on FV value

4.2 Detection Results

The threshold Thr is regarded as a critical decision value of the FV value to determine whether there is MC-FRUC forgery in suspicious video, and it is an important parameter to ensure high DA. In order to set the appropriate threshold Thr, the training video sequence of 23 groups of CIF format and 30 fps different from the testing video sequence are selected to form a training set of the same capacity as the test set, in which NS and PS are still constructed by the above method. Figure10 shows the probability distribution of the FV value of NS and PS in the training set. We can observe that the FV values of mostly NS samples are less than 3. While the distribution of FV values of the PS samples is more uniform and about 90%, the FV values of PS samples are greater than 3. Therefore, a more appropriate threshold Thr should be less than 3. Based on the above analysis results, we select several candidate thresholds in the range of [0.05, 3] with a step size of 0.05 and select the most appropriate threshold among them by cross-validation. First, the NS and PS of the training set are randomly divided into two subsets of the same capacity and nonoverlapping, respectively, one of which is used for training and the other for testing. For the training subset, all candidate thresholds are adopted to detect all samples so as to select threshold of the highest average DA. For the testing subset, the optimal threshold of training subset output is adopted to calculate the average DA of the testing subset. The above cross-validation scheme is executed 10 times, and the variations of threshold and average DA are shown in Fig. 11. It can be observed that the selected optimal threshold can ensure the average DA of test subset above 0.92 in each cross-validation, and the average DA reaches the maximum value when the threshold Thr is 2.4. Based on the cross-validation results, it is more appropriate to set the threshold Thr to be 2.4.

Fig. 10
figure 10

FV distribution of the NS and PS in training video sequence set

Fig. 11
figure 11

Cross-validation results

Table 1 shows the average DA of video sequences generated by different MCFI forgery methods. It can be seen from Table 1 that when the suspicious video is not subject to post-processing attack, the average DA reaches 100% under any standard deviation α of noise. This demonstrates that the detection edge complexity is an effective method to identify MC-FRUC forgery. After compression attack is performed on the test video in PS, the detection occur error under the condition of no noise. For example, for the forgery method in [14], the FNR value is 0.16, and the DA is 0.92. However, the average DA recovers to 100% when active noise is implemented, which indicates that the edge complexity can resist the adverse impact of compression attack on the detection for the noisy video. The Gaussian noise is effectively suppressed after the de-noising attack is implemented, which has a certain impact on the proposed algorithm taking the detection edge complexity as the core. In the case of no noise, the FNR value is up to 0.17, and the average DA is only 0.915. The average DA is improved after the noise mixed in, which indicates that the edge complexity can resist de-noising attack to a certain extent for the noisy video.

Table 1 Average detection accuracy of the proposed algorithm for different MCFI forgery methods

5 Conclusions

This chapter proposes a forensics algorithm that can identify MC-FRUC forgery by quantifying the variation of edge features in video frames. The proposed algorithm first using the Sobel operator to detect edges of video frame and then the edges is quantified to obtain the edge complexity of each frame. Finally, the periodicity of the edge complexity along time axis is detected, and MC-FRUC forgery is automatically identified by hard threshold decision. Experimental results show that the FV values of unforged videos are much lower than those of forged videos. When the decision threshold is set to be 2.4, the proposed algorithm shows a good detection performance on the NS composed of 138 video sequences and the PS composed of 532 video sequences. When the post-processing attack is not performed on testing videos, the average DA reaches 100%. The proposed algorithm can still maintain the stability of the detection accuracy after the attacks of de-noising and compression.