1 Introduction

Technologies such as digital storage and multimedia data compression have developed rapidly in recent years, and individuals can obtain the multimedia information they need more conveniently and efficiently [9, 37]. Along with the application of multimedia, video services are increasingly becoming an essential part of human life. Nevertheless, because of the openness and sharing characteristics of the network, video data are susceptible to attacks such as illegal copying and malicious access in the transmission process. Hence, setting up a high-efficiency video content protection scheme is essential for the sustainable growth of video commercialization.

Using encryption techniques to encrypt video data is an essential method for protecting the privacy and security of video data [8, 12, 31]. The encryption methods utilized in video encryption can be separated into two categories: traditional encryption algorithms and chaotic encryption algorithms. Although traditional encryption algorithms can ensure the security of video ciphertext information in practical video communication applications, they are computationally demanding and take up a large amount of arithmetic power in a hardware-implemented environment. For example, the RSA (Rivest-Shamir-Adleman) [3] algorithm mainly performs encryption operations by modulo power and multiplication. When a large amount of video data needs to be encrypted in real-time, the computational throughput of the RSA algorithm can be very low. Ting [24] uses AES (Advanced Encryption Standard) cipher to encrypt the data part of the video bit stream, which does not destroy the video data format but causes an increase in the video bit rate. Therefore, it is difficult for the traditional encryption algorithm to meet the real-time requirements of confidential video transmission.

Unlike traditional algorithms, chaotic cryptographic algorithms can generate real-time pseudo-random sequences. As a nonlinear deterministic system with high degrees of freedom, chaotic systems are characterized by initial value sensitivity, ergodicity, unpredictability and low computational complexity. Therefore, many researchers have proposed a close connection between chaos and cryptography [25], and chaos-based encryption algorithms have received significant attention [34, 35].

Qayyum [30] proposes a new image encryption algorithm based on two-dimensional Henon, Ikeda chaos mapping, and S-box (substitution box) transformation using chaos theory and dynamic substitution technique. The algorithm achieves encryption by selecting a random S-box and replacing image pixels through Henon. Hafsa [11] uses Henon chaos mapping to generate key streams and proposes an improved AES algorithm that improves obfuscation and diffusion in encrypted videos. El-Mowafy [7] proposes two encryption algorithms. The first algorithm is a robust video encryption algorithm based on chaotic mapping and random keys, which relies on the use of chaotic mapping to generate randomization in the original video I-frames. The second approach is a hybrid algorithm that combines steganography and cryptographic processes. In the first process, the original video's I-frames are hidden in the overlay video's I-frames. In contrast, the latter process is implemented by applying chaos mapping to the overlay I-frames generated by the steganography process.

The CCML (Cross-Coupled Map Lattice) model is a classic example of a spatio-temporal chaotic system [17], which can generate multiple sets of pseudo-random sequences simultaneously with excellent diffusion effects. The model displays excellent performance in the aspects of the Lyapunov exponent, difference properties, information entropy and randomness. In this paper, the chaotic pseudorandom sequences generated by an improved model of CCML, the integer dynamic cross-coupling tent mapping model, are applied to video encryption for generating the keystreams required for encryption.

Since the amount of unencoded video data is tremendous, direct indiscriminate encryption of the entire encoded video information would be time-consuming and severely damage the correlation between the video contents. The literature [19] proposed an encoding encryption scheme that performs an entire encryption operation on each frame of the original RGB video, thus preserving the format compatibility of the encrypted video. However, the use of this scheme severely alters the correlation between video pixels, which leads to a sharp increase in the bit rate of the compressed video and makes the video transmission less real-time. At the same time, the traditional data encryption of the encoded video file will destroy the format compatibility, which causes the decoder cannot to decode the ciphertext video properly. Xu et al. [46] used the chaotic stream cipher algorithm to encrypt the H.264 code streams completely, and the system has good real-time performance because the data volume of the H.264 video is much smaller than the original video, which significantly saves the encryption operation time. However, the scheme's indiscriminate encryption of H.264 streams led to the destruction of the encrypted video format, which cannot meet the format compatibility of the confidential transmission video.

Selective video encryption is to encrypt part of the critical information of the encoded video, and the encryption will not impact the encoded video format. Su et al. [36] proposed a random dislocation encryption method for MVD symbols by using the feature that two numbers of opposite signs with the same absolute value have the same word length after exponential Columbian encoding, which does not affect the code rate and has a good encryption effect. Di et al. [6] not only encrypted IPM (Intra Prediction Model) and MVD but also randomly scrambled the non-zero DCT coefficient symbols, guaranteeing the constant code stream and producing a better visual effect of encryption. Compared with full encryption methods, selective encryption schemes are a more effective way to achieve security, real-time and format compatibility for video encryption systems.

In this paper, we propose to generate chaotic pseudo-random sequences from an integer dynamic cross-coupled Tent mapping model for encryption with H.264/AVC video syntax elements. Considering the effectiveness and security of encryption, we selectively encrypt the element by combining the intra-frame prediction mode, trailing coefficient symbols, non-zero coefficient magnitude symbols, motion vector difference symbols and the information value of M bit in Exp-Golomb coding. The experimental results show that the encryption algorithm has good real-time performance and high efficiency, which completely meets the requirements of video encryption.

The main contributions of this paper are as follows.

  1. 1)

    The integer dynamic cross-coupling image lattice model is introduced, and its cross-coupling iteration is briefly analyzed. Also, a pseudo-random sequence generator is designed based on this model to generate high complexity and security key streams quickly.

  2. 2)

    Analyze the coding structure characteristics of H.264/AVC video coding protocol in intra-frame prediction, inter-frame prediction, entropy coding, and other links. And select multiple key syntax elements as encryption objects to realize multi-link encryption of the video coding process. This method avoids the impact of encryption operation on video compression coding efficiency and format compatibility on the one hand. It solves the problem of poor security of single-link selective encryption on the other.

  3. 3)

    Encryption effect and security analysis are carried out to test the security of the proposed video encryption algorithm. Comparing the performance parameters of the proposed encryption algorithm with the prior art results, we find that the proposed algorithm has a better visual encryption effect, higher encryption efficiency, and security.

Section 2 of this paper introduces the chaotic system of encrypting video data and the chaotic critical generation method. Section 3 explicitly describes the selective encryption scheme proposed in this paper. Section 4 provides an experimental analysis of the encryption algorithm from various perspectives, including security and encryption performance. The last section summarizes the whole paper and the future prospection.

2 Key generation algorithm

2.1 Integer dynamic cross-coupled tent map lattice model

2.1.1 Improved cross-coupled tent map lattice model

The CCML model is a typical example of spatiotemporal chaos. The iterative method of this model combines diffusion and partial reaction processes with excellent diffusion effects. The mathematical expression of CCML is as follows.

$$x_{n+1}(i)=\left\{\begin{array}{l}\frac1{\left(1+\varepsilon\right)}f\left[x_n(i)\right]+\frac\varepsilon{2\left(1+\varepsilon\right)}\left[x_n\left(i-1\right)\right]+x_n\left(i+1\right),\;\;i\;\text{is}\;even\\\frac1{\left(1+\varepsilon\right)}f\left[x_n(i)\right]+\frac\varepsilon{2\left(1+\varepsilon\right)}\left[x_{n+1}\left(i-1\right)\right]+x_{n+1}\left(i+1\right),\;i\;is\;\text{odd}\end{array}\right.$$
(1)

In Eq. (1), the value range of n is [1, N], and N represents the time dimension. The discrete lattice point coordinates i take values in the range [1, L] for integers, and L is the discrete system dimension. The coupling coefficient ε needs to satisfy 0 ≤ ε ≤ 1. The nonlinear mapping function f chosen for the CCML model is tent mapping. xn(0) = xn(L) and xn(L + 1) = xn(L) represents the boundary condition. The initial values are randomly selected in the interval [0,1].

The tent mapping [20] is defined as

$$F_\alpha:x_i=\left\{\begin{array}{lc}\frac{x_{i-1}}\alpha,&0\leq x_{i-1}\leq\alpha\\\frac{1-x_{i-1}}{1-\alpha},&\alpha<x_{i-1}\leq1\end{array}\right.,$$
(2)

where α is the mapping parameter. The CCML model combined with tent mapping enhances the diffusion and confusion rates during iteration, and the generated chaotic sequences have good statistical properties. However, the model metrics are strongly influenced by the parameters and the model needs to be optimized.

Eq. (3) takes Eq. (1) as a prototype, discarding the coupling coefficient ε, while introducing a constant term k. The tent mapping is chosen for the nonlinear function, and the lattice point values are restricted to the range (0,1) using the XOR operation to obtain an ICCTML (Improved Cross-Coupled Tent Map Lattice) model [4], as shown in Eq. (3).

$$x_{n+1}(i)=\left\{\begin{array}{lc}\left[f\left(x_n(i)\right)+f\left(x_n\left(i-1\right)\right)+f\left(x_n\left(i+1\right)\right)+k_i\right]\;\operatorname{mod}\;1,&i\;is\;\text{even}\\\left[f\left(x_n(i)\right)+f\left(x_{n+1}\left(i-1\right)\right)+f\left(x_{n+1}\left(i+1\right)\right)+k_i\right]\;\operatorname{mod}\;1,&i\;\text{is}\;odd\end{array}\right.$$
(3)

The meanings of n, i, L, and f in the above equations are the same as those in Eq. (1), and k is a constant term with the initial values obtained randomly in the interval of [0,1]. The ICCTML model generates sequences with high complexity and excellent security.

2.1.2 Integer dynamic cross-coupled tent mapping lattice model

The above models can only be implemented in the real number domain. And there are three problems in constructing a chaotic model in the real number domain: first, due to the limited storage space of the computer, there is a case of limited accuracy, in which the dynamics of the discrete chaotic system is not as good as that of the continuous system; second, the results generated by the chaotic system in the case of limited accuracy will be affected by the type of machine operated, and the type of computer language used. The third is that it affects the speed of computer implementation. It is also known from the literature [4, 42] that the variation of the parameters of the CCML model in the real number domain significantly impacts its cryptographic performance and can even cause complete failure of the cryptosystem. Now, we construct the IDCCTML (Integer Dynamic Cross-Coupling Tent Mapping Lattice) model using Eq. (3) as a prototype.

$$x{}_{n+1}(i)=\left\{\begin{array}{lc}\left[f\left(x_n(i)\right)+f\left(x_n\left(i-1\right)\right)+f\left(x_n\left(i+1\right)\right)+k_i\right]\;\operatorname{mod}\;2^\text{a},&i\;is\;\text{even}\\\left[f\left(x_n(i)\right)+f\left(x_{n+1}\left(i-1\right)\right)+f\left(x_{n+1}\left(i+1\right)\right)+k_i\right]\;\operatorname{mod}\;2^\text{a},&i\;is\;odd\end{array}\right.$$
(4)

In Eq. (4), the xn+1(i) represents the state value of the discrete system when it is n+1 in the time dimension and i in the space dimension; the integer model controls the state value in the range of [0, 2a] by modulo operation, where a is the system bit number. The f function adopts the IDTM (Integer Dynamic Tent Mapping) model. The IDTM model not only inherits the nonlinear property, uniform distribution property, and mapping irreversibility of the tent mapping model but also breaks the short-period problem existing in the integer tent mapping and avoids the conversion from the real to the integer domain. It can also quickly generate high-quality integer chaotic pseudo-random sequences [22]. The mathematical form of the IDTM model is shown in Eq. (5) and Eq. (6) [23].

$$f\left[x_n(i)\right]=\left\{\begin{array}{lc}2g_n(i)+1,&g\in\left[0,2^{a-1}\right)\\2\left(2^n-1-g_n(i)\right),&g\in\left[2^{a-1},2^a\right]\end{array}\right.$$
(5)
$${g}_n(i)=\left[{x}_n(i)+{k}_n(i)\right]\mathit{\operatorname{mod}}{2}^{\textrm{a}}$$
(6)

In Eq. (6), xn(i) ∈ [0,  2a − 1]; ki denotes the horizontal shift distance of the integer dynamic tent mapping, and kn(i) represents the dynamic parameters.

Table 1 shows the specific meanings of all variables from Eq. (1) to Eq. (6). The IDCCTML model is iterated by cross-coupling, in which a slight change in a lattice point affects the whole sequence as time increases. The IDCCTML model uses a two-way diffusion in the time and space dimensions for iteration, and the specific cross-coupling of the model is shown in Fig. 1. During the iteration, values of the discrete lattice points at even positions are generated from previous iterations. For odd positions, values of lattice points are generated by the previous step and even lattice points at the current iteration. This iterative approach ensures that the model is confounded and diffused over a minimum time horizon. So, the IDCCTML model can generate pseudo-random sequences rapidly and in parallel. Furthermore, the diffusion effect of lattice point coupling in its space and the confusion effect of nonlinear functions increase the complexity of sequences as well as the security of the system.

Table 1 Specific meanings of each variable
Fig. 1
figure 1

IDCCTML model iteration method

2.2 Key generation

The IDCCTML model is highly secure and can quickly generate integer chaotic pseudo-random sequences, which is ideal for H.264 video encryption. The process of producing integer chaotic pseudo-random sequences by the IDCCTML model is shown in Fig. 2 as follows. Firstly, the model parameters are initialized by taking the number of system lattice points L = 30, the number of system bits a = 32, the length of the sequence stream n = 2000000, and generating 30 random sequence primaries M1.

$${M}_1=\left\{{x}_1(1),{x}_1(2),\cdots, {x}_1(30)\right\}$$
(7)
Fig. 2
figure 2

Key generation process

In the second step, the initial values are brought into Eq. (6) and Eq. (5) and iterated through the IDTM model. Where the boundary conditions are satisfied by xn (0)= xn(L), xn(L+1)= xn(L). After that, the sequence F with good independent and uniform distribution properties is quickly generated.

$$F=\left\{{f}_2(1),{f}_2(2),\cdots, {f}_2(30)\right\}$$
(8)

In the third step, the sequence F is brought into Eq. (4), the IDCCTML model. Odd iterations are performed if i is odd, and even iterations are performed if i is even, generating in parallel the sequence M2 with high independence.

$${M}_2=\left\{{x}_2(1),{x}_2(2),\cdots, {x}_2(30)\right\}$$
(9)

In the fourth step, all sequences are generated after repeated iterations in the time and space dimensions. And five chaotic pseudo-random sequences Si are randomly selected as keys to encrypt the corresponding positions.

$${S}_i=\left\{{S}_1,{S}_2,{S}_3,{S}_4,{S}_5\right\}$$
(10)

Finally, decryption. The decryption process is the inverse process of the encryption process.

3 Video encryption solutions

3.1 H.264/AVC coding standard

H.264/AVC [21] is one of the most popular video coding standards widely used in video applications because of its low bit rate and high compression rate [1, 18]. H.264/AVC specifies several profiles. The common ones are baseline, main, and extended profile, supporting specific coding tools and application areas. Considering the real-time nature of the encryption system, the baseline profile [14] is chosen in this scheme. The baseline profile only supports inter-frame and intra-frame predictions with CALVC (Context-Adaptive Variable-Length Coding) and Exp-Golomb coding methods to encode the H.264 bitstream.

Each bit of data in the H.264 bitstream [27] is tightly connected to the video encoding and decoding process. Fig. 3 illustrates the codestream hierarchy principle of H.264 [48]. In the H.264 structure, video-encoded data is referred to as a frame. A frame comprises a slice or picture group, a slice comprises one or more macroblocks, and a macroblock comprises 16×16 YUV data. In the baseline profile, the frame type only consists of I-frames and P-frames, which have the following characteristics:

  1. 1)

    An I slice contains only I macroblocks.

  2. 2)

    A P slice can contain P and I macroblocks.

  3. 3)

    The I macroblock uses the pixels already decoded from the current slice as a reference for intra-frame prediction.

  4. 4)

    The P macroblock uses the previously encoded image as a reference for intra-frame prediction.

Fig. 3
figure 3

Schematic diagram of the code flow hierarchy

In the input codestream of the H.264 decoder, the basic unit of data is the syntax element, and the codestream is made up of syntax elements articulated in sequence. Each syntax element consists of a series of bits representing a particular physical meaning. Different types of macroblocks carry different syntax elements. In this paper, the intra-frame prediction mode (IPM), the trailing coefficient symbols (Tl), the amplitudes of the nonzero coefficients (Levels) symbols, the motion vector difference (MVD) symbols, and the information value of M bit in Exp-Golomb coding (INFO) are selected for selective encryption.

3.2 Encryption algorithms

The video coding encryption process for H.264/AVC baseline profile is shown in Fig. 4, and the video encryption process is given by Algorithm 1. The key generator randomly generates the pseudo-random sequence Si, S1 encrypts the intra-frame prediction mode (IPM), and S2 encrypts the syntax element as the MV prediction difference (MVD) in the inter-frame prediction mode. After obtaining the residual data by intra-frame prediction or inter-frame prediction during encoding, the residual data are encoded using CAVLC. The rest of the encoding parameters, except the residual data, are encoded using Exp-Golomb encoding. While encoding, key S3 encrypts the trailing coefficient sign bits (T1) of CAVLC encoding, key S4 encrypts the magnitude of non-zero coefficients (Levels) of CAVLC encoding, and key S5 encrypts the information bits (INFO) of Exp-Golomb encoding. Finally, the encrypted H.264 video stream is formed, and the encrypted video sequence is obtained by decoding the video stream directly.

Algorithm 1
figure a

video encryption algorithm

Fig. 4
figure 4

Encryption process

3.3 Specific encryption process

3.3.1 Intra prediction mode encryption

The IPM is an essential syntactic element in the video compression process. Encryption of the IPM will prevent the decoder from reconstructing the original image macroblock, thus achieving the encryption purpose of masking the video plaintext information. In the H.264/AVC coding standard, the intra 4×4 type macroblock is further divided into 16 prediction subblocks of 4×4 size. If the IPM of the current prediction sub-block is equal to the minimum of its upper and left adjacent prediction sub-blocks, only the IPM of the current sub-block needs to be recorded using the 1-bit field pre_intra 4×4 pred_mode_flag. Otherwise, an additional 3-bit field rem_intra 4× 4 pred_mode is required to represent it. Therefore, the IPM of the macroblock is encrypted by Eq. (11).

$${IPM}_{new}={IPM}_{org}\oplus {S}_1$$
(11)

In Eq. (11), ⊕ is the XOR operation, IPMorg and IPMnew denote the rem_intra 4×4 pred_mode syntax elements of the macroblock before and after encryption, respectively. S1 is any of the pseudo-random sequences generated by the model. This method effectively destroys the visual structure of the image of the video frame. It does not introduce any video bitrate impact since the length of the rem_intra 4×4 pred_mode syntax element is not changed after the heterodyne operation.

3.3.2 MVD symbolic encryption

The motion vector is an essential syntactic element for inter-prediction. Encryption of the sign bits of the MVD will corrupt the motion estimation and motion compensation of the video, thus affecting the reconstruction of the internally predicted image. Moreover, it can effectively prevent attackers from cracking the P- frames and B-frames but does not affect the code stream structure so that it can be encrypted. The encryption is as follows.

$$MVD\_{\mathit{\operatorname{sign}}}_{new}= MVD\_{\mathit{\operatorname{sign}}}_{org}\oplus {S}_2$$
(12)

The MVD_signorg and MVD_signnew are the sign bits of the motion vector difference before and after encryption, respectively, and S2 is any pseudo-random sequences generated by the model.

3.3.3 CAVLC encryption

In H.264, Context-Adaptive Variable-Length Coding (CAVLC) can dynamically select the code table used in coding according to the encoded syntax elements. Furthermore, updating the length of the trailing coefficient suffix at any time to obtain a very high compression ratio is an important stage of the coding process. CAVLC is used in the H.264 standard to encode the luminance and chrominance residual data of the 4 × 4 module with the following encoding process.

  1. 1)

    Coding the total number of nonzero coefficients (Total_Coeffs) and the number of trailing coefficients (Trailing_Ones).

  2. 2)

    Encode the sign of each trailing coefficient (Trailing_ones_sign_flag)

  3. 3)

    Encoding of nonzero coefficient amplitude (Level_prefix and Level_suffix) except for the trailing coefficients

  4. 4)

    Encoding the number of zeros before the final nonzero coefficient (TotalZeros)

  5. 5)

    Encode the number of zeros before each nonzero coefficient (Run_Before).

Encryption of syntax elements in the CAVLC encoding stage does not change the compression efficiency of the video. During the encoding process, the high-frequency part of the residual information of the luminance and chrominance macroblocks is represented by the Trailing_ones_sign_flag [33]. Therefore, encryption of Trailing_ones_sign_flag can directly affect the texture information of the video. However, it is not secure to encrypt only the Trailing_ones_sign_flag. Moreover, encrypting the magnitudes of non-zero coefficients (Levels) can destroy the correlation between adjacent macroblocks. Therefore, we choose to encrypt the above two syntax elements in the CAVLC stage so that the attacker cannot get the correct information from the ciphertext. More importantly, Trailing_ones_sign_flag and Levels are used to maintain the normality of the bitstream during the encoding process, so encrypting the above two syntax elements will not break the format specification of H.264/AVC encoding. The specific encryption method is as follows:

$$T{1}_{new}=T{1}_{org}\oplus {S}_3$$
(13)
$${Levels}_{new}=Levels_{org}\oplus S_4$$
(14)

Eq. (13) and Eq. (14) are methods for encrypting the T1 and the Levels, respectively. Where T1org and T1new are the signs of the trailing coefficients before and after encryption, Levelsorg and Levelsnew are the amplitudes of the nonzero coefficients before and after encryption, and S3 and S4 are any of the pseudo-random sequences generated by the model.

3.3.4 Exp-Golomb coding encryption

The standard structure of Exp-Golomb encoding is [M zeros] 1 [INFO], which consists of three parts, M zeros denote the 0 of M bits, the middle "1" is the sign bit, INFO is the information bit, and the length of the information bit is equal to M. The structure of the code word is shown in Table 2.

Table 2 Structure of Exp-Golomb code words

As can be seen from Table 2, the following two equations are available for the input data Code_Num.

$$M=floor\left(\log_2\left(Code_{Num}+1\right)\right)$$
(15)
$$INFO=Code_{Num}+1-2M$$
(16)

According to the above two formulas, the information bit length M and the information bit INFO of the bit string can be calculated, respectively. The floor () in the formula indicates the downward rounding function, and the key encryption method using M bits for encryption is shown below.

$$INFO_{new}=INFO_{org}\oplus S_5$$
(17)

In Eq. (17), INFOorg and INFOnew are the encoded information bits before and after encryption, and S5 is any one pseudo-random sequence generated by the model. For INFO encryption, the encryption process is shown in Fig. 5.

Fig. 5
figure 5

Exp-Golomb encoding encryption process

4 Experimental results and performance analysis

The experimental testing platform for software is the JM 8.6. A total of 8 video reference sequences with QCIF (176 × 144) resolution are selected to analyze this experiment's proposed selective encryption scheme. These videos contain different scenes, subjects, textures, colors (grayscale), resolutions, Etc., which can test the advantages and disadvantages of the encryption algorithm from multiple perspectives. These video sequences are coded with the structure of IPPP ......; only the first 40 frames of each video sequence are selected for testing. The quantization parameter QP is 28 by default, the entropy coding mode is selected as CAVLC, and the YUV sampling format of the video is 4:2:0.

4.1 Video encryption performance analysis

4.1.1 Analysis of the effect of subjective encryption

The subjective test method of video encryption effectiveness [38] is measured by scoring the video sequence with the observer's naked eye. Frame 10 of Foreman, frame 15 of Football, and frame 20 of City were selected as test images. Among them, the Foreman test sequence has a simple texture structure with local solid motion properties in the foreground and background; Football has fast-moving objects, and City has extremely complex texture information. Fig. 6 a, b and c show the original unencrypted images, Fig. 6 d shows the test results after the encryption scheme in this paper encrypts the video sequences, and Fig. 6 e shows the test results after the video sequences are encrypted and then decrypted. From the comparison graphs of the encryption results of these three video sequences, it can be seen that all the test video sequences become blurred after encryption, the pictures are distorted, and the viewers cannot understand the content of the encrypted images. Therefore, the encrypted images satisfy the subjective test security requirement of the video encryption effect.

Fig. 6
figure 6

Comparison of subjective effects

4.1.2 PSNR comparative analysis

The peak signal-to-noise ratio (PSNR) [44] is widely used to evaluate video quality objectively and is defined as

$$PSNR=10\log_{10}\frac{\left(2^n-1\right)^2}{MSE}$$
(18)
$$MSE=\frac{\sum\limits_{i=1}^N\sum\limits_{j=1}^M{\left({I}_0\left(i,j\right)-{I}_r\left(i,j\right)\right)}^2}{N\cdot M}$$
(19)

where MSE is the mean square error between the pixel values of plaintext frames and ciphertext frames, which is used to measure the quality of the video after encryption. n is the number of pixel bits of the video frame. In Eq. (19), I0(i,  j) and Ir(i, j) are the video frames at (i,  j) before and after encryption. Moreover, N and M refer to the width and height of the video frames, respectively. The image size is N·M.

The video has good invisibility when the value of PSNR is below 15 dB under normal conditions. Table 3 compares the PSNR values of the original images of eight different video sequences with the PSNR values of the Y component of the images encrypted using the encryption algorithm in this paper. Fig. 7 shows the comparative line graph of PSNR values [16]. The results show that the average PSNR using the encryption method in this paper is 11.67 dB, which is far from the luminance value of the original video and is lower than the two encryption methods proposed in the literature [18] and literature [29]. It indicates that the proposed encryption method proposed in this paper has a good effect, and the encrypted video is fuzzy and well protected.

Table 3 PSNR experimental results
Fig. 7
figure 7

PSNR results

4.1.3 SSIM analysis

The structural similarity index (SSIM) [28] is a measure of the structural similarity of an image, and its value can be used to measure video quality [32]. SSIM takes values in the range [0, 1] and approaches 1 when the two images are almost identical. The SSIM is calculated as follows:

$$SSIM\left(x,y\right)=\frac{\left(2\mu_x\mu_y+c_1\right)\left(2\sigma_{xy}+c_2\right)}{\left(\mu_x^2+\mu_y^2+c_1\right)\left(\sigma_x^2+\sigma_y^2+c_2\right)}$$
(20)

where x, y are the original video image and the encrypted video image, respectively; μx is the mean value of x; μy is the mean value of y. \({\sigma}_x^2\) is the variance of x;\({\sigma}_y^2\) is the variance of y; σxy is the covariance of x and y. c1 and c2 are the two constants used to maintain stability.

Table 4 and Fig. 8 gives the SSIM values of the eight test video sequences simultaneously compared with Li’s methods. The data in the line chart shows that the SSIM values of the pre-encryption videos are all greater than 0.9, and the SSIM after encryption is close to 0.2. The encrypted videos are far from the original videos in terms of subjective perception, indicating that the video images have been severely distorted and confused.

Table 4 SSIM experimental results
Fig. 8
figure 8

SSIM results

4.1.4 Compression ratio analysis

The encryption scheme in this paper selects the IPM, Tl, Levels, MVD, and INFO for selective encryption. The encryption process uses only a simple XOR operation, improving encryption speed and not affecting the video coding format and code length. So, the encryption method in this paper can maintain the bit stream length after encryption and does not affect the compression ratio of the video.

4.1.5 Time costs analysis

In the process of encrypting video, the proportion of encrypted data, the complexity of the encryption method, and the rate at which the sequence generator generates keys all influence the efficiency of real-time video transmission. Therefore, when designing encryption methods, every effort should be made to avoid the encoding time increasing. Our encryption scheme further reduces the time consumption of video encryption for video encoding by using less encrypted data. Simple dissimilarity operation while using integer dynamic chaos model as the pseudo-random sequence generator and ensuring the effectiveness of video encryption and video format compatibility. The encoding time consumption of the eight tested sequences is shown in Table 5 and Fig. 9.

Table 5 Time consumption analysis
Fig. 9
figure 9

Time consumption

As a result of the Table 5 and Fig. 9, the coding time consumption varies considerably from sequence to sequence. However, the overall coding time and encryption time consumption do not differ significantly, with an average increase of 1.171%. This situation is caused by the different compositions of individual frame macroblock prediction patterns in particular sequences. The number and length of model-generated pseudorandom sequences significantly impact the system's security and timeliness. As the number of encoded frames increases, the time to generate the pseudorandom sequence is fixed, so the total percentage of encoding time consumed will decrease.

4.2 Video encryption security analysis

4.2.1 Analysis of key stream security

  1. (1)

    Information Entropy Analysis

Information theory provides the notion of information entropy [2], which reflects the extent of disorder in a sequence. Meanwhile, the value of information entropy is proportional to the sequence's degree of chaos. Suppose the probability of the random variable X when taking values of {x1, x2, ..., xn} is p(xi) (i=1, 2, 3, ..., n), and the corresponding information entropy of X is calculated by Eq. (21).

$$H(X)=-\sum\limits_{i=1}^np_i\;\log_2\;p_i$$
(21)

Taking 28 as the length of the system, when the information entropy is maximum, there is pi=1/28. At this time, H(X)=8 is the theoretical maximum of the sequence information entropy. The information entropy of the dynamic integer logistic mapping model and the IDCCTM model with the different number of lattice points is calculated and shown in Table 6. The information entropy of both models is around the theoretical maximum for the same number of lattice points, which indicates that the confusion of both models is high. The IDCCTM model has a slightly larger information entropy than the integer dynamic logistic mapping model. It can be seen that the IDCCTM model has a better degree of confusion. It is observed that the IDCCTM model has a better degree of chaos, and the pseudo-random sequence flow generated by the model has good randomness.

Table 6 Contrast of the information entropy
  1. (2)

    Key Space Analysis

The key space comprises every key value. The greater the size of the key space, the more excellent its resistance against exhaustive attacks. It is currently accepted in the cryptographic community that a cipher space smaller than 2100 ≈ 1030 [10] is insecure. In the encryption algorithm designed in this paper, {x1(1), x1(2), ......, x1 (30), ki} are considered as keys, a total of 31 keys, each key length is 4bit, then the key space is 2992. Table 7 compares the key space of the literature [5, 41, 45]. Our method has the largest key space, indicating that this paper's encryption method can resist ciphertext-only attacks [26].

Table 7 Key Space Comparison

4.2.2 Key sensitivity analysis

Key sensitivity refers to the fact that the encryption effect produced by the system changes even when there is a slight change in the key. Key sensitivity is one method to measure video encryption algorithms' security. It is mainly used to analyze the ability of video encryption algorithms to resist all-out attacks. Suppose the original key S is used to encrypt the video test sequence, then S is changed by 1bit, and the changed key S' decrypts the video stream.

When the experiment selects a Soccer video test sequence with resolution 176 × 144, key S = 3214567890 and S' = 3214567892, the results are shown in Fig. 9. Fig. a shows the Bus ciphertext, Fig. b shows the image of decrypting the ciphertext using the correct key, and Fig. c shows the image of decrypting the ciphertext using the changing key S'. It can be seen that when decryption is performed using key S', the decrypted video is still blurred and unintelligible. It shows that our encryption algorithm has good key sensitivity to key changes and is fully resistant to all-out attacks Fig. 10.

Fig. 10
figure 10

Key sensitivity analysis

4.2.3 Replacement attacks

The replacement attack is a ciphertext-only attack, performed by replacing the encrypted data bits with constant bits and then observing whether the encrypted video is decrypted correctly. This experiment tests the substitution attack on the encrypted Bus sequence. Here, all encryption bits are replaced with 0 and 1, respectively, and the encrypted video is tested for decryption. The results are shown in Fig. 11. Obviously, the plaintext is not leaked when the encryption bits are replaced with 0 and 1.

Fig. 11
figure 11

Replacement attacks analysis

It is clear that the frames under the replacement attack are still blurry and invisible, demonstrating that our encryption scheme can effectively against the replacement attack.

4.2.4 Edge detection protection analysis

Edge detection can quantitatively analyze the effect of video encryption. First, the cipher text's color image to be detected is converted into a grayscale image. And then, the edge information in the image is calculated according to the selected detection algorithm to find the coordinates with a significant difference in gray value variation. The edge refers to the region's boundary in the image with a significant difference in gray value variation. If the encryption effect of the encryption scheme is not sufficient to mask the plaintext information, edge detection may detect the edge information in the ciphertext. Thus, the contour and texture features of the plaintext will be exposed.

We introduce the Gaussian Laplace operator for edge detection of images and calculate the edge difference ratio (EDR) according to the method in the literature [39] to quantitatively analyze the resistance of video encryption algorithms to edge detection attacks. The EDR is calculated as shown in Eq. (9):

$$EDR=\frac{\sum_{i,j=1}^N\left|P\left(i,j\right)-{P}^{\prime}\Big(i,j\Big)\right|}{\sum_{i,j=1}^N\left|P\left(i,j\right)+{P}^{\prime}\Big(i,j\Big)\right|}$$
(22)

where P (i, j) and P’ (i, j) are the edge detection pixel values of plaintext and ciphertext, respectively. The maximum value of EDR is 1. When the EDR value is close to 1, it indicates that the edge information of plaintext and ciphertext videos are widely disparate, and the edge of ciphertext video is blurred. This paper calculates the EDR values of eight test video sequences. Analyzing the experimental data in Table 8, we can see that the average EDR value of the eight test videos is about 0.8802, which indicates that the edge information of plaintext and ciphertext videos is vastly different.

Table 8 Numerical analysis of EDR

Meanwhile, Fig. 12 visualizes the plaintext and ciphertext edge information of the Soccer test videos. Fig. 12 shows that the edges of the ciphertext video are confusing and blurred and cannot correlate with the plaintext image's basic contour and texture features. Therefore, our video encryption scheme can effectively hide the encrypted video edge information.

Fig. 12
figure 12

EDR analysis

4.2.5 Histogram analysis

The histogram reflects the distribution of pixel values of an image [47]. For a good encryption algorithm, the histograms of the original and encrypted video frames should differ from resisting differential and statistical attacks. The histograms of the original and encrypted frames of the Bus test sequence are shown in Figs. 13 and 14. As can be seen from the figures, the histograms of the original and encrypted frames are entirely different, and the histograms do not provide any clues to the use of differential and statistical attacks. The experimental results show that the method has good performance against histogram attacks.

Fig. 13
figure 13

BUS plaintext histogram

Fig. 14
figure 14

BUS ciphertext histogram

4.2.6 Comparative analysis

In order to evaluate the proposed encryption scheme, we compare this scheme with four other recent video encryption schemes. The four other encryption schemes use different algorithms to encrypt different parameters. The comparison is focused on encrypted syntactic elements, format compatibility, calculation complexity, and bit increase. The comparison results are summarized in Table 9.

Table 9 Comparative analysis

Table 9 shows that some schemes incur code rate increases, and the key length is another significant factor affecting the key space. Obviously, our scheme has advantages in terms of code stream increase and key length. In particular, the encryption algorithm proposed in this paper does not change the bit rate, has low computational complexity, and conforms to the format specification of video coding. The pseudo-random sequence generated by the IDCCTML model has the advantages of ample key space and high randomness, which can resist brute force attacks with high security.

5 Conclusion

This paper proposes an efficient video encryption algorithm for the H.264/AVC video codec. The algorithm is divided into two parts: generation of pseudo-random sequences and selective encryption of syntax elements. As a result of considering the compatibility of coding formats and the efficiency of codecs, we generate pseudo-random sequences using the IDCCTML model and selecting IPM, T1, Levels, MVD, and INFO as encryption domains for encryption. The advantages of this selective encryption algorithm include ample key space, resisting all-out attacks, good key sensitivity and having no effect on video code rate, proved by the experiments. The scheme has shown good performance in both subjective and objective analysis and security analysis.

The encryption scheme proposed in this paper can hide all the information in the video. However, some information or specific video regions need to be encrypted in some specific application scenarios. Therefore, the subsequent work will combine with video target tracking technology [15] to design intelligent encryption algorithms to achieve the purpose of encryption protection for specific contents. Video encryption can prevent unauthorized users from accessing clear video content, but when the video is decrypted, it may face problems such as piracy, content tampering, illegal copying, Etc. Therefore, designing a protection mechanism that combines video encryption and watermarking is also a focus of future research.