1 Introduction

In spite of the fact that the Internet is utilized as well-known venues for users to access desired data, it has likewise opened another entryway for attackers to obtain precious and intellectual information of other users with little exertion [11]. Steganography has functioned in a complementary capacity to offer a protection mechanism that prevents eavesdroppers from any ongoing communication between an authorized transmitter and its recipient [23]. Steganography is characterized as the art of concealing secret information in specific carrier data, establishing covert communication channels between official parties [105]. Subsequently, a stego object (steganogram) should be same as an original data that has the same statistical features. Carrier data is also referred as cover or host data [61, 84]. Carriers can be acknowledged in various forms such as text, audio, image, and video. A hidden message can also appear in any form of data such as such as text, audio, image, and video [24, 59]. The primary objective of steganography is to remove any hacker’s suspicion to the transmission of hidden messages and provide security and anonymity for legitimate parties. Simple way to observe the steganogram visual quality is to determine its accuracy which is achieved through the human visual system (HVS). The HVS cannot identify slight distortions in steganogram, thus avoiding suspiciousness [97]. However, if the size of the hidden message in proportion with the size of the carrier object is large, then the steganogram’s degradation will be visible to the human eye resulting in a failed steganographic method [34]. Figure 1 represents the general model of steganographic method.

Fig. 1
figure 1

General diagram of the steganography method

Embedding efficiency, hiding capacity, and robustness are the three major requirements incorporated in any successful steganographic methods [19, 26]. First, embedding efficiency can be determined by answering the following questions [68, 83]: 1) How safe is the steganographic method to conceal the hidden information inside the carrier object? 2) How precise are the steganograms’ qualities after the hiding procedure happens? and 3) Is the secret message undetectable from the steganogram? In other words, the steganographic method is highly efficient if it includes encryption, imperceptibility, and undetectability characteristics. The high efficient algorithm conceals the covert information into the carrier data by utilizing some of the encoding and encryption techniques prior to the embedding process to enhance the security of the algorithm [22, 90].

Obtained steganograms with low alteration rate and high quality do not draw the hacker’s attention, and thus will avoid any suspicion to the sending of covert information. If the steganography method is more effective, then the steganalytical detectors will find it more challenging to detect the hidden message [21, 89].

The hiding capacity is the second fundamental requirement which permits any steganography method to expand the size of hidden message taking into account the visual quality of steganograms. The hiding capacity is the quantity of the covert messages needed to be inserted inside the carrier object. In ordinary steganographic methods, both hiding capacity and embedding efficiency are contradictory [13, 90]. Conversely, if the hiding capacity is expanded, then the quality of steganograms will be diminished which decreases the algorithm’s efficiency. The embedding efficiency of the steganographic method is directly affected on its embedding payload. To expand the hiding capacity with the minimum alteration rate of the carrier object, many steganographic methods have been presented using different strategies. These methods utilize linear block codes and matrix encoding principles which include Bose, Chaudhuri, and Hocquenghem (BCH) codes, Hamming codes, cyclic codes, Reed-Solomon codes, and Reed-Muller codes [20, 113].

Robustness is the third requirement which calculates the steganographic method’s strength against attacks and signal processing operations [42]. These operations contain geometrical transformation, compression, cropping, and filtering. A steganographic method is robust when the recipient obtains the hidden information accurately, without any flaws. High efficient steganography methods withstand against both adaptive noises and signal processing operations [69, 110]. Recently, a large number of video steganography techniques have been proposed in the literature. Unfortunately, the literature lacks of video steganography survey articles. Therefore, this leads to present an extensive study that includes all video steganography techniques for the past decade. This paper unlike others provides a comprehensive survey and analysis of the state-of-the-art video steganography methods in both compressed and raw domains. In addition, this survey not only investigates the existing video steganography techniques but also provides recommendations and future directions to enhance those methods. The remaining parts of the paper are organized as follows: Section 2 explains steganography versus cryptography and watermarking. A comprehensive study and analysis of the state-of-the-art video steganography methods in compressed and raw domains is given in 3. Section 4 presents some of well-known performance assessment metrics. Section 5 summarizes the key findings of this survey, advises some recommendations to improve the existing methods, and suggests future research directions.

2 Steganography versus cryptography and watermarking

The common objective of both steganography and cryptography is to provide confidentiality and protection of data. The steganography “protected writing” establishes a covert communication channel between legitimate parties; while the cryptography “secret writing” creates an overt communication channel [1]. In cryptography, the presence of the secret data is recognizable; however, its content becomes unintelligible to illegitimate parties. In order to increase additional levels of security, steganography and cryptography can operate together in one system [14, 63].

Digital watermarking techniques use a preservation mechanism to protect the copyright ownership information from unauthorized users. This process is accomplished by concealing the watermark information into overt carrier data [40]. Like steganography, watermarking can be used in many different applications such as content authentication, digital fingerprints, broadcast monitoring, copyright protection, and intellectual property protection [28, 29, 40, 49, 87]. Different watermarking techniques can be found in the literature [4, 8, 33, 41, 48, 50, 51, 85, 88]. Table 1 shows the general similarities and differences between steganography, cryptography, and watermarking techniques.

Table 1 Comparison of steganography, cryptography, and watermarking techniques

3 Video steganography techniques

Due to the advancement of Internet and multimedia technologies, digital videos have become a popular choice for data hiding. The video data contains a massive amount of data redundancy which can be utilized for embedding secret data. Recently, there are many useful applications of video steganography techniques such as video error correcting [47, 70, 71, 86, 109], military services [81], bandwidth saving [67, 96], video surveillance [62, 72, 112], and medical video security [73, 74, 76]. Video steganography techniques are classified into compressed and uncompressed domains. Figure 2 clarifies the hierarchy of the overall system security including video steganography, which is the main focus of this survey.

Fig. 2
figure 2

Disciplines of overall system protection. The red color indicates the focus of this study

3.1 Video steganography techniques in compressed domain

The H.264 standard has increased the efficiency of video compression when compared to the previous versions. Some new features of H.264 video codec include flexible macroblock ordering, quarter-pixel interpolation, intra prediction in intra frame, deblocking filtering post-processing, and multiple frames reference capability [52, 94, 106, 108]. Usually, H.264 codec comprises a number of group of pictures (GOP). Every GOP includes three types of frames: intra (I) frame, predicted (P) frame, and bidirectional (B) frame. During the video compression process, the motion estimation and compensation processes minimize the temporal redundancy. Since the video stream is a number of correlated still images, a frame can be predicted by using one or more referenced frames based on the motion estimation and compensation techniques. First, frames are divided into 16 × 16 macroblocks (MB) wherein each MB contains blocks that may include the smallest size of 4 × 4. When applying a few searching algorithms, block C in the present frame is compared, individually, to one of the selected block \( \overset{\sim }{R} \) in the referenced frame\( \overset{\sim }{F} \)in order to find a corresponding block C. The prediction error between two blocks (C and\( \overset{\sim }{R} \)) of size b can be measured using sum of absolute differences (SAD).

$$ e= SAD\left(C,\tilde{R}\right)={\displaystyle \sum_{1\le i,j\le b}\left|{C}_{i,j}-{\tilde{r}}_{i,j}\right|} $$
(1)

Where \( {c}_{i,j}\ and\ {\overset{\sim }{r}}_{i,j} \) refer to block values. The best matched block will have a minimum SAD using C‘s prediction denoted by\( \overset{\sim }{P} \). The motion vector (MV) and differential error \( D=C-\overset{\sim }{P} \) are required for the coding process. Video steganography techniques in compressed domain are categorized according to the video coding stages as venues for data hiding such as intra frame prediction, inter frame prediction, motion vectors, transformed and quantized coefficients, and entropy coding. Figure 3 illustrates the H.264 video codec standard indicating some venues for information hiding.

Fig. 3
figure 3

H.264 hybrid video codec standard shows venues for data hiding

3.1.1 Video steganography techniques in intra frame prediction

During the video compression process, the macroblocks are encoded using a number of intra prediction modes. In H.264 codec, the numbers of intra prediction modes are nine of 4 × 4 blocks and four of 16 × 16 blocks which are illustrated in Fig. 4 and Fig. 5, respectively. Also, the high efficiency video coding (HEVC) codec can support up to 35 intra prediction modes for each 64 × 64, 32 × 32, 16 × 16, 8 × 8, and 4 × 4 block sizes as shown in Fig. 6. For data concealing purposes, these modes can be mapped to one or more of secret message bits. Liu et al. [53] presented a new secure data hiding technique which performs entirely in a compressed domain. The framework of this algorithm consists of four stages. First, in the video sequences parser stage, the video sequences are coded, and discrete cosine transform (DCT) coefficients are obtained. In addition, the motion vectors, and the intra coded macroblocks are acquired. In the second stage, scene detection is performed on the consecutive intra frames to identify the fluctuation scenes. The fluctuation scene is identified using a histogram variation of DC coefficients within intra frame DCT coefficients. In the third stage, the embedding process is achieved using only intra frames of fluctuation scenes. The last stage is called video steganalysis. Here, the security level of the stego video is statistically measured to determine whether it is high or low. If the stego video cannot be passed by the steganalysis, then it will adjust the scale factor to make it stronger. The algorithm introduced by Liu et al. has limited capacity for hidden data because the fluctuation scenes of intra frames are only used for data embedding.

Fig. 4
figure 4

H.264 intra prediction modes for 4 × 4 blocks

Fig. 5
figure 5

H.264 intra prediction modes for 16 × 16 blocks

Fig. 6
figure 6

The 35 HEVC intra prediction modes [111]

Chang et al. [9] presented a data concealing algorithm using HEVC utilizing both DCT and discrete sine transform (DST) methods. In this scheme, HEVC intra frames are used to conceal the hidden message without propagating the error of the distortion drift to the adjacent blocks. Blocks of quantized DCT and DST coefficients are selected for embedding the secret data by using a specific intra prediction mode. The combination modes of adjacent blocks will produce three directional patterns of error propagation for data hiding, consisting of vertical, horizontal, and diagonal. Each of the error propagation patterns has a range of intra prediction modes that protect a group of pixels in any particular direction. The range of the modes begin at 0 and ends at 34. Chang et al.‘s algorithm lacks the embedding payload because the selection of blocks for the embedding process must meet certain conditions. Similarly, both Hu et al. [31] and Zhu et al. [115] presented data hiding methods using intra prediction modes for H.264/AVC. During the intra frame coding process, the secret message is embedded into the 4 × 4 luminance block. These algorithms utilize the 4 × 4 intra prediction modes in order to hide one bit of secret information per block. The 4 × 4 intra prediction modes are divided into two subsets based on the predefined mapping rules between the secret message and intra prediction modes in order to embed 0 or 1 of the secret message bits. Table 2 illustrates the mapping rule of 4 × 4 intra prediction modes of the Hu et al. method, which shows that each most probable mode and its candidate modes mapped to 0 or 1. Both Hu et al. and Zhu et al. methods achieve a negligible degradation of video quality as well as a small increase on the bit rate. In general, the steganographic techniques that use the intra frame prediction as venues for data hiding have low capacities to embed secret messages.

Table 2 Mapping rules of 4 × 4 intra prediction modes [31]

3.1.2 Video steganography techniques in inter frame prediction

In many video steganography methods, the seven block sizes that include 16x16, 16x8, 8x16, 8x8, 8x4, 4x8 and 4x4 of H.264 inter frame prediction are commonly utilized as a venue to embed the secret message by mapping each block type to a number of secret bits. Kapotas et al. [36] proposed a data concealing algorithm for scene change detection in H.264 coding. This method uses four different block sizes. Each one is mapped onto one pair of a secret message. In this algorithm, the secret message consists of scene change frames information that will be embedded into the encoded videos. This embedded information will help the scene change detection algorithm, in H.264 video stream, functioning in real time. However, the data hiding methods of the intra frame prediction have a very limited embedding capacity. For example, let “NY” is the secret information that must be embedded into the inter frame prediction blocks in H.264 codec. By using mapping rules of different block sizes the embedding goal can be achieved. Figure 7 illustrates the embedding process using mapping rules.

Fig. 7
figure 7

Using mapping rules for prediction block type to conceal “NY” characters

3.1.3 Video steganography techniques in motion vectors

Motion vector characteristics such as horizontal and vertical components, amplitude, and phase angles are utilized in embedding secret information. Xu et al. [107] proposed a compressed video stream steganography. In this scheme, the embedding process relies on I, P, and B frames. First, the hidden data is concealed into the motion vectors of, both, P and B frames. Only the motion vectors that have a high magnitude are chosen. Here, each macroblock has a motion vector; however, the selected macroblocks are moving rapidly. Secondly, the control information is embedded into I frames. This control information includes the capacity payload and segment range of each GOP. Each GOP contains one I frame which carries the control information necessary for the data extraction phase. In addition, each GOP has a number of P and B frames which contain secret messages in their high magnitude motion vectors. Xu et al.‘s method has a low embedding payload because it only used the motion vectors with a high magnitude. Pan et al. [78] presented a new steganography method in the H.264 video standard based on the motion vectors and linear block codes. The embedding process is achieved by using motion vectors of inter frames macroblocks, and, then discarding the surrounding macroblocks. By using a predefined threshold, a group of motion vectors are selected in each video inter frame. The values (0 or 1) of selected motion vectors (MV r ) are obtained by calculating the phase angles (φ) illustrated in Fig. 8. By definition, phase angles are the arctangents of both vertical (MV v ) and horizontal (MV h ) motion vectors’ components as given in Eq. 2.

$$ \varphi =\mathit{\arctan}\left(\frac{M{V}_v}{M{V}_h}\right)\ \left({0}^{{}^{\circ}}\le {\varphi}_i<{360}^{{}^{\circ}}\right) $$
(2)
Fig. 8
figure 8

Motion vector representation in [78]

Once the MV r  values are obtained, the hidden information is concealed into the motion vector array utilizing the linear block code principle. The reason for using the linear block codes is to minimize the motion vectors’ alteration rate and increase embedding capacity. The results of this algorithm have demonstrated that in every 6 bits of motion vector array, 4 bits of the secret data can be hidden. The pick signal to noise ratio (PSNR) of the obtained stego videos is 37.45 dB, which is proven by reducing alteration rate of motion vectors. However, this method has a limited hiding capacity due to it is based on the number of motion vectors. The data concealing and extracting phases of the Pan et al. method are illustrated in Eq. 36 as follows:

$$ SY=M{V}_r{H}^T $$
(3)
$$ b=SY\oplus S $$
(4)
$$ M{V}_r^w=M{V}_r\oplus {E}_b $$
(5)
$$ S\hbox{'}=M{V}_r{H}^T\oplus {E}_b{H}^T $$
(6)

Where S and S ' are embedded and extracted messages. MV r  and\( M{V}_r^w \) are original and stego selected motion vectors. SYE b , and H Tare syndrome, coset leader of b, and transpose of parity check matrix, respectively [78]. Comparatively, Bin et al. [7] presented a new data concealing algorithm using the motion vector and matrix encoding processes. The naked eye can realize the difference that happens when the object moves tardily, while if the object transfers rapidly, then the change will be unnoticeable. The motion vectors that have large amplitudes are produced from the macroblocks that move quickly. The sizable motion vectors will be utilized for concealing the hidden message. The selected motion vectors for data embedding include two properties: 1) the motion vector’s amplitude must be greater than the predefined threshold T; and 2) both the vertical and the horizontal motion vector components must not be equal. Moreover, the best component (MV w ) of both the vertical (MV v ) and the horizontal (MV h ) motion vectors are chosen based on their phase angles (θ). Then, the process of hiding the secret message is performed using matrix encoding, reducing the modification rate of selected motion vectors. The least significant bit (LSB) of the selected motion vectors (MV w_LSB ) is utilized for embedding secret bits. The average PSNR of the stego videos is 38.18 dB [7]. However, this algorithm has a low embedding capacity because the selected motion vectors have restricted conditions. The embedding stage of the algorithm introduced by Bin et al. can be carried out as follows:

$$ M{V}_w=\left\{\ \begin{array}{c}\hfill M{V}_h\kern0.5em ;0\le \uptheta <\uppi /4\hfill \\ {}\hfill M{V}_v\kern0.5em ;\uppi /4\le \uptheta <\uppi /2\ \hfill \end{array}\right. $$
(7)
$$ \uptheta = \arctan \left|M{V}_v/M{V}_h\right| $$
(8)
$$ M{V}_{w\_LSB}=\left\{\ \begin{array}{c}\hfill unchanged\kern0.5em ; if\ M{V}_w=0\ \hfill \\ {}\hfill 1\kern0.5em ; if\ M{V}_{w\_LSB}=0\ and\ M{V}_w\ne 0\hfill \\ {}\hfill 0\kern0.5em ; if\ M{V}_{w\_LSB}=1\ and\ M{V}_w\ne 0\hfill \end{array}\right. $$
(9)

In a different work, Jue et al. [35] designed a new algorithm for H.264/AVC video steganography using motion vectors as cover data. In this scheme, the luminance macroblocks for inter frames (P and B) video coding is used. Using a predefined threshold, the motion vectors with a large magnitude will be selected, while the motion vectors of slow objects will be discarded. Then, the hidden data bits will be concealed into the difference of both horizontal and vertical components for the selected motion vectors. This algorithm has improved the utilization ratio and the embedding efficiency. The modified motion vector’s feature (\( {\hat{\mathrm{P}}}_{\mathrm{i}} \)) including the secret message can be calculated as follows:

$$ {\widehat{P}}_i=\left\{\ \begin{array}{c}\hfill \mathit{\mod{}}\left[\left|{V}_{dx}\right|-\left|{V}_{dy}\right|,2\right]\kern0.5em ; if\ {P}_i={S}_i\hfill \\ {}\hfill\ \hfill \\ {}\hfill \mathit{\mod{}}\left[\left|{V}_{dx}+0.25\right|-\left|{V}_{dy}\right|,2\right]\kern0.5em ; if\ {P}_i\ne {S}_i\ and\ \hfill \\ {}\hfill\ \left|{V}_{dx}\right|-\left|{V}_{dy}\right|\ge 0\hfill \\ {}\hfill \mathit{\mod{}}\left[\left|{V}_{dx}\right|-\left|{V}_{dy}+0.25\right|,2\right]\kern0.5em ; if\ {P}_i\ne {S}_i\ and\ \hfill \\ {}\hfill\ \left|{V}_{dx}\right|-\left|{V}_{dy}\right|<0\hfill \end{array}\right. $$
(10)

P i and S i are motion vector features and secret message bits. V dx and V dy are horizontal and vertical motion vector components, respectively. However, Jue et al.‘s scheme is limited to the embedding payload due to the high value of the predefined threshold. Commonly, the steganographic techniques that utilize the motion vectors as carrier objects to hide the secret messages, have low embedding capacities. Moreover, a high modification rate on the motion vectors will negatively influence the quality of the stego videos.

3.1.4 Video steganography techniques in transform coefficients (DCT, QDCT, and DWT)

The DCT, quantized discrete cosine transform (QDCT), and discrete wavelet transform (DWT) coefficients of the luminance component are also good candidates to conceal the secret message due to their low, middle, and high frequency coefficients for data embedding. Huang et al. [32] presented reliable information bit hiding using the DCT and communication theory. In order to enhance the robustness of this method, the BCH codes and soft-decision decoding have been used. Moreover, the robustness is also achieved by testing both the common signal processing operations and a StirMark attack. The secret data is hidden into the DCT coefficients, especially, in DC with the highest energy coefficient and low-frequency AC coefficients. Barni et al. [5] presented a watermarking technique of MPEG-4 video coding based on the video object planes. This scheme hides the watermark information into the selected inter and intra macroblocks of each video object. Depending on the computed frequency mask, the DCT coefficients that are greater than the predefined threshold are chosen for the embedding process. Barni’s is flexible and easy to use for many applications. Moreover, it is robust against some common signal processing. Additionally, Shahid et al. [93] proposed a reconstruction loop for information embedding of intra and inter frames for H.264/AVC video codec. This method embeds the secret message into the LSB of QDCT coefficients. Only non-zero QDCT coefficients are chosen for data hiding process, utilizing the predefined threshold which directly depends on the size of secret information. Edges, texture, and motion regions of intra and inter frames are utilized in the concealing process. Shahid et al.‘s algorithm extracts the hidden message easily and maintains the efficiency of compression domain. On the other hand, Thiesse et al. [100102] presented a steganography of motion data in the chrominance and luminance of video frame components. In order to control the modification of the sum bitrate in the H.264 codec, the motion vector indices are embedded into the selected DCT coefficients of both luminance and chrominance components. In addition, the hidden indices minimize the distortion drift propagation of the prediction process to the next frames utilizing the rate-distortion optimization. The summation of the selected QDCT coefficients (\( {S}_i^w \)) is modified as follows:

$$ {S}_i^w=\left\{\begin{array}{c}\hfill {S}_i\kern0.5em ; if\ \left|{S}_i\right|\ \mathit{\mod{}}2={I}_i\hfill \\ {}\hfill {S}_i+{m}_i\kern0.5em ; if\ \left|{S}_i\right|\ \mathit{\mod{}}2\ \ne\ {I}_i\hfill \end{array}\right. $$
(11)
$$ {S}_i={\displaystyle \sum_{n=1}^N}{a}_n $$
(12)

Where a n represents quantized coefficients, and S i represents the summation of quantized coefficients of the i th block. I i is the prediction index and m i represents shifted coefficients. Meuel et al. [64] proposed information concealing in H.264 codec for lossless reconstruction of the region of interest (ROI). This method protects the facial features of video stream by embedding facial regions into the DCT coefficients. Two LSBs of non-zero QDCT coefficients are utilized to embed the facial information. Only the skip mode is used during inter coded prediction of the ROI. Both DC and AC DCT coefficients of ROI macroblocks are set to 1 and 0, respectively, in order to guarantee predicting the original ROI macroblocks during the decoding process. The facial pixels are determined as skin pixels if the Euclidean distance is lower than the predefined threshold value d using the following formula:

$$ \sqrt{{\left({P}_u-Re{f}_u\right)}^2+{\left({P}_v-Re{f}_v\right)}^2} < d $$
(13)

Where P u and Ref u are the Cb and its reference components, respectively, P v and Ref v are the Cr and its reference components, respectively. The suggested method of Meuel et al. achieved a high quality of the region of interest based on the lossless reconstruction. In a different work, Yilamz et al. [109] proposed error concealment of video sequences by steganography. In the first stage, this method detects the location of the error which is the macroblocks that have been damaged. In the second stage, when the error location has been found, the distortion drift direction must be reversed, avoiding error propagation from other macroblocks. In the third stage, the reconstruction of the damaged macroblocks values is performed to fulfill successful error concealment. In Yilamz et al.‘s algorithm, the edge information is hidden into the non-zero QDCT coefficients with the maintained bit-rate and channel utilization, and improved video quality. Later, Li et al. [44] proposed recoverable privacy protection for the video content distribution. This method utilizes DWT sub-bands of the region of interest in order to generate both a hidden message and a carrier. The middle and high frequency DWT coefficients are considered as carrier data, while the low frequency DWT sub-band is considered as secret information. The process of embedding is applied only on the luminance component. Additionally, Stanescu et al. [96] presented a video steganography algorithm called “StegoStream”, which embeds the subtitle messages into the MPEG-2 video streams without using an extra bandwidth. In MPEG-2 compressed videos, the intra frames are self-dependent frames. Only the intra frames are used for the embedding process to hide video subtitles as secret messages. After the quantization process and the necessary predefined threshold T has been reached, the number of blocks are selected for data hiding. The LSBs of the non-zero DCT coefficients of the selected blocks that do not match with the hidden information bits alter; otherwise, the LSBs of the non-zero DCT coefficients remain unchanged. The video subtitle must appear in certain time. However, choosing an inconvenient threshold will cause the video subtitle to appear on the screen, incorrectly. Moreover, since the common MPEG-2 videos have 4–5 intra frames every one second, the video subtitles will not repeat continuously. Moreover, Li et al. [45] proposed a new algorithm for H.264 video steganography. During the video coding process, the quantized coefficients in each 4 × 4 luminance of inter frame macroblocks are used for embedding the secret message. The majority zero values of quantized coefficients are located on the bottom-right corner because it is a high frequency region. Conversely, the majority of non-zero values of quantized coefficients belonging to low frequency band are located on the top-left corner. An array of inverse zigzag scan mode equaled to every 16 quantized coefficients will be produced in order to obtain the last non-zeros more efficiently. Using a predefined threshold T (0–15), based on the scan point, the last non-zero coefficient is selected in every macroblock. Depending on the parity of odd and even, the secret message of 1-bit per block is concealed. If the hidden bit is 1, then the selected DCT coefficients (V) modifies as follows:

$$ \widehat{V}=\left\{\begin{array}{c}\hfill\ V\kern0.5em ; if\ V\mathit{\mod{}}2=1\hfill \\ {}\hfill V-1\kern0.5em ; if\ V\mathit{\mod{}}2=0\left| and\ V>0\right.\ \hfill \\ {}\hfill V+1\kern0.5em ; if\ V\mathit{\mod{}}2=0\ \left| and\ V<0\right.\hfill \end{array}\right. $$
(14)

Otherwise, the selected DCT coefficients (V) are modified as follows:

$$ \widehat{V}=\left\{\ \begin{array}{c}\hfill V\kern0.5em ; if\ V\mathit{\mod{}}2=0\hfill \\ {}\hfill V+1\kern0.5em ; if\ V\mathit{\mod{}}2=1\ \left| and\ V>0\right.\hfill \\ {}\hfill V-1\kern0.5em ; if\ V\mathit{\mod{}}2=1\ \left| and\ V<0\right.\hfill \end{array}\right. $$
(15)

Li et al.‘s method has limited data embedding payload because the selected blocks embed only one bit per 4 × 4 block. Correspondingly, both Ma et al. [60] and Liu et al. [54] presented a video data hiding for H.264 coding without having an error accumulation in intra video frames. In the intra frame coding, the current block predicts its data from the encoded adjacent blocks, specifically from the boundary pixels of upper and left blocks. Thus, any embedding process that occurs in these blocks will propagate the distortion, negatively, to the current block. In addition, the distortion drift will be increased toward the lower right intra frame blocks. To prevent this distortion drift, authors have developed three conditions to determine the directions of intra frame prediction modes. The 4 × 4 blocks have nine prediction modes (0–8) and 16 × 16 blocks have four prediction modes (vertical, horizontal, DC, and plane). In the 4 × 4 block, the first condition is the right mode {0, 3, 7}; the second condition is both the under-left mode {0, 1, 2, 4, 5, 6, 8} and the under mode {1, 8}; and the third condition is the under right-mode {0, 1, 2, 3, 7, 8}. To select 4 × 4 QDCT coefficients of the luminance component for data embedding, the three conditions must be presented together. However, the two methods have a low embedding payload because only the luminance of the intra frame blocks that meet the three conditions are selected for hiding data. Later, Liu et al. [55, 56] presented a robust data hiding using H.264/AVC codec without a deformation accumulation in the intra frame based on BCH codes. By using the directions of the intra frame prediction, the deformation accumulation of the intra frame can be prevented. Some blocks will be chosen as carrier object for concealing the covert message. This procedure will rely on the prediction of the intra frame modes of adjacent blocks to prevent the deformation that proliferates from the neighboring blocks. The authors used BCH encoding to the hidden message before the embedding phase to enhance the method performance. Then, the encoded information is concealed into the 4 × 4 QDCT coefficients using only a luminance plane of the intra frame. Liu et al. defined N as a positive integer and \( {\overset{\sim }{\mathrm{Y}}}_{ij} \) as selected DCT coefficients (i, j = 0,1,2,3). The embedding process of this method is carried out by the following steps:

  1. 1.

    If \( \left|{\overset{\sim }{\mathrm{Y}}}_{ij}\right|=\mathrm{N}+1\ \mathrm{or}\ \left|{\overset{\sim }{\mathrm{Y}}}_{ij}\right|\ne \mathrm{N} \), then the \( {\overset{\sim }{\mathrm{Y}}}_{ij} \)will be modified as follows:

$$ {\overset{\sim }{Y}}_{ij}=\left\{\begin{array}{c}{\overset{\sim }{Y}}_{ij}+1\kern1em \mathrm{if}\ {\overset{\sim }{Y}}_{ij}\ge 0\ and\ \left|{\overset{\sim }{Y}}_{ij}\right|=N+1\kern1em \\ {}{\overset{\sim }{Y}}_{ij}-1\kern1em \mathrm{if}\ {\overset{\sim }{Y}}_{ij}<0\ and\ \left|{\overset{\sim }{Y}}_{ij}\right|=N+1\kern1em \\ {}{\overset{\sim }{Y}}_{ij}\kern3em \mathrm{if}\ \left|{\overset{\sim }{Y}}_{ij}\right|\ne N+1\ \mathrm{or}\ \left|{\overset{\sim }{Y}}_{ij}\right|\ne N\kern1em \end{array}\right. $$
(16)
  1. 2.

    If the secret bit is 1 and\( \left|{\overset{\sim }{Y}}_{ij}\right|=N \), then the \( {\overset{\sim }{Y}}_{ij} \)will be changed as follows:

$$ {\overset{\sim }{Y}}_{ij}=\left\{\begin{array}{c}{\overset{\sim }{Y}}_{ij}+1\kern1em \mathrm{if}\ {\overset{\sim }{Y}}_{ij}\ge 0\ and\ {\overset{\sim }{Y}}_{ij}=N\kern1em \\ {}{\overset{\sim }{Y}}_{ij}-1\kern1em \mathrm{if}\ {\overset{\sim }{Y}}_{ij}<0\ and\ {\overset{\sim }{Y}}_{ij}=N\kern1em \end{array}\right. $$
(17)
  1. 3.

    If the secret bit is 0 and\( \left|{\overset{\sim }{\mathrm{Y}}}_{ij}\right|=\mathrm{N} \), then the \( {\overset{\sim }{\mathrm{Y}}}_{ij} \)will not be modified.

Overall, the previous methods that use DCT, QDCT, and DWT coefficients as venues to hide secret information are restricted to the limited number of coefficients in the embedding phase. Moreover, many mentioned algorithms did not include the secret message and cover data preprocessing stages which are necessary to improve security and robustness of any of the steganographic methods.

3.1.5 Video steganography techniques in entropy coding CAVLC and CABAC

During the H.264 compression, context adaptive variable length coding (CAVLC) and context adaptive binary arithmetic coding (CABAC) entropy coding can be used as host data to carry secret messages within many video steganography techniques. Ke et al. [38] presented a video steganography method relies on replacing the bits in H.264 stream. In this algorithm, CAVLC entropy coding has been applied in the data concealing process. During the video coding and after the quantization stage, authors used non-zero coefficients of high frequency regions for the luminance component of the embedding process. Here, non-zero coefficients in high frequency bands are almost “+1” or “-1”. The embedding phase can be completed based on the trailing ones sign flag and the level of the codeword parity flag. The sign flag of the trailing ones changes if the embedding bit equals “0” and the parity of the codeword is even. Also, the sign flag changes if the embedding bit equals “1” and the parity of the codeword is odd. Otherwise, the sign flag of the trailing ones does not change. The trailing ones are modified as follows:

$$ Trailing\ Ones=\left\{\begin{array}{c}\hfill\ even\ codeword\kern0.5em ; if\ secret\ bit=0\ \hfill \\ {}\hfill odd\ codeword\kern0.5em ; if\ secret\ bit=1\ \hfill \end{array}\right. $$
(18)

The modification of high frequency coefficients does not have an impact on the video quality. However, the embedding capacity is low because Ke et al.‘s method is established on the non-zero coefficients of the high frequencies that consist of a large majority of zeros. Similarly, Liao et al. [46] proposed real-time data concealing in H.264/AVC codec. During the process of CAVLC in 4 × 4 blocks, the trailing ones are utilized for embedding the secret data. The performance of this method was achieved through low computational complexity, negligible degradation of the video quality, and an unchangeable bit-steam size almost. This method employed random sequences as secret data. It is embedded into the selected blocks of CAVLC trailing ones as follows:

$$ {\widehat{T}}_{Ones}=\left\{\begin{array}{l}2\kern1em ; if\ w=0\ and\ Trailing\ Ones\ge 3\kern1em \\ {}1\kern1em ; if\ w=1\ and\ Trailing\ Ones=2,\kern1em \\ {}\kern1em or\kern1em \\ {}\kern1em w=1\ and\ Trailing\ Ones=0\kern1em \\ {}\kern2em \\ {}0\kern1em ; if\ w=0\ and\ Trailing\ Ones=1\\ {} unchanged\kern1em ; otherwise\kern1em \end{array}\right. $$
(19)

Where w represents secret data that is hidden into the trailing ones codeword within range of 0 to 3. \( {\hat{T}}_{Ones} \) represents modified trailing ones. Additionally, Lu et al. [58] proposed real-time frame dependent video watermarking in VLC coding. In order to achieve the real-time detection, the CAVLC encoder is applied during this algorithm. During the process of video coding, the secret data is embedded into the run-level pairs of each frame’s macroblocks keeping the bit-rate almost unchangeable. Table 3 illustrates run-level pairs (r, l) and codewords of the CAVLC encoder. The diagram of the data hiding process was introduced by Lu et al., and is illustrated in Fig. 9.

Table 3 VLC table (s denotes the sign bit)
Fig. 9
figure 9

Diagram of the hiding process in method [58]

Mobasseri et al. [65] proposed watermarking of MPEG-2 standard in a compressed domain by utilizing CAVLC mapping. During the CAVLC encoder, there are some run-level pairs that cannot systematically meet each other in intra frame blocks called unused pairs. The secret data is embedded into the codewords of unused run-level pairs of the CAVLC entropy coding. This method achieved a low modification rate of the selected run-level pairs which keeps the visual quality and bit-stream size of the watermarked video nearly unchanged. In a different work, Wang et al. [104] presented a real-time watermarking method in the H.264/AVC codec based on the CABAC features. The CABAC encoder uses a unary binarization, which is a process of concatenating all binary values of syntax elements. A certain number of motion vectors for both P and B frames are utilized for the data hiding process using the CABAC properties. The secret watermark is concealed by displacing the binary sequence of the selected syntax elements orderly. This method achieves a low degradation of the video quality because of the difference between the original code and the replacement code is very small (at most 1 bit is altered out 8-bits of the selected motion vector). This small difference is also the reason of achieving a little bit-rate increase. The percentage of the increased bit-rate μ is calculated as follows:

$$ \upmu =\frac{m-u}{u}\times 100\% $$
(20)

where u and m indicate the bit-rate of the original and the watermarked videos respectively. The flowchart of this method is illustrated in Fig. 10. The diagram of the CABAC encoder is shown in Fig. 11. Generally, the previous methods that utilize CAVLC and CABAC entropy coding as venues to conceal secret messages are limited in capacity due to the restricted number of selected blocks in the embedding stage. Moreover, when using the entropy coding, the quality of the steganogram is severely distorted.

Fig. 10
figure 10

The data embedding framework in [104]

Fig. 11
figure 11

General block diagram of the CABAC encoder

Table 4 summarizes video steganography methods that operate in compressed domain, emphasizing each of embedding capacity, video quality, robustness against attackers, video preprocessing, and secret messages preprocessing. Table 5 clarifies the advantages and limitations of each venue for concealing secret messages in compressed domain. These venues include intra frame prediction, inter frame prediction, motion vectors, DCT coefficients, QDCT coefficients, DWT coefficients, CAVLC entropy coding, and CABAC entropy coding.

Table 4 Venues, embedding capacity, video quality, robustness, video and message preprocessing of the surveyed video steganography techniques that utilize compressed domain for data hiding
Table 5 Advantages and disadvantages of each venue for data concealing in compressed domain

3.2 Video steganography techniques in raw domain

Unlike the compressed video, raw video steganography techniques deal with the video as a sequence of frames with the same format. First, digital video is converted into frames as still images, and then each frame is individually used as carrier data to conceal the hidden information. After the embedding process, all frames are merged together to produce the stego video. Raw video steganography techniques operate in both spatial and transform domains [75].

3.2.1 Video steganography techniques in spatial domain

There are many steganographic techniques that rely on the spatial domain such as LSB substitution, bit-plane complexity segmentation (BPCS), spread spectrum, ROI, histogram manipulation, matrix encoding, and mapping rule. Basically, these techniques utilize the pixel intensities to conceal the secret message. Zhang et al. [114] presented an efficient embedder utilizing BCH encoding for data hiding. This embedder hides the covert information into a block of carrier object. The concealing phase is achieved by modifying different coefficients in the input block to set the syndrome values null. This method enhances embedding payload and execution duration compared to others. The error correcting code (ECC) and steganographic model of this method is shown in the Fig. 12. Zhang et al.‘s method modifies the complexity of the algorithm from exponential to linear. On the other hand, Diop et al. [16] presented an adaptive steganography method utilizing the low-density parity-check codes. The method discusses how to reduce the influence of hidden information insertion by this codes. This algorithm demonstrated that the low-density parity-check codes are better for encoding algorithms than other codes. The process of embedding and extraction can be accomplished by Eq. 21 and Eq. 22.

$$ S= Embedding\ \left(I,m\right) $$
(21)
$$ m= Extraction\ (m)=HS $$
(22)
Fig. 12
figure 12

ECC and steganographic method of [114]

Where I and S are the cover data and steganogram, respectively, and m is a secret message (\( m\in {F}_2^m\Big) \). Additionally, Cheddad et al. [10] presented a skin tone data concealing method which depends on the YCbCr color space. YCbCr is utilized in different methods such as object detection and compression techniques. In YCbCr, the correlation between RGB colors is isolated by separating the luminance (Y) from the chrominance blue (Cb) and the chrominance red (Cr). Subsequently, the human skin areas are recognized, the Cr of these areas are used for concealing the hidden information. Overall, the method has a limited embedding capacity because the hidden message is embedded only in the Cr plane of the skin region. Similarly, Sadek et al. [91] proposed a robust video steganography method based on the skin region of interest. The secret message is concealed into the wavelet coefficients of skin regions for each blue and red components. This method is robust against MPEG compression. However, the results of comparison demonstrated that Cheddad et al.‘s method outperformed Sadek et al.‘s algorithm in both imperceptibility and embedding capacity. Khupse et al. [43] presented an adaptive information hiding scheme using steganoflage, which is based on the ROI in a frame instead of utilizing an entire frame. This method utilized human skin tone as a carrier object for concealing the hidden data. The filling operation and morphological dilation techniques have been applied as a skin detection. Then, the YCbCr frame that has the lower mean square error (MSE) is chosen for the hiding stage. Only the Cb part of that certain video frame is selected for concealing the hidden information. Khupse et al.‘s method is restricted in embedding payload due to considering only a single frame for data hiding stage.

Alavianmehr et al. [3] presented a robust uncompressed video steganography by utilizing the histogram distribution constrained (HDC). In this method, the Y component of every frame is segmented into non-overlapping blocks (C) of size m × n. Then, the secret message is concealed into these blocks based on the shifting process. The selected blocks are changed only when the secret message bits are “1”. The modified frame S of the k th block is calculated as follows:

$$ {\widehat{S}}^k\left(i,j\right)=\left\{\ \begin{array}{c}\hfill {S}^k\left(i,j\right)+\gamma \kern0.5em ; if\ \upalpha \in \left[0,T\right]\ and\ \mathit{\mod{}}\left(i,2\right)=\mathit{\mod{}}\left(j,2\right)\ \hfill \\ {}\hfill {S}^k\left(i,j\right) - \gamma \kern0.5em ; if\ \upalpha \in \left[-T,0\right]\ and\ \mathit{\mod{}}\left(i,2\right)\ne \mathit{\mod{}}\left(j,2\right)\hfill \\ {}\hfill\ {S}^k\left(i,j\right)\kern0.5em ; otherwise\ \hfill \end{array}\right. $$
(23)
$$ \upgamma =\frac{\ \left(G+T\right)\times 2\ }{m\times n} $$
(24)
$$ {\alpha}^k={\displaystyle \sum_{i=1}^m}{\displaystyle \sum_{j=1}^n}{C}^k\left(i,j\right)\times N\left(i,j\right) $$
(25)
$$ N\left(i,j\right)=\left\{\ \begin{array}{c}\hfill\ 1\kern0.5em ; if\ \mathit{\mod{}}\left(i,2\right)=\mathit{\mod{}}\left(j,2\right)\ \hfill \\ {}\hfill -1\kern0.5em ; if\ \mathit{\mod{}}\left(i,2\right)\ \ne\ \mathit{\mod{}}\left(j,2\right)\ \hfill \end{array}\right. $$
(26)

Where γ, α, and N are the shift quantity, the arithmetic difference, and the computed matrix for each block, respectively. Also, T and G are two predefined thresholds used in this method. Alavianmehr et al.‘s method withstands against compression attack. However, it utilizes only Y plane for data embedding process. Similarly, Moon et al. [66] presented a secure data hiding method using a computer forensic process. The hidden message is authenticated and ciphered by a secret key, and then it conceals into the 4 LSB of each pixel of the video frames. In order to transfer the authentication key to the receiver, it will conceal into one of the frame recognized by the sender and recipient. The goal of utilizing the computer forensic process is the validity of the obtained stego videos. The method presented by Moon et al. is not robust against video processing operations due to utilization of spatial domain. In addition, Kelash et al. [39] presented a histogram variation-based video steganography method. Frames whose histogram variation averages exceed the histogram constant value (HCV) are chosen for the data concealing procedure utilizing the specified threshold. Then, these frames are segmented into blocks in order to compute the variation of the successive pixels. The hidden message is concealed into 3 LSB of each selected pixel. According to the hiding capacity, Kelash et al.‘s method is restricted because it is only based on the HCV value. Comparatively, Paul et al. [80] presented a new steganographic technique to conceal the hidden message inside the video stream. Once the abrupt scene fluctuation frames are revealed, these frames are selected to host the hidden message. The histogram variation is utilized to find each frame whether it is a sudden scene variation or not. A 3–3-2 LSBs of each pixel are contributed into a data concealing process in order to hide the covert information. The randomization location of the pixels improved the security of this method. However, there is a limitation of the number of sudden scene changes frames. On the other hand, Bhole et al. [6] presented a randomization byte data hiding method. The first video frame is utilized to conceal the control data of other frames which is called an index frame. The remaining frames are used for data concealing procedure utilizing the index frame information. Bhole et al., also used the LSB method. Bhole et al.‘s algorithm lacks of the robustness due to utilization of spatial domain. In a different work, Hanafy et al. [25] presented a secure communication method based on video steganography. This scheme is applied to the spatial domain by using raw videos as cover data and every text, audio, image, and video as a secret message. In this instance, the message is segmented into non-overlapping blocks, and then these blocks are randomly concealed into the frames based on the secret key. The randomization of the secret message is dynamically changed in each video frame in order to control identifying the message’s location by attackers. The data embedding process is accomplished by using 2 LSBs of each color channel (RGB), which hides 6 bits of secret message in each pixel frame. In this scheme, the secret message is protected using a secret key. However, the scheme utilizes the spatial domain in order to embed the covert information. Here, this method is not robust against video compression processes and noises. Similarly, Lou et al. [57] proposed LSB steganography scheme using the reversible histogram transformation function. Here, the covert information is hidden into the LSB pixels of the cover data. This algorithm is robust against two well-known statistical steganalysis schemes including x2-detection and regular-singular attacks. The average of the embedding rate of Lou et al.‘s method is similar to the LSB technique. In addition, Tadiparthi et al. [99] proposed a steganographic method that utilizes animations as cover data. This method conceals the secret message into the animation frames. Tadiparthi et al.‘s algorithm achieves better results when comparing with the two existing algorithms. However, the secret message distribution cannot be modified because the secret key relies on the probability distribution of the secret message. In addition, this method requires longer time to implement, and thus makes it more complex than others. Eltahir et al. [18] proposed a high rate data concealing algorithm. In each frame, a 3–3-2 approach is used based upon the LSB of three color channels (RGB). A 3–3-2 method refers to 3-bits of Red, 3-bits of Green, and 2-bits of Blue in each pixel that are used to hide the covert data as shown in the Fig. 13. Later, Dasgupta et al. [15] optimized the [18] method based on the genetic algorithm in order to enhance both the security of the covert information and the visual quality of the steganogram. The reason for this improvement is to develop an objective function that is based on the weights of different parameters such as MSE and HVS. However, [15, 18] algorithms are not robust against signal processing, noises, and video compression due to the fact that they operate in the pixel domain.

Fig. 13
figure 13

The hiding capacity in each RGB pixel [18]

Hu et al. [30] presented a novel data concealing using a non-uniform rectangular partition method. The non-uniform rectangular partition procedure has three main factors. First, a suitable initial partition must be chosen to improve the results of the partition. Therefore, a reconstructed frame can be obtained with a minimum number of partitions and codes. Second, in order to make an approximation of the pixel gray values in each specific sub-image (rectangle), the bivariate polynomial has been utilized. Then, by applying the optimal quadratic approximation to these gray values, the undetermined coefficients of the bivariate polynomial can be specified. The partition processing will continue to divide the sub-image into four smaller parts, especially if the original sub-image cannot be extracted by the determined bivariate polynomial using the required control error. Also, the process of approximation is repeated again until the number of pixels in the sub-image is greater than or equal to the undetermined coefficients of the bivariate polynomial. The original image can approximately be reconstructed according to the codes that have been obtained from the partitioning process. Third, the last factor of the non-uniform rectangular partition process is the control error. The control error is determined at the end of the partitioning process. It decides whether or not to continue dividing sub-images. The non-uniform rectangular partition is applied on each frame of secret video in order to obtain partitioned codes that will be concealed into the cover frames. This steganographic algorithm is based on a concept called “Tangram” that is similar to a puzzle game. The algorithm has two main advantages: 1) The adaptability of non-uniform rectangular partition and 2) the cover frame carries and records the partition codes information [30]. If the secret frame is A and the carrier frame is B, then the process of embedding in this method can be accomplished by the following points:

  1. 1-

    A suitable initial partition area is selected. Also, a control error E = 4 (ranged from 2 to 6) and a bivariate polynomial equation f(x, y) = ax + by + cxy + d are specified. By applying the non-uniform rectangular partition algorithm, the frame A partition grids are obtained; and

  2. 2-

    Partition grids of frame A are placed on frame B, and then\( {h}_1={z}_1-{\hat{z}}_1 \), \( {h}_2={z}_2-{\hat{z}}_2 \), \( {h}_3={z}_3-{\hat{z}}_3 \), and \( {h}_4={z}_4-{\hat{z}}_4 \) are calculated. Where z 1 , z 2 , z 3 , z 4 and \( {\hat{z}}_1,{\hat{z}}_2,{\hat{z}}_3,{\hat{z}}_4 \) are the gray values of each rectangular sub-area vertexes for A and B frames, respectively; and

  3. 3-

    Embedding all partition codes and their differences {h} into each 4 LSB frame B gray values.

Hu et al.‘s algorithm increases the capacity of the hidden data. However, it is not robust against the video compression and temporal noises due to due to utilization of spatial domain. Moreover, the computational time is high due to algorithm’s complexity. In a different work, Kawaguchi et al. [37] proposed principles and applications of BPCS steganography. In this method, the video frame is first converted into 8 bit-planes, and then each bit-plane is divided into informative (simple) and noise-like (complex) regions. The BPCS technique differs from the LSB technique in the number of bit-planes that are utilized for embedding secret message. The BPCS technique uses all bit-planes (0–7) for data hiding while the LSB technique only uses a bit-plane 0 for the embedding process. Figure 14 clarifies how one of the video frames converts to 8 bit-planes by applying the BPCS technique. In this method, the covert information is concealed into the complex regions to achieve a high embedding payload. Moreover, modifying the noise-like areas in each bit-plane for data hiding purposes has a minimal influence to the human visual system. The complexity (α) level is measured in each region whether informative or complex, and α can be defined as follows:

$$ \upalpha =\frac{\ k\ }{\ 2m\left(m-1\right)}, \left(0\le \alpha \le 1\right) $$
(27)
Fig. 14
figure 14

The process of converting one of the Foreman video frames into 8 bit-planes using the BPCS technique

Where k equals the total length of the black-and-white border in the selected region, and 2m(m − 1) is the highest possibility of the border length gained from the selected region. An m × m represents the size of the selected region. Figure 15 illustrates the complexity degree of the BPCS regions according to the Kawaguchi et al. method.

Fig. 15
figure 15

BPCS complexity degree of different regions: left informative region and right noise-like region

Sun [98] proposed a new information hiding method based on the improved BPCS steganography. The regular BPCS method computes the complexity of the selected region based on the total length of the black-and-white border. This technique introduces a new method that identifies the noise-like regions which is useful, especially, in periodical patterns. Each canonical gray coding (CGC), run-length irregularity, and border noisiness are utilized to measure the complexity level of the selected regions. Based on the complexity degree, the secret data is concealed into the noise-like areas. In order to expand the capacity of the covert information, the informative regions are converted into the complex regions using the conjugation operation. If n is the length of pixels and h[i] is the repetition of run-lengths in each black-or-white of i pixels, then run-length irregularity of the binary pixels (H s ) in Sun’s algorithm can be calculated as follows:

$$ {H}_s=-{\displaystyle \sum_{i=1}^n}h\left[i\right]\ lo{g}_2{P}_i $$
(28)
$$ {P}_i=\frac{\ h\left[i\right]\ }{\ {\displaystyle {\sum}_{j=1}^n}h\left[j\right]\ } $$
(29)

In conclusion, the steganographic methods that operate in spatial domain are simple and obtain a high payload of secret messages. However, these techniques are not robust against signal processing, noises, and compression. Moreover, most of the above-mentioned methods do not take advantage of the cover data and the secret message preprocessing stages which can enhance the robustness and security of the steganographic algorithms.

3.2.2 Video steganography techniques in transform domain

In video steganography methods that operate in transform domain, each video frame is individually transformed into frequency domain using DCT, DWT, and discrete Fourier transform (DFT) and the secret message is embedded utilizing the low, middle, or high frequencies of the transformed coefficients. Patel et al. [79] presented a new data hiding method using the lazy wavelet transform (LWT) technique, where each video frame is divided into four sub-bands, separating the odd and even coefficients. The secret information is then embedded into the RGB LWT coefficients. For accurate extraction of embedded data, the length of hidden data is concealed into the audio coefficients. The amount of hidden information is high, but this type of wavelet is not a real mathematical wavelet operation. Consequently, Patel et al.‘s method will not protect the hidden information from attackers due to it operates in the spatial domain. On the other hand, Spaulding et al. [95] presented the BPCS steganography method using an embedded zerotree wavelet (EZW) lossy compression. In this method, the DWT’s coefficients are representing the original frame’s pixels. Therefore, the BPCS steganography can be applied to DWT coefficient sub-bands which contain different features. The features of DWT sub-bands include correspondence, complexity, and resiliency against attacks. Each DWT sub-band is divided into pit-planes, and then the quantized coefficients are used for hiding the covert data. This method achieves a high embedding capacity around a quarter of the size of the compressed frame. Fig. 16 illustrates the data embedding process of Spaulding et al.‘s method.

Fig. 16
figure 16

A block diagram representing the data concealing phase of the method [95]

Similarly, Noda et al. [77] presented a video steganography technique utilizing the BPCS and wavelet compressed video. The 3D set partitioning in hierarchical trees (SPIHT) and motion-JPEG2000 are the two coding techniques that use the DWT domain. First, each bit-plane of the video frame and the secret message is segmented into 8*8 blocks. Then, the noise-like, bit-plane blocks are selected using the threshold of the noise-like complexity measurement. The two wavelet compression techniques are applied on the selected blocks by using the BPCS method, hiding the secret data into the quantized DWT coefficients. The experimental results of Noda et al.‘s algorithm demonstrated that the 3D SPIHT coding method has a higher embedding payload than the Motion-JPEG2000 coding method when using BPCS steganography. However, the suggested algorithm of Noda et al. is not guaranteed that all types of cover videos contain enough noise-like bit-plane regions. Moreover, this method is only applied to the wavelet-based compression domain. Ordinarily, the steganographic techniques based on the transform domains improve the robustness against signal processing, noises, and compression. However, these techniques are more complex than the spatial domain methods.

Table 6 summarizes video steganography methods that utilize raw domain for data hiding, highlighting each of embedding capacity, video quality, robustness against attacks, video preprocessing, and secret messages preprocessing.

Table 6 Venues, embedding capacity, video quality, robustness, video and message preprocessing of the discussed video steganography methods that operate in raw domain

4 Performance assessment metrics

The main purpose of steganography techniques is to conceal the secret information inside the cover video data, thus the quality of the cover data will be changed ranging from a slight modification to a severe distortion. In order to evaluate whether the distortion level is acceptable or not, statistically, different metrics have been utilized [2]. PSNR is a common metric utilized to calculate the difference between the carrier and stego data. The PSNR can be calculated as follows [92]:

$$ MSE=\frac{{\displaystyle {\sum}_{i=1}^m}{\displaystyle {\sum}_{j=1}^n}{\displaystyle {\sum}_{k=1}^h}{\left[C\left(i,j,k\right)-S\left(i,j,k\right)\right]}^2}{m\times n\times h} $$
(30)
$$ PSNR=10*Lo{g}_{10}\left(\frac{MA{X_C}^2}{MSE}\right)\ (dB) $$
(31)

C and S represent the carrier and stego frames. Both m and n indicate the frame resolutions, and h represents the RGB colors (k = 1, 2, and 3). PSNR-HVS (PSNRH) and PSNR-HVS-M (PSNRM) objective measurements are utilized to enhance the quality of the steganograms. The PSNRM is an upgraded form of the PSNRH. Each of PSNRH and PSNRM relies on the DCT coefficients of the transform domain [17]. PSNRH and PSNRM can be calculated using Eq. 32 and Eq. 33 [82]:

$$ PSNRH=10*Lo{g}_{10}\left(\frac{MA{X_C}^2}{MS{E}^{hvs}}\right)\ (dB) $$
(32)
$$ PSNRM= 10\ast Lo{g}_{10}\left(\frac{MA{X_C}^2}{MS{E}^{hvs-m}}\right)\ (dB) $$
(33)

MSE hvs and MSE hvs_m utilize the factor matrix and the 8 × 8 DCT coefficients of the carrier and stego frame blocks [17]. On the other hand, the performance of steganographic method in terms of embedding capacity is a major factor that any method tried to increase it with the respect of the visual quality. According to [103], any steganographic method has a high hiding capacity if the hidden ratio exceeds 0.5 %. The embedding ratio is calculated in the following formula:

$$ Embedding\ ratio=\frac{\ Size\ of\ embedded\ message\ }{Cover\ video\ size}\times 100\% $$
(34)

To further evaluate the performance of any steganographic algorithm in terms of robustness, two objective metrics including bit error rate (BER) and similarity are used. These metrics are applied to determine whether the secret messages are retrieved from the stego videos successfully by comparing the concealed and extracted covert data. The BER and similarity are computed in the following formulas [27]:

$$ BER = \frac{{\displaystyle {\sum}_{i= 1}^a}{\displaystyle {\sum}_{j= 1}^b}\left[M\left(i,j\right)\oplus \widehat{M}\left(i,j\right)\right]}{a\times b}\times 100\% $$
(35)
$$ Similarity = \frac{{\displaystyle {\sum}_{i= 1}^a}{\displaystyle {\sum}_{j= 1}^b}\left[M\left(i,j\right)\times \widehat{M}\left(i,j\right)\right]}{\sqrt{{\displaystyle {\sum}_{i= 1}^a}{\displaystyle {\sum}_{j= 1}^b}M{\left(i,j\right)}^2}\times \sqrt{{\displaystyle {\sum}_{i= 1}^a}{\displaystyle {\sum}_{j= 1}^b}\widehat{M}{\left(i,j\right)}^2}} $$
(36)

Where \( \mathrm{M}\ \mathrm{and}\ \widehat{\mathrm{M}} \) are the concealed and extracted hidden data, and, “a” and “b” are the size of the hidden data.

5 Conclusion and recommendations

In this paper, we have presented a comprehensive review and analysis of video steganography methods in both compressed and raw domains. In addition, the main confusion between steganography, cryptography, and watermarking techniques was eradicated. First, compressed video steganography techniques were classified based on the video encoding stages as venues for data embedding. Venues for concealing secret messages in compressed domain include: 1) intra frame prediction, 2) inter frame prediction, 3) motion vectors, 4) DCT and QDCT coefficients, and 5) CAVLC and CABAC entropy coding. Second, the existing raw video steganography methods were categorized according to their domain of operation including 1) spatial domain methods and 2) transform domain methods. Then, techniques of each domain were discussed and their performance assessments, imperceptibility, embedding capacity, robustness against attacks, video preprocessing, and secret messages preprocessing were highlighted. Furthermore, the characteristics and drawbacks of each steganographic method were mentioned. The following recommendations and future research trends are suggested to come up with an appropriate method for data hiding:

  1. 1-

    Proposing a video steganography method that maintains a trade-off between video quality, hiding capacity, and robustness against attacks, this makes it more appropriate for real-time security methods.

  2. 2-

    Suggesting a steganographic technique that combines steganography with other system protection methods such as cryptography and error correcting codes. Thus, encrypting and encoding the hidden massage prior to the embedding process will provide an additional security level to the secret message and make it more robust against attackers during the transmission.

  3. 3-

    Providing a video steganography algorithm that focuses on a portion of the video as carrier for data hiding instead of using entire video. Such a method will lead to enhance the quality of steganograms and the resistance against attacks. For instance, concealing the secret message into the region of interest includes human faces, human bodies, cars, or any other motion objects. Furthermore, it will be challenging for unauthorized users and intruders to define the position of hidden data in each video frames as the hidden data is concealed into the ROI which modifies from frame to frame, hence maintaining the security of hidden message.

  4. 4-

    Introducing a video steganography method that utilizes transformation coefficients of the ROI rather than using actual pixel domain. Since transform domain techniques are more robust against signal processing operations and compression process.