1 Introduction

Since the rise of the Internet and wireless communications, protection of sensitive information against unwanted access or manipulation has been of prime concern. Cryptography methods are employed to maintain information security. In these methods, the message content is converted to a meaningless one using a key and then transmitted through the communications channel. The main issue in cryptography schemes is that the eavesdropper may easily suspect the existence of the confidential message as soon as meaningless content is observed. In other words, cryptography does not hide the communication; whereas, in some cases, we aim to conceal the communication from the eavesdropper. The technique employed to achieve this is named steganography. Indeed, contrary to cryptography, steganography is the science of concealing the confidential information inside a cover media so that only the transmitter and the legitimate recipient have been aware of that.

The quality of a steganography method is evaluated by applying its measurements. The main steganography measurements are imperceptibility, capacity, robustness, and security. Imperceptibility evaluates the ability of a steganography approach to avoid visual degradation in the host media during embedding. Capacity means the number of bits that can be embedded in a cover media using a steganography techniqueFootnote 1. Security and robustness of a steganography method are the measurements of its ability to resist the operations of passive and active wardens respectively. Security is calculated employing steganalysis attacks, and steganalysis is the procedure of detecting the presence of a hidden message in a suspicious media. Security against passive wardens’ operations is the primary requirement of steganography which is usually known as undetectability. Robustness refers to the capability of a steganography method to preserve secret information despite information eliminating efforts of the warden [3, 13, 31].

Besides the above measurements, there are other important criteria such as compression ratio and computational cost [2]. Nowadays, almost all types of digital media are compressed to reduce the cost of transmission and the space required for storage. The compression ratio evaluates the effect of embedding with a steganography method on increasing the output bitrate of the cover media.

Almost all types of digital media such as text, image, audio, video and network protocol packets can be used as steganography hosts. Due to the high capacity of video and redundancy of its information, this media is a suitable host for embedding high volume secret messages.

Video steganography approaches lie in the following two categories: 2D methods and 3D methods [41]. The methods which embed a secret message in video frames independently and regardless of correlations among frames such as embedding on intra-prediction modes [14, 24, 25, 32, 37, 50] are considered as 2D video steganography. These techniques have no advantage over image steganography techniques except capacity, and can be detected by image steganalysis attacks. Any method which uses the third dimension of video to embed confidential information belongs to the 3D category. These methods are more secure against steganalysis techniques compared to 2D methods. The information hidden by 3D methods cannot be detected by image steganography methods due to using inter-frame components of video [4, 8, 41, 44]. The 3D methods include but not limited to embedding on DCT coefficients [26, 30, 33, 57], variable-length codes [33], quantization parameters [40, 46], inter prediction modes [27, 55], and motion vectors [1, 7, 9, 10, 16, 36, 48, 51, 54, 56]. Among all of 3D video steganography strategies, video steganography based on motion vectors is set to be the most secure technique against steganalysis schemes for the following reasons:

  1. 1)

    By changing motion vectors, the intra-frame and inter-frame statistics of video change indirectly and randomly.

  2. 2)

    There exist much lower correlations among neighboring MVs than neighboring pixels (For more explanation, please refer to Fig. 1 and Fig. 2 in [44]). As a result, a video containing manipulated MVs is highly misleading.

  3. 3)

    A majority of video steganalysis methods consider the embedding procedure as an additive noise; these attacks cannot detect MV-based steganography schemes [7].

Moreover, manipulating motion vectors leads to negligible distortion in the visual characteristics of the output video thanks to the motion compensation phase in the video codec algorithm. Hence, much attention nowadays has been focused on motion-vector based video steganography.

In this study, we propose an information hiding scheme on motion vectors of video. The main objective of our research is increasing the security of MV-based video steganography against the most powerful steganalysis algorithms. We also aim to achieve a high level of imperceptibility, computational cost, and compression ratio at the same time. Our main contributions and paper organization are as follows. Section 2 includes the related work on video steganography and steganalysis. Section 3 presents the basic concepts which will be later used in our proposed method. Section 4 contains the proposed method and the following contributions:

  1. 1)

    For the first time, a cost function is proposed based on the original and modified MVs’ statistical differences in the side of steganalyzer and MVs’ spatial correlations.

  2. 2)

    A novel method for finding proper MVs for manipulation is introduced based on a comparison between posterior statistics of original and altered MVs.

  3. 3)

    A pseudo-randomization scheme is proposed to increase the efficiency of the syndrome-trellis coder.

  4. 4)

    The proposed method is the most secure video steganography method against the outstanding recent video steganalysis approaches.

In Section 5, the experimental setup is introduced and the experimental results are demonstrated. Finally, discussion and conclusion are presented in Sections 6 and 7.

2 Related work on video steganography and steganalysis

MV-based video steganography involves two major problems: how to find the appropriate MVs for embedding, and how to alter the selected MVs so as to leave as less embedding evidence as possible. Up to this point, the evolution of MV-based video steganography techniques can be separated into three fundamental stages [56]. In the early generation, appropriate motion vectors for embedding were chosen using a predetermined threshold and a particular criterion such as MVs magnitude [16, 48, 54], or MVs’ corresponding block prediction error [1]. Then, confidential information was embedded in the magnitude or the phase of the selected motion vectors. Because of improper MV selection and modification methods, these approaches failed to preserve secrecy against initial steganalysis techniques [8].

In order to enhance embedding efficiency (the number of confidential message bits embedded per unit modification [13]) and consequently acheive higher level of security, second-generation methods have applied error-correcting codes such as BCH codes [35], Wet Paper Codes (WPC) [7, 10, 19, 20] or Syndrome-Trellis Codes (STC) [9, 10, 17, 51]. The mentioned approaches consider the embedding procedure as a source coding problem. In wet paper coding, we assume the source is like a memory with some damaged cells [29]. From the steganography point of view, defective cells correspond to the cover elements which lead to easier detectability in the side of steganalyzer after modification. In [7], the cost of modifying each motion vector is defined as the lowest mean square error (MSE) correspond to a motion vector in the whole search window, which has different Least Significant Bit (LSB) from the original MV. Regarding the number of message bits that must be embedded, a wet vector is then formed using the evaluated costs. Afterward, wet paper coding is applied to specify the MVs to be modified, and selected MVs are replaced by MVs with the lowest costs.

Unlike wet paper codes, all of the MVs can be employed in syndrome-trellis codes, although, they differ greatly in their level of reliability. The more reliable a cover element is, the lower cost its manipulation imposes. These codes aim to minimize the total cost of embedding for a particular payload. Because of its efficiency in distortion minimization and consequently security improvement, syndrome-trellis codes caused scientists to turn their attention from uniform embedding to content-adaptive steganography. Therefore, designing the state-of-the-art steganography methods in these days is limited to two problems: (1) finding the most accurate manipulation cost (distortion function) with respect to statistical detectability (2) finding the best method to manipulate selected cover elements so as to seem as innocent-looking as possible [21].

It is claimed that altering motion vectors turn them from locally optimal to non-optimal [8, 45]. Accordingly, reversion based features in [8] and AoSO feature set in [45] are exploited for classification. Also, in [43, 44, 47], detectors are designed based on the correlation between each MV and its neighbors. In order to increase security against these attacks, Cao et al. [9] proposed a cost function for altering MVs based on their associated uncertainty. In [51], the authors proposed a distortion function for STC by defining Statistical Distribution Change (SDC) and Prediction Error Change (PEC) of MVs.

Attempting to preserve local optimality during embedding procedure, [10] and [56] have formed a new generation of MV-based video steganography methods. It is demonstrated in [10] that tampered MVs can be locally optimal in SAD (Sum of Absolute Differences) sense at the receiver’s side. So, the associated distortion scale of MVs is calculated according to their one-distant-locally-optimal neighbors. Next, a N1-bits message is embedded using STC. Following that, a wet paper channel is evaluated with modified MVs in the STC stage being regarded as wet cells; and a N2-bits message is embedded using WPC. In [56], the search area of MVs’ locally-optimal neighbors depends on the maximum acceptable computational cost.

Although [7, 9, 10, 51, 56] are the most successful MV-based steganography methods until now, they still have some weaknesses. In [9], the cost of altering MVs is evaluated using various Lagrangian multipliers for compressing each block. Since just one quantization scale is applied in the MV selection stage of video codec which is available on the steganalyser’s side, using other quantization scales in cost function seems to be useless. Moreover, the MV with minimum SAD is selected as modified MV, without taking the Lagrangian multiplier into consideration. In [51], due to lossy compression of associated prediction error block and using rate-distortion optimization in advanced video coding standards, there can be a nonlinear relationship between PEC and changes in steganalysis features. Therefore, a distortion function based on PEC is not rigorous. Furthermore, SDC cannot be computed in the case of variable-block-size video coding. On top of that, authors suggest accumulating the MVs of N successive frames and then perform embedding on them altogether, while this strategy has two major flawsFootnote 2. The main issue of [10] is the fact that we have to transmit either the rate N1/N2 or one of the numbers N1 or N2 for each P-frame as side information so that the message can be extracted in receiver’s side. Therefore, we should embed this information in a contractual location in the video which leads to a substantial increase in embedding rate. Otherwise, we have to send the information through a secure channel. In [56], even though the idea of finding locally-optimal neighbors is admirable, no alternative way is proposed when there is not any locally-optimal neighbor in the SAD sense for an MV. In Table 1, the percentage of MVs without a locally-optimal neighbor is represented that confirms the aforementioned approach is not always implementable. Besides, Since [7, 9, 56] use the whole search window of video compression algorithm to find the best modified MV, the computational costs of these methods are relatively high. In addition to taking more computational cost, selecting modified MVs from a big search range may negatively affect the compression ratio. Moreover, because of information loss stages such as DCT quantization, an originally compressed video can contain some non-locally-optimal MV. As shown in Table 2, usually with decreasing the quantization scale, the percentage of non-locally-optimal MVs in a clear video increases from the steganalyser’s perspective. Thus a locally-optimal MV is not always the most undetectable cloak.

Table 1 Detector reliability of MVRB against our proposed method and rival steganography approaches (SA) with different motion estimation algorithms (ME), embedding rates (ER), and quantization parameters (QP).
Table 2 The average percentage of non-locally-optimal motion vectors according to the lagrangian multiplier from the receiver’s point of view using H.264/AVC codec and different motion estimation algorithms (ME) and quantization parameters (QP) for QCIF sequences

3 General theory

3.1 Motion estimation and compensation

One part of the H.264 encoder for inter-frames is the selection of subdivisions using a cost function. In this stage, four motion estimation modes including full ME (one motion vector for the entire 16 × 16 macroblock), horizontal (two MVs, one per each 8 × 16 block), vertical (two MVs, one per each 16 × 8 block), and quaternary (four MVs, one per each 8 × 8 block) are considered for each 16 × 16 macroblock. The macroblock is divided based on the mode. Then for each block, the best MV (\(MV_{b_{k}}\)) is calculated applying (1) based on the rate-distortion-optimization (λ is Lagrangian multiplier, \(\lambda _{ME}=\sqrt {\lambda _{mode}}\), λmode = 0.85 × 2(QP− 12)/3, and Rmv is the number of bits required for transmitting the candidate MV). Finally, the Lagrangian cost of each mode is calculated and the best partitioning mode is selected using (2) (cfrmode is the final bitrate of coefficients using the existing mode). If the quaternary ME is selected, selection of subdivisions is performed again for each of the four blocks (Fig. 1).

$$ MV_{b_{k}}=\underset{mv}{\arg\min}[J_{b_{k},mv}]=\underset{mv}{\arg\min}[SAD_{b_{k},mv}+\lambda_{ME}\times R_{mv}] $$
(1)
$$ Mode=\underset{mode}{\arg\min}[SSD_{b_{k},MV,mode}+\lambda_{mode}\times cfr_{mode}] $$
(2)
$$ SAD_{b_{k},mv}=\sum\limits_{x=X(b_{k})}^{\begin{array}{ll} X(b_{k})\\ +BS_{x}(b_{k}) \end{array}}\sum\limits_{y=Y(b_{k})}^{\begin{array}{ll} Y(b_{k})\\ +BS_{y}(b_{k}) \end{array}}|F^{Org}_{x,y,t}-F^{Rec}_{x+mv_{x},y+mv_{y},t-1}| $$
(3)
$$ SSD_{b_{k},mv}=\sum\limits_{x=X(b_{k})}^{\begin{array}{ll} X(b_{k})\\ +BS_{x}(b_{k}) \end{array}}\sum\limits_{y=Y(b_{k})}^{\begin{array}{ll} Y(b_{k})\\ +BS_{y}(b_{k}) \end{array}}(F^{Org}_{x,y,t}-F^{Rec}_{x,y,t})^{2} $$
(4)

\(F^{Org}_{x,y,t}\) and \(F^{Rec}_{x,y,t}\) in (3) and (4) refer to quantity of the pixel in location (x, y) in the t th original and recounstructed P-frame respectively. X(bk) and Y (bk) are the corresponding row and colomn of the pixel in top-left corner of k th block, and BSx(bk) and BSy(bk) are the width and height of k th block (Fig. 1).

Fig. 1
figure 1

Levels of motion estimation in H.264/AVC codec [38]

3.2 Distortion minimization using syndrome-trellis codes

Assuming that we have N blocks in a P-frame, the MVs of these blocks are subjected to a LSB function to obtain \(p=\{LSB_{mv_{1}}, LSB_{mv_{2}}, ..., LSB_{mv_{N}}\}\) (\(LSB_{mv_{N}}\) is the least significant bit of sum of horizontal and vertical components of N th MV). The syndrome-trellis coding begins by having the LSB vector (p), cost of modifying the MV of each block (COST = {Cost(mv1), Cost(mv2), ... , Cost(mvN)}), the secret message bit stream (m), and a parity check matrix H ∈ {0, 1}rN×N being previously shared between the transmitter and the legitimate recipient (r is the embedding rate). First, all p vectors which satisfy (5) are obtained.

$$ {P'(m)}=\{p'\in\{0,1\}^{N}| p'H^{T}=m\} $$
(5)

Among all possible vectors p, the one that imposes the lowest cost of embedding is chosen as follows.

$$ \widetilde{p}=\underset{p'\in P'(m)}{\arg\min}[\sum\limits_{k=1}^{N} COST(k). (p(k)\oplus p'(k))] $$
(6)

Where ⊕ is the XOR (exclusive OR) function. With a simple comparison between p and \(\widetilde {p}\), we will realize which MVs must be altered. In the recipient side, after obtaining MVs and forming the LSB vector \(\widetilde {p}=\{LSB_{\tilde {mv_{1}}}, LSB_{\tilde {mv_{2}}}, ..., LSB_{\tilde {mv_{N}}}\}\), the message can be extracted by (7).

$$ m=\widetilde{p}H^{T} $$
(7)

The syndrome trellis coder can access utmost h × w motion vectors for embedding one bit of confidential message. Usually, 6 ≤ h ≤ 15, the quantity of which affects the speed and efficiency of the algorithm (The larger the h is, the slower the algorithm works); Because the time-and-space complexity for solving the trellis is \(\mathcal {O}(2^{h}n)\) (n is the number of MVs in each frame). Also, w is determined by Embedding Rate (1/(w + 1) ≤ ER ≤ 1/w). For example if h = 8 and embedding rate equals 4/10, so w = 2 and thus trellis coder has access to utmost 16 MVs (for more explanation, the reader is requested to refer to [17]). Since MVs are organized as the raster scan, accessible MVs for embedding each secret bit are close together. As it is depicted in Fig. 2, in overwhelming majority types of videos including videos with a fixed camera or a fixed background, or videos containing rigid moving objects (such as a moving car), MVs represent specific patterns. They are equal (in place of rigid objects), or equivalent to zero in large parts (in place of background). Therefore, their statistics including embedding costs are the same. So in these places, the performance of the Viterbi algorithm is likely to degrade.

Fig. 2
figure 2

MVs’ patterns in background and rigid objects (red circles are representatives of the big areas in which all the motion vectors are equal to zero)

Considering the above explanations, a novel arrangement of MVs is needed so as to increase the efficiency of syndrome-trellis coder.

3.3 Near-perfect steganalytic features

In [53], A set of steganalytic features is proposed supposing the original MVs of the received video should be locally optimal with respect to the Lagrangian multiplier. Since it can be evaluated by the steganalyzer, the Lagrangian multiplier can be used for feature extraction. After decompression, each reconstructed block is subjected to further motion estimation utilizing the obtained Lagrangian multiplier and eight MVs centering around the received MV. Then, four types of features are calculated based on the cost of the best MVs (MVs with the lowest cost) and received MVs. Type 1 feature set is based on the percentage of locally-optimal MVs in each of 9 positions; type 2 feature set is based on the cost difference between the received and the best MV in each position; type 3 and type 4 feature sets are such as type 1 and type 2 respectively, with one difference. The cost function in the first two types is based on SAD (Sum of Absolute Differences), whereas the cost function in the next two types is based on SATD (Sum of Absolute Transformed Differences).

Considering this successful strategy for MV-based steganography detection, the relationship between the Lagrangian costs of the original and altered MV should be as close as possible in the receiver’s side in order that less convincing evidence of steganography can be obtained by attackers.

4 Propose method

Overview

We aim to conceal the confidential information in MVs which cause the most similar spatio-temporal statistics of the resulting P-frame with that of the original one. In order to achieve the lowest possible output bitrate, the alternative MV is selected from the set of four nearest MVs to the original one. Because video coding is a lossy compression method, statistics of MVs differ before and after encoding. Steganalyzers can just access the encoded information available on the receiver’s side. This suggests that the modified MV should be the one with the closest statistics to that of the original MV after encoding. Thus in contrary to the state-of-the-art steganography methods, we exploit the posterior statistics of MVs in cost function and alternative MV selection procedure. Inspired by [53], we use steganalytic features partly based on local optimality to compute the most undetectable modified MVs and their corresponding modification costs. A syndrome-trellis coder is then used to improve embedding efficiency. The modification cost of each MV is composed of two costs: temporal and spatial cost. The temporal cost measures the difference between the steganalytic features of the alternative and original MV based on local optimality. This cost enforces the Syndrome-trellis-coder to select alternative MVs with the closest steganalytic features to that of the original ones. The spatial cost is used to ensure that the MVs belonging to the rigid objects or fixed backgrounds that have less indeterministic statistics will not be modified as far as possible. Figure 3 represents the block diagram of the proposed MV-based video steganography algorithm.

Fig. 3
figure 3

Block diagram of the proposed steganography method

Embedding Procedure

The process of embedding the secret message (encrypted information) and encoding is performed on each P-frame separately and in the order of occurrence.

  1. Step 1.

    Original MVs Computing: At this stage, MVs in each frame, the type of blocks, and motion compensation blocks are obtained during video coding.

  1. Step 2.

    Alternative MVs Computing: In this step, the best alternative MV is obtained using the following method:MVs which cause the best possible bitrate. In video codec standards, each motion vector is predicted by utilizing its previous encoded neighboring MVs; and the residual block is coded. Thus altering each MV affects the prediction of subsequent MVs and consequently bitrate. In order to prevent increasing the output bitrate, it is necessary to choose the embedded motion vector from a set of nearest possible MVs (according to the employed video coding standard, the smallest acceptable unit for MV alteration is 0.25). According to Fig. 4, among eight neighborhoods centered around the original MV, there are 4 MVs (shaded blocks) that can be used as the altered MV due to having different LSBs (LSBs of the sum of their horizontal and vertical components) from that of the original MV.

    MVs which have the closest temporal statistics to that of the original MV from the steganalyzer’s perspective. First, the block is compressed using each of these four MVs. Then, we put ourselves in the eavesdropper’s shoes; so in each of these five cases (original and altered MVs), we decode the block’s bitstream and compute the locally-optimal MV by having eight neighbors of the received MV, the Lagrangian cost function, and the reconstructed block. Afterwards, we compute the difference of Lagrangian costs (DoLC) of the received and locally-optimal MVs, and difference of that MVs (DoMV) from the encoded one applying (8) and (9). Among 4 candidate MVs, the MVs which have the closest DoMV to DoMVorg lie in the set S1 using (10). These are the MVs which have nearest statistics to the original MV from the steganalyzer’s point of view in case of local optimality.

    $$ DoLC_{i}=(J_{M{V_{i}^{R}}}-J_{MV^{Opt}_{i}})/J_{M{V^{R}_{i}}} $$
    (8)
    $$ DoMV_{i}=|MV^{Opt}_{ih}-MV^{R}_{ih}|+|MV^{Opt}_{iv}-MV^{R}_{iv}| $$
    (9)
    $$ S_{1}=\{MV_{j}||DoMV_{org}-DoMV_{j}|=M_{1}\} $$
    (10)
    $$ M_{1}=\min\{|DoMV_{org}-DoMV_{j}||j=2,4,5,7\} $$
    (11)

    In (8), Jx is the Lagrangian cost function associated with the motion vector x. In (9), \( MV^{R}_{ih}\) and \(MV^{R}_{iv}\) stand for the horizontal and vertical components of the decoded MV. Also, \(MV^{Opt}_{ih}\) and \(MV^{Opt}_{iv}\) stand for the horizontal and vertical components of locally-optimal MVs obtained by further motion estimation respectively.The most locally optimal MVs. Among all of MVs existing in the set S1, the MVs that seem to be more optimal in the receiver’s side (with respect to DoMV) are selected by applying (12).MV which imposes the nearest Lagrangian cost to that of the original MV. Finally, among MVs belonging to set S2, the MV which has the closest DoLC to DoLCorg is selected as the alternative MV (14).

    $$ S_{2}=\{MV_{j}\in S_{1}|(\frac{DoMV_{org}-DoMV_{j}}{|DoMV_{org}-DoMV_{j}|})=M_{2}\} $$
    (12)
    $$ M_{2}=\max\{(\frac{DoMV_{org}-DoMV_{j}}{|DoMV_{org}-DoMV_{j}|})|MV_{j}\in S_{1}\} $$
    (13)
    $$ MV_{Alternative}=\min_{MV_{i}}\{|DoLC_{j}-DoLC_{org}||MV_{i}\in S_{2}\} $$
    (14)
  2. Step 3.

    Embedding Costs Evaluating: In this step, temporal and spatial costs are evaluated which subsequently form the overall cost of modification for different MVs.Temporal cost function. For each block (bk), temporal cost of MV modification is obtained by (15). Indeed, the temporal cost of modifying each MV is equal to the difference between the original and alternative motion vector plus the absolute difference between exponentials of “difference of Lagrangian costs” of original and alternative MVs from the receiver’s perspective. In (15), exp stands for “exponential” and exp(x) equals ex. Exponential function is used to enlarge the difference between \(DoLC_{MV_{org}}(b_{k})\) and \(DoLC_{MV_{Modified}}(b_{k})\).

    $$ \begin{array}{@{}rcl@{}} {}Cost_{Temporal}(b_{k})& = &DoMV_{MV_{Modified}}(b_{k})\\&& + |exp\{DoLC_{MV_{org}}(b_{k})\} - exp\{DoLC_{MV_{Modified}}(b_{k})\}| \end{array} $$
    (15)

    Spatial cost function. As mentioned in [51], neighboring MVs may belong to a same rigid objects or fixed backgrounds. These MVs are not suitable cloaks for confidential information as their manipulation can leave statistical clues about embedding. Accordingly, we should consider the spatial correlations of neighboring MVs in the embedding cost function. The more similar the neighboring MVs of a MV are, the more spatial cost its modification imposes. Inspired by picture processing grammar [11] and XPG modeling [12], the spatial cost function is calculated as follows.

    First, the MVs of eight neighbors of each block are obtained. Assuming the situation of the top-left pixel of existing block is equal to (X(bk), Y (bk)), and the

    block’s size is (BSx(bk), BSy(bk)), the neighboring MVs are calculated as follows (Fig. 5).

    $$ \begin{array}{@{}rcl@{}} MV_{n1}(b_{k})&=&mv(X(b_{k})-1,Y(b_{k})-1) \\ MV_{n2}(b_{k})&=&mv(X(b_{k}), Y(b_{k})-1) \\ MV_{n3}(b_{k})&=&mv(X(b_{k})+BS_{x}(b_{k}),Y(b_{k})-1) \\ MV_{n4}(b_{k})&=&mv(X(b_{k})-1, Y(b_{k})) \\ MV_{n5}(b_{k})&=&mv(X(b_{k})+BS_{x}(b_{k}),Y(b_{k})) \\ MV_{n6}(b_{k})&=&mv(X(b_{k})-1,Y(b_{k})+BS_{y}(b_{k})) \\ MV_{n7}(b_{k})&=&mv(X(b_{k}),Y(b_{k})+BS_{y}(b_{k})) \\ MV_{n8}(b_{k})&=&mv(X(b_{k})+BS_{x}(b_{k}),Y(b_{k})+BS_{y}(b_{k})) \end{array} $$
    (16)

    Next, the spatial cost is calculated employing (17). In (18), the function Unique(x) counts the unique observations in the set x. \(S_{MV}^{h}(b_{k})\) and \(S_{MV}^{v}(b_{k})\) are the sets of horizontal and vertical components of eight neighboring MVs of the original MV in k th block respectively. Figure 6 shows three examples of neighboring MVs of a block and their corresponding spatial costs.

    $$ Cost_{spatial}(b_{k})=Cost_{spatial}^{h}(b_{k})+Cost_{spatial}^{v}(b_{k}) $$
    (17)
    $$ \begin{array}{@{}rcl@{}} Cost_{spatial}^{h}(b_{k})=\frac{9-Unique(S_{MV}^{h}(b_{k}))}{8}\\ Cost_{spatial}^{v}(b_{k})=\frac{9-Unique(S_{MV}^{v}(b_{k}))}{8} \end{array} $$
    (18)
    $$ \begin{array}{@{}rcl@{}} S_{MV}^{h}(b_{k})=\{MV^{h}_{ni}(b_{k})|i=1,2, ... , 8\}\\ S_{MV}^{v}(b_{k})=\{MV^{v}_{ni}(b_{k})|i=1,2,...,8\} \end{array} $$
    (19)
  3. 1)

    The main cost function: Finally, the cost of altering the k th MV of the existing P-frame is evaluated employing following function (α and β are adaptable parameters. In experiments, we have set α = 4 and β = 2). We have added 1 to each of the two cost functions in order that the value of both parentheses is not lower than 1 and consequently, multiplication of them is always greater than both values.

    $$ Cost(b_{k})=(\alpha Cost_{emporal}(b_{k})+1)^{\upbeta}\times (Cost_{spatial}(b_{k})+1) $$
    (20)
  4. Step 4.

    MVs Re-ordering: In videos with slow movements, fixed background or fixed camera, the order of scanning MVs must be pseudo-random. This can be performed by sharing a key between the transmitter and the legitimate receiver. This pseudo-random selection is not to improve the cryptographic security, but to improve security against steganalysis attacks. Without employing a pseudo-randomizer, all available motion vectors for embedding one bit are likely to be a part of the background or improper for embedding. Also, the MVs related to suitable regions of video such as deformable parts (e.g., a moving arm) having more appropriate statistical properties for altering will not be completely utilized.

    In this step, the order of MVs is modified by a pseudorandom-generator key which is obtained using (21) (In (21), n is the number of MVs in the frame, ER is the embedding rate, and h is the parameter of syndrome-trellis coder). As mentioned in Section 2, the syndrome-trellis coder has access to h × w motion vectors for embedding each bit. Hence, the pseudorandom-generator key is optimized with respect to the embedding rate and h. Actually, we partition all MVs of the frame into the maximum number of accessible MVs for Syndrome-trellis coder. This way, the syndrome-trellis coder can access different parts of the frame during embedding each bit. The transformation matrix is then formed as (22). Finally, the order of MVs is altered by applying (23) (MV and RMV are the vector of primary MVs and re-ordered MVs respectively).

    $$ K=\lfloor \frac{n\times ER}{h}\rfloor $$
    (21)
    $$ \mathbf{X} = \left( \begin{array}{ccccc} 0\times K+1 & 1\times K+1 & {\ldots} & ((h\times w)-1) \times K+1\\ 0\times K+2 & 1\times K+2 & {\ldots} & ((h\times w)-1) \times K+2\\ {\vdots} & {\vdots} & \ddots& \vdots\\ 0\times K+K & 1\times K+K & {\ldots} & ((h\times w)-1) \times K+K \end{array}\right) $$
    (22)
    $$ \begin{array}{@{}rcl@{}} RMV(:,i)= \left\{\begin{array}{lll} MV(:,f_{X}(i)) &\text{if}\quad 1\leq i\leq (h\times w)\times K\\ MV(:,i) & \text{else} \end{array}\right. \end{array} $$
    (23)
    $$ f_{X}(i)=X(\lceil\frac{i}{h\times w}\rceil,i-(\lceil\frac{i}{h\times w}\rceil-1)(h\times w)) $$
    (24)
    $$ \mathbf{MV} = \left( \begin{array}{llll} MV_{h}(1) & MV_{h}(2) & {\ldots} & MV_{h}(n)\\ MV_{v}(1) & MV_{v}(2) & {\ldots} & MV_{v}(n) \end{array}\right) $$
    (25)

    Figure 7 shows an example of MVs re-ordering and its effect on the arrangement of MVs that are fed into the Syndrome-trellis-coder. In this example, h × w is assumed to be equal to 4.

  5. Step 5.

    Syndrome-trellis coding: The binary vector p is formed using parity check function and RMV (26). The syndrome-trellis-coder takes vector p along with MVs’ modification costs and payload, calculates optimal MVs for embedding, and returns \(\widetilde {p}\) (6).

    $$ p(i)=LSB(RMV(1,i)+RMV(2,i)) $$
    (26)
  6. Step 6.

    Information Embedding: For each element of \(\widetilde {P}\) that is unequal to P, we replace its corresponding MV with the alternative MV obtained in the step 2 (14).

  7. Step 7.

    MVs Re-predicting: In order to transmit correct information to the decoder, it is necessary to perform MV prediction after embedding on the MVs of each frame (Please refer to the Appendix for more explanation).

  8. Step 8.

    P-frame Encoding: In this stage, the rest of the typical video compression algorithm (including variable-length coding) for the existing frame is applied using the information obtained in the previous stages.

Fig. 4
figure 4

Candidate alternative motion vectors

Fig. 5
figure 5

Motion vectors of the neighboring blocks

Fig. 6
figure 6

Examples of a block’s neighboring MVs and spatial cost of modifying its corresponding MV

Fig. 7
figure 7

Effect of MVs reordering on the arrangement of MVs that are fed into the Syndrome-trellis-coder. a blocks and their numbers in a P-frame. b the usual arrangement of MVs that are fed into the Syndrome-trellis-coder using raster scanning. c the arrangement of MVs after re-ordering

4.1 Extraction procedure

The extraction algorithm is also applied to P-frames one after another. The confidential message is then obtained by accumulating extracted information and subsequently decrypting it.

  1. Step 1.

    Reordering MVs: After decoding the bitstream of MVs [42], their order is altered by employing the common key and (21)-(23).

  2. Step 2.

    Syndrome-trellis decoding: Having embedding rate, the vector of reordered MVs and h, we apply syndrome-trellis decoding to obtain the confidential message.

5 Experiments

5.1 Experimental settings

  • Database: Figure (8) shows the first frame of 22 video sequences with PAL QCIF resolution and without prior lossy compressionFootnote 3 (192 × 144 pixels), being downloaded from [49] and used to construct the database. The database consists of a wide range of videos with respect to diversity in camera and objects movement. Since the video sequences contain a different number of frames, all sequences are trimmed into non-overlapping 60-frame subsequences, and utmost five 60-frame subsequences are applied for experiments. Totally 84 subsequences are used for experiments.

  • Video Compression Method: Without loss of generality, H.264’s baseline profile is employed for video compression [38]. Motion estimation methods are exhaustive (full) search and HEX search [52]; motion estimation resolution is equivalent to quarter-pixel; the search range is 8 pixels, and quantization parameter is equal to 17, 27 and 32.

  • Competitor Steganography Approaches: In order to evaluate the performance of the proposed method, we compare its results (denoted by Prop*) with the results of methods [9] (denoted by ALG1), [56] (denoted by ALG2), [10] (denoted by ALG3), and [7] (denoted by ALG4) Footnote 4.

  • Steganalysis Manners: In order to illustrate the security of the proposed method in comparison with rivals, we apply MVRB features [8], AoSO features [45], and NPELO features [53], which are the best video steganalysis methods against MV-based video steganography to date (The method [39] which exploits the entropy of blocks for feature extraction has proved to be a reliable detector, and its results are close to that of [53]). Meanwhile, features in [8] are extracted using macroblocks; while in H.264 video compression algorithm, each macroblock may contain some blocks. Therefore, in order to improve the performance of this steganalyzer, the features are extracted using blocks.

  • Training and Classification: For each steganography method, a random sequence with uniform distributionFootnote 5 is produced and embedded in all subsequences with rates 0.1, 0.2 and 0.3 of total MVs per P-frame. All embedded and compressed videos are separated into non-overlapping 12-frame subsequences, and a feature vector is extracted from each subsequence. Indeed, we have 420 embedded twelve-frame-subsequences and 420 clean twelve-frame-subsequences per each training and testing round (each number in Tables 34 and 5). Afterwards, 80% of feature vectors are randomly chosen for training, and the remaining feature vectors are exploited for testing the classifiers. We use MATLAB’s SVM toolbox for training each steganalyzer and classification. Also, Gaussian and Polynomial kernel functions are applied and the best answers are listed.

  • Evaluation Criteria: We compare the detector reliability (AUC) of steganalyzer approaches against proposed and rival methods to illustrate the fact that our scheme outperforms the best existing MV-based video steganography approaches in case of undetectability [5, 6, 15]. In order to clarify the effect of applying a pseudorandom-generator key on security improvement, we have shown the results of the proposed method without reordering MVs (denoted by Prop) and with reordering MVs (denoted by Prop*). Imperceptibility is measured by PSNR which can show the perception of quality degradation by human visual system better than other criteria [13, 31]. Also, mean output bitrate per P-frame is measured to compare the compression ratio (The results correspond to standard video compression are denoted by STD). Additionally, the percentage of increase in compression time has been shown to demonstrate the difference in the computational cost of the proposed method and rivals.

  • Other settings: In the competitor approaches [7, 9, 56], embedding is applied by employing full search. So the search range for finding embedded MVs is the same as the search range for motion estimation and equal to 8 pixels. On the contrary, embedded MVs in the proposed method are selected from the set of only four MVs around the original MVs. Also in [9], The set of Lagrangian multipliers which are used for MVs’ embedding cost function is λ = [0, 2, 4, 6, 8] and we set b = − 2 and α = 0.5 for the distortion function. The toolbox for syndrome-trellis code which is used to implement the proposed and rival methods is downloaded from [22].

    We set h = 8 for the syndrome-trellis coder used in the proposed and competitor algorithms. Also, we just use ER = 0.2 and QP = 27 for all criteria exclusively of security.

Fig. 8
figure 8

tree structured motion compensation for H.264 [38]

Table 3 Detector reliability of MVRB against our proposed method and rival steganography approaches (SA) with different motion estimation algorithms (ME), embedding rates (ER), and quantization parameters (QP)
Table 4 Detector reliability of AoSO against our proposed method and rival steganography approaches (SA) with different motion estimation algorithms (ME), embedding rates (ER), and quantization parameters (QP)
Table 5 Detector reliability of NPELO against our proposed method and rival steganography approaches (SA) with different motion estimation algorithms (ME), embedding rates (ER), and quantization parameters (QP)

5.2 Experimental results

  1. 1)

    Security: The security of the proposed method and ALG1-4 against three detectors are compared in Tables 35. Table 3 shows that ALG1 cannot resist against MVRB and the classes in some cases are completely distinctive by this feature set. Other algorithms have shown a fair level of security against this detector, specifically when it comes to HEX motion estimation algorithm. The figures for the proposed method, however, are much closer to 0.5 than that of rivals.

    According to Table 4, the detector AoSO is not reliable against the proposed method and competitors, except for ALG1. The resistance of competitors has increased when the fast search algorithm has been applied in video compression. The proposed method has surpassed all rivals in security against AoSO, the figures of which hover around 0.51.

    Only the proposed method is secure against NPELO (Table 5). ALG2 which showed acceptable resistance against previous detectors fails to resist against NPELO in high embedding rates in both HEX and FULL ME conditions. Furthermore, a sharp decrease in security with increasing the embedding rate can be observed in all rivals, whereas these changes in the performance of the proposed scheme are quite negligible.

    The positive impact of MVs reordering in the performance of the syndrome-trellis coder can be observed in Tables 3-5. In the case of FULL search ME algorithm and smaller quantization parameters where MVs are overwhelmingly locally-optimal and less information is lost during compression, the influence of reordering is more prominent.

    In summation, the records of various detectors’ reliability against the proposed method and rivals have proved that not only our method has surpassed all of the competitors in undetectability, but also it can highly resist against the best detectors so that their results are close to random guessing.

  2. 2)

    Imperceptibility: Table 6 compares PSNR between the original and embedded sequences. Looking first at the results of FULL search, it can be perceived that there is a slight difference between the PSNR correspond to the output of the standard compression algorithm and embedding algorithms, exclusively of ALG1. The reason why we see the average reduction of 0.18db in PSNR of ALG1 is improper cost function which does not reflect the characteristics of the alternative MV. Although the replaced MV is obtained using λME = 0 and consequently we may expect a better PSNR in comparison to the original video, the impact of this phase is negated by the inaccurate cost function. The average differences in PSNR for other algorithms is roughly equal to 0.001db that is quite acceptable.

    Moving on to the results of HEX search, a more perceptible average increase in PSNR of ALG2 and ALG4 (0.02db and 0.04db) can be seen. This phenomenon is because of their cost function which uses the whole allowed search window to find the replaced MV. However, an increase in PSNR cannot be regarded as a merit if it negatively affects the computational cost or output bitrate. There is a small increase in PSNR of the proposed method, being equal to 0.01db on average. Also, the results of ALG1 are similar to that in the FULL search.

  3. 3)

    Compression Ratio: Tables 7 and 8 display the average output bitrate (Kilobytes per P-frame) and the size of the embedded message (Bits per P-frame). Since the size of the confidential information in each sequence is approximately equal in the proposed and rival approaches, we can expect the output bitrates to be similar as well. The only method which has demonstrated weak performance in the compression ratio is ALG1, and the output bitrates of other methods including ours are close to that of the standard compression.

  4. 4)

    Computational Cost: In Table 9, the ratio of embedding time to standard compression time is indicated. As can be seen, the computational cost of the proposed method is far less than that of competitor schemes, being roughly equal to 11% and 12% in the case of FULL and HEX ME algorithms respectively. In sharp contrast with the proposed algorithm, ALG2, the security of which is the best among all rivals, has the highest computational cost. Indeed, the average computational cost of ALG2 is more than 7 times that of the proposed method. The computational cost that ALG3 imposes is the closest one to the proposed method, owing to the fact that both schemes consider only 4 MVs around each original MV as candidates of embedded MV.

Table 6 PSNR of video sequences produced by standard compression, our proposed method, and rival steganography approaches (SA) with different motion estimation algorithms (ME). Embedding rate for all steganography methods is set to 0.2, and the quantization parameter is equal to 27
Table 7 Average output bitrate of video sequences produced by standard compression, our proposed method, and algorithms 1-4 (motion estimation: FULL, embedding rate: 0.2, Quantization Parameter: 27)
Table 8 Average output bitrate of video sequences produced by standard compression, our proposed method, and algorithms 1-4 (motion estimation: HEX, embedding rate: 0.2, Quantization Parameter: 27)
Table 9 Computational cost of our proposed method and rivals (embedding rate: 0.2, quantization parameter: 27)

6 Discussion

The idea behind our proposed method is rooted in the fact that the outputs of all of the lossy compression algorithms are regarded as innocent media, notwithstanding the lost information which is not retrievable. Thus we can increase the security of a steganography method if the information is hidden in media so that statistical clues of manipulation can be lost through the procedure of lossy compression. In order to design a reliable steganography method, the following points should be noted:

  1. 1)

    With increasing the size of video frames, the necessity of reordering MVs and its impact on security improvement will increase. For example, in a video sequence with CIF resolution, because the size of the background is larger than that in QCIF resolution, more successive MVs sets can have improper statistics for altering. Therefore, the impact of pseudo randomization in undetectability would be remarkable.

  2. 2)

    Since the trellis encoder has access to just h × w motion vectors, with increasing the embedding rate, the number of MVs which are accessible to embed one bit would decrease; So the impact of reordering MVs would be more palpable. A simple comparison between the security degradation of the proposed method and rivals with increasing the embedding rate (Table 3-5) can prove the mentioned logical statement.

  3. 3)

    By reordering MVs, even using a smaller h in the trellis coder, we can reach higher security. Actually, the role of the pseudo-randomizer key is to disperse undetectable MVs.

  4. 4)

    A smaller change in motion vectors leads to a smaller change in the statistical characteristics of the video. Hence, efficient usage of compression techniques which support a smaller sub-pixel motion estimation can substantially contribute to improving the steganography performance.

  5. 5)

    Purposive selection of video compression settings can enhance steganographic security. As a concrete example, video compression settings which lead to more non-locally-optimal MVs with respect to the Lagrangian multiplier from the eavesdropper’s point of view increase the resistance of MV-based steganography against steganalysis attacks. Therefore, fast search algorithms producing more indeterministic components and consequently reducing the percentage of locally optimal MVs can provide more appropriate content for embedding.

  6. 6)

    It should not be left unmentioned that the obtained results are the average of results for different covers. It is amply clear that the selection of host media plays a prominent role in the security of steganography methods against steganalysis attacks. The existence of more deformable objects, less fixed background, faster-moving objects, and fast-moving cameras will result in a better performance of steganography.

7 Conclusion

This paper suggests an adaptive approach for hiding information in MVs of the video. According to the results, a dramatic increase in the security of the proposed method compared to current outstanding MV-based steganography methods is observed. Moreover, the proposed approach defeats the prominent steganalysis attacks. Indeed, due to unreliable evidence of modification, steganalyzers cannot distinguish a clear video from a stego one produced by the proposed algorithm. This improvement originates from three contributions:

  • Preserving temporal statistics of MVs by applying the statistical differences of the original and the best alternative MV: The most important MV-based temporal steganalytic features are extracted based on local optimality of the received MVs. MV-based steganography usually causes the MVs to shift from optimal to non-optimal. This problem can be addressed by selecting alternative MVs which have the closest local-optimality-related steganalytic features at the receiver’s side. All the information which are available in the steganalyzer’s side can be utilized in the steganographer side, whereas some information being available in the transmitter side may vanish during the lossy compression procedure. This superiority should be utilized against eavesdroppers. We have proposed a method to select the MVs with closest local-optimality-related steganalytic features to that of the original MVs. Moreover, the embedding cost function is designed so that alternative MVs impose lower embedding costs.

  • Preserving spatial statistics of MVs by applying their correlation to the cost function: MVs may have specific patterns in some areas of the video. The less the entropy of MVs around a particular MV is, the more perceptible changes its altering will impose on the statistical characteristics of the output video. For instance, if all of the neighbors of an MV are equal to zero, changing this MV may result in moving some feature vectors of the detector to a particular area. As a consequence, the margin between the two classes of clean and dirty media will increase, and the detector can reach more distinctive classes. By contrast, if all of the neighboring MVs are different together, manipulating the central MV leads to scattering the feature vectors to different areas and this sparse shift will cause an increase in security against steganalysis attacks. The designed embedding cost has an indirect relationship with the entropy of neighboring MVs. Thus the syndrome-trellis-coder is encouraged to choose MVs from deformable dynamic regions rather than static regions.

  • Improving the availability of proportionate MVs for syndrome-trellis coder using an adjustable pseudorandomization key: An optimized embedding cost function is the necessary condition for a secure steganography method, but not the sufficient condition. In fact, the distribution of proper hosts to be selected by the syndrome-trellis coder is as important as a perfect cost function. The syndrome-trellis coder sometimes does not have access to low-cost MVs for manipulation. The mentioned problem is supposed to be alleviated to a great extent by re-ordering MVs.

It is expected that the detector’s reliability will drop further in real-world conditions where the exact information about the embedding rate and ME algorithm are not available [28]. As a result, secure steganography with higher embedding rates is possible.

The proposed approach also causes a slight improvement in compression ratio and visual quality, as well as achieving the smallest computational cost in comparison with rivals.