1 Introduction

Network security is becoming increasingly important as more and more data is exchanged over the Internet. Therefore, data security and confidentiality are essential today to prevent any unauthorized access, and so there is a great growth of the information hiding field. Information hiding techniques have been performed efficiently on different applications like army and radar applications, healthcare applications, anti-criminal, and commercials [16, 21, 22]. To solve the problem of the protection-hidden message, there are three methods that can be employed which are cryptography, watermarking, and steganography [16].

Cryptography converts a plain message to an unreadable one, called cipher message. To decipher this unreadable message, a secret key is required that is used in an algorithm for performing the encryption process or decryption process [6, 39]. Watermarking and steganography are also used for the purpose of data hiding through hiding media in other media. In the steganography process, hiding the data cannot be seen or detected [3, 25, 28]. However, in the watermarking process, the information may not be completely hidden, as it should be secure, and a watermark is embedded into the data to save the privacy of the user. So, watermarking is the procedure of hiding data into multimedia information without observing from the humans, however, this data is clearly discovered by a detector or a computer. So, the watermarking process is suitable for a lot of applications and services like authentication, fingerprinting, copy control, and copyright protection. The watermarking process involves an embedding process and an extraction process, which has many features like robustness, fidelity, and tamper-resistance [2, 30].

Steganography has become an essential process for identification and authentication applications. Steganography can be applied to several types of data such that audio, video, and images, and it can hide any kind of digital information. Steganography and steganalysis are two contending concepts [43]. Steganalysis art is the science of identifying the hidden data concealed in digital multimedia information utilizing the steganography process, which called steganography detection. It can lead to the avoidance of unfortunate security incidents. The amount of hidden data compared to the size of the cover multimedia data will determine the detection ability. There are various steganalysis methods like carrier comparison, structural inspection, and statistical analysis. Steganalysis is divided into two main categories of analysis: statistical and visual analyses [54].

The main classification of steganography techniques is spatial domain-based schemes, frequency domain-based schemes, and adaptive schemes. The spatial domain-based schemes are generally classified into the Pixel Value Difference (PVD) schemes, the Least Significant Bit (LSB) schemes, and the machine learning-based schemes [13]. In the LSB methods, hidden data is distributed among the LSBs of each pixel. The PVD methods are utilized to differentiate the smooth areas from the edge areas. The machine learning is considered as a subset of artificial intelligence, which provides the ability to learn without being programmed. Frequency domain-based techniques include the Discrete Cosine Transform (DCT) and DWT. They are used to convert an image or a video frame to a frequency domain from its original form in a spatial domain [23, 47]. These methods are slower and difficult than spatial domain-based methods since a cover image or a cover video frame should be transformed into the frequency domain coefficients before embedding the secret information. Each one of these methods has its own advantages and disadvantages [1].

Both cryptography and steganography can be used to get more confidentiality and privacy of data and keep it secret. The steganography process varies from the cryptography process in that steganography concentrates on maintaining the existence of a data secret, while cryptography concentrates on preserving the contents of a data secret [14, 40]. Many researchers have developed various steganography algorithms to retain confidentiality by embedding the data in various carriers like video, image, audio, and text [10, 41]. Noda et al. [36] developed a steganography technique that is based on the lossy compressed video to embed large data in its frames. The Bit Plane Complexity Segmentation (BPCS) steganography method is utilized in the proposed technique. Also, the wavelet compression is utilized for compressing the video data. Thus, the Motion-JPEG2000 and 3-D Set Partitioning In Hierarchical Trees (SPIHT) wavelet-based video coding approaches are utilized. The wavelet coefficients in the video frames are quantized into an arrangement of bit-planes, and hence the steganography process of the BPCS can be performed in the wavelet domain. Also, the integration of the Motion-JPEG2000 and the BPCS steganography process are presented. Furthermore, the 3-D SPIHT and the BPCS steganography process are evaluated. The simulation results proved that the 3-D SPIHT with BPCS steganography achieved better performance than the Motion-JPEG2000 and BPCS steganography.

Mansouri et al. [29] suggested an adaptive steganography scheme based on spatial and temporal features of the human visual system and video signal features to hide secret information in a coded video sequence. The motion vectors of the P-VOP (Video Object Plane), B-VOP, and qualified-DCT coefficients of I-VOP are utilized for the spatial and temporal characteristics of the video, respectively. The extraction of the hidden data is performed without full decompression. The simulation outcomes proved that the suggested adaptive steganography method has high capacity and imperceptibility. Dasgupta et al. [4] developed a video steganography algorithm for hiding the secret data in video frames based on the LSB method. The proposed algorithm is improved by utilizing the genetic algorithm that is developed to obtain an optimum embedded message imperceptibility. Video quality and imperceptibility are the main factors to measure the efficiency and effectiveness of secret data hiding and extraction. The tests and findings proved that the suggested algorithm improved the image fidelity and PSNR.

Ramalingam et al. [41] offered a video steganography scheme based on a Markov model to enhance the embedding and extraction speed of the embedded message. They utilized the state transition dynamics and the conditional states among the video frames to hide and extract the secret message. The simulation results proved that the suggested algorithm improved the information embedding time by 3–50%, the extraction rate of secret message enhanced by 21–76%, with a processing computational cost of 21–90%, and the protection performance enhanced by 5–76% compared to the literature algorithms. Mihara et al. [32] presented a quantum steganography approach based on prior entanglement, where it depends on quantum physics. This approach is based on combining the quantum error-correcting codes and prior entanglement that allows the hidden message and cover message content to be created, separately. In several steganography methods, hiding a secret message in error-correcting codes may give rise to harm if the hidden region is corrupted. The proposed approach considered that the essential cover message form must not be changed due to the hidden message.

Sushmitha et al. [48] developed a proposed video steganography system to hide a video as a secret data in another video stream as a cover data based on the DWT. Also, the authors extended the proposed system to embed double-secret videos in one cover video. The experimental outcomes achieved acceptable PSNR with negligible degradation in video quality and the difference amongst the original video, stage-video, and reconstructed secret video is not noticeable. Khalil et al. [19] presented an efficient technique for embedding audio samples in the digital image quaternion frequency domain. Each sample of the employed audio signal is integrated with the pixel values of the three-color components attended from the utilized cover digital image to yield a quaternion number. The absolute value of this quaternion number is communicated, and then the original sample of the audio signal is obtained at the receiver based on simple quaternion mathematics. Ramadhan et al. [33] suggested a secure and robust HEVC steganography method based on error-correcting codes and multiple object tracking (MOT) algorithm in the DCT and DWT domains. The secret message is firstly encoded and subsequently inserted into the coefficients of the DCT and DWT of the regions of interest of the host HEVC frames. The outcomes demonstrated that the suggested scheme increased the imperceptibility, embedding capacity, and security performance against different attacks.

Khalil et al. [20] suggested a steganography methodology depending on the quaternion Fourier transform to conceal text information in a digital cover electronic image. The secret information is transferred to the quaternion domain before its embedding, and the digital cover electronic image is represented in the quaternion domain space. The simulation results proved that the embedded message is extracted without changes from the original message and the cover image can be transmitted without noticeable changes. Hashemzadeh et al. [10] suggested a video steganography scheme based on the human visual system’s weaknesses to understand the modifications in dynamic scenes. The algorithm depends on detecting video scene regions with extremely dynamic. Then, it utilizes these regions to embed the data and computes the data size that will be hidden in detecting dynamic regions. The scene dynamics are determined by utilizing the motion clues of feature points, and the capacity of each hidden pixel is determined by statistical indicators based on the feature points behaviors. The simulation outcomes demonstrated that the proposed scheme achieved acceptable execution than the existing algorithms.

Wang et al. [52] introduced an improved intra-prediction mode (IPM)-based HEVC steganography scheme. The security efficiency of stego HEVC frames is achieved by embedding secret messages in the coding and prediction units of the compressed video frames. Simulation results clarified that the presented scheme could be implemented easily, and it could maintain the video quality. Shuyang et al. [27] presented a robust secret sharing based HEVC steganography algorithm. In the proposed algorithm, three different intra-frame prediction classes are utilized to avoid the distortion drift of the HEVC intra-frames. In the proposed algorithm, the embedded data has been coded into various sub-secrets to enhance the robustness performance of the secret message. After that, the encoded data is inserted into the frequency coefficient values of the luminance blocks of the HEVC frames. The simulation outcomes proved the survival of the proposed algorithm against attacks compared to the literature algorithms. Dong et al. [5] suggested an efficient HEVC steganography scheme based on small-sized Prediction Blocks (PBs) and large-sized PBs. The proposed technique exploits the feature of multi-sized prediction modes in the HEVC coding to perform the steganography process. The advantages of the proposed technique are the enhancement of embedding capacity and the preservation of coding efficiency.

Zhe et al. [24] studied an efficient HEVC steganography methodology that transmits video frames with great privacy. The proposed steganography scheme is depending on the DCT domain and chaotic logistic mapping to improve video privacy and protection. Mehmet et al. [22] presented an effective HEVC hiding technique with achieving higher levels of capacity and fidelity. The proposed data hiding scheme is depending on the matrix coding scheme that embeds the secret data into the discrete sine frequency coefficient of the high-efficiency video encoded frames. Also, the proposed technique avoids error propagation resulted from the embedding process to achieve minimum distortion level in the visual quality of the transmitted HEVC frames. Galiano et al. [7] suggested an HEVC hiding approach based on changing the luminance video frame-blocks to embed and recover the secret data in HEVC frames. Simulation results proved that the proposed hiding scheme could recover secret information, maintain good visual quality, and robust to most of the steganalysis attacks.

The greatest critical and main disadvantages established in the related steganography techniques that will be avoided in our proposed QFFT-based HEVC steganography approach are as follows:

  • Many of the related techniques are uniquely and barely based on the original secret message without any pre-processing stages such as compression or encryption.

  • Many of the related techniques have not introduced a significant accomplishment for the estimated evaluation metrics like PSNR, SSIM, FSIM, and correlation coefficient.

  • Many of the related techniques failed to recover the secret message at the receiver side.

  • Many of the related techniques employ extremely two or three test digital images or videos for assessments and evaluations.

  • Many of the related techniques have not studied more steganalysis attacks on the stego videos.

  • Many of the related techniques have not studied the effect of noise attack on the stego videos to professionally evaluate the performance efficiency of the presented algorithm.

  • Many of the related techniques presented high computational cost or the time processing of the steganography process is not considered.

  • Many of the related techniques have not studied widespread comparisons with the previous related works.

Due to the practical and academic importance of HEVC security in different multimedia applications, we are motivated to introduce an efficient HEVC steganography method. Also, in terms of development with considering the restrictions of the literature algorithms, this motivated us to propose a covert HEVC steganography scheme in this paper to study the possibility of hiding an encrypted secret audio message within the stego HEVC frames in a complicated and secure manner. Moreover, to enable this scheme to be contrasted with the literature approaches, the main contribution of the suggested HEVC steganography scheme that it hides a large quantity of secret information with achieving higher robustness against multimedia steganalysis contrasted to the previous approaches. Also, the suggested scheme ensures that the secret information will be unobserved from intrusions attempts on the cover HEVC frames. Therefore, instead of direct hiding the plain audio message within the stego-media, the proposed approach applies two consecutive layers of encryption to the plain audio message before embedding it within the video signal. The first encryption layer is performed using the random projection encryption based on the Legendre sequence in the DWT domain. In the second encryption layer, the yielded audio message is represented as quaternion numbers using the QFFT technique. In the embedding process, the quaternion mathematics is employed to transfer HEVC video frames to the quaternion format to embrace the secret data which is also characterized in the quaternion domain.

Thus, the major novelty of this work is the introduction of an effective HEVC video steganography approach to hide a block of compressed and encrypted audio data within HEVC frames by using the quaternion mathematics concepts and QFFT. Firstly, the quaternion information is switched to the frequency domain utilizing the QFFT technique, and each cover video frame is converted to a quaternion form. The proposed approach can hide a large amount of secret audio information into the HEVC video frames with achieving higher imperceptibility. Also, at the receiver, the hidden secret audio information can be extracted utilizing straightforward quaternion functions and mathematics. The suggested approach has more robustness performance against the trails of multimedia steganalysis contrasted to the previous approaches. This article is structured as follows. Section 2 presents the basics of the related concepts used in this paper. The proposed HEVC video steganography approach is introduced in section 3. Section 4 provides the simulation outcomes and comparison analyses to evaluate the performance of the suggested steganography scheme. Section 5 concludes the paper and recommends some future works.

2 Preliminaries related work

In this section, the basic concepts of the DWT, random projection process, Legendre sequences, DCT-based compression process, and the concepts of quaternion mathematics used in this paper are discussed.

2.1 Basics of the DWT

The DWT can be utilized for the multiresolution decomposition process of a speech signal. It decomposes a given audio signal of a length (L) into two different sub-bands of different scales with a length (L/2) to investigate each scale, independently. The multiresolution output of the DWT can be described as the detail and approximation coefficients that have high-frequency components and low-frequency components, respectively. The approximation coefficients are generated by means of passing the speech signal through a low-pass filter, while the detail coefficients are generated by passing the speech signal through a high-pass filter. These coefficients can be utilized to build the model of the speech and audio signals [26]. The wavelet process can be represented as given in Fig. 1. In this paper, the DWT is employed to convert the input speech signal from a spatial pixel domain to a transform domain. Then, an encryption process is introduced to the obtained coefficients that considered as a diffusion process. The encryption process can be accomplished by changing the values (diffusion) or positions of the coefficients (permutation) to provide more confidentiality.

Fig. 1
figure 1

Concept of the wavelet analysis

The two filters outputs can be described as follows [46]:

$$ {y}_{low}\left[n\right]=\sum \limits_{k=-\infty}^{\infty }x\left[k\right]\ h\left[2n-k\right] $$
(1)
$$ {y}_{high}\left[n\right]=\sum \limits_{k=-\infty}^{\infty }x\left[k\right]\ g\left[2n-k\right] $$
(2)

where x[k] is the input speech signal, and g[n] and h(n) are the impulse functions of the high pass filter and low pass filter, consistently.

2.2 Basics of random projection process

It is known that the main objective of any data transformation using the projection technique is to reserve the data with much information as possible between the original and the transformed data groups with achieving better representation of the data in its new pattern. Random projection is introduced to project data points to random directions that are independent of the dataset [31]. It can be considered as a local sensitivity hashing method that is used for data hiding and security applications [35, 37].

The random projection process can be obtained by projecting the original signal on a random space [17, 38]. It can be generated by multiplication the original signal with a random matrix. It can be expressed as in (3):

$$ \overset{\rightharpoonup }{{\mathbf{Y}}_{k\ast n}}={\mathbf{A}}_{k\ast d}\ {\overset{\rightharpoonup }{\mathbf{X}}}_{d\ast N} $$
(3)

where Y and X represent the output and input random vectors, respectively. The vector A represents the random matrix.

It is worth mentioning that random projection retains the distance between the original and the produced data points with a high probability. It is proved in [31] that the required distances in the random space can be obtained using Gaussian random matrices. In this paper, the random matrix can be attained using Legendre sequences. Therefore, the proposed HEVC steganography system will be more robust against attacks.

2.3 Basics of Legendre sequences

Legendre sequences are also called pseudorandom sequences are generated using a deterministic algorithm. The Legendre and Pseudorandom sequences have broad applications in several disciplines like keystream sequences in stream ciphers, communication systems, computer simulation, cryptography, steganography, and other communication areas. Pseudorandom sequences are required to satisfy several randomness properties; otherwise, attacks can be launched based on the statistical deviation between the pseudorandom sequences and truly random sequences. In [8], the authors proposed three axioms for the appearance of binary periodic pseudorandom sequences: balanced, run property, and ideal autocorrelation. A pseudorandom sequence generator should have the following characteristics: good randomness of output sequences, long period, speed and efficiency, reproducibility, easy to implement, and fast computation.

For a prime p > 2, let Sn be the Legendre sequence that is defined as [8, 53]:

$$ {S}_n=\left\{\begin{array}{c}1,\kern2.75em \left(\frac{n}{p}\right)=-1,\\ {}0,\kern2.25em Otherwise,\end{array}\right.\kern3em n\ge 0 $$
(4)

where \( \left(\frac{n}{p}\right) \) denotes the Legendre symbol.

The Legendre sequence is a binary structure with many interesting randomness properties, ideal periodic, aperiodic autocorrelation functions, and large linear complexity. The linear complexity established with the common Berlekamp-Massey method [8]. The Legendre sequence has a great linear-complexity named the linear k-error complexity which does not decrease by any alterations in its initial conditions. So, this motivated us to employ the Legendre sequence in our proposed algorithm for encryption process due to its significant features. The linear complexity of a Legendre sequence is determined in [53].

2.4 Basics of DCT-based compression algorithm

Data compression is an approach that is employed to decrease the data size required to represent the sampled digital data, and therefore, it decreases the storage cost and transmission rate. Compression types are lossless and lossy compression [42]. In lossless coding approaches, the original information can be completely retrieved from the encoded (compressed) data. In lossy compression approaches, the retrieved data from the compressed data is not completely identical to the original data. The most utilized lossy compression approaches are based on the DCT, which is an algorithm that is used to convert a signal into primary frequency components. The DCT is widely utilized in image compression applications, and it is a close respective to the Discrete Fourier Transform (DFT). Based on the compression process, the DCT achieves better performance than the DFT, where it converges the low frequencies which include useful data in the upper left block corner. The DCT quantization based on maintaining the low frequencies and zeroing further entries. So, due to the considerable advantage of the DCT, this motivated us to employ it in our proposed algorithm for compression process.

During the encoding process, different processes such as zig-zag scanning, quantization, variable-length encoding, and run-length-encoding are applied after the DCT. This process flow is inverted during the decoding process. In this work, we employed the DCT-based compression algorithm to compress the audio signal, where the audio signal is converted to a square matrix before applying the compression process. The proposed DCT-based compression approach utilizes different quantization matrices of DCT’s coefficients, and the determined level of quantization matrix is concerning to the standard deviation of DCT’s coefficient blocks.

2.5 Basics of quaternion definition and algebras

Hamilton discovered a way to multiply in four dimensions, not three [9]. It is called a Quaternion, which has four components: three imaginary and one real. The elements of for the real quaternions are given as follows:

$$ \mathbb{H}=\left\{a+ bi+ cj+ dk:a,b,c,d\in \mathbb{R}\right\} $$
(5)

where as a 4-dimensional vector space over , is the field of real numbers, the set {1, i, j, k} is a natural basis for vector space, and the following rules are imposed:

$$ {\displaystyle \begin{array}{c}{i}^2={j}^2={k}^2= ijk=-1\\ {} ij=k,\kern0.75em jk\kern0.5em =i,\kern0.75em ki=j\\ {} ji=-k,\kern0.5em kj\kern0.5em =-i, ik=-j\end{array}} $$
(6)

The rules for multiplying i, j, and k by each other, put them in alphabetical order around a circle as explained in [12].

figure a

The products which are following clockwise get a positive sign and products which are going against the order get a negative sign, e.g., ki = j and ik = −j. For a quaternion,

$$ q=a+ bi+ cj+ dk\kern1em \in \mathbb{H} $$
(7)

and its conjugate \( \overline{q} \) is defined to be,

$$ \overline{q}=a- bi- cj- dk\kern1em \in \mathbb{H} $$
(8)

which has the properties of: \( \overline{q_1+{q}_2}=\overline{q_1}+\overline{q_2},\kern0.5em \overline{q_1{q}_2}=\overline{q_1}\ \overline{q_2}, \kern0.5em \overline{\overline{q}}=q \). The norm of q is given as follows.

$$ N(q)={a}^2+{b}^2+{c}^2+{d}^2 $$
(9)

Then, the \( \overline{q}q=q\overline{q} \) and \( N\left({q}_1{q}_2\right)={q}_1{q}_2\overline{q_1{q}_2}={q}_1{q}_2\overline{q_2}\ {\overline{q}}_1={q}_1N\left({q}_2\right)\ {\overline{q}}_1={q}_1\ {\overline{q}}_1N\left({q}_2\right)=N\left({q}_{21}\right)N\left({q}_2\right). \) The imaginary and real components of q are defined as:\( \kern0.1em \mathit{\operatorname{Im}}(q)= bi+ cj+ dk\kern0.5em \in \mathbbm{I} \) and Re(q) = a ∈ .If q ≠ 0, then, the inverse of q ≠ H is given as follows.

$$ {q}^{-1}=\frac{\overset{\sim }{q}}{\left|{q}^2\right|} $$
(10)

A pure quaternion can be defined as a quaternion with zero real part, while a unit quaternion can be represented as a quaternion with unit modulus.

$$ q=\frac{i+j+k}{\sqrt{3}} $$
(11)

The quaternion imaginary quantity can be expressed as three components that can be represented graphically in a three-space vector. Therefore, the quaternion number q can be explained as a summation of two parts: a scalar part S(q) and a vector part V(q) as follows:

$$ q=S(q)+\mathbf{V}(q) $$
(12)

where V(q) is a composition of three imaginary components:

$$ \boldsymbol{V}(q)= bi+ cj+ dk $$
(13)

and S(q) is the real components (S(q) = a).

The Discrete Quaternion Fourier transforms (DQFT) has been defined in [12]. There are three different types of the DQFT which are the two-sides DQFT, the left-side DQFT, and the right-side DQFT. These types are established according to the quaternion noncommutative property of the quaternion. Besides, they can be mathematically represented as follows [12]:

  1. 1.

    The two-sides DQFT:

$$ {F}_{L-R}\left(u,v\right)=\frac{1}{\sqrt{MN}}\sum \limits_{x=0}^{M-1}\sum \limits_{y=0}^{N-1}{e}^{-\mu 2\pi \frac{xu}{M}}f\left(x,y\right){e}^{-\mu 2\pi \frac{vy}{N}} $$
(14)
  1. 2.

    The left-side DQFT:

$$ {F}_L\left(u,v\right)=\frac{1}{\sqrt{MN}}\sum \limits_{x=0}^{M-1}\sum \limits_{y=0}^{N-1}{e}^{-\mu 2\pi \left(\frac{xu}{M}+\frac{vy}{N}\right)}f\left(x,y\right) $$
(15)
  1. 3.

    The right-side DQFT:

$$ {F}_R\left(u,v\right)=\frac{1}{\sqrt{MN}}\sum \limits_{x=0}^{M-1}\sum \limits_{y=0}^{N-1}\ f\left(x,y\right)\kern0.5em {e}^{-\mu 2\pi \left(\frac{xu}{M}+\frac{vy}{N}\right)} $$
(16)

where μ is a unit of any pure quaternion.

$$ {F}_f^{-q}\left[{F}_f^q\right]\left(m,n\right)=f\left(m,n\right)=\frac{1}{MN}\sum \limits_{u=0}^{M-1}\sum \limits_{v=0}^{N-1}{F}_f^q\left(u,v\right){e}^{\mu 2\pi \left(\frac{mu}{M}+\frac{nv}{N}\right)} $$
(17)

The Inverse Discrete Quaternion Fourier Transforms (IDQFT) can be represented mathematically as follows [12]:

  1. 1.

    The two-sides IDQFT:

$$ f\left(x,y\right)=\frac{1}{\sqrt{MN}}\sum \limits_{x=0}^{M-1}\sum \limits_{y=0}^{N-1}{e}^{\mu 2\pi \frac{xu}{M}}{F}_{L-R}\left(u,v\right){e}^{\mu 2\pi \frac{vy}{N}} $$
(18)
  1. 2.

    The left-side IDQFT:

$$ f\left(x,y\right)=\frac{1}{\sqrt{MN}}\sum \limits_{x=0}^{M-1}\sum \limits_{y=0}^{N-1}{e}^{\mu 2\pi \left(\frac{xu}{M}+\frac{vy}{N}\right)}{F}_L\left(u,v\right) $$
(19)
  1. 3.

    The right-side IDQFT:

$$ f\left(x,y\right)=\frac{1}{\sqrt{MN}}\sum \limits_{x=0}^{M-1}\sum \limits_{y=0}^{N-1}\ {F}_R\left(u,v\right)\kern0.5em {e}^{\mu 2\pi \left(\frac{xu}{M}+\frac{vy}{N}\right)} $$
(20)

The color image pixel or color video frame can be denoted as a pure quaternion if it requires a representation as a quaternion form [18]. It has three components of Blue, Green, and Red (RGB). The imaginary parts of a pure quaternion can be used to represent the RGB components. A pixel at image coordinates f(x, y) in an RGB digital image can be expressed as follows:

$$ f\left(x,y\right)=0a+ Ri+ Gj+B\mathrm{k} $$
(21)

where B, R, and G are blue, red, and green parts of a color digital image, correspondingly. For more details, descriptions, and explanations concerning the above-mentioned equations and their terminologies can be found in [9, 12, 18].

3 Proposed QFFT-based HEVC steganography approach

The proposed QFFT-based HEVC steganography approach comprises of two main processes. The first one is for hiding a secret audio message within the stego video frame (the cover after embedding the secret information), and the second one is for obtaining the hidden audio message. The embedding procedure is utilized to hide an encrypted secret audio message within the cover video frames. It is important to be insured the unnoticeability of the secret information from intruders’ interceptions on the cover HEVC frames. The extraction procedure is employed to reconstruct and extract the hidden secret information at the receiver side without any distortion with good quality. To assure that the proposed approach comprises of both high payload and high security, efficient cryptographic and compression techniques are employed as pre-processing steps on the secret audio message.

3.1 Proposed embedding procedure

The flow diagram of the whole embedding procedure of the proposed QFFT-based HEVC steganography approach is introduced in Fig. 2. The embedding procedure requires two main inputs: the first input is the secret audio message and the second input is the cover HEVC frame. The cover HEVC video frames are used to be the stego media that will contain a secret audio message. Each input video frame is resized into a square frame with a height h and a width w, and it is called a stego video frame. So, the secret audio message that will be hidden and embedded into the cover video frames must have not a maximum length greater than the value of L = h × w = h2. So, to embed a high amount of audio data in a stego cover, the transmitted audio message can be compressed prior to the process of embedding. In this paper, the DCT-based compression method is employed for compressing the audio signal. The audio signal is converted to a square matrix before applying the DCT-based compression technique that is based on various quantization matrices of DCT’s coefficients.

Fig. 2
figure 2

The flow diagram of the embedding procedure of the suggested steganography approach

To ensure that the suggested steganography scheme is more complicated against steganalysis, the compressed message (audio signal) is encrypted prior to embedding it into the cover video frame using a random projection algorithm based on the Legendre sequence. The schematic description of the proposed encryption process is illustrated in Fig. 3. It consists of two stages: the DWT analysis stage and the random projection encryption stage. In the DWT analysis phase, the secret audio information is decomposed into two sub-signals using the DWT to convert the input audio signal from the time domain to the transform domain. The two sub-bands are the approximation and details coefficients. The second stage is established using the random projection of the wavelet coefficients. The random matrix used in the random projection process is generated using the Legendre sequence that is a pseudo-random noise binary sequence used as a key for the encryption process. It is a semi-random sequence in the sense that it seems random within the sequence length, and it satisfies the randomness needs. Regarding the number of coefficients in each sub-band, the Legendre matrix is produced from a set of prime numbers. This matrix consists of a sequence that is repeated to create the required length. This sequence is exploited for encrypting the approximation and details coefficients, separately. The secret audio signal is generated by the concatenation of the two wavelet coefficients; therefore, the secret audio message has the same length as the original audio message.

Fig. 3
figure 3

The description of suggested encryption algorithm

At the receiver, the original audio secret information can be reconstructed by partitioning the secret audio message into two parts, and the inverse random projection transform is applied to each part, separately. Then, the inverse DWT is employed to reconstruct the compressed audio message. Finally, the decompression is applied to this signal to recover the original audio signal.

The stego HEVC frame is separated into the G, R, and B (green, red, and blue) elements that are utilized to build quaternion numbers QA in the form of a square array with the zero-real component:

$$ {Q}_A=0w+ Ri+ Gj+ Bk $$
(22)

Based on the mathematical discussions introduced in section 2.5, a secret audio signal with a length ≤ L is obtained in the form of a string, and each character value (c) of the acquired string is utilized to put the real component in a quaternion array QS:

$$ {Q}_S= cw+0i+0j+0k $$
(23)

The quaternion Fourier transform of QS is computed as:

$$ {Q}_B= QFFT\left(\ {Q}_S\right)= mw+ xi+ xj+ xk $$
(24)

The magnitudes of the three vector parts of the generated QB are identical. From quaternion concepts, it is known that in the case of obtaining three vector parts of a quaternion array with magnitudes of zeros, therefore, the generated quaternion Fourier transform of this such array will be in the manner demonstrated in Eq. (24). This is an important feature from the quaternion concepts that will be a considerable collaborator in building the embedding procedure, where it is potential to utilize only two elements of the quaternion array QB that are x and m.

The subsequent action is to insert the secret audio signal in the quaternion transform domain (QB) within the stego video frame in the quaternion arrangement (QA). For guaranteeing that the audio secret signal turns unobserved, a little percentage of QB is mixed to a great percentage of QA. To achieve this purpose, two factors γ and α are utilized with assuming that α + γ ≅ 1. Therefore, a small ratio of γ and a high ratio of α are employed to create the stego video frame as follows:

$$ {\displaystyle \begin{array}{c}{Q}_T=\alpha\ \left(0w+ Ri+ Gj+ Bk\right)+\kern0.5em \upgamma\ \left(0w+ mi+ xj+0k\right)\\ {}{Q}_T=0w+\left(\ \alpha\ R+\upgamma\ \mathrm{m}\right)\ i+\left(\ \alpha\ G+\gamma\ x\right)\ j+\kern0.5em \alpha\ B\end{array}} $$
(25)

The array that is introduced in Eq. (25) signifies the resulted stego HEVC frame, and it is communicated through an insecure or a secure communication channel. The pseudocode description of the suggested embedding procedure is demonstrated in Algorithm (1).

Algorithm (1) The pseudocode description of the suggested embedding procedure.

figure b

3.2 Proposed extraction procedure

The schematic flow diagram of the extraction procedure is displayed in Fig. 4. This procedure requires two inputs for recovering the secret audio message. These inputs are the received stego HEVC video and the original HEVC video. As described in the preceding discussion that the original HEVC frame must be resized and subsequently characterized in the form of a quaternion array as in (26).

$$ {Q}_A=0w+ Ri+ Gj+ Bk $$
(26)
Fig. 4
figure 4

The flow diagram of the extraction procedure of the suggested steganography approach

The received stego video frame is resized, and then characterized in a quaternion structure as:

$$ {Q}_a=0w+r^{\prime }i+g^{\prime }j+b^{\prime }k $$
(27)

For providing the values of γ and α with utilizing Eqs. (24) and (25), the secret audio signal information can be calculated as:

$$ m'=\left(r'-\alpha\ R\right)/\upgamma $$
(28)
$$ x'=\left(g'-\alpha\ G\right)/\upgamma $$
(29)
$$ {Q}_f=m'+x'i+x'j+x'k $$
(30)

The inverse process of the QFFT of Qf produced:

$$ {Q}_{c'}= IQFFT\ \left({Q}_f\right)=S'w+ 0\ i+ 0\ j+ 0\ k $$
(31)

where the obtained and reconstructed secret audio data is given by S′. The pseudocode description of the suggested extraction procedure is demonstrated in Algorithm (2).

Algorithm (2) The pseudocode description of the suggested extraction procedure.

figure c

4 Simulation results and comparative analysis

In the previous section, an efficient approach is suggested for concealing secret data (audio signal) within cover HEVC video frames. The identical approach can be employed to conceal a text or an image within cover HEVC frames. Firstly, the implementation of HEVC coding scheme is performed to compress the transferred HEVC sequences owing to its efficient coding and decoding performance. Numerous examinations and analyses on various HEVC test sequences, e.g., Balloons, Basketball, Breakdancer, PoznanHall, and Uli [34]. The utilized HEVC video sequences have different spatial and temporal characteristics. The standard of the H.265/HEVC Test Model (HM) codec [11] is employed to generate the compressed HEVC frames of each tested stream. After that, the simulation results are carried out on Intel Core i7–3770@2.80GHz with 16GB RAM using windows 10 64-bit running system utilizing MATLAB R2019b software to test and evaluate the proposed QFFT-based HEVC steganography approach.

Therefore, the suggested HEVC hiding approach is consisting of two fundamental processes: the first one is the embedding process and the second one is the extraction process. The embedding procedure is utilized to hide a secret audio signal inside a cover compressed HEVC video frame. The extraction process leads to revealing the secret audio signal at the receiver side. The MATLAB functions are employed to build the mathematical quaternion model to represent the compressed HEVC video frame and the secret encrypted-compressed audio message in quaternion arrays form. Both IQFFT and QFFT operations have been exploited for transforming from the spatial to the frequency domain and for the contrariwise process. These operations produce real numbers and values in the quaternion form.

One of the most important properties for any proposed steganography algorithm is the amount of concealing capacity. It is the greatest data that can be securely inserted in the cover medium with no noticing statistically perceptible things and robustness, which refers to how well the steganographic algorithm resists the extraction of hidden data. Considering the proposed QFFT-based steganography approach of embedding audio signal within the cover HEVC frames: the number of hidden audio samples depends on the number of pixels of a cover video frame. Assuming a video frame of W × H dimensions, where H ≤ W, the number of 8-bit audio samples that can be hidden within one video frame is H2. This amount of hidden data is higher than that of the traditional LSB scheme in which the data is embedded only in the least significant bit of each pixel for the same video frame, this amount equals to 3H × W/8.

To confirm the success and robustness of the proposed encryption scheme in addition to the proposed steganography approach, the quality of encrypted, compressed, and decrypted audio secret signals is investigated in our simulation tests. Fig. 5 (a and b) illustrates the time domain and the spectrogram results of two different tested long and short audio signals with different sizes (160Kilobyte and 35Kilobyte, respectively). It is known that the spectrogram is a graph of the intensity of a signal expressed as a function of frequency and time in which the vertical direction is the frequency (f), the horizontal direction is the time (t), and the amplitude (A) is shown on a grey-scale. It provides an exciting way to edit audio as it appears in tandem with a waveform display. The DCT-based compressed audio signals and their spectrograms of the employed long and short audio signals are presented in Fig. 6, while their ciphering audio patterns and their spectrograms are illustrated in Fig. 7. These encrypted audio signals are embedded into the cover video frames to form the stego video frames. It is obvious from spectrograms that the ciphered audio signals are similar to the white noise, therefore any intelligibility of the audio signals is removed. At the receiver side, the audio signals are extracted and reconstructed. Fig. 8 shows the reconstructed (deciphered and decompressed) audio signals and their spectrograms for the long and short audio signals.

Fig. 5
figure 5

The original audio signals and their spectrograms for the long and short signals

Fig. 6
figure 6

The compressed audio signals and their spectrograms for the long and short signals

Fig. 7
figure 7

The ciphered audio signals and their spectrograms for the long and short signals

Fig. 8
figure 8

The reconstructed (deciphered and decompressed) audio signals and their spectrograms for the long and short signals

The accuracy of the residual intelligibility is utilized to assess the perceptual and visual quality of the ciphered and deciphered audio signals. To demonstrate the robustness of the residual intelligibility, the objective qualities are measured that include the spectral and time domains metrics. The first one of time-domain metrics incorporates Signal-to-Noise Ratio (SNR) and Segmental SNR (SNRseg). The second one of the spectral domain metrics includes Spectral Distortion (SD). The SNR is a relation amongst the signal and noise energy stated in decibels (dB) for the number of samples i, and it is given as follows [15]:

$$ SNR=10{\log}_{10}\frac{\sum \limits_{i=1}^N{x}^2(i)}{\sum \limits_{i=1}^N{\left(x(i)-y(i)\right)}^2} $$
(32)

where x(i) is the input original audio samples, and y(i) is the output extracted audio samples. It is noticed that the SNR is simple in its calculation, but it is very sensitive to the time alignment of the original and processed audio signals. To achieve a better quality of the audio signals, the SNR should record smaller values between the original and ciphered signals, while it should record higher values between the original and deciphered signals.

The SNRseg determines the average value of the SNR that is calculated over sequences of short frames for the audio signal with a total number of M and each frame that has a length of N that can be chosen between 15 and 20 msec. It can be estimated as follows [50]:

$$ {SNR}_{seg}=\frac{10}{M}\ \sum \limits_{m=0}^{M-1}{\mathit{\log}}_{10}\frac{\sum \limits_{n= Nm}^{Nm+N-1}x{(n)}^2}{{\left(x(n)-\hat{x}\ (n)\right)}^2}\kern1.5em $$
(33)

where x(n) and \( \hat{x}\ (n) \)are the samples of the input original audio and the output processed audio signals, respectively. To achieve a better quality of the audio signals, the SNRseg should record smaller values between the original and ciphered signals, while it should record higher values between the original and deciphered signals.

The SD term shows how the farness between the processed audio spectrum and the original one in the frequency domain. This measure is preferred compared to the time-domain measures because it is less influenced by possible time misalignments between the original and the processed audio signals. It can be calculated as follows [46]:

$$ SD=\frac{1}{M}\sum \limits_{m=0}^{M-1}\sum \limits_{n= Nm}^{N-1}\left|{V}_x(n)-{V}_y(n)\right| $$
(34)

where Vx(n) and Vy(n) declare the original and the processed spectrum of the audio signal. To accomplish a superior performance and quality of the audio signals, the SD should record higher values between the original and ciphered signals, while it should record smaller values between the original and deciphered signals.

To validate the effective of the cryptosystem and the robustness of the proposed steganography approach, the correlation coefficient (rxy) amongst the processed and the original signal samples is also measured that assess the quality of the proposed approach. It can be calculated by the following equation as defined in [46]:

$$ {r}_{xy}=\frac{c_v\left(x,y\right)}{\sqrt{D(x)}\sqrt{D(y)}} $$
(35)

where cv(x, y) is the covariance value amongst the original and processed audio signals. The D(y) and D(x) determine the variances values of the two audio signals y and x. To achieve a better quality of the audio signals, the rxy should record smaller values between the original and ciphered signals, while it should record higher values between the original and deciphered signals.

Further and more additional evaluation quality metrics are utilized to evaluate the effectiveness of the proposed cryptosystem and the robustness of the proposed steganography approach such as Number of Changing Pixel Rate (NPCR), Percent Root-mean-square Difference (PRD), Unified Averaged Changed Intensity (UACI), Perceptual Evaluation of Speech Quality (PESQ), Log-Likelihood Ratio (LLR), more details, explanations, and descriptions about these evaluation metrics can be found in [44, 45, 49]. Table 1 presents the SNR, SNRseg, SD, LLR, PRD, PESQ, NPCR, UACI, and rxy results of the employed two different long and short audio signals.

Table 1 The results of quality metrics of the ciphered long and short audio signals and the deciphered long and short audio signals

From Table 1, it is noticed that the SNR and SNRseg of the ciphered audio messages have negative values, the rxy values have smaller values less than one, and the values of the LLR, PRD, NPCR, UACI, and SD metrics have risen which reflects the great accomplishment of the proposed audio cryptosystem. The reconstructed audio signal at the receiver has low values of LLR, PRD, NPCR, UACI, and SD metrics and higher values of the PESQ, rxy, SNR, and SNRseg, this ensures high quality of the reconstructed signal. It is also observed that rxy values are near to one such that the decrypted audio signal is as the original reconstructed one, while the LLR, PRD, NPCR, UACI, and SD quantities have small values that imply high precision data and extremely good quality of the reconstructed audio signal. Therefore, it is obvious from the simulation outcomes that the suggested ciphering process utilized in our proposed steganography process is efficient and has superior privacy and robustness.

Referring to Eq. (25), two ratios of α and γ are used for hiding an audio message within video frames in a quaternion domain. The hiding and extraction processes are performed by varying α and γ to determine the adequate values for the embedding process. It is well known that robustness performance is an essential characteristic of efficient video hiding methods. So, the quality performance of the suggested steganography approach is studied and evaluated utilizing the different metrics of PSNR, SSIM, FSIM, and correlation coefficient. The PSNR can be measured by the following Eq. [28]:

$$ PSNR=10{\log}_{10}\frac{{\mathit{\max}}_v}{MSE} $$
(36)

where maxv determines the highest pixel value of the video frame, and the value of mean square error (MSE) is calculated amongst the processed and the original video frames.

The SSIM is a measure for the similarity index between the stego and original video frames that can be defined as y and x, respectively [51]. The expression of the SSIM metric can be formulated as follows:

$$ SSIM=\frac{\ \left(2{\mu}_x{\mu}_y+{V}_1\right)\left(2{\sigma}_x+{V}_2\right)\kern0.5em }{\left({\mu}_x^2+{\mu}_y^2+{V}_1\right)\left({\sigma}_x^2+{\sigma}_y^2+{V}_2\right)} $$
(37)

where σx and μx denote to the standard deviation and the mean values of pixels in an original frame x, respectively. The σy and μy denote to the standard deviation and the mean values of pixels in a stego frame y, respectively. The V1 and V2 are constants with small values.

The FSIM is also calculated for evaluating the quality and efficiency of the proposed approach. It determines the amount of local similarity amongst the original frame and stego frame as follows:

$$ FSIM=\frac{\sum_{x\in \varOmega }{S}_L(x).P{C}_m(x)}{\sum_{x\in \varOmega }P{C}_m(x)} $$
(38)

where the PCm(x) is the estimated value of the phase congruency, Ω is the spatial domain of the video frame, and the SL(x) is the overall estimated similarity amongst the two frames. For improved steganography quality, it is desired to have a higher FSIM value for the stego frame (the value of the FSIM metric increases when the steganography efficiency is increased).

To further validate the effective performance of the proposed steganography approach, the correlation coefficient (rxy) amongst the stego and the original frames is also measured that assess the quality of the proposed approach. It can be calculated as defined in Eq. (35), where in this case the cv(x, y) is the covariance value amongst the processed stego and original video frames, respectively, while the D(y) and D(x) determine the values of the variances of the two frames y and x, respectively. To achieve a better quality of the proposed steganography process, the PSNR, SSIM, FSIM, and rxy should record higher values amongst the processed stego and original HEVC frames.

In the simulation results, different values for γ and α percentages have been utilized for evaluating both the extracted and embedded secret data. Moreover, the cover HEVC frame is matched to the stego one to decide if there is a noticeable difference between them or not. The amount of indication difference can be studied as a relationship evaluation metric, and it can regulate the high limit of the embedding information capacity. To provide the carrier to be secured, and the hidden message involved in it cannot be discovered by the known statistical analysis methods, the length of this secret message should be less than the predefined upper bound. It has been found that when α = 15 and γ = 95, we obtain a very good quality of both audio and video frames.

Figures 9, 10, 11, 12, 13, 14, 15, 16, 17 and 18 present the graphical results of the proposed steganography process that are performed to hide an audio secret message within the tested five different HEVC streams in the two cases of using long and short audio signals as secret messages. Each one of these figures shows the results of the original and stego frames of the utilized HEVC stream and their histograms, it also introduces the difference amongst the stego and cover frames, and the extracted and reconstructed secret audio signal. Also, each one of these figures presents the objective results of the PSNR, SSIM, FSIM, rxy, and processing time of the stego frames, also it presents the objective results of the SNR, SNRseg, SD, LLR, PRD, PESQ, NPCR, UACI, and rxy of the reconstructed secret audio signal. Figs. 9, 11, 13, 15, and 17 show the results of the employed HEVC frames in the case of using the long audio signal as a secret message, while Figs. 10, 12, 14, 16, and 18 show the results of the employed HEVC frames in the case of using the short audio signal as a secret message.

Fig. 9
figure 9

The graphical results of the Balloons stream with the values of PSNR = 31.25 dB, rxy = 0.9969, SSIM = 0.9511, FSIM = 0.9883, and processing time = 3.2 s in the case of using the long audio signal as a secret message

Fig. 10
figure 10

The graphical results of the Balloons stream with the values of PSNR = 31.41 dB, rxy = 0.9974, SSIM = 0.9515, FSIM = 0.9889, and processing time = 2.4 s in the case of using the short audio signal as a secret message

Fig. 11
figure 11

The graphical results of the Basketball stream with the values of PSNR=34.80dB, rxy=0.9949, SSIM=0.9476, FSIM=0.9882, and processing time=3.7 sec in the case of using the long audio signal as a secret message

Fig. 12
figure 12

The graphical results of the Basketball stream with the values of PSNR = 34.92 dB, rxy = 0.9956, SSIM = 0.9477, FSIM = 0.9890, and processing time = 2.45 s in the case of using the short audio signal as a secret message

Fig. 13
figure 13

The graphical results of the Breakdancer stream with the values of PSNR = 32.04 dB, rxy = 0.9957, SSIM = 0.9359, FSIM = 0.9844, and processing time = 3.4 s in the case of using the long audio signal as a secret message

Fig. 14
figure 14

The graphical results of the Breakdancer stream with the values of PSNR = 32.07 dB, rxy = 0.9968, SSIM = 0.9358, FSIM = 0.9863, and processing time = 2.28 s in the case of using the short audio signal as a secret message

Fig. 15
figure 15

The graphical results of the PoznanHall stream with the values of PSNR = 31.32 dB, rxy = 0.9960, SSIM = 0.9199, FSIM = 0.9890, and processing time = 3.8 s in the case of using the long audio signal as a secret message

Fig. 16
figure 16

The graphical results of the PoznanHall stream with the values of PSNR = 31.42 dB, rxy = 0.9973, SSIM = 0.9203, FSIM = 0.9905, and processing time = 2.73 s in the case of using the short audio signal as a secret message

Fig. 17
figure 17

The graphical results of the Uli stream with the values of PSNR = 35.85 dB, rxy = 0.9961, SSIM = 0.9477, FSIM = 0.9875, and processing time = 3.58 s in the case of using the long audio signal as a secret message

Fig. 18
figure 18

The graphical results of the balloons stream with the values of PSNR = 35.94 dB, rxy = 0.9966, SSIM = 0.9478, FSIM = 0.9885, and processing time = 2.84 s in the case of using the short audio signal as a secret message

As it is known that the three basic requirements of video steganography process are the capacity, robustness, and imperceptibility. To achieve a good imperceptibility which is the most essential need for the video steganography process, the transmitted video should have high quality without causing anyone to discover it. To ensure the ability of the proposed HEVC steganography approach for achieving high imperceptibility, several HEVC streams with different dimensions have been used in the simulation tests, to embed long and short audio messages utilizing the maximum capacity of the video frames. All results presented in Figs. 9, 10, 11, 12, 13, 14, 15, 16, 17 and 18 demonstrate the performance efficacy of the suggested steganography performance in capacity, robustness, and imperceptibility perspectives.

It is observed from all introduced results that the stego frames are approximately similar to the original cover frames, so this proves higher imperceptibility performance of the proposed steganography approach. Also, it is noticed that there is a great possibility to hide a long or short audio secret message within the HEVC frames with achieving higher quality, so this proves the higher capacity accomplishment of the suggested steganography scheme.

Furthermore, it is also demonstrated from the results shown in Figs. 9, 10, 11, 12, 13, 14, 15, 16, 17 and 18 that there is a possibility to extract and reconstruct the secret audio signal with higher quality, so this proves the higher robustness performance of the proposed steganography approach. Therefore, the proposed QFFT-based HEVC steganography approach has a high level of imperceptibility, while the PSNR measurements with a good performance above 20 dB for the whole analyzed HEVC frames in case of various lengths of audio secret messages. Moreover, it achieves higher SSIM, FSIM, and rxy values. Besides, it is noticed that the suggested approach has a superior level of robustness since the original audio message is recovered with higher PESQ, SNR, SNRseg, and rxy values with achieving lower LLR, PRD, NPCR, UACI, and SD values.

To further evaluate the efficiency of the proposed QFFT-based HEVC steganography approach, we tested its performance in the presence of attacks. We carried out more simulation tests in the case of existing various kinds of communication attacks of the rotation, Gaussian noise, blurring, JPEG compression, resizing, and crop attacks. Tables 2, 3, 4, 5 and 6 present the objective SNR, SNRseg, SD, LLR, PRD, PESQ, NPCR, UACI, and rxy values of the extracted and reconstructed secret long and short audio signals for the proposed steganography approach at various kinds of channel attacks. It is observed from the whole investigated results in Tables 2, 3, 4, 5 and 6 that the suggested QFFT-based HEVC steganography approach has a high level of robustness since the original secret long and short audio messages can be extracted and reconstructed with good quality metrics of the SNR, SNRseg, SD, LLR, PRD, PESQ, NPCR, UACI, and rxy values.

Table 2 The results of quality metrics of the extracted and reconstructed long and short audio signals in the occurrence of various rotation attacks on the stego HEVC frames
Table 3 The results of quality metrics of the extracted and reconstructed long and short audio signals in the occurrence of various Gaussian noise attacks on the stego HEVC frames
Table 4 The results of quality metrics of the extracted and reconstructed long and short audio signals in the occurrence of various blurring attacks on the stego HEVC frames
Table 5 The results of quality metrics of the extracted and reconstructed long and short audio signals in the occurrence of various JPEG compression attacks on the stego HEVC frames
Table 6 The results of quality metrics of the extracted and reconstructed long and short audio signals in the occurrence of crop and resizing attacks on the stego HEVC frames

Furthermore, the proposed QFFT-based HEVC steganography scheme is contrasted to some recent literature HEVC steganography works to prove its superior performance efficiency. Table 7 introduces the SSIM and PSNR comparison objective outcomes for the HEVC Basketball stream of the proposed HEVC steganography algorithm and the literature HEVC steganography algorithms in [5, 7, 22, 27]. It is observed that the proposed steganography approach presents higher PSNR and SSIM values compared to the literature approaches for the tested Basketball stream which proves its great imperceptibility and efficiency.

Table 7 The PSNR and SSIM comparison outcomes of the suggested steganography algorithm and the literature algorithms in [5, 7, 22, 27]

5 Conclusions and future work

This article introduced a new QFFT-based HEVC steganography approach. The proposed approach could be applied for embedding secret audio messages in cover video frames. Thus, in this paper, secret audio data has been concealed inside HEVC cover frames. The audio message is firstly compressed to exploit the capacity of cover video frames, and consequently maximizing the size of the hidden message as possible. Furthermore, the compressed secret message is then encrypted using a random projection encryption method in the DWT domain. The random matrix of the random projection of the compressed message is generated using the Legendre sequence to produce the encrypted form of the compressed secret data. The compressed HEVC cover frames and secret data are assigned to the quaternion format in the QFFT domain prior to the hiding process. The proposed approach has been tested through various standard HEVC streams and audio signals. The accomplishment of the suggested approach is evaluated by assessing different quality metrics with and without the presence of different multimedia attacks. The achieved outcomes proved that the cover video frames can be communicated with no obvious variation with a high imperceptibility compared to the literature algorithms. Furthermore, the obtained findings clarified the chance of inserting secret data with large size with achieving higher embedding capacity and exploiting the whole size of the cover video frame. In this paper, we did not utilize watermarking besides employed encryption and steganography schemes. So, in the future, we can enhance the performance of the HEVC steganography approach by integrating the watermarking process in addition to the presented work to build a multi-level security system for HEVC transmission and storage. Also, new trends of deep learning-based security approaches can be utilized for HEVC transmission and storage to achieve covert and robust performance.