1 Introduction

With the development of the Internet, an enormous amount of data is transmitted every day in various forms. The reckless growth and sharing of online video content bring copyright violation detection, video search, and retrieval problems. However, with the availability and use of powerful multimedia tools, the data can be easily accessed, modified, and redistributed on the Internet. This fast evolution of the digitalization of data poses new challenges for managing and securing information. Therefore, the protection of digital data is highly required in the current digital society. Data hiding techniques hide some secret data into the cover medium without being detected. This is a method that can be considered to protect information. There exist various data hiding methods includes steganography, digital watermarking and so on (Jia et al. 2019).

Further, wwatermarking scheme can be classified based on watermark extraction ability and fragility and sensitivity of watermark. According to watermark extraction ability there are three watermarking schemes: blind, semi-blind and non-blind. In non-blind watermarking scheme, information of original host is required. Non-blind watermarking is practically limited because it requires extra storage to maintain the source image (Chang et al. 2007). In semi-blind scheme, some information of host and watermark is required but in blind watermarking scheme both host and watermarking information are not required. In blind watermarking, watermark information is extracted without prior knowledge of original information (Thakur et al. 2019; Thanki et al. 2019). As per the fragility and sensitivity of watermark, two types of watermarking scheme are there named as fragile and semi-fragile watermarking. Fragile watermarking is a verified solution to standard digital signature scheme, and it is used to verify the integrity of data to detect every possible change in image like adding or removal of any feature in image. In contrast, semi-fragile watermarking is used for minor data modifications like image compression and enhancement (Lu and Guo 2017).

Problems with identity protection are addressed using watermarking approaches. The main objective of these approaches is to meet properties such as robustness of watermark against various attacks, imperceptibility, security, and blindness w.r.t requirement of original data. To reveal the identity, the extraction of the watermarked symbol should be unambiguous to justify the authenticity of proprietary data (Kumar et al. 2016). A general approach of watermarking consists of two phases: embedding and an extraction procedure shown in Fig. 1. During the embedding phase, a processed watermark symbol or share or copyright information is embedded in the primary signal, which is limited to the owner. In addition, a reverse embedding procedure is performed to extract that copyright information so that the authenticity of data can be recognized (Katzenbeisser and Petitcolas 2000). Further, we suggested an entirely invisible watermarking instead of visible watermarking where no watermark symbol can be seen over hosted videos. In advance, most of the data hiding approaches can extract the watermark (embedded data) but at the same time will damage the covered signal. To overcome such an issue, the proposed work suggested a reversible data embedding approach capable of extracting additional information with image reconstruction without any distortion. Reversible data hiding can be performed by using two means, i.e., encrypted domain and plaintext domain. In this work, the host signal (video) is considered plain text while the embedded data (image) is encrypted. Reversible data hiding with a visual secret sharing Scheme is implemented with firefly-based threshold optimization so that unique and best frames from a video can be considered to embed encrypted watermark share.

Fig. 1
figure 1

A generalized approach of watermarking

The rest of the paper is classified into five sections. Section 2 discusses the related watermarking techniques followed by the proposed method in Sect. 3. Section 4 discusses experimental results concerning the robustness of the proposed technique against various attacks and a comparison with other state-of-art techniques. Finally, the conclusions with future directions are provided in the last section of this paper.

2 Literature survey

Digital watermarking for identity privacy is a dynamic domain for research. In general, the watermark symbol is inserted either by directly modifying the host signal or modifying the transform domain's coefficients. There exist various methods for performing watermarking on images as well as on digital videos. In this section, we analyzed and presented recent works implemented in this area.

A novel approach was proposed for plane slicing based watermarking algorithm to embed colored watermark images on the color video using hybrid transforms like Contourlet Transform (CT), Discrete Wavelet Transform (DWT), and Singular Value Decomposition (SVD) (Agilandeeswari and Ganesan 2016). As a result, a high level of robustness, good fidelity, and high value of PSNR are achieved. Eigenvector is also calculated at the receiver’s end to check the authentication of the watermarked image. Two new watermarking approaches (VW16E and VW8F) were given for achieving a high degree of imperceptibility and temper detection. These approaches were compared with the other approaches for a range of video samples, and the result shows that the proposed approaches provide better detection capabilities and imperceptibility (Arab et al. 2016). Karmakar et al. (2016) proposed a DCT based rotation attack resistant video watermarking scheme. The algorithm was also implemented in MATLAB, and results were calculated for three different standard videos, which indicates that their method is vigorous against any rotation and video attacks. Asikuzzamanave et al. (2016) proposed a digital watermarking method for depth-image-based 3D videos that can extract the watermark from the left, right, and center of the videos. Experiment shows that the proposed approach was also effective against additive noise, baseline distance adjustment, and compression. Another non-blind color video watermarking scheme was proposed by Rasti et al. (2016). Their algorithm was checked against several signal processing attacks and compared with existing methods, and it was identified that the proposed algorithm is the robust method on public image data sets. Himeur and Boukabou (2018a) have designed a new watermarking system for uncompressed video sequences where DWT and SVD were used for low visual distortion. This approach had remarkable results against scaling, blurring, cropping, filtering, and H.264 compression. Zhao and Li (2018) proposed a three-dimensional histogram shifting for reversible data hiding, which can hide data into any image or video file. The demonstrated result shows that the embedding efficiency of the proposed scheme was much higher than the existing schemes. Wang et al. (2018) suggested a high-efficiency video coding-based watermarking scheme to embed the watermark in the quantized coefficients of the video files to remove the cumulative errors. The results were obtained for different values of quantization parameters, and it was observed that the proposed algorithm gave good results even when the quantization parameter values were greater than 40.

A two-tier RDH-ED framework was proposed by Shah et al. (2018). It is proposed for 3D mesh models based on Homomorphic Paillier Cryptosystem. The proposed approach can produce better results with high embedding data capacity. Kulkarni and Kulkarni (2018) have given a video cryptography-based greyscale image watermarking scheme for two shares of the images and find out the results for three greyscale images. The proposed approach was satisfying the security, robustness, and blindness properties. Xu et al. (2018) implemented an efficient method for reversible data hiding in encrypted images based on 2-dimensional histogram modification. The proposed approach can be used in applications where high image quality and reversibility is required, like in cloud computing. Further, Bhardwaj et al. (2018) proposed a robust video watermarking approach base on coefficient difference using the frame selection method. The results were compared with existing approaches on different videos, and improvements are achieved in terms of robustness, but imperceptibility was compromised. An adaptive reversible data hiding method was implemented by Chuan et al. (2018) for encrypted images. The proposed method is better in terms of rate-distortion performance. Rajkumar and Vasuki (2019) produced a method to improve the watermark embedding capacity and the perceptual quality for a reversible watermarking scheme. They used a Gaussian filter as the first step while embedding the watermark. The peak points of the histogram are chosen for data hiding. High-frequency modification is done at pixel level where they embedded watermark. A secret key is used for the security of data. Security of the images is improved in the proposed method. Li et al. (2019) applied histogram shifting approach for data hiding in the images. The histogram shift is used for hiding data in JPEG images. The data bits are embedded in the high-frequency coefficients, which are obtained by histogram distribution. The optimal DCT coefficient is considered to embed secret messages. This method produced better results than other approaches in terms of embedding rate and visual quality of the images.

Malik and Reddlapalli (2019) proposed a watermarking scheme by integrating the concept of histogram and entropy, which results in better imperceptibility and robustness. The experiment was performed on the three well-known greyscale images as well. Senthilnathan et al. (2019) given SBRE-RDH crypto security-based encryption algorithm. Initially, the host image is divided into blocks in their method, and entropy is estimated for hiding the watermark. Further, the histogram shape method is used to hide watermark information. Their results are better than other algorithms to recover the hidden data and images without any error for large amount of data. For these reasons, it can be used in a cloud environment as well. Tang et al. (2019) implemented a reversible data hiding approach based on differential compression. Huffman coding was used to diminishes the size of embedding location maps. The proposed algorithm performs well in terms of data hiding capacity and computational time, but it was not suitable for JPEG images. Nasrullah et al. (2019) used Kd-tree to propose a joint and separable RDH system to provide better data hiding in compression and encryption domain. The cover signal is compressed with lifting-based integer wavelet transform (IWT) and set partition in hierarchical tree (SPHIT) encoding. Further, multiple shift operations are carried out to generate SPIHT bit-stream. These streams are arranged into a binary square matrix and shown to Kd-tree with random transformations to hide the data. Noor et al. (2019) proposed a watermarking scheme, which works along with DTT, R-PCA, and SVD approaches. CAT mapping scrambling is used to scramble the watermark logo. The original data is decomposed into low-rank and sparse components using R-PCA. DTT is applied for transformation to embed watermark data using SVD. Their technique is tested against various type of attacks to check the robustness. Kumar and Jung (2020) proposed a robust and reversible data hiding approach based on 2-layer insertion of data with reduced capacity-distortion trade-off. They first performed decomposition of the image into two planes that is a higher significant bit (HSB) and a least significant bit (LSB), respectively. Further, prediction error expansion is used to hide secret watermark bits in the HSB plane. The experimental result gives better outputs. This method is also tested against various attacks. Altay and Ulutas (2021) have proposed a robust Discrete Wavelet Transform (DWT) and Singular Value Decomposition (SVD) based technique for copyright protection. Fibonacci-Lucas Transform (FLT) was also applied to binary watermark for providing more security of the watermarking scheme. A high robustness level against attacks was achieved in this approach. Alotaibi (2020) has introduced a three-level watermarking scheme to optimize hidden neurons in a deep belief network framework to enhance prediction accuracy. Ayubi et al. (2021) have proposed a new two- dimensional secure video watermarking scheme. They have also introduced an efficient algorithm based on IWT, DWT and CT transform. The result shows that the proposed algorithm was providing better visual quality based on PSNR and SSIM. Dogan (2016) proposed a new data hiding technique based on a genetic algorithm where different chaotic maps were used to the randomness of genetic function. Results of the proposed method were compared, and it was observed that gauss, logistic, and tent maps were faster than random functions.

From the literature, it has been observed that video watermarking is fragile against various geometrical attacks. Most of the researchers worked with videos without suggesting powerful encryption strategies. Their capacity and techniques to hide content over video frames are uniform and embed watermark content on every frame. This brings a major disadvantage for such techniques because during the extraction procedure after performing attacks, data recovery is lossy. Most techniques work with binary images as a watermark, but we used various images of varying sizes. To fill these gaps, we proposed a novel framework with high data embedding capacity and robustness because of the suggested optimization technique. Also, the share of the encrypted watermark is embedded in specific frames based on threshold instead of embedding data on each frame of the host signal. The conversion of video frames is not implemented to maintain the quality of the host signal. The demonstrated results show the effectiveness of the proposed framework, as discussed in subsequent sections.

3 Methodology

The proposed methodology is divided into four sub-sections, and watermarking workflow is shown in Fig. 2. In the first sub-section, video frames are extracted based on a threshold estimated using the Firefly algorithm. Then, the watermark is processed using visual cryptography. The third part performs data embedding into video frames using a reversible data hiding procedure. The watermark extraction after performing various attacks is performed in the fourth part of the proposed methodology. The subsequent section discusses the working of the proposed method.

Fig. 2
figure 2

Workflow of the suggested video watermarking system

The concept of reversible data hiding is used in this work. It can be classified into irreversible and reversible data hiding (RDH) techniques (Hou et al. 2021). RDH is used to completely recover the original data after correctly extracting the hidden data. The main motive of RDH to implement in this work is to achieve a higher embedding rate with less distortion. There are two different domains of RDH, i.e., spatial and encrypted. In spatial RDH algorithms focus is given on achieving the data embedment by exploiting the spatial correlation among pixels and their aim is to achieve visual quality as high as possible for a given embedding capacity. Even though there exist various techniques in RDH. One type of approach focuses on finding new embedding ways to deal with the prediction errors and reduce the embedding distortion, and other types of approaches focus on improving prediction accuracy (Weng et al. 2021). Here it is used to improve the watermark prediction accuracy after attempting various attacks.

3.1 Extraction and processing video frames for data embedding

The robustness and practicality of watermarking approach depend on the embedding and extraction processes. The first part of the proposed method is to extract complex frames from an input video. The video is divided into various frames, and further, frame complexity is estimated on processed frames. This frame complexity is further compared to the estimated threshold value computed by the Firefly algorithm. If the complexity of a particular frame is greater than the threshold is taken as a complex frame; else, the frame is ignored for hiding the watermark symbol. In the proposed algorithm, only complex frames are chosen to embed watermark rather than adding copyright information on every video frame.

The image frame complexity is measured using black and white border length. The more extended border is taken as a complex frame. The length of border (\(k_{i}\)) is estimated by performing sum over the number of times there is a color change in all the rows and columns of the image. The frame complexity (\(\overline{c}\)) is estimated using Eq. 1.

$$\overline{c} = k/m$$
(1)

In the Eq. 1, \(k\) represents the total length of black and white border while \(m\) is estimated using (size of image \(\times\) number of bit planes). \(k_{i}\) is estimated for every block of extracted video frame.

3.1.1 Firefly algorithm (FA)

The Firefly algorithm was first proposed by Yang (2009) in 2009. The basic steps of this algorithm can be referred to from an initially proposed algorithm. Generally speaking, this algorithm is based on the swarm intelligence approach of the flashing behavior of fireflies. It assumes that the solution of an optimization problem can be considered fireflies whose brightness is proportional to the values of its objective function with provided problem space. These flies can produce lights, and that will help them to communicate and attract.

Optimization: The optimization problem depends on three factors: a function to optimize, a solution set to pick a value for the variable from, and an optimization rule (maximized or minimized). The optimization function of the algorithm is \(f\left( x \right), where x = \left( {x_{1} ,x_{2} ,x_{3} , \ldots x_{n} ,} \right)\). Three rules are followed in the proposed work.

  1. a.

    All the fireflies are unisexual and therefore can attract any firefly.

  2. b.

    Attractiveness is proportional to their brightness. The flies will attract to the highest intensity or brightness and this stay decreases with their distances increases.

  3. c.

    In case there exist no brighter firefly than a given one then it will move randomly.

For maximization problem, the intensity or brightness is proportional to the objective function value. For a specific location \(x\), the intensity \(I\) of a firefly can be picked as \(I\left( x \right) \propto f\left( x \right)\) for maximization where \(Max f\left( x \right)such that s.At.x \in S \in R^{n}\). The solution for this is a member of set \(S\) which generates maximum value of objective function compare to all elements in set \(S\). Another parameter which is attractiveness β is relative and arbitrated by former flies and therefore differ with distance \(d_{ij}\) between firefly \(i\) and \(j\). Further, to understand the implemented FA algorithm in the proposed work, can also be referred from Mishra et al. (2014) with the difference in objective value for firefly. For the proposed work, it is defined as shown below:

$$objective = PSNR + \emptyset *[BER\left( {w,w^{\prime}} \right) + \mathop \sum \limits_{i = 1}^{At} BER\left( {w,w_{i}^{^{\prime}} } \right)]$$

Here \(w\) and \(w^{\prime}\) are the original and extracted attacked watermark. \(At\) represents various attacks on the watermarked video frame and \(\emptyset\) is the additive weighting factor to balance the PSNR influence and chosen as per (Huang et al. 2010).

This is a stochastic method and cannot guarantee the best ideal solution in deterministic time but converges towards a rational time solution (Mustafa Bilgehan et al. 2017). In fact, this is a metaheuristic method and trade-off among randomization, and local search is measured by parameters. The firefly heuristic is based on the survival of the population, and randomized movement evades local optima. Moreover, the optimal solution continues until there are improvements in the objective function. To optimize a single objective function effectively for better parameters estimation, we used the method. In addition, this algorithm is chosen to find a suitable threshold value based on extracted (populated) frames of a video based on parameters. Choosing a value is highly complex in scientific studies, and the use of this algorithm is uncomplicated and easy. This provides a better decision to choose a particular frame where a watermark symbol can be embedded compared to other optimization techniques. The outcome of the implemented algorithm gives a feasible set of video frames in which we are embedding data for further processing. The algorithm for estimating an optimized threshold value follows Algorithm 1.

figure a

This generates a best threshold value which is used further with \(\overline{c}\) for finding complex frames.

Further, the identified complex frames are transformed into wavelet domain (DWT) using ‘Haar’ wavelet algorithm. The wavedec2 function with level = 1 is performed on the extracted frames. The implemented method derives horizontal, vertical and diagonal bands for a given frame. The LL sub-band are divided into blocks. Low frequency band is used to embed the encoded watermarked share because of its robustness against various attacks.

3.2 Processing of watermarked symbol using visual cryptography

A watermark image is taken in JPG format of different sizes and an RGB color map in the proposed method. Visual cryptography is performed to create a share from the input image. This converts the image in shares to create an encoded watermarked image (Failed 2013).

We perform visual cryptography based on (2,2) VC scheme where each pixel is in the relationship with two pixels. Using the proposed approach, two watermark shares \(W1\). and \(W2\) are generated. The user keeps one share as a key by using which the final watermarked video frame is generated while share two is embedded by performing merging operation using histogram-based bit shifting in complex frames obtained from step 3.1. The proposed process to generate an encrypted share is performed using the following steps.

  1. 1.

    Me of the watermarked image \(I^{\prime}\) is obtained after performing conversion from RGB to gray. This is performing using \(Mean = mean \left( {Grayscale \left( {I^{\prime}} \right)} \right).\)

  2. 2.

    A seed value is used and a random map is estimated. This is denoted by the value Seedmap.

    $$Seedmap = Rand \left( {Row, Col} \right) \times Seed$$
  3. 3.

    Further, the grayscale image is changed to a binary level which is denoted as BW. This is performed using the function im2bw with a threshold value as 0.5. The yield image changes all pixels (black (0) and white (1)) grounded on threshold. The function to estimate \(BW\) is defined as \(BW = im2bw \left( {Grayscale \left( I \right), 0.5} \right)\).

  4. 4.

    A novel binary level map is setup that is two times the size of the input image and is denoted by \(BW1\). The rules shown in Table 1 is used to generate the shares.

Table 1 Rules to generate the visual cryptographic shares

An image gained after this process is used as an encoded secreshare \(ESI\).

In the proposed approach, during the embedding of data, bits underflow and overflow problem is controlled by modifying the pixel values. The pixel with values 0 and 255 represents the underflow and overflow values, respectively. Under this scenario, the pixel value with 0 is changed to 1 and value of 255 is changed to 254, and 1 is stored in the estimated array \(l\). Other modified pixels (0 or 255) during embedding data are stored as 1 in the same array. This is implemented to minimize data loss.

3.3 Data embedding procedure using histogram bit shifting

After performing the watermark image encryption, we hide encrypted bits in the host video in this procedure. To achieve the reversibility, the idea of histogram shifting is used. This process is divided into two steps namely, histogram generation and modification by adding encrypted share using histogram shifting. This is performed as per steps 3 and 4 of the embedding algorithms. The confidential data hiding in complex video frames is performed. Input video frame is named as \(CVF\) of size \(M \times N\) and encrypted watermark share as \(ESI.\) The final output of this step is watermarked video frame WVF. Finally, these frames are combined with those frames which are not used for performing watermarking, i.e., complex watermarked frames are merged with smooth frames and finally, a watermarked video signal is generated. The whole process is well explained using below shown steps.

3.3.1 Embedding algorithm

  1. 1.

    The CVF is divided into blocks \(CVF_{b}\) of size \(r \times c\). The number of blocks is estimated using Eq. 2.

    $$n = {\raise0.7ex\hbox{${M \times N}$} \!\mathord{\left/ {\vphantom {{M \times N} {r \times c}}}\right.\kern-\nulldelimiterspace} \!\lower0.7ex\hbox{${r \times c}$}}$$
    (2)
  2. 2.

    For selected \(CVF_{b}\), the absolute difference frame block of size \(r \times \left( {c - 1} \right)\) is calculated using Eq. 3.

    $$FAD_{b} = \left| {CVF_{b} \left( {i,j} \right) - CVF_{b} \left( {i,j + 1} \right)} \right| for 0 \le i,j \le r - 1, c - 2, b \le n$$
    (3)
  3. 3.

    For all \(FAD_{b}\), histogram is generated with maxima \(\max_{b}\) and minima \(\min_{b}\) w.r.t every block b. \(\min ima = \min \left| {\max_{b} - \min_{b} } \right|\) and maxima consist of all positive and negative maxima.

  4. 4.

    In this step, the pixel values between \(\max_{b}\) and \(\min_{b}\) of CVF block are incremented and decremented w.r.t neighbor pixels as per histogram peaks. Under these conditions, the \(FAD_{b}\) can be expressed as Eq. 4 and 5 while the conditions for i, j and b remain same as Eq. 3.

    $$FAD_{b}^{^{\prime}} \left( {i,j} \right) = \left\{ {\begin{array}{*{20}l} {FAD_{b} \left( {i,j} \right) + 1} \hfill & {if \,FAD_{b} \left( {i,j} \right) > max_{b} } \hfill \\ {FAD_{b} \left( {i,j} \right)} \hfill & {otherwise} \hfill \\ \end{array} } \right.$$
    (4)
    $$FAD_{b}^{^{\prime}} \left( {i,j} \right) = \left\{ {\begin{array}{*{20}l} {FAD_{b} \left( {i,j} \right) - 1} \hfill & {if \,FAD_{b} \left( {i,j} \right) > max_{b} } \hfill \\ {FAD_{b} \left( {i,j} \right)} \hfill & {otherwise} \hfill \\ \end{array} } \right.$$
    (5)
  5. 5.

    Here, secret bits of \(ESI\) is embedded in the shifted histogram image blocks \(FAD_{b}^{^{\prime}}\) by modifying \(max_{b} .\) The new frame is represented by \(FAD_{b}^{^{\prime\prime}}\) and performed using Eq. 6.

    $$FAD_{b}^{^{\prime\prime}} = \left\{ {\begin{array}{*{20}l} {FAD_{b}^{^{\prime}} \left( {i,j} \right) + m} \hfill & {if\, FAD_{b}^{^{\prime}} \left( {i,j} \right) = max_{b} } \hfill \\ {\left| {FAD_{b}^{^{\prime}} \left( {i,j} \right)} \right| + m1} \hfill & {if \,FAD_{b}^{^{\prime}} \left( {i,j} \right) = - max_{b} } \hfill \\ {FAD_{b}^{^{\prime}} \left( {i,j} \right) } \hfill & {otherwise} \hfill \\ \end{array} } \right.$$
    (6)

    m1\(\in \left\{ {0 to 2^{l} and l = log_{2} \left( {d^{\prime}} \right) } \right\}\) \(d^{\prime}\) is the difference between negative and positive maxima of frame block.

  6. 6.

    The watermarked video frame \(WVF\) blocks are is finally generated using \(CVF\) and \(FAD_{b}^{^{\prime\prime}}\). This is achieved using Eq. 7 and 8.

    $$WVF_{b} \left( {i,0} \right) = \left\{ {\begin{array}{*{20}l} {CVF_{b} \left( {i,0} \right)} \hfill & {if\, CVF_{b} \left( {i,0} \right) < CVF_{b} \left( {i,1} \right)} \hfill \\ {CVF_{b} \left( {i,1} \right) + FAD_{b}^{^{\prime\prime}} \left( {i,0} \right) } \hfill & {otherwise} \hfill \\ \end{array} } \right.$$
    (7)
    $$WVF_{b} \left( {i,0} \right) = \left\{ {\begin{array}{*{20}l} {CVF_{b} \left( {i,0} \right) + FAD_{b}^{^{\prime\prime}} \left( {i,0} \right)} \hfill & {if\, CVF_{b} \left( {i,0} \right) < CVF_{b} \left( {i,1} \right)} \hfill \\ {CVF_{b} \left( {i,1} \right) } \hfill & {otherwise} \hfill \\ \end{array} } \right.$$
    (8)
  7. 7.

    For remaining pixels in the watermarked video frame, the mapping of \(FAD_{b}^{^{\prime\prime}}\) is done with \(WVF_{b}\) as per neighbor pixel relationships.

3.4 Data extraction algorithm (reversible approach)

In this section, the encrypted image is extracted from the watermarked video \(WVF\). After extracting that \(BW1\) share of encrypted image is further composite with \(BW2\) to restore the original watermark. The input to this algorithm is extracted unique watermarked video frames. The frames which are considered to recover the watermark image is complex frames, as these frames are chosen based on estimated threshold for watermarking. The following extraction procedure is performed to achieve the desired result. The Extraction process is shown in Fig. 3, where extraction of unique frames is carried out. Further, DWT is applied with reversible histogram-based method which gives one share of the watermark symbol. Then this symbol is merged with another share obtained implementing visual cryptography so that a final watermark can be generated. This is an invertible process w.r.t Sect. 3.3 algorithm working.

Fig. 3
figure 3

Block diagram for watermark extraction procedure

3.4.1 Extraction algorithm

  1. 1.

    The watermarked video frames are divided into block as per step 1 of embedding algorithm.

  2. 2.

    The difference video frame block \(R_{e} FAD_{b}\) is estimated from watermarked frame by implementing Eq. 9 as per condition used in Eq. 3. The values of \(max_{b}\) and \(min_{b}\) are also estimated in this step.

    $$R_{e} FAD_{b} \left( {i,j} \right) = \left| {WVF_{b} \left( {i,j} \right) - WVF_{b} \left( {i,j + 1} \right)} \right|$$
    (9)
  3. 3.

    The \(R_{e} FAD_{b}\) is visited and embedded bits of encrypted share is extracted using Eq. 10.

    $$m = \left\{ {\begin{array}{*{20}l} {Extacted as 0} \hfill & {if R_{e} FAD_{b} \left( {i,j} \right) = max_{b} } \hfill \\ {Extracted as 1 } \hfill & {if R_{e} FAD_{b} \left( {i,j} \right) = max_{b} + 1} \hfill \\ \end{array} } \right\}$$
    (10)

    Rest of the bits are also extracted in array l as did in step 5 of embedding procedure and converted into binary. Also, the positive maxima and negative minima is estimated.

  4. 4.

    Also, the reverse of \(FAD_{b}^{^{\prime}}\) is estimated in \(R_{e} FAD_{b}^{^{\prime}}\) using Eq. 11.

    $$R_{e} FAD_{b}^{^{\prime}} = \left\{ {\begin{array}{*{20}l} {R_{e} FAD_{b} \left( {i,j} \right) - 1} \hfill & {if\, R_{e} FAD_{b} \left( {i,j} \right) = max_{b} + 1} \hfill \\ {R_{e} FAD_{b} \left( {i,j} \right)} \hfill & {if\,R_{e} FAD_{b} \left( {i,j} \right) = max_{b} } \hfill \\ \end{array} } \right\}$$
    (11)

    The value of negative maxima is estimated if the bits of \(R_{e} FAD_{b} \left( {i,j} \right)\) is from the range \(- max_{b}\) to \(- max_{b - 1}\).

  5. 5.

    In this step, the \(WVF\) is obtained by using reverse shifting operations.

Finally, watermark symbol is extracted from attacked video frames.

4 Results

The evaluation of the suggested system is carried out in MATLAB R2019a as platform with intel i3 processor 8th Generation and 4 GB RAM. The experiment is performed using \(\beta_{0} = 1, \alpha = 0.01\) and light absorption coefficient \(\gamma = 1.0\) with initial fireflies as 20. The value of \(\emptyset\) is 25. To compute the objective function the value of \(At\) is 11 in this work. And, based on the convergence of FA analysis, it is evident that FA converge to local position based on condition if \(0 < \beta_{0} < 1\). The algorithm converges to averaged position \(p\) of its attractive neighbor if the condition for \(\beta_{0}\) is satisfied. The performance of the method is tested against different type of attacks.

To generate the experimental results, different types of images with erratic sizes and videos are used as shown in Table 2. The details of the images and video as dataset can be accessed from “https://github.com/wssmanojkumar/Watermaking-Sample-datset” for testing purpose. Using this dataset, a total of 25 cases can be generated for evaluation purpose. The quality and robustness of the approach is estimated using performance parameters such as peak signal-to-noise-ratio (PSNR) and bit error rate (BER). PSNR objective is to evaluate the fidelity/ quality of the watermarked video and BER is used to test the robustness against various attacks. The value of PSNR is estimated using Eq. 12 while MSE and BER is calculated using Eqs. 13 and 14 respectively.

$$PSNR = 10 \times log_{10} \frac{{255^{2} }}{MSE}$$
(12)
Table 2 Sample images and videos used for evaluating the proposed approach

The MSE is in Eq. 12 is calculated using Eq. 13. Where \(O_{ij}\) is the original pixel value and \(M_{ij}\) is the marked pixel value.

$$MSE = \frac{1}{512 \times 512}\mathop \sum \limits_{i = 1}^{512} \mathop \sum \limits_{j = 1}^{512} \left( {O_{ij} - M_{ij} } \right)^{2}$$
(13)
$$BER \left( {w,w^{\prime}} \right) = \frac{{\mathop \sum \nolimits_{i = 1}^{N} \mathop \sum \nolimits_{j = 1}^{M} w\left( {i,j} \right) \otimes w^{\prime}\left( {i,} \right)}}{N \times M} \times 100\%$$
(14)

here \(w\). and \(w^{\prime}\) are the original and extracted watermarks and \(N\) and \(M\) are the rows and columns of watermark images.

The videos used for watermarking are.wmv and images are in.jpg and.png format. The quality is measured based on samp rate and frame rates. The used video is of 320 × 240 size with 30Fr/sec. Two channels are used to create the videos with an auto sample rate of 48.00 kHz. While images are of varying sizes ranging from 650 × 550 to 630 × 434. For.png images bit depth is 32 while for.jpg it is 72dpi.

A total of 11 attacks are performed on the blind watermarked signal. For evaluating the technique, five types of cases are discussed in this section. Watermark image is embedded in all the videos and performance is estimated likewise. A sample result is given in Table 3.

Table 3 Performance of projected technique with all cases as per Table 2 data

From this table, it is evident that the suggested system is able is kept good imperceptibility as the values of PSNR are good after performing watermarking. The estimated mean error is also less for the watermarked video. A comparative estimation for input MSE and output MSE is shown in Fig. 4. From the figure, we can depict that the difference between the initial MSE for all the cases and the output MSE is significantly less. Test case 2 shows the minimum difference, which is 0.013, while the maximum is obtained in test case 3 which is 2.735.

Fig. 4
figure 4

A comparison between input MSE and estimated MSE

The effectiveness of our approach under various attacks is shown in Fig. 5, in which BER is estimated for all used cases. It is evident from the results that even after performing various attacks on the watermarked video, there are negligible differences in original BER and estimated BER. The results show that the maximum distortion in bit error is under quantization attack (0.33) while minimum under frame blending (0.002). In the figure, \(x\) axis shows the performed attacks while \(y\) axis shows BER values. Further, the BER value differences with proposed visual cryptography and without visual cryptography is also shown in Fig. 6.

Fig. 5
figure 5

BER estimation under various attacks

Fig. 6
figure 6

BER comparison using visual cryptography and without cryptography

The results of Fig. 6 are obtained under different attacks on the encrypted or non-encrypted data. The simulation shows that by implementing a layer of visual encryption, we can further enhance the security and therefore results in less losses in the data in terms of BER. This shows that the proposed method also enhances the security parameter. In this approach, the bit error is less among watermarked and non-watermarked frames i.e., the number of errors is less with visual cryptography than that without visual cryptography. Therefore, the robustness of this method good because of visual cryptography.

A sample of executed GUI of the proposed approach is shown in Fig. 7. The figure shows video under an attack and extraction of the watermark from an image with BER.

Fig. 7
figure 7

A sample of GUI shows performing attacks in a and extraction of watermark in b

The GUI (a) of Fig. 7 is used to select various types of attacks on input video. Also, all the estimated parameters results are shown for these attacks. On the other side, GUI (b) showing the extracted watermark after all attacks with an estimated bit error rate.

These results are adequate to demonstrate that the implemented video watermarking system is robust against geometrical attacks and keeps high imperceptibility and payload capacity after embedding watermark image share. A comparison between various watermarking techniques under these two parameters are shown in Fig. 8. After embedding the watermark, there are no visible artifacts can be seen in the watermarked video frames. If we compare the estimated PSNR for this video watermarking system to other similar approaches, this method outperforms and maintains high data imperceptibility. Even though there are very few techniques in the literature that uses this kind of video watermarking approach. Therefore, this method brings better results are contribute substantially to the video watermarking domain.

Fig. 8
figure 8

Performance of proposed approach using avg. PSNR and watermark bits embedding capacity

The results plotted in this section depict that the proposed method can fill the research gap discussed above. The capacity of hiding data and the robustness of the technique against various attacks is highly attained. The optimization and encryption used in the proposed work make it more secure and novel for data embedding, where attacks rarely affect hidden watermark information as presented through the results. With the proposed system, the quality of the final watermarked video is also decent and unique. Further, when the algorithm is applied to extract the keyframes (Threshold-based), a frame is considered true positive if that frame is selected as keyframe by the proposed method and user and false positive if the frame is considered by the implemented method only. A false negative frame is that frame that is selected as a keyframe by the user but not by the approach. Using this information, we also computed Precision, Recall and F- measure. These are estimated using Eqs. 1517.

$$Recall = {\raise0.7ex\hbox{${n_{TP} }$} \!\mathord{\left/ {\vphantom {{n_{TP} } {n_{TP} + n_{FN} }}}\right.\kern-\nulldelimiterspace} \!\lower0.7ex\hbox{${n_{TP} + n_{FN} }$}}$$
(15)
$$Precision = {\raise0.7ex\hbox{${n_{TP} }$} \!\mathord{\left/ {\vphantom {{n_{TP} } {n_{TP} + n_{FP} }}}\right.\kern-\nulldelimiterspace} \!\lower0.7ex\hbox{${n_{TP} + n_{FP} }$}}$$
(16)
$$F = 2\frac{Recall \times Precision}{{Recall + Precision}}$$
(17)

Using these equations, the output of various parameters for the proposed method and other approaches is shown in Table 4.

Table 4 Comparison between recall, precision and F-measure

From these results, it is identified that the proposed technique results are acceptable based on these metrics. All the metrics are estimated from all five cases of videos + watermark images, and accessed average values are shown for the proposed method in Table 4. Figure 9 compiles the results of CPU time computation for all the test cases using proposed approach. The results depict that time complexity usage is less because of convergence for finding an optimal video frame. The compile-time starts from the frame selection to the embedding process and finally returning the watermarked video. Maximum compilation time is taken by case 3 (320 Sec.) of the proposed work, while the fastest execution is for case 5 (175 s.) while the difference is 40 s. between test cases 2 and 4. This might be due to the number of frames estimated and video length for a particular video. In the proposed work, we presented time is not individually estimated for like frame extraction or specific embedding capacity instead involves the whole process.

Fig. 9
figure 9

CPU utilization for used test cases

5 Conclusion

In the proposed paper, a robust video watermarking system is suggested for which a watermark share is embedded on complex frames in a blind manner. These complex frames are extracted based on Firefly based optimization method. Instead of embedding watermarks share on all the video frames, the proposed method chooses unique complex frames based on a threshold value. Further, an adaptive histogram bit shifting based method is suggested which hides watermark share in LL band of video frames. In addition, visual cryptography is implemented to encode watermark before embedding/hiding in the DWT domain and this brings robustness and security in the approach. The experimental results are encouraging in terms of various parameters such as imperceptibility, payload capacity, MSE and BER. Furthermore, our approach is tested against 11 type of attacks and reported minimum bit errors. Finally, the demonstrated result shows that the proposed technique can accomplish digital video watermarking satisfactorily compared to other state-of-art techniques. In the future, this technique can also be tested again other geometrical attacks. Also, the technique can be improved for embedding watermark in the compressed video stream.