1 Introduction

The word “steganography” has been derived from the Greek word “steganos” which literally means “covered”. It is basically a technique of camouflaging information in a medium which prevents the detection of the hidden data [15]. The medium where the data is hidden is named as cover medium (e.g., image, video, audio file). LSB (Least Significant Bits) based steganographic techniques [16, 20, 36] are very old practice for hiding secret data where the embedding capacity as well as embedding efficiency are very low. But sometime security becomes a key issue for secret data transmission. Rather than directly flipping the LSBs of the pixel intensities some transformation functions may be used. For example, Lou et al. [23] have used a reversible histogram transformation in their LSB based steganographic method. It can resist the statistical steganalysis attacks. A steganography technique is presented in the work of Feng et al. [7] that works on binary image. It focuses on the minimization of texture distortion. As the number of bitplanes of a binary image is only one, hence selecting binary image as a cover media is not preferred. Admitting this fact, Lin et al. [22] have proposed a closed-loop iterative computing framework that optimizes picture quality.

A steganographic scheme based on (2η + θ − 1)-ary notational system [18], where η denotes size of each pixel group and θ characterizes the inherent trade-off embedding rate and image quality, encodes a stream of bits with a cover pixel using codewords. But the major drawback of this algorithm is the capacity. Average embedding capacity here is less than 1 bpp, specially when the large number of pixels are considered in each non-overlapping set. Maya et al. [25] have presented an image steganography scheme based on Bitplane Complexity Segmentation and Integer Wavelet Transform.

Later, new techniques have been developed to alter multiple bits (LSB and higher weighted bits) of cover media. In recent times researchers have incorporated cryptographic concepts to enhance the security while passing secrete message bit stream. For example Hashim et al. [10] have proposed a data hiding technique which is combination of steganography and cryptography. Song et al. [39] have applied LSB matching method and used Boolean functions in stream ciphers. Lou et al. [24] have proposed a edge adaptive image steganography technique and it relies on LSB matching. Socek et al. [38] have proposed a digital video steganography technique in which one can disguise a particular video with the help of another video. Most of the works embeds the data sequentially. But it is susceptible to various statistical attacks [34]. To get rid of it, pixels for embedding are chosen randomly. Keys or seeds are required for such selection. Many steganographic algorithms are based on key sharing or block-code sharing. For example, Mstafa et al. [27] have proposed a secure video steganography method which is based on the concept of linear block code where for cover data, nine uncompressed video sequences are used and for secret message, a binary image logo is used. The positions of pixels of both cover videos and hidden message are randomly reordered with the help of a private key to improve the security of the system. Sharing of keys and/or the linear block-code is an essential criteria for such algorithms, but establishment of such priori agreement is an overhead. Paul et al. [30] have described a steganographic approach that does not require any key for random selection of pixels. Here the major concern again is the low capacity. The multi-bit steganographic technique of Petitcolas et al. [31] is capable of hiding at most 4 bits per pixel depending on the energy value. A multibit steganography has also been proposed by Mukherjee et al. [29]. It is applicable for audio steganography.

In modern era, Internet of Things, popularly known as IoT, has become very important. It can be used in several application areas like e-agriculture system [40], e-Health System [6]. In such applications, security of data is very important. Though Location-based Services (LBS) with IoT technology provides a lot of flexibility and conveniences, but there is a high probability to lose their privacy. Specially, the untrusted and malicious LBS servers with all clients’ data may track the users in different ways or provide the personal information to the third parties. One solution to preserve the user’s location privacy is dummy location privacy-preserving (DLP) algorithm [41] or k-anonymity trajectory (KAT) algorithm [21], but personal data can be made more secure by imposing strong cryptographic/steganographic algorithms. Amin et al. [1] have proposed an architecture for distributed cloud environment that supports an authentication protocol using smartcard where the registered users are empowered to access all private data securely from each private cloud servers. Beside using AVISPA (Automated Validation of Internet Security Protocols and Applications) tool and BAN [3] (Burrows Abadi Needham) logic model, the authors have used cryptanalysis techniques in order to confirm that the protocol can withstand against all probable security threats. It is quite clear that for applications in IoT, steganography can play its role. Security and capacity, both the aspects are crucial for such cases.

It is observed that the capacity of LSB based steganography is very low. To overcome the same multibit steganography comes into picture. But most such schemes as suffer from statistical as well as steganalysis attacks when the embedding is done sequentially starting from the very first pixel of the image. This problem is addressed by selecting the pixels randomly from the image space. It introduces the overhead of sharing keys. In this paper, we propose a keyless multi-bit steganographic methodology applicable for both image and audio that can resist the first order statistical attacks and the state-of-the art blind steganalysis algorithm namely sample pair analysis [5]. Besides increasing the capacity, the security issues have been taken care of. In Section 2 the secret text bits to be embedded, have been encoded first based on the co-efficient of bivariate symmetric polynomial where operations are in GF(28) (Section 2.2) and then this encoded text bits have been concealed in the image pixels using novel PVD (i.e., pixel value difference) technique and in audio samples as well by considering SVD (i.e., sample value difference) technique (Section 2.3). As the number of bits embedded in a pixel or audio sample is not fixed (it depends on the neighboring pixel intensities or sample values) it is difficult for the intruder to detect the hidden message. The rest of the paper is structured as follows. Section 2 details the methodology proposed. In Section 3, methodology is analyzed and theorems are framed. Experimental results are placed in Section 4 and concluded in Section 5.

We summarize our contributions of the work in the following subsection.

1.1 Contributions

Below we list our contributions one by one.

  1. 1.

    We propose novel steganographic algorithms for both image and audio based on PVD/SVD after encoding a message using bivariate symmetric polynomial based on GF(28).

  2. 2.

    Our proposed algorithm is free from the key sharing overhead between receiver and sender.

  3. 3.

    Our method has high capacity - it hides 6 bits of message per pixel and 13 bits of message per audio sample.

  4. 4.

    We theoretically express embedding efficiency (Theorem 5). We have compared the outcome of our proposed method with the state-of-the art works (Section 4.5).

  5. 5.

    In order to ensure the security we have analyzed our algorithm through visual attacks, structural attacks and statistical attacks. For this we have included UIQI, PSNR, SNR, BER, etc.

  6. 6.

    We have shown that our algorithm withstands the popular blind steganalysis attacks (Section 4.8).

  7. 7.

    We have shown that our proposed algorithm is capable of withstanding the benchmark called StirMark.

2 Proposed method

We propose a novel steganographic technique that enables two layered security in message hiding. At the first level the message to be hidden is encoded based on a bivariate symmetric polynomial operations are performed in GF(28) (Section 2.2). At the next level the encoded data is then embedded by extending the idea of pixel value difference (PVD) or sample value difference (SVD) (Section 2.3). Note that we describe our embedding and extraction algorithm for one color plane only. For color image, the same process can be applied on each color plane separately. Before detailing the proposed methodology, in brief we revisit some preliminaries (Section 2.1).

2.1 Preliminaries

We discuss here the concept of Bivariate polynomial and Galois Field (GF(28)). Bivariate polynomial may be described as a polynomial which consists of two variables. It can be both symmetric as well as non-symmetric in nature. Let us consider a bivariate polynomial

$$ G (x, y) =\sum\limits_{i,j = 0}^{k-1}C_{ij}x_{i}y_{j}. $$
(1)

The above polynomial is considered to be a bivariate symmetric polynomial, if it satisfies the criteria stated in (2).

$$\begin{array}{@{}rcl@{}} G (x, y) &=& G(y,x) , \\ ~\text{i. e., }C_{ij}&=&C_{ji}. \end{array} $$
(2)

The finite field GF(28) [35] is generated by any irreducible polynomial of degree 8. For our purpose, we use the polynomial

$$ g (x) = x^{8}+x^{4}+x^{3}+x + 1. $$
(3)

The co-efficient of symmetric bivariate polynomial are utilized in encoding of text and all the operations are done in GF(28).

2.2 Message encoding

In two layer security process, we first encode the text Consider a text message with S characters T1, T2, T3, … , TS and then arrange it in column major order, shown in Table 1. Let, Dk denotes the corresponding ASCII values of the k-th character Tk. Hence, considering ASCII value of each character, the above matrix can be rewritten as in Table 2.

Table 1 Text message arrangement
Table 2 ASCII values of each character

Divide this matrix into 2 × 3 non-overlapping cells. Here, we represent each cell as shown in Table 3. Generate a symmetric bivariate polynomial term (BPT) chart (Table 4).

Table 3 Smallest message block
Table 4 BPT chart

Based on Table 4, construct the bivariate symmetric polynomial equation as shown in (1) and (2).

$$\begin{array}{@{}rcl@{}} p (x,y)& & = D_{i} + D_{i + 2} y + D_{i + 4} y^{2} + D_{i + 2}x + D_{i + 1} xy \\ &&+ D_{i + 3} xy^{2} + D_{i + 4}x^{2} + D_{i + 3} x^{2}y + D_{i + 5} x^{2}y^{2}. \end{array} $$
(4)

Choose the values of {xi : 0 < i ≤ 3} as follows.

$$ x_{i}= (H\times W +i\times F) \mod 2^{8}-1, \text{ if } 0<i\le 3, $$
(5)

For an image, H and W are the height and the width respectively. F can be taken as any non-zero constant and it must be known to the recipient. For an audio clip, it is first decomposed into frames of equal size. W stands for the frame size and H is taken as 1. Depending on the size of the message to be embedded, a sequence of images or audio frames is required. In such cases, F stands for the image/audio index number in the sequence.

After substituting x by {xi : 0 < i ≤ 3} in (4), we get three equations (6).

$$\begin{array}{@{}rcl@{}} p (x_{1},y) &=& C_{1}+C_{2}y+C_{3}y^{2}; \\ p (x_{2}, y) &=& C_{4}+C_{5}y+C_{6}y^{2}; \\ p (x_{3}, y) &=& C_{7}+C_{8}y+C_{9}y^{2}. \end{array} $$
(6)

Note that from the first equation, we take coefficients from all the powers of y. However, in the second equation we skip the term y2 and in the third equation we skip the term y. This makes the three resultant equations (formed by the subset of selected coefficients) linearly independent. It enables unique the decoding of text at the time of retrieval. It has been detailed in Section 3 as D′ and Theorem 3 in Section 4.

The co-efficients [C1, C2, C3, C4, C5, C7] are stored in an array. Here, we perform all our calculations in GF(28) using the irreducible polynomial in (3). On processing for every block of text change, 6-dimension co-efficient vector \(\overrightarrow {C}_{j}\), where j ∈ {1, 2, 3, 4, 5, 7} are taken to for 48-bit binary coefficient vector. It is the encoded version of the text block. The entire text message is divided into the blocks and each block is thus encoded. Instead of directly embedding the ASCII value of text data, the encoded bit stream is then embedded into the images (or in audio frames).

2.3 Embedding of encoded message

Once encoding is over, the next task is to embed the encoded stream into the cover (as next layer of security). Here we describe the embedding process for a single image. For a sequence of images, the same process is to be repeated for the next portion of the encoded stream. Similar approach may be followed in audio track also.

figure d

The image is divided into blocks of size 3 × 3, indexed by β. The number of bits to be embedded in a pixel primarily depends on its intensity value and that of the pixel at the center of the corresponding block. Thus, it varies from pixel to pixel. Let \(\omega ^{(\beta )}_{i,j}\) be the intensity of the center pixel of the block β as shown in Table 5.

Table 5 3 × 3 cover image block

Pivot value for the block is calculated as in (7).

$$ P^{(\beta)}= \omega^{(\beta)}_{i,j}-(\omega^{(\beta)}_{i,j}\mod~4). $$
(7)

The embedding process will be such that P(β) remains unaltered (proof shown in section III). Each pixel of the block is compared with the pivot value, P(β) and their difference \(Q^{(\beta )}_{i+k_{1},j+k_{2}}\) is calculated as shown in (8).

$$ Q^{(\beta)}_{i+k_{1},j+k_{2}}= |\omega^{(\beta)}_{i+k_{1},j+k_{2}} - P^{(\beta)}|; $$
(8)

where k1 ∈ {− 1, 0, 1} and k2 ∈ {− 1, 0, 1}.

Based on \(Q^{(\beta )}_{i+k_{1}, j+k_{2}}\), partition the size of bitstream for the pixel at i + k1, j + k2 is determined. The intensity range [0, 255] is divided into number of non-uniform slots (in our case, it is 11) and \(R^{(l)}_{m}\), the lower intensity value of mth slot can be obtained as follows.

$$ R^{(l)}_{m} =\left\{\begin{array}{l} 0 \text{ , if }m = 0; \\ 4 \times {\sum}^{m}_{v = 1} 2^{\lfloor{\frac{m-v}{2}}\rfloor} \text{ , if }0<m<11. \end{array}\right. $$
(9)

Assume that \(Q^{(\beta )}_{i+k_{1},j+k_{2}}\) belongs to the mth slot for some m.

Number of consecutive bits (nm) are then taken from the binary coefficient vector, where

$$ n_{m}= \log_{2} (R^{(l)}_{m + 1}-R^{(l)}_{m}). $$
(10)

Let dm be the decimal equivalent of the bit pattern of length nm and it is the value to be embedded. The stego pixel value corresponding to the cover pixel ω(β) is computed as follows.

$$ t^{(\beta)}= \left\{\begin{array}{l} P^{(\beta)}+ b_{m} \text{ , if }\omega^{(\beta)} \ge P^{(\beta)};\\ P^{(\beta)}- b_{m} \text{ , otherwise.} \end{array}\right. $$
(11)

where bm stands for

$$ b_{m}=d_{m}+ R^{(l)}_{m}. $$
(12)

Higher the intensity difference between the center pixel (hence the pivot) and pixel chosen for embedding, it is likely to be mapped onto a slot resulting into embedding of more number of bits from the binary coefficient vector. Embedding higher value to relatively high intensity will have less visual impact. Moreover, the embedding is contrast enhancing. It is worth to mention that the embedding process also maintains the uniformity of a smooth region as less bits are embedded.

It may be noted the bm is obtained by adding dm to the lower limit of the intensity slot where the difference between pixel intensity and the pivot value is mapped. As dm is not added to actual intensity value of the pixels, possibility of artifacts is also reduced. nm is chosen in a way that bm will always lie within the range of the slot. bm is obtained by incorporating the embedding effect on the difference between the pixel value and the pivot. To obtain the stego pixel value, finally bm is added to or subtracted from the pivot value. It may result into fall off.

Fall off is said to take place if t(β) exceeds 28 − 1 or is less than 0. t(β) is stored in the place of ω(β) in the cover image if fall off does not occur. If Fall off takes place then we use the following approach to counter it. For m = 0, t(β) is adjusted as shown in (13).

$$ t^{(\beta)}= \left\{\begin{array}{l} P^{(\beta)}- b_{m} \text{ , if }\omega^{(\beta)} \ge P^{(\beta)};\\ P^{(\beta)}+ b_{m} \text{ , otherwise.} \end{array}\right. $$
(13)

t(β) is then stored in the place of ω(β) in the cover image. If m > 0, then nm is recalculated for fall off pixel as shown in (14).

$$ n_{m}= \log_{2}(R^{(l)}_{m}-R^{(l)}_{m-1}). $$
(14)

nm number of consecutive bits are taken from the binary equivalent array of coefficient matrix and its corresponding decimal equivalence (dm) is calculated. bm is obtained by (15).

$$ b_{m}=d_{m}+ R^{(l)}_{m-1}. $$
(15)

t(β) is also recalculated by using (16).

$$ t^{(\beta)}= \left\{\begin{array}{l} P^{(\beta)}+ b^{(\beta)} \text{ , if }\omega^{(\beta)} \ge P^{(\beta)};\\ P^{(\beta)}- b^{(\beta)} \text{ , otherwise.} \end{array}\right. $$
(16)

The above process to counter fall off is repeated until t(β) comes within accepted limits, i. e., 0 to 28 − 1.

The process of embedding can also be applied on the audio track. The samples of cover audio is first broken into 3 × 1 blocks. A sample block is shown in Table 6.

Table 6 3 × 1 audio cover block

The pivot value for the block is calculated using the following equation.

$$ P^{\prime (\beta)}= {\Omega}^{(\beta)}_{i}-({\Omega}^{(\beta)}_{i}mod~4). $$
(17)

Generally, audio samples are of 16 bit. Hence total magnitude is divided into non-uniform slots as follows.

$$ R^{\prime (l)}_{m} = \left\{\begin{array}{l} 0 \text{ , if }m = 0;\\ 64 \times {\sum}^{m}_{v = 1} 2^{\lfloor{\frac{m-v}{5}}\rfloor} \text{ , if }0<m<39;\\ 65472 \text{ , if }m = 39; \end{array}\right. $$
(18)

Subsequently, similar approach of embedding as in case of cover image is deployed.

2.4 Retrieval

figure e

The stego image is first broken into 3 × 3 blocks of pixel values.

The center pixel \(\omega ^{(\beta )^{*}}_{i,j}\) is then considered and the pivot value for Table 7 is calculated using (19).

$$ P^{(\beta)^{*}}= \omega^{(\beta)^{*}}_{i,j}- (\omega^{(\beta)^{*}}_{i,j}mod~4). $$
(19)
Table 7 3 × 3 stego image block

Each pixel value of Table 7 is compared with \(P^{(\beta )^{*}}\) and their difference \(Q^{(\beta )^{*}}\) is calculated by using (20).

$$ Q^{(\beta)^{*}}_{i+k_{1},j+k_{2}}=|\omega^{(\beta)^{*}}_{i+k_{1},j+k_{2}} - P^{(\beta)^{*}}| , $$
(20)

where k1 ∈ {− 1, 0, 1} and k2 ∈ {− 1, 0, 1}.

Now \(Q^{(\beta )^{*}}_{i+k_{1},j+k_{2}}\) is checked in which range it falls by (9) and the greatest value of \(R^{(l)^{*}}_{m}\) lesser than \(Q^{(\beta )^{*}}_{i+k_{1},j+k_{2}}\) is noted.

The number of bits that were embedded is obtained from (21).

$$ n_{m}^{*}= \log_{2}(R^{(l)^{*}}_{m + 1} -R^{{(l)}^{*}}_{m}). $$
(21)

The decimal equivalence \(d_{(m)}^{*}\) of the embedded bit-pattern is obtained by:

$$ d_{m}^{*}=Q^{(\beta)^{*}}_{i+k_{1},j+k_{2}}-R^{{(l)}^{*}}_{m}. $$
(22)

Finally the binary sequence \(b^{*}_{m}\) is the binary equivalence (of length \(n^{*}_{m}\)) of \(d_{m}^{*}\). Like this the entire image is processed to get the complete embedded binary sequence.

Then the audio component is similarly processed, albeit with 3 × 1 blocks of sample values as shown in Table 6.

The binary sequence obtained from image be concatenated with that from the audio as shown in Table 8. Eight consecutive and non-overlapping bits are then extracted from this array to convert them into their decimal values to generate the coefficient matrix as shown in Table 9.

Table 8 Retrieved binary sequence
Table 9 Coefficient matrix

Calculate x1, x2 and x3 in the same manner as during embedding. Six consecutive and non-overlapping bits \([C^{\prime }_{1}, C^{\prime }_{2}... C^{\prime }_{5},C^{\prime }_{7}]\) are then extracted from this array. These values are same as the values of [C1, C2, C3, C4, C5, C7] obtained from (6). The decimal values of each character of the original text message is obtained as follows. For i = 1 and x = x1 in (4) we get (23) to (25). For i = 1 and x = x2 in (4) we get (26) and (27). Finally for i = 1 and x = x3 in (4) we get (28).

$$\begin{array}{@{}rcl@{}} D^{\prime}_{1} + D^{\prime}_{3}x_{1} +D^{\prime}_{5}{x_{1}^{2}} &=& C^{\prime}_{1}; \end{array} $$
(23)
$$\begin{array}{@{}rcl@{}} D^{\prime}_{3} + D^{\prime}_{2}x_{1} +D^{\prime}_{4}{x_{1}^{2}} &=& C^{\prime}_{2}; \end{array} $$
(24)
$$\begin{array}{@{}rcl@{}} D^{\prime}_{5} + D^{\prime}_{4}x_{1} +D^{\prime}_{6}{x_{1}^{2}} &=& C^{\prime}_{3}; \end{array} $$
(25)
$$\begin{array}{@{}rcl@{}} D^{\prime}_{1} + D^{\prime}_{3}x_{2} +D^{\prime}_{5}{x_{2}^{2}} &=& C^{\prime}_{4}; \end{array} $$
(26)
$$\begin{array}{@{}rcl@{}} D^{\prime}_{3} + D^{\prime}_{2}x_{2} +D^{\prime}_{4}{x_{2}^{2}} &=& C^{\prime}_{5}; \end{array} $$
(27)
$$\begin{array}{@{}rcl@{}} D^{\prime}_{1} + D^{\prime}_{3}x_{3} +D^{\prime}_{5}{x_{3}^{2}} &=& C^{\prime}_{7}. \end{array} $$
(28)

We call (23) to (28) as the Retrieval System of Equations. These equations are solved to obtain \([D^{\prime }_{1}... D^{\prime }_{6}]\). The decimal values are converted to characters using ASCII to obtain the text messages.

3 Theoretical analysis of proposed method

Here we describe theoretical analysis of propose method by following theorems.

3.1 Pivot invariance theorem

Theorem 1

The pivot value remains invariant before and after embedding data into the cover media.

Proof

From (7), we find that the pivot value is calculated by \(P^{(\beta )}= \omega ^{(\beta )}_{i,j}-(\omega ^{(\beta )}_{i,j}\mod ~4)\). Thus, P(β) is divisible by 4 and \(\omega ^{(\beta )}_{i,j} = P^{(\beta )} + k\), where k ∈ [0, 3]. Hence, two bits of data are embedded into the center pixel.

At the time of retrieval the message, pivot value will be \( P^{(\beta )^{*}}= \omega ^{(\beta )^{*}}_{i,j}-(\omega ^{(\beta )^{*}}_{i,j}\mod ~ 4)\) = P(β) + k − (P(β) + k mod 4) = P(β) + kk = P(β). □

3.2 Range invariance theorem

Positive distortion is defined as the increase in pixel value after embedding while negative distortion is defined as the decrease in pixel value after embedding.

Lemma 1

The maximum possible positive and negative distortion for embedding n bits of data in each of the pixels of the image is given in below.

$$ max \delta_{i}^{+} = 2^{n} -1. $$
(29)
$$ max \delta_{i}^{-} =-2^{n} + 1. $$
(30)

Proof

maxδi = 2n − 1 and \(max \delta _{i}^{-} = -2^{n}+ 1\) because maximum possible decimal value that can be generated by n bits is 2n − 1. □

Theorem 2

(Range Invariance Theorem) The range of \(Q^{(\beta )}_{i+k,j+k}\) and \(Q^{(\beta )^{*}}_{i+k,j+k}\) remain invariant before and after embedding the secret data.

Proof

The maximum possible positive and negative distortion for embedding n bits of data in each of the pixels of the image is given in Lemma 1. \( R^{(l)}_{m}\) is defined lower range of \(Q^{(\beta )}_{i+k,j+k}\). Thus the maximum possible value of \(Q^{(\beta )}_{i+k,j+k}\) will be \(R^{(l)}_{m}+ max \delta _{i}+ 1\). Now,

$$\begin{array}{@{}rcl@{}} R^{(l)}_{m}+ max \delta_{i}+ 1&=& R^{(l)}_{m}+ 2^{n} - 1 + 1 \\ && (\text{from }(29))\\ &=& R^{(l)}_{m} + 2^{n}\\ &=& R^{(l)}_{m}+ 2^{\log_{2}(R^{(l)}_{m + 1}-R^{(l)}_{m})}\\ &=& R^{(l)}_{m}+ R^{(l)}_{m + 1} - R^{(l)}_{m}\\ &=& R^{(l)}_{m + 1}. \end{array} $$

Hence, we see that that the lowest value of in any range \(R^{(l)}_{m} + 1\) do not exceed the highest value \(R^{(l)}_{m + 1}\) even if maximum distortion occurs. □

3.3 Linear independence of retrieval system of equations

Next, we show that System of (23) to (28) yields a unique solution.

Theorem 3

The System of (23) to (28) is consistent and has a unique solution in \(D^{\prime }_{i\in \{1,2,3,\cdots ,6\}}\).

Proof

The co-efficient matrix (\(D^{\prime }_{coeff}\)) of the system shown in (23) to (28) can be written as:

$$D^{\prime}_{coeff}= \left[\begin{array}{cccccc} 1 & 0 & x_{1} & 0 & {x_{1}^{2}} & 0 \\ 0 & x_{1} & 1 & {x_{1}^{2}} & 0 & 0 \\ 0 & 0 & 0 & x_{1} &1 & {x_{1}^{2}} \\ 1 & 0 & x_{2} & 0 & {x_{2}^{2}} & 0 \\ 0 & x_{2} & 1 & {x_{2}^{2}} & 0 & 0 \\ 1 & 0 & x_{3} & 0 & {x_{3}^{2}} & 0 \end{array}\right] $$

We can write,

$$D^{\prime}_{coeff}= \left[\begin{array}{c} D_{x_{1}} \\ D_{x_{2}} \\ D_{x_{3}} \end{array}\right], $$

where,

$$D_{x_{1}}= \left[\begin{array}{cccccc} 1 & 0 & x_{1} & 0 & {x_{1}^{2}} & 0 \\ 0 & x_{1} & 1 & {x_{1}^{2}} & 0 & 0 \\ 0 & 0 & 0 & x_{1} &1 & {x_{1}^{2}} \end{array}\right], $$
$$D_{x_{2}}= \left[\begin{array}{cccccc} 1 & 0 & x_{2} & 0 & {x_{2}^{2}} & 0 \\ 0 & x_{2} & 1 & {x_{2}^{2}} & 0 & 0 \end{array}\right], $$
$$D_{x_{2}}= \left[\begin{array}{cccccc} 1 & 0 & x_{3} & 0 & {x_{3}^{2}} & 0 \end{array}\right]. $$

From (5), we find that {xi|0 < i ≤ 3} has been chosen in such a way that xi ∈ {1, 2, … , 28 − 1} are distinct. So any row of \(D_{x_{i}}\) is independent of any row of \(D_{x_{j}}\), for ij ∈ {1, 2, 3}. Thus, the rank of \(D^{\prime }_{coeff}\) would be the sum of the ranks of \(D_{x_{1}}\), \(D_{x_{2}}\) and \(D_{x_{3}}\). Now \(D_{x_{1}}\) has 3 rows and hence its rank is at most 3. Since the submatrix formed by the first three columns of \(D_{x_{1}}\) has rank 3, \(D_{x_{1}}\) has rank exactly equal to 3. Similarly, the first two columns of \(D_{x_{2}}\) indicate that its rank is 2. The rank of \(D_{x_{3}}\) is trivially 1. Hence, the rank of \(D^{\prime }_{coeff}\) is (3 + 2 + 1) = 6. Thus, the system is consistent and has a unique solution. □

3.4 Proof of correctness

One important issue regarding any steganographic algorithm is to check whether the embedded secret message can be correctly extracted at the receiver end or not. We have proved it both theoretically and experimentally. Experimental result have been discussed in Section 4 and the theoretical proof is given below.

Theorem 4

The embedded message bits can be extracted without any bit loss.

Proof

We have to prove that \(d_{(m)}^{*}=d_{(m)}\). Here the two possible cases are:

Case 1: :

If t(β) > p(β) then

\(d_{(m)}^{*} =Q^{((\beta )^{*})}-R^{(l)}_{m^{*}} = \omega ^{(\beta )^{*}}_{i+k,j+k} \sim P^{(\beta )^{*}} - R^{(l)}_{m^{*}} = \omega ^{(\beta )^{*}}_{i+k,j+k} \sim P^{(\beta )^{*}} - R^{(l)}_{m}\) (by Range Invariance Theorem ) \(=(t^{(\beta )}\sim P^{(\beta )^{*}})-R^{(l)}_{m}\) (Since stego-pixel intensity is \(t^{(\beta )}=\omega ^{(\beta )^{*}}_{i+k,j+k}\)) \(=(P^{(\beta )}+ b_{(m)}- P^{*}_{(\beta )})-R^{(l)}_{m}\) (Since \(t^{(\beta )}>P^{(\beta )}) =b_{(m)}-R^{(l)}_{m}\) (Since \(P^{(\beta )^{*}}=P^{(\beta )}) =d_{(m)}+ R^{(l)}_{m}-R^{(l)}_{m} =d_{(m)}\)

Case 2: :

If t(β) < p(β) then \(d_{(m)}^{*}= Q^{((\beta )^{*})}-R^{(l)}_{m^{*}} = \omega ^{(\beta )^{*}}_{i+k,j+k} \sim P^{(\beta )^{*}} - R^{(l)}_{m^{*}} = \omega ^{(\beta )^{*}}_{i+k,j+k} \sim P^{(\beta )^{*}} - R^{(l)}_{m}\) (by Range Invariance Theorem ) \(=(t^{(\beta )}\sim P^{(\beta )^{*}})-R^{(l)}_{m}\) (Since stego-pixel intensity is \(t^{(\beta )}=\omega ^{(\beta )^{*}}_{i+k,j+k}) =(P^{*}_{(\beta )-P^{(\beta )}+ b_{(m)} })-R^{(l)}_{m}\) (Since \(t^{(\beta )}<P^{(\beta )}) =b_{(m)}-R^{(l)}_{m}\) (Since \(P^{(\beta )^{*}}=P^{(\beta )}) =d_{(m)}+ R^{(l)}_{m}-R^{(l)}_{m} =d_{(m)}\) the result follows.

3.5 Embedding efficiency

Embedding efficiency is the embedding strength of any steganographic algorithm against distortion that occurs due to concealing message bits into cover media. The embedding efficiency definition [8] appearing as follows:

Definition 1

Embedding efficiency of any steganographic algorithm can be defined as the expected number of message bits concealed per embedding change.

Theorem 5

If n is the number of embedded bits per pixel of the cover media, then the maximum embedding efficiency of proposed method is \(\frac {n\cdot 2^{n}}{2^{n}-1}\)

Proof

The possible values of distortions for concealing n bits message in 8-bits intensity value is given by {1, 2, ... , 2n}. Assuming uniform distribution, probability of change in pixel intensity is given by \(\frac {2^{n}-1}{2^{n}}\). Dividing n by this probability, we get embedding efficiency of our technique. □

For any LSB based steganographic algorithm, the embedding efficiency is \(\frac {1}{1/2}= 2\) [8]. The embedding efficiency for steganographic algorithms using random ternary symbols with uniform distribution in the media is 2.3774. The embedding efficiency of the works [8] and [28] are 4.4 and 5.33 respectively. Limiting n to 6 in our technique we get the embedding efficiency as 6.0952 (image) and 13.0015 (audio) which is higher than the said techniques.

4 Experimental results

Along with the analysis of the visual quality of the cover and stego (image or audio) we also analyze the strength of our method in terms of visual quality analysis, average embedding capacity, embedding efficiency etc. Our method is compared with several other existing methods. The superiority of this novel algorithm can be seen through the comparative study shown in Table 15.

Besides theoretical outcomes, we have thoroughly analyzed the results obtained by applying the proposed steganographic algorithm on the different types of images, audios and videos as well. In our experimental setup, we have considered standard images like lena, baboon, fruits. It is to be noted that a sequence of images may be required when the message size becomes very large. For such cases, we have considered the sequence of image frames extracted from the video clips like Architectures, Cartoon, Wildlife. Number of frames depends on size of the data to be embedded. So starting from a point consecutive frames are used to satisfy the need. However, in our experimental setup we have considered 100 of each categories of video clips, viz., Architectures, Cartoon, Wildlife mostly having approximately 100 frames of size 255 × 141. In order to embed message bits in the audio clips we have divided the clip into frames consisting of 1024 samples. For audio, .wav files are considered. Image sequence is in uncompressed form. It is considered that the message is not so long to make the transmission of uncompressed image sequence prohibitive.

4.1 Visual perceptibility analysis

Our proposed method has been widely tested over several image and audio. In our experimental setup we find an insignificant change between the cover and its respective stego version. Figures 1 and 2 shows original and corresponding stego version of standard images. Figures 3 and 4 shows one sample image from every image sequence and also the corresponding stego version. For audio, few samples of every audio sequence and also the corresponding stego version is given in Figs. 5 and 6.

Fig. 1
figure 1

Cover version of standard images, viz., lena, baboon and fruits (left to right)

Fig. 2
figure 2

Stego version of standard images, viz., lena, baboon and fruits (left to right)

Fig. 3
figure 3

One of cover frame from each of architectural, cartoon and wildlife video (left to right)

Fig. 4
figure 4

One of stego frame from each of architectural, cartoon and wildlife video (left to right)

Fig. 5
figure 5

The waveforms of sample cover audio frame (left) and stego audio frame (right) from each of Artificial video

Fig. 6
figure 6

The waveforms of sample cover audio frame (left) and stego audio frame (right) from each of Cartoon video

4.2 Analysis of MSE, PSNR and SNR

Mean Square Error (MSE) as shown in (31) (where I represents the cover and I represents the stego image of size W × H) shown in Tables 10 and 11 for images of four categories and standard images respectively.

$$ MSE=\frac{1}{W \times H}\sum\limits_{i = 0}^{W-1}\sum\limits_{j = 0}^{H-1}[I(i,j)-I^{*}(i,j)]^{2}. $$
(31)
Table 10 MSE and PSNR for images of different sequence
Table 11 MSE and PSNR for standard images

The values of Peak-Signal-to-Noise-Ratio (PSNR) as shown in (32)

$$ PSNR= 10 \cdot \log_{10}\left( \frac{255^{2}}{MSE}\right). $$
(32)

Signal-to-noise ratio (SNR) is defined as the power ratio between a signal (meaningful information) and the background noise (unwanted signal), i.e.,

$$SNR = \frac{\text{signal power}}{\text{noise power}}.$$

The Signal-to-Noise-Ratio (SNR) for the audio frame is shown in Table 12.

Table 12 SNR for Audio of different sequences

4.3 Histogram analysis

A histogram is the graphical representation of distributed numerical image intensity values. Figure 7 shows the histograms of both cover and stego version of a sample image. We find negligible differences between cover and stego. There is no difference of histogram for sample cover lena and stego lena image.

Fig. 7
figure 7

Histogram of cover (at top) and stego (at bottom)

4.4 Bitplane analysis

A bitplane of an image can be defined as a set of bits which corresponds to any particular bit position for the respective binary numbers that represents the image Fig. 8.

Fig. 8
figure 8

Bitplane analysis of cover (lena.bmp) and stego (lenaStego.bmp)

4.5 Average embedding capacity

Definition 2

Average embedding capacity is defined by the number of embedded bits per pixel. i.e.,

$$AEC=\frac{Number\ of\ Embedded\ Bits}{Total\ Number\ of\ Pixels}.$$

Average embedding capacity for an image with x pixels can be formulated as,

$$\gamma = \frac{1}{x}\sum\limits_{i = 1}^{x} y_{i},$$

where yi ⇒ Number of message bits embedded in the i-th pixel.

The average embedding capacity is one of the most important criteria to determine the efficiency of an algorithm. Embedding capacity (on an average) can be of several forms.

Bits per pixel (bpp) for the image component is calculated by the ratio of number of bits embedded to the number of pixels in the image while bits per sample (bps) for the audio frame is calculated by the ratio of number of bits embedded to the number of samples in the audio.

Our method has a very high embedding efficiency which is not achieved at the cost of visual impairing. The average embedding capacity of our method on different video categories is shown in Tables 13 and 14. The fact that our method proves its superiority over other methods in terms of average embedding capacity can be understood from Table 15. Although the capacity of method [13] is better than our method yet our method yields a comparatively very high PSNR value claiming higher embedding capacity.

Table 13 Image for different sequences
Table 14 Capacity for audio
Table 15 Comparison of capacity and PSNR with other methods

4.6 Color frequency test

Chi-square test is used to show whether the distribution of color frequency in an image matches with the distribution which demonstrates distortion from hidden data that is embedded. We determine the probability of embedding by using (33).

$$ p={\int}_{0}^{\chi^{2}}{ \frac{t^{\frac{v}{2}-1}e^{-t/2} }{2^{\frac{v}{2}}{\Gamma (\frac{v}{2})}} }dt. $$
(33)

where, v is the degrees of freedom in Chi-square test and v + 1 is the number of distinct color categories. This probability is calculated by considering pixel values all over the image component of the video. Places of video where no secret message is embedded, p, obtained from (33), should be almost equal to 0. However in the proposed technique data is not directly embedded into pixels but adjustment is made to the difference in values of two neighboring pixels. Thus we can safely conclude that this method is incapable of analyzing hidden messages in our novel method. Moreover the testing of our algorithm in different categories of videos have yielded the value p = 0.0191 which further proves the effectiveness of this method.

4.7 Sample pair method

Dumitrescu et al. [5] show that this scheme is based on the assumptions given below: Assumption: Let, P be the set of all pixel intensity pairs (r, s) such that either rs = 2u + 1 where 0 ≤ u ≤ 27 − 1 and the even component is lesser than the odd component of the pair. Let, Q be the set of all pixel intensity pairs (r, s) such that either rs = 2u + 1 where 0 ≤ u ≤ 27 − 1 and the even component is greater than the odd component of the pair. The assumption is that statistically we have |P| = |Q|.

In Table 13, we find the AEC of 3 types of images cartoon, Wildlife, Architectural are 2.21 bpp, 2.22 bpp, 2.25bpp. But in Table 16, we find that the sample pair analysis can detect only 0.98 bpp, 0.89 bpp, 1 bpp for the aforesaid image types, i.e., almost 60% message bits remains undetectable by sample pair analysis. We conclude that our proposed algorithm withstands this test.

Table 16 Sample pair results for image sequences

4.8 ROC analysis

Among various performance measurement techniques against steganalysis tools, measuring the false positive rate and false negative rate in order to draw the receiver operating characteristic (ROC) curve is very popular. As shown in Table 17, the confusion matrix is determined in terms of True positive, True Negative, False Negative and False Positive as follows:

  • TP: Stego that is correctly classified as Stego

  • TN: Cover that is correctly classified as Cover

  • FP: Cover that is wrongly classified as Stego

  • FN: Stego that is wrongly classified as Cover

Table 17 Confusion matrix

Based on the TP, TN, FP and FN values, True Positive Rate (TPR) and False Positive Rate (FPR) are determined as shown in (34) and (35)

$$\begin{array}{@{}rcl@{}} TPR &=& \frac{\#TP}{\#TP+\#FN} \end{array} $$
(34)
$$\begin{array}{@{}rcl@{}} FPR &=& \frac{\#FP}{\#TN+\#FP} \end{array} $$
(35)

Figure 9 shows the Receiver Operating Curve (ROC) for proposed algorithm using the TPR and FPR values obtained from (34) for different thresholds using StegExpose tool [2] that combines RS analysis, Sample Pair, etc. In Appendix A we show a table of sample images containing filename, classication (stego or cover), quantitative steganalysis (payload size in bytes), Primary Sets, χ2, Sample Pairs, RS analysis, Fusion (mean) in CSV format. In our experimental setup, we test 300 set of images and find the area under the curve (AUC) of the ROC i.e., 0.5503 which is low that indicates the strength of our proposed method.

Fig. 9
figure 9

ROC curve for proposed algorithm

In our experimental setup, we perform Bit Error Rate (BER), Normalized Cross-Correlation (NCC), Universal Image Quality Index (UIQI) test Structural Similarity Index Metric (SSIM) test as describe below.

4.9 Bit error rate analysis

The value of Bit Error Rate (BER) (shown in Tables 18 and 19) represents the ratio of total number of distorted bits to the total number of bits in the image or audio frame [37].

Table 18 Avg. BER for image and audio sequences
Table 19 BER for standard images

4.10 Normalized cross-correlation (NCC)

The Normalized Cross-Correlation (NCC) is defined by the amount of deflection in the stego image or audio with respect to its cover version [37]. Always the value of NCC is 1 for a pair of same images (or audio frame). The value of NCC is obtained from the (36)

$$ NCC=\frac{{\sum}^{P}_{i = 1}{\sum}_{j = 1}^{Q}[\omega(i,j).\omega^{*}(i,j)]} {{\sum}_{i = 1}^{P}{\sum}_{j = 1}^{Q} \omega(i,j)^{2}} $$
(36)

where ω(i, j) and ω(i, j) represent the pixel intensity values of the cover and stego image (or audio) respectively. The average NCC value of our method is shown in Tables 20 and 21, which is very close to 1 denoting marginal distortion caused by embedding.

Table 20 Avg. NCC for image and audio sequences
Table 21 Avg. NCC for standard images

4.11 Universal image quality index (UIQI)

Universal Image Quality Index is an index which calculates any kind of distortion as a combination of three factors: loss of correlation(Q1), luminance distortion (Q2) and contrast distortion (Q3) [11].

Let \(x=\{x_{i}|i \in \mathbb {N}^{+}\}\) and \(y=\{y_{i}|i\in \mathbb {N}^{+}\}\) be the pixels of the original and the stego images respectively. N is number of pixels in the images. Here Q1, Q2 and Q3 are given in following equation:

$$ Q_{1}=\frac{\sigma_{xy}}{\sigma_{x}\sigma_{y}} $$
(37)
$$ Q_{2}=\frac{2\overline{xy}}{(\overline{x^{2}}+\overline{y^{2}})} $$
(38)
$$ Q_{3}=\frac{2\sigma_{x}\sigma_{y}}{({\sigma^{2}_{x}}+{\sigma^{2}_{y}})} $$
(39)

where,

$$\overline{x}=\frac{1}{N}\sum\limits^{N}_{i = 1}x_{i} $$
$$\overline{y}=\frac{1}{N}\sum\limits^{N}_{i = 1}y_{i} $$
$${\sigma^{2}_{x}} =\frac{1}{N-1}\sum\limits^{N}_{i = 1}(x_{i}-\overline{x})^{2} $$
$${\sigma^{2}_{y}} =\frac{1}{N-1}\sum\limits^{N}_{i = 1}(y_{i}-\overline{y})^{2} $$
$$\sigma_{xy} =\frac{1}{N-1}\sum\limits^{N}_{i = 1}(x_{i}-\overline{x})(y_{i}-\overline{y}) $$

Now, UIQI (say, Q) is determined by (40)

$$\begin{array}{@{}rcl@{}} Q &=& Q_{1}*Q_{2}*Q_{3} \\ &=&\frac{4\sigma_{xy}\overline{x}\overline{y}}{({\sigma^{2}_{x}}+{\sigma^{2}_{y}})[(\overline{x})^{2}+(\overline{y})^{2}]} \end{array} $$
(40)

\(\overline {x}\) is the average intensity of original image. \(\overline {y}\) is the same for corresponding stego image. σx and σy denote the standard deviation of the same. σxy is the covariance. In our case UIQI value is given in following table Tables 22 and 23. The values are very close to 1 which signifies the distortion is minimal.

Table 22 Avg. UIQI for image and audio sequences
Table 23 Avg. UIQI for standard images

4.12 Structural similarity index metric (SSIM)

Structural Similarity Index Metric (SSIM) can be defined as follows:

Definition 3

Structural Similarity Index Metric (SSIM) [11] is a image degradation measurement estimated using (41).

$$ SSIM\triangleq\frac{(2\times\overline{x}\times\overline{y}+A_{1})(2\times\sigma_{xy}+A_{2})}{({\sigma^{2}_{x}}+{\sigma^{2}_{y}}+A_{2}) \times(\overline{x^{2}}+\overline{y^{2}}+A_{1})} $$
(41)

where A1 = (m1F)2 and A2 = (m2F)2 are two constants. F is 2number of bits per pixel − 1. m1 = 0.01 and m2 = 0.03 by default.

The value of SSIM is between − 1 and 1. For identical image value is 1. In our case SSIM value is given in following Tables 24 and 25. The values are very close to 1 which signifies the distortion is minimal.

Table 24 Avg. SSIM for image and audio sequences
Table 25 Avg. SSIM for standard images

4.13 StirMark analysis

The StirMark 4. 0 [33] tool is considered as a benchmark tool to justify the strength of any steganographic algorithm requires. It occurs a small geometrical alteration that causes a loss of synchronization between the analyzed images. The geometrical distortion may be “sheared”, “stretches”, “bent”, “shifted” and “rotated” by a small amount. The proposed algorithm exhibits satisfactory outcomes for the image. The very minute differences between the values of the cover and its stego version, as shown in Table 26, is a clear indication of the fact that our technique withstand the benchmark.

Table 26 StirMark analysis of proposed technique on cover and stego version of image wildlife (size: 255 × 141)

4.14 Payload curve

Figure 10 shows the different payload curves of distortion parameter Vs. embedding capacity. Here distortion parameter includes MSE, PSNR, BER, SNR, NCC, UIQI, SSIM etc. We vary the capacity from 20% to 100% in order to analyzed the effect on different quality metrics and distortion parameter as well. Figure 10a denotes MSE vs. capacity where MSE increases with capacity. Figure 10b denotes PSNR vs. capacity where PSNR decreases with capacity. Figure 10c denotes BER vs. capacity (for image) where BER increases with capacity. Figure 10.d denote NCC vs. capacity (for image) where NCC increases with capacity. Figure 10e and f denote UIQI vs. capacity (for image) and SSIM vs. capacity (for image) respectively where UIQI and SSIM both decreases with capacity. Figure 10g denotes SNR vs. capacity where SNR decreases with capacity. Figure 10h denotes BER vs. capacity (for audio) where BER increases with capacity. Figure 10i denote NCC vs. capacity (for audio) where NCC increases with capacity. Figure 10j denote SSIM vs. capacity (for audio) where SSIM decreases with capacity.

Fig. 10
figure 10

Distortion parameter vs. embedding capacity

The abbreviations, used in this paper, are provided in Table 27.

Table 27 Abbreviations used

5 Conclusion

In this paper, we have proposed a novel multibit steganographic technique that involves two phases. In the first phase, we encode the information to be embedded using Galois field (GF (28)) arithmetic. In the second phase, we embed the encoded information in the cover multimedia in spatial domain. For images (an uncompressed video can be considered as a set of images as well), our algorithm is capable of hiding maximum of 6 bits per pixel and for audio, it can hide a maximum of 13 bits per sample without keeping any perceptible signature in the stego media. We have presented both theoretical arguments and experimental results to establish high embedding capacity and security provided by our method. In this work, we have worked with lossless image and audio only. Applicability of our method in lossy domain remains an interesting open problem and we plan to take this up as part of our future work.