1 Introduction

The concept of unconditional secrecy was presented in the seminal article by Shannon, where he also showed that a one-time pad (or Vernam cipher) is unconditionally secure [23]. In particular, unconditional secrecy means that a computationally unbounded adversary cannot obtain any information about an encrypted message without a key. It is clear that this property is highly desirable, but if the one-time pad is used to encrypt a message, the length of the key used must be at least the length of the message (or, more precisely, its Shannon entropy). This requirement is too strict for many applications, and there are many approaches for creating secure ciphers with short or low entropy keys, see [3, 5, 6, 11, 12, 15, 17, 18]. One such approach was suggested by Shannon in [23], who described ideal cipher systems where a computationally unbounded adversary “does not obtain a unique solution to the cipher but is left with many alternatives, all of reasonable probability”. He built a theory of ideal ciphers and described some of them for the case when the probability distribution of encrypted messages is known. Later, ideal systems were proposed for the case of unknown statistics, see [17].

An interesting new way of building secure systems with short keys is the so-called entropic security, proposed by Russell and Wang [15] and developed by Dodis and Smith [5]. This cipher uses the entropy of the original message in such a way that the key length should be roughly the difference between the message length and its min-entropy (the exact definition will be given below). (Note that the use of the entropy message to improve the strength of the cipher was also used in [17, 18].)

The notion of an entropically-secure symmetric encryption scheme is extremely important for cryptography because one can construct this scheme with a key shorter than the length of the input. In a sense, this circumventins Shannon’s lower bound on the key length.

Another way to construct a short key cipher is the so-called honey cipher, proposed by Juels and Ristenpart [12] and developed by Jaeger et al. [11]. In some ways, the honey cipher is like the ideal cipher: a computationally unlimited adversary has many possible highly probable decryptions. Li et al. [14] developed and combined the ideas of the honey cipher and the entropically-secure ciphers to create a new class of easily implementable short key codes. In a sense, the idea of preprocessing an original message in order to increase its entropy is being developed and widely used in their methods.

Data compression and randomization are two methods of preprocessing the original message that have been used for centuries in cryptography [10, 23]. Moreover, homophonic coding can be used to compress and randomize together [10, 20]. The goal of both transformations is to make the probability distribution of the original messages closer to the uniform one (see an overview in [1]). Interestingly, both transformations have been successfully applied to some cryptographic problems: they were used to extract randomness [7, 21, 24] and to build an ideal steganographic system [22].

In this article, we combine entropically-secure encryption with the suggested compression and data randomization techniques and apply it to the following two cases: (i) the statistics of encrypted messages are known; and (ii) statistics are unknown, but messages are generated by a Markov chain with known memory (or connectivity). In the first case the key length is a constant, whereas in the second case it is \(O(\log n)\), where n is the length of the message. (But in both cases, the length of the secret key depends on the required security level). This makes it possible to apply an entropically-secure cipher so that the key length is independent of the entropy or the length of the message.

2 Definitions and preliminaries

We consider the problem of symmetric encryption, when there are two parties Alice and Bob and Alice wants to securely transmit a message M to Bob, where \(M \in \{0,1\}^n \), \(n\ge 1\), obeys a certain probability distribution p defined on the set \(\{0,1\}^n\). Alice and Bob have a shared secret key \( K = K_1 \ldots K_k \), which can be much shorter than the length of M, that is \( k \ll n \). Alice encrypts M with K and possibly some random bits, and obtains the word cipher(MK). Then she sends it to Bob, who decrypts the received message and obtains M. In addition, there is a computationally unlimited adversary Eve who does not know M and K, but knows the probability distribution p and wants to find some information about M based on the encrypted message.

Let \(\Lambda \) be a finite alphabet and P be a set of probability distributions defined on \(A=\Lambda ^n\) where \(n \ge 1\). By definition, the min-entropy of \(p \in \textbf{P}\) is as follows

$$\begin{aligned} h_{min}(p) = - \log \, \, \max _{a \in A} \, p(a) \,. \end{aligned}$$
(1)

(Here and below \(\log = \log _2 \,\).)

In this article we will consider ciphers that can be applied to messages obeying not just one particular distribution, but any distribution from some family. We will therefore define the following two families of probability distributions: \( \textbf{P}_{min}(h) = \{p: h_{min}(p) \ge h \}, \,\, \) and \(\textbf{P}_{markov}(m) \) which contains stationary ergodic Markov chains with memory, or connectivity, \(m, m \ge 0\). (The definition can be found in [2, 19] and Appendix).

Russell and Wang [15] suggested a definition of the entropic security which was generalised by Dodis and Smith [5] and can be formulated as follows: A probabilistic map Y is said to hide all functions on \(\Lambda ^n\) with leakage \(\epsilon \) if, for every adversary B, there exists some adversary \({\hat{B}}\) (who does not know Y(M)) such that for any distribution from \(\textbf{P}\), all functions f, and any \(M \in \Lambda ^n\)

$$\begin{aligned} | \, Pr\{ B(Y(M) ) \, = f(M) \} \, - Pr\{ {\hat{B}}(\,)\, = f(M) \} \, | \,\le \, \epsilon . \end{aligned}$$
(2)

(note that \({\hat{B}}\) does notknow Y(M) and, in fact, she guesses the meaning of the function f(M).) In what follows the probabilistic map Y will be cipher(MK) and f is a map \(f: \Lambda ^n \rightarrow \{0,1\}^*\).

Definition 1

The map Y() is called \(\epsilon \)-entropically secure for some family of probability distributions \(\textbf{P}\) if Y() hides all functions on \(\Lambda ^n\) with leakage of \(\epsilon \) for any \(p \in \textbf{P}\).

Note, that in a sense the Definition 1 is a generalisation of the Shannon notation of the perfect security. Namely, if we take \(\epsilon = 0\) and Y \(= cipher(M,K)\) and \(f(x) = x\), we obtain that for any M

$$\begin{aligned} | Pr\{{B}( cipher (M,K) ) = M \} - Pr \{ {\hat{B}} (\,\,) = M \} \,| \, = \, 0 \end{aligned}$$

(So, B and \({\hat{B}}\) obtained the same result, but B estimates the probability based on cipher(MK) , whereas \({\hat{B}}\) does it without knowledge of cipher(MK) ). So, the entropic security (2) can be considered as a generalisation of the Shannon’s perfect secrecy.

The following theorem of Dodis and Smith [5] is a generalisation of the results of Russell and Wang [15].

Theorem 1

[5] Let there be an alphabet \( \{0,1\}^n, n >0,\) with a probability distribution p whose min-entropy is not less then h (that is, \(p \in \textbf{P}_{min} \) (h) with \(\Lambda = \{0,1\}\)). Then there exists an efficient \(\epsilon \)-entropically secure cipher with the k-bit key where

$$\begin{aligned} k = n - h + 2 log (1/\epsilon ) +2. \end{aligned}$$
(3)

Let’s denote this cipher as \(cipher_{rw-ds}\) ( A description of one of these ciphers from [5] is given in the Appendix). It is important to note that one cipher is applied for any distribution whose min-entropy is not greater than h.

Another important notion is that of indistinguishability:

Definition 2

[5] A randomised map \(Y: \{0, 1\}^n \rightarrow \{0, 1\}^n, n \ge 1, \) is \(\epsilon \)-indistinguishable for some family of destributions \(\textbf{P}\) if there is a probability distribution G on \(\{0, 1\}^n\) such that for every probability distribution \(p \in \textbf{P}\) we have

$$\begin{aligned} SD(Y (M), G) \le \epsilon , \end{aligned}$$

where for two distributions AB

$$\begin{aligned} SD(A,B) = \frac{1}{2} \sum _{M \in \textbf{M}} | Pr\{A = M\} - Pr\{B = M\} | \,. \end{aligned}$$

Dodis and Smith [5] showed that entropic security and indistinguishability are deeply connected for distributions with bounded min-entropy:

Theorem 2

[5] Let Y be a randomised map with inputs of length n bits. Then

  1. 1.

    \(\epsilon \)-entropic security for \(\textbf{P}_{min}\) (t) implies \(4\epsilon \)-indistinguishability for \(\textbf{P}_{min}\) \((t-1)\) and

  2. 2.

    \( \epsilon \)-indistinguishability for \(\textbf{P}_{min}\) \((t-2)\) implies \( \epsilon /8\) -entropic security for \(\textbf{P}_{min}\) \((t-2)\), when \( 2 \log (1/\epsilon ) + 1 \ge t\).

It is worth noting that the same cipher (or random map Y) is \(\epsilon \)-entropically secure and \(\epsilon \)-indistinguishable, and both concepts are equivalent up to small parameter changes. Moreover, one cipher (and one Y) fits all distributions from \(\textbf{P}_{min}(t)\).

3 Suggested cipher: general construction

Suppose there is a set \(\textbf{M}\) of n-letter messages from \(\Lambda \) and a family of probability distributions \(\textbf{P}\) on \(\textbf{M}\). We can see from Theorem 1 that the choice of the length k of the key K depends significantly on the min-entropy of the probability distribution; specifically, \( k = n - h_{min} + 2 \log (1/\epsilon ) + 2\).

So, the following observation seems to be natural: transform the set \(\textbf{M}\) into a larger set of longer messages \(\textbf{M}^*\) in such a way that the minimum entropy of the transformed distribution becomes closer to the message length in \(\textbf{M}^*\). Then, apply the entropically-secure cipher \(cipher_{rw-ds}\) to messages from \(\textbf{M}^*\). It turns out that the resulting cipher for the original set \(\textbf{M}\) is also entropy-secure with a short key.

The proposed cipher is based on this observation and is described as follows. Suppose that there is a set \(\textbf{M}\) of n-letter messages with a probability distribution \(p \in \textbf{P} \).

Preliminary stage Alice and Bob build such a random map \(\phi \) of \(\Lambda ^n \rightarrow \{0,1\}^{n^*}, n^* \ge n, \,\) that

  1. i.

    there exists such a map \(\phi ^{-1}: \{0,1\}^{n^*} \rightarrow \Lambda ^{n}\) that \( \phi ^{-1}(\phi (m)) = m\) for any \(m \in \textbf{M} \), and

  2. ii.

    for any \(p \in \textbf{P}\) the min-entropy of the corresponding probability distribution \(\pi _p\) on \(\textbf{M}^*\) is \(n^* - \Delta \), where \(\Delta \) is significantly less than \(n^*\). (That is, \(\phi \) converts p to \(\pi _p\). Formally, for \(v \in \textbf{M}^*\), \(\,\,\pi _p(v) = Pr\{ \phi (u) = v\}, u \in \textbf{M}\), and u obeys the distribution p).

Message encryption Suppose that Alice wants to cipher a message \(m \in \textbf{M}\) with a leakage \(\epsilon \). She calculates \( z = \) \(cipher_{ rw-ds} \) \((\phi (m)) \) with key length \(\Delta + 2 \log (1/\epsilon ) +2\) and sends z to Bob. To decrypt, Bob computes \(\phi ^{-1} \) \((decipher_{ rw-ds} (z))\).

The random map \(\phi \) uses data compression and randomisation, so that we denote this cipher as \( cipher_{c \& r}\).

Theorem 3

Let there be a set of messages \(\textbf{M} \) \(\subset \Lambda ^n, n >0,\) and a probability distribution p on \(\textbf{M}\), \(p \in \textbf{P}\). Let \(\epsilon >0\) and assume that the described cipher \( cipher_{c \& r}\) applies to messages from \(\textbf{M}\) with a secret key of length \(\Delta + 2 \log (1/\epsilon ) +2\) and properties (i) and (ii) are valid.

Then this cipher is \(\epsilon \)-entropically secure, that is, for any function \(A: \, \{0,1\}^{*} \rightarrow \{0,1\}^*\), any \(f: \,\Lambda ^n \rightarrow \{0,1\}^*\) and \(M \in \textbf{M}\)

$$ \begin{aligned} | Pr \{ A(cipher_{c \& r} (M ) = f(M) \} - Pr \{{\hat{A}}(\,) = f(M) \} | \le \epsilon , \end{aligned}$$

where \({\hat{A}}\) does not use the value \( cipher_{c \& r} (M) \).

Proof

Taking into account that the min-entropy of \(\pi _p, p \in \textbf{P}, \) equals \(n^* - \Delta \), we see from Theorem 1 that for any function g

$$\begin{aligned} | Pr \{ A(cipher_{rw-ds}(v) = g(v) \} - Pr \{{\hat{A}}(\,) = g(v) \} | \le \epsilon , \end{aligned}$$

where v is a random variable with distribution \(\pi _p\) on \( \{0,1\}^{n^*} \), g is any function defined on \(\{0,1\}^{n^*}\) (\(g: \{0,1\}^{n^*} \rightarrow \{0,1\}^*)\) and \({\hat{A}}(\,)\) does not depend on \(cipher_{rw-ds}(v)\). For any function \(f: \Lambda ^n\rightarrow \{0,1\}^*\), any \(u \in \textbf{M}\) and any \(v= \phi (u)\) define the function \(g(v) = f(\phi ^{-1}(v ) ) (=f(u))\) (so, g(v) is defined if \(\pi _p(v) >0\)). The last inequality is valid for this function g, hence

$$\begin{aligned} | Pr \{ A(cipher_{rw-ds}(\phi (u) ) = f(u) \} - Pr \{{\hat{A}}(\,) = f(u) \} | \le \epsilon . \end{aligned}$$

Taking into account that \( cipher_{c \& r} (u) = cipher_{rw-ds}(\phi (u) )\) and \( f(\phi ^{-1}(v)) = f(u)\), we can see from the last inequality that

$$ \begin{aligned} | Pr \{ A(cipher_{c \& r}( u)) = f(u) \} - Pr \{{\hat{A}}(\,) = f(u) \} | \le \epsilon \,. \end{aligned}$$

The theorem is proven. \(\square \)

We can see from Theorem 2 that the cipher \( cipher_{c \& r}\) is \(2 \epsilon \) indistinguishable. Below we prove directly that \( cipher_{c \& r}\) is \(\epsilon \) indistinguishable with \(k = \Delta +2 log (1/\epsilon ) +6 \, \).

From Theorem 2 we know that, in fact, the indistinguishability is equal to the entropic security, and it holds for \(cipher_{rw-ds}\), but we are interested in the indistinguishability of the \( cipher_{c \& r}\). The following theorem establishes this.

Theorem 4

Let there be set of messages \(\textbf{M} \) \(\subset \Lambda ^n, n >0,\) and a probability distribution p on \(\textbf{M}\), \(p \in \textbf{P}\). Let \(\epsilon >0\) and assume that the described cipher \( cipher_{c \& r}\) applies to messages from \(\textbf{M}\) with a secret key of length \(\Delta + 2 \log (1/\epsilon ) + 6\) and properties (i) and (ii) are valid.

Then this cipher is \(\epsilon \)-indistinguishable.

Proof

In order to prove it suppose that \(cipher_{wr-ds}\) is applied to the words from the set \(\phi (\textbf{ M}) \) \( \subset \{0,1\}^{n^*} \) in such a way that it is \((1, \epsilon /4)\) entropically secure, where the length of the key equals \(\Delta + 2 \log (1/(\epsilon /4))+2 = 2 \log (1/\epsilon ) +6\). From Theorem 2 we can see that this cipher is \(\epsilon \) indistinguishable, that is, \(SD(cipher_{rw-ds}, G) \le \epsilon \), where G is a random variable on \(\{0,1\}^{n^*} \) which is independent on \(cipher_{rw-ds} \).

Define \(U_a = \{ cipher_{rw-ds} (\phi (a) )\} \}\) and let \(G'(v) \) be defined as follows:

$$\begin{aligned} Pr\{G' = v\} = \sum _{w \in U_v} Pr\{ G = w\}. \end{aligned}$$

The following chain of equalities and inequalities is based on these definitions and the triangle inequality for \(L_1\):

$$ \begin{aligned}{} & {} SD(cipher_{c \& r}, G') = \\{} & {} \qquad \frac{1}{2 }\sum _{u \in \Lambda ^n } | Pr\{cipher_{c \& r}=u\} - Pr\{ G'=u \}| =\\{} & {} \qquad \frac{1}{2 }\sum _{v \in \Lambda ^n }| \sum _{w \in U_v} Pr\{cipher_{rw-ds}=w\} - Pr\{ G=w \}| \le \\{} & {} \qquad \frac{1}{2 }\sum _{v \in \Lambda ^n } \sum _{w \in U_v} | Pr\{cipher_{rw-ds}=w\} - Pr\{ G=w \} \, | =\\{} & {} \qquad \frac{1}{2 } \sum _{w \in \{0,1\}^{n^*}} | Pr\{cipher_{rw-ds}=w\} - Pr\{ G=w \} \, | =\\{} & {} \qquad SD(cipher_{rw-ds}, G) \le \epsilon \,. \end{aligned}$$

So, \( SD(cipher_{c \& r}, G') \le \epsilon \).

Theorem is proven. \(\square \)

4 Applying \( cipher_{c \& r} \) to messages generated by a source with known statistics

In this part we apply \( cipher_{c \& r} \) to messages from a set \(\{0,1\}^n\) which obey a distribution p, whereas p is known to Alice, Bob and Eve. Let \(A \subset \{0,1\}^n\) be a set of messages with non-zero probabilities. In this part, it will be convenient to call \(A = \{a_1, \ldots , a_L\}\) alphabet and \(a_i\) letters, since we will consider letter-wise codes. Clearly,

$$\begin{aligned} \log L \le n \,. \end{aligned}$$
(4)

We first describe the preliminary stage in which we use the Shannon code [4] to compress the messages and then randomisation.

4.1 Lossless codes

4.1.1 Shannon code and its generalisations

Suppose, \(a_1, \ldots , a_L \in A\) are ordered in such a way that \(p(a_1) \ge p(a_2) \ge \cdots \ge p(a_L) > 0\). Define \(Q_1 = 0, Q_t = \sum _{i=1}^{t-1} p(a_i)\), \(t=2, \ldots , L\), and let \(\hat{Q_i}\), \(t=1, \ldots , L\), be a presentation of \(Q_i\) in binary system as an infinite \(\{0,1\}\) word and without the initial 0. (if there are two such presentations then the presentation with finite number of ones is used). That is, \(1/2 = 100000 \ldots , 1/3 = 010101 \ldots \). The codeword \({\hat{\lambda }}(a_i)\) for symbol \(a_i\) is chosen to be the first \(\lceil \log (1/p(a_i) \rceil \) binary digits in \(\hat{Q_i}\), \(i=1, \ldots , L\). It is clear that,

$$\begin{aligned} |{\hat{\lambda }}(a_i)| = \lceil \log (1/p(a_i) )\rceil \,. \end{aligned}$$
(5)

For example, let \(A = \{a_1, a_2, a_3\}\) and \(p(a_1) = 13/16, p(a_2) = 1/8, p(a_3) = 1/16\). Then, \({\hat{\lambda }}(a_1) = 0,\) \({\hat{\lambda }}(a_2) = 110, \) \({\hat{\lambda }}(a_3) = 1111\). Clearly, these codewords can be made shorter as follows: \(\lambda (a_1) = 0,\) \(\lambda (a_2) = 10\), \(\lambda (a_3) = 11.\) This procedure for removing extra digits can be described using binary trees. It is known that the Shannon code can be represented as a binary tree, the branches of which correspond to codewords. In this tree, the left child is marked with 0, and the right child is 1. If some node has one child, it is removed, and the corresponding digit from the corresponding codeword is also removed. The obtained code we denote as \(\lambda _{Sh}\) and derive from (5) the following:

$$\begin{aligned} | \lambda _{Sh}(a_i)| \le \lceil \log (1/p(a_i) ) \rceil \, \le \log (1/p(a_i) ) +1. \end{aligned}$$
(6)

Also, it is known that the set of codewords \(\lambda _{Sh}(a_1), \ldots , \lambda _{Sh}(a_L)\) is prefix-free. (Recall that, by definition, a set of words U is prefix-free if for any \(u,v \in U\) neither u is a prefix of v nor v is a prefix of u.)

Note that, for any a prefix-free code \(\lambda '\) and any sequence \(x_1 x_2 \ldots x_n\) from A, \( n\ge 1,\) the encoded sequence \(\lambda '(x_1) \lambda '(x_2) \ldots \lambda '(x_n) \) can be decoded to \(x_1 x_2 \ldots x_n\) without errors. Such a code \( \lambda ' \) is called lossless code. Hence, any prefix-free code is a lossless one.

Note the the “initial” code \({\hat{\lambda }}(a_i)\) has the same properties as a modified \(\lambda _{Sh}\), that is, it is the prefix-free and (6) is valid. (That is why we do not describe the transformation of \({\hat{\lambda }}\) to \(\lambda _{Sh}\) in detail and do not estimate its complexity.)

4.1.2 Trimmed codes

Let \(\lambda \) be a lossless code for elements from A. Consider the following probability distribution \(p(a_1) =1/2, p(a_2) = 1/4, \ldots , p(a_{L-1}) = p(a_L) = 2^{-(L-1)}\). From the description of the Shannon code we can see that \(|\lambda _{Sh}(a_L) )| = {L-1}\).

In applications, the complexity of the cipher will largely depend on the lengths of the codewords. Thus, it will be convenient to use codes for which the length of the code of any letter does not exceed \( \lceil \log L \rceil + 1 \) for any probability distribution (instead of \( L-1 \) as in the previous example). It is also worth noting that it will be shown later that one extra bit of the length of the codeword can add at most 1 extra bit of the length of the encryption key. We call such codes as trimmed and define one of them as follows: if \(\lambda \) is a code then

$$\begin{aligned} \lambda ^{tr}(a_i) = {\left\{ \begin{array}{ll} 0\, \lambda (a_i) &{} \quad \text {if } |\lambda (a_i) |\le \lceil \log L \rceil \\ 1\, bin_{\lceil \log L \rceil }(i) &{} \quad \text {if } |\lambda (a_i) | > \lceil \log L \rceil \,, \end{array}\right. } \end{aligned}$$
(7)

where \( bin_{\lceil \log L \rceil }(i)\) is a binary presentation of i whose length is \(\lceil \log L \rceil \). (For example, \(bin_3(3) = 011\)). We see that the maximal codeword length is not greater than \( \lceil \log L \rceil + 1 \). Also, note that for any prefix-free code the maximal codeword length is not less than \(\lceil \log L \rceil \).

Let us explain how to decode. First, the decoder reads the first binary letter. If it is 0, the decoder uses the codeword of the code \(\lambda \) in order to find the encoded letter. If the first letter is 1, the next \(\lceil \log L \rceil \) letters contain the binary decomposition of i, i.e. the letter is \(a_i\).

If the trimmed code is built based on the Shannon code, from (7) and (6) we obtain

$$\begin{aligned} | \lambda ^{tr}_{Sh}(a_i)| \le \lceil \log (1/p(a_i) ) \rceil +1 \, < \log (1/p(a_i) ) +2. \end{aligned}$$
(8)

4.2 Randomised prefix-free codes

Let \(\lambda \) be a prefix-free code for the alphabet A and

$$\begin{aligned} n^* = \max _{i=1, \ldots ,L} |\lambda (a_i)| \,. \end{aligned}$$

The randomised code \(\phi _\lambda \) maps letters from the alphabet A to the set \(\{0,1\}^{n^*}\) defined as follows.

$$\begin{aligned} \phi _\lambda (a_i) = \lambda (a_i) \, r^i_{|\lambda (a_i)|+1} r^i_{|\lambda (a_i)| + 2} \ldots r^i_{n^*} \,, \end{aligned}$$
(9)

where \(r^i_{|\lambda (a_i)|+1}, r^i_{|\lambda (a_i)| + 2}, \ldots ,r^i_{n^*}\) uniformly distributed and independent random bits (for all i). Let us define a probability distribution \(\pi _\lambda \) on \(\{0,1\}^{n^*}\) as follows:

$$\begin{aligned}{} & {} \pi _\lambda (y_1y_2 \ldots y_{n^*}) = p(a_i) 2^{- (n^* - |\lambda (a_i) | )} \, \nonumber \\{} & {} \qquad \text {if} \quad y_1 y_2 \ldots y_{|\lambda (a_i)|} = \lambda (a_i). \end{aligned}$$
(10)

If for some \(y=y_1 \ldots y_{n^*} \) any \(\lambda (a_i) \) is not a prefix of y, then \(\pi _\lambda (y) = 0\).

Claim 1

\(h_{min}(\pi _\lambda ) = n^* - \max _{i=1, \ldots ,L} (|\lambda (a_i) | - \log (1/p(a_i) )\). In particular,

$$\begin{aligned} h_{min}(\pi _{\lambda _{Sh}})> n^* -1,\,\, h_{min}(\pi _{\lambda ^{tr}_{Sh}}) > n^* -2 . \end{aligned}$$
(11)

Here the first equation follows from the definition of the min-entropy and (10), whereas (11) follows from (6) and (8).

4.3 The cipher for known statistics

Now we can apply the cipher from the part 4 (see message encryption) with \(\phi = \phi ^{tr}_{Sh}\) and \(\pi = \pi _{\lambda ^{tr}_{Sh}} \). From (11) we can see that \(\Delta = 2\) and applying Theorems 3 and 4 and the estimate we obtain

Claim 2

Let there be set of messages from \( \{0,1\}^n, n >0,\) obey a known probability distribution p and \(\epsilon >0\). Let the cipher \( cipher_{c \& r}\) be applied

  1. (i)

    with a secret key of the length \(4+ 2 \log (1/\epsilon ) +\) bits. Then this cipher is \( \epsilon \)-entropically secure.

  2. (ii)

    If the key length k equals \(k = 8+ 2 log (1/\epsilon ) \, \) bits, the cipher is \(\epsilon \)-indistinguishable.

Comment In this section, we described the Shannon code, for which the letters of the alphabet must be arranged in descending order of probabilities. Sometimes it can be a time consuming operation. In such a case, one can use the Gilbert-Moore code [9], which can be used for unordered probabilities, but its code length is one bit longer than the Shannon code, i.e. the code length of \(a_i\) can be \(\log (1/p(a_i )) +2\). For this code, we can also use a trimmed version. In both cases, the use of the Gilbert–Moore code may add one extra bit to the key length.

5 Messages generated by Markov chains with unknown statistics

As in the previous part, we perform encryption in three steps: compress, randomise, and apply \(cipher_{rw-ds}\). For compression, we apply the so-called universal codes, which are used for unknown statistics.

5.1 Universal coding

First, we consider the simplest case where the alphabet is \( \{0,1\}^n, n\ge 1\), and letters are generated by some i.i.d. source \(\mu \) and \(\mu (0)\), \(\mu (1)\) are unknown. The goal is to build a lossless code which “compresses” n-letter sequences in such a way that the average length (per letter) of the compressed sequence is close to the Shannon entropy \(h(\mu )\), which is the lower limit of the codeword length (lossless code is such that the encoded messages can be decoded without errors and \(h(\mu ) = - (\mu (0)\log \mu (0) + (1-\mu (0)) \log (1-\mu (0) ) \) ).

The first universal code was invented by Fitingof [8] and we use this code as a part of the suggested entropically secure cipher. In order to describe this code we consider any word \(v\in \{0,1\}^n\) and denote by \(\nu \) the number of ones in v and let \(S_\nu \) be the set of n-length words with \(\nu \) ones. Fitingof proposed to encode the word v by two subwords u (prefix) and w (suffix), where u is the binary notation of an integer \(\nu \) and w is the index of the word v in the subset \(S_\nu \). It is assumed that the words in \(S_\nu \) are ordered 0 to \((|S_\nu | -1)\) (say, lexicographically) and the lengths of u and w are equal to \(\lceil \log (n+1) \rceil \) and \(\lceil \log |S_n| \rceil \), respectively. For example, for \(n= 3\), \(v = 100\) we obtain \(\nu = 1, u = 01, w= 10\). Clearly, this code is prefix-free.

If we denote the Fitingof code by \(code_F\) we obtain from its description

$$\begin{aligned} |code_F(v) | = \lceil \log (n+1) \rceil + \lceil \log |S_\nu | \rceil \, . \end{aligned}$$
(12)

For this code the ability to compress messages is based on the simple observation that probabilities of all messages from \(S_\nu \) are equal for any distribution \(\mu \) and, hence, \(\,\, \mu (v )\le 1/|S_\nu |\) for \(\mu \) and any word \(v \in S_\nu \). From this inequality and (12) we obtain

$$\begin{aligned} |code_F(v) | \le \log (n+1) + 2 + \log (1/\mu (v) ) \,. \end{aligned}$$
(13)

(Let’s explain the name “universal code.” Clearly, the average code-length \(E_\mu (|code_F|)\) is not greater than \( \log (n+1) + 2 + n h(\mu )\) and, hence, the average length per letter \(E_\mu (|code_F|)/n\) is not grater than \( h(\mu ) + (\log (n+1) + 2)/n\)). We can see that \(E_\mu (|code_F|)/n\) \(\rightarrow \) \(h(\mu )\) if \(n \rightarrow \infty \). So, one code compresses sequences generated by any \(\mu \), that is, the code is universal).

The Fitingof code described generalizes to i.i.d. processes with any finite alphabet \(\Lambda \), as well as to Markov chains with memory or connectivity m, based on the same method as for binary i.i.d. [13]. Namely, the set of all n-letter words is divided into subsets of equiprobable words, and the code of any word is represented by a prefix and a suffix, where the prefix contains the number of the set with equiprobable words which contains the encoded one, and the prefix is the number in this set. It can be shown that the number of sets with equiprobable words is bounded above by \((|\Lambda |-1) |\Lambda |^{m}\) [8, 13], and similarly (13) we can deduce that

$$\begin{aligned} |code_F(v) | \le \log ( (|\Lambda |-1) |\Lambda |^{m} ) + 2 + \log (1/\mu (v) ) \,. \end{aligned}$$
(14)

It is important to note that there exists an algorithm to find the codewords which is based on method of fast calculation of numbers in \(S_\nu \), see [16]. The complexity of this algorithm is \(O(n \log ^3 n \log \log n)\).

5.2 Randomisation

As with known statistics, we randomise compressed messages to construct random maps of \(\phi \) and \(\phi ^{-1}\) for which \(\phi ^{-1}(\phi (u))\) \( = u\) for any message u (see "preliminary stage"). This method is similar from the part from 4.2.

Let \(n^*= \max _{w \in \Lambda ^n} \{ |code_F(w)| \}\,.\) The randomized code \(\phi _F\) maps elements from \(\Lambda ^n\) to the set \(\{0,1\}^{n^*}\) defined as follows:

$$\begin{aligned} \phi _F(w) = code_F(w) \, r^i_{|code_F(w)|+1} r^i_{|code_F(w)| + 2}... r^i_{n^*} \,, \end{aligned}$$
(15)

where \(r^i_{|code_F(w)|+1}, r^i_{|code_F(w)| + 2},...,r^i_{n^*}\) are uniformly distributed and independent random bits (for all i).

Let us define the probability distribution \(\pi _{F,\mu }\) on \(\{0,1\}^{n^*}\) as follows:

$$\begin{aligned} \pi _{F,\mu }(y_1y_2... y_ {n^*} )= \mu (v) 2^{- (n^* - |code_F(v) | )} \,\, \text {if} \,\, y_1 y_2... y_{|code_F(v)|} = code_F(v). \end{aligned}$$
(16)

If for some \(y=y_1 \dots y_{n^*}\) any \(v \in \Lambda ^n\), \(\,\,code_F(v) \) is not a prefix of y, then \(\pi _{F,\mu }(y) = 0\).

Let us estimate the min-entropy of the distribution \(\pi _{F,\mu }\). From this equation and the definition of the min-entropy we obtain the following:

$$\begin{aligned} h_{min}(\pi _{F,\mu }) = n^* - \max _{u \in \Lambda ^n} (|code_F(u) | - \log (1/\mu (u) ) \,. \end{aligned}$$
(17)

Now we consider the Fitingof code applied to n-letter sequences generated by a Markov chain \(\mu \) of memory m over some alphabet \(\Lambda \). The Fitigof code is prefix-free and, hence, from (14) and (17) we obtain the following

Claim 3

For any distribution \( \mu \)

$$\begin{aligned} h_{min}(\pi _{F, \mu }) > n^* - (|\Lambda |^m (|\Lambda | - 1) \log n \, + \,2)\,. \end{aligned}$$
(18)

In particular, for an i.i.d. source with binary alphabet

$$\begin{aligned} h_{min}(\pi _{F,\mu }) > n^* - (\log n +2)\,. \end{aligned}$$

5.3 Description of the cipher for Markov chains

Now we can apply the cipher \(cipher_{rw-ds}\) from the part 3 with \(\phi = \phi _{F}\) and \(\pi = \pi _{F,\mu } \). (Recall that this cipher is one for all distributions with equal min-entropy and, hence, it does not depend on unknown distribution \(\mu \).)

From (18) we can see that \(\Delta = |\Lambda |^m (|\Lambda | - 1) \log n \, + \, 2\) and applying theorem 3 and 4 and this estimate we obtain

Claim 4

Let there be set of messages from \( \Lambda ^n, n >0,\) generated by Markov chain of order \(m, m \ge 0,\) and \(\epsilon >0\). Let the cipher \( cipher_{c \& r}\) be applied.

  1. (i)

    If the key length k equals \( (|\Lambda |^m (|\Lambda | - 1) \log n \, + \,5 \, + 2 \log (1/\epsilon ) +\) bits, then the cipher is \(\epsilon \)- entropically secure.

  2. (ii)

    If the key length k equals \(k = (|\Lambda |^m (|\Lambda | - 1) \log n \, + \,9)\, + \,2 log (1/\epsilon ) \, \) bits, the cipher is \(\epsilon \)- indistinguishable.