Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

Digital steganography is a fairly new field of modern computer science concerned with camouflaging the presence of secret data in legal communications. In the general setting, a sender, often called Alice or the steganographer wishes to send a hidden message to a recipient via a public channel, which is completely monitored by an adversary called Warden or steganalyst. Taking a “typical” document Alice tries to embed a secret message in it such that a steganalyst cannot determine whether the secret message is present or not. In particular, Warden should have little chances to distinguish original documents, called coverdocuments, from altered ones called stegodocuments. This implies in general that the distributions of coverdocuments and stegodocuments have to be fairly close.

A crucial component when modeling steganography and steganalysis is the knowledge of the parties involved about coverdocuments. Considering different levels of knowledge, various models have been defined and studied. For example, if both the steganographer and the steganalyst have perfect knowledge about the distribution of coverdocuments and these documents satisfy certain conditions, secure steganography can be modeled and investigated by means of information and coding theory, whereas steganalysis can be done by applying statistical detection theory. But, though well-understood, such models are quite artificial and far away from reality (for more discussion, see [9]). The other extreme is to assume that the steganographer a priori has no knowledge whatsoever about typical documents and can only get information using a sampling oracle. Even if the steganalyst has full knowledge assuming the existence of secure cryptographic one-way functions, provably secure steganography is possible [7], but any secure steganographic system requires an exponential number of samples with respect to the message length [4]. Thus, steganography becomes highly inefficient.

To be closer to the real world, newer approaches to steganalysis and steganography assume some reasonable partial knowledge about the type of covertext channel. Then steganalysis can be formulated as a binary classification problem and examined using methods from machine learning. This line of research has currently received much attention (see e.g. [6, 10, 17]). However, learning approaches to steganography have not been studied systematically so far.

As in real applications of steganography we assume that Alice knows that the coverdocument distribution belongs to some class of distributions – she can choose the media where to embed into. Besides that, she can only use a sampling oracle to get information about the actual coverdocument distribution. Then the steganographic encoding can be stated as a two-stage problem (for a formal definition of steganography see Sect. 4):

  1. (1)

    Algorithmic learning of the concrete distribution of coverdocuments and

  2. (2)

    Generating a stegodocument that encodes a given piece of message.

Hence, the essential difficulties in constructing efficient algorithms arise because of two reasons. First, a standard PAC approach to model this situation typically fails because of a fundamental difference: only positive samples are available. Second, algorithms for random generation of combinatorial objects from a given (typically uniform) distribution, see e.g. [8], cannot be applied directly since the generated objects have to encode given messages.

Most recently Liśkiewicz et al. [12] have obtained several promising results in generating stegodocuments. They have considered three families of coverdocument channels described by monomials, by decision trees (DTs), and by DNF formulas, respectively, assuming uniform distribution of documents. The learning complexity of the corresponding concept classes in the general case ranges from low up to high (assuming \(RP \not = NP\)). For these families of channels efficient generic algorithms have been constructed that for a given description of the coverdocuments, suitably manipulate the documents to embed secret messages, even against a steganalyst with full knowledge. This solves Problem (2) above and allows secure steganography assuming the coverdocument distributions can be learned properly, i.e. such that the learning algorithm outputs a monomial, resp. a DT, or a DNF expression as its hypothesis, when learning from positive data only.

Notice the importance of the proper learning here. For example, it is well known that k-term DNF formulas can be learned efficiently from positive samples with respect to k-CNF formulas, i.e. such that the learning algorithm outputs a k-CNF formula for the concept represented by an unknown k-term DNF. However, such a k-CNF representation of coverdocuments is useless for stegodocuments generation, because one would have to find satisfying assignments for k-CNF formulas which cannot be done efficiently in general. Unlike monomials and k-CNF formulas, the problem whether DTs and DNF-formulas can be learned properly from positive samples in an efficient way, remains open even for simple probability distributions like the uniform one. This paper gives an affirmative answer to this question for k-term DNFs.

Learnability of \(\varvec{k}\) -term DNF: Known Results. For the notion of learnability, we loosely follow the PAC model. In the standard setting (i.e. with positive and negative samples) it is not feasible to learn k-term DNF formulas properly in a distribution-free sense for fixed \(k\ge 2\) unless \(RP=NP\). Learning k-term DNF concepts for \(k\ge 4\) remains infeasible even if allowing as hypothesis f(k)-term DNF, for \(f(k)\le (2k-3)\) [14]. For unrestricted DNF formulas, it is infeasible to learn with respect to DNF hypothesis, even if the number of terms in the hypotheses is arbitrary large [1]. Assuming that samples are drawn from specific distributions over the learning domain but still allowing positive and negative samples, the situation changes drastically. Flammini et al. [5] have shown that k-term DNF formulas are learnable (properly) in polynomial time using positive and negative samples drawn from q -bounded distributions (the ratio of the probabilities D(x) / D(y) for elements in the support does not exceed q for some number \(q\ge 1\)). This class is a natural generalization of the uniform distribution.

If the number of terms of the DNFs may grow, from [19] we know that n-term DNF formulas over the uniform distribution can be learned using a polynomial number of samples in quasi-polynomial time. However, the hypothesis space has to be extended to \((n\cdot t)\)-term DNF with t depending on the sample complexity.

Concerning steganographic applications one has to learn DNF formulas properly and from positive samples only. The next serious complication is to exclude false positives in order to achieve steganographic security. In the distribution free setting, this learning task can efficiently be mastered for 1-term DNF (monomials) [18]. But it becomes infeasible for k-term DNF, with \(k\ge 2\), and log-term as well as for unrestricted DNF formulas [13]. There is a positive result for monotone DNF (MDNF) formulas over the uniform distribution. It is possible to learn log-term MDNF formulas from positive samples only [15]. The class of k-term MDNFs can even be learned over q-bounded distributions from positive samples [11, 16]. Also, a method for positively learning 2-term DNF over q-bounded distributions is known [5]. Most recently De et al. [3] have shown that DNF formulas have efficient learning algorithms from uniformly distributed positive samples, but instead of a k-term DNF hypothesis the learner outputs a sampler. This model seems to be unsuitable for embedding secret messages efficiently, because it is unknown how coverdocuments can be modified to securely embed a given message without knowing an adequate k-term DNF hypothesis.

Our Contribution. The main result of this paper is an efficient learner without false positives for k-term DNF formulas from positive samples with hypothesis space identical to the concept class for arbitrary fixed k over q-bounded distributions. The major challenge already occurs for the uniform distribution: false positives cannot be tolerated at all. Our solution works in two phases. The learner switches from k-term DNF to k-CNF representation in phase 1 and then back in the second phase. In more details, in the first phase k-term DNF formulas are learned using k-CNF formulas with very high accuracy and without false positives using a first sequence of positive samples.

In phase 2, we construct a set of maximal monomials that should cover most of the k-CNF formula generated. The number of candidates for these monomials could be extremely large. Thus, we have to design a mechanism to select a suitable subset. This subset will still contain many more than k monomials. Finally, we apply tests with a second sequence of positive samples to select a subset of size at most k as final hypothesis.

As a negative result, we show that it is impossible to learn unrestricted DNF formulas without false positives. For q-bounded distributions learning n-term DNF formulas requires an exponential number of positive samples regardless of the hypothesis space. An overview of the current state of knowledge concerning DNF learning is given in Table 1.

Table 1. Positive and negative (unless \(RP=NP\)) results for learning DNF formulas from positive samples over several distributions in polynomial time.

2 Preliminaries

Let us start with some basic definitions. In the following, n will always denote the number of variables and \(\mathcal X= \{0,1\}^n\) the set of binary strings of length n. For a distribution D over \(\mathcal X\) let \(\texttt {sp}(D) \ := \ \{ x \in \mathcal X\mid D(x)>0\}\) denote the support of D. For \(q \ge 1\) such a distribution is called q -bounded if \( \max \{ D(x) \mid x \in \texttt {sp}(D) \} \mathrel {\ \le \ }q \cdot \min \{ D(x) \mid x \in \texttt {sp}(D) \}. \)

For a Boolean formula \(\varphi \) let \(\texttt {sat}(\varphi ) := \{x\in \mathcal X\mid \varphi (x)\}\) denote the set of assignments that satisfy \(\varphi \); \(\texttt {sat}(\varphi )\) will also be called the support of \(\varphi \). A k-CNF formula \(\psi \) is given by a conjunction of clauses each containing at most k literals. We may assume that \(\psi \) does not contain tautological clauses (having a variable and its negation simultaneously). A k-term DNF formula \(\varphi \) is a disjunction of at most k monomials. \(\varphi \) is called non-redundant if it does not contain monomials M such that removing M from \(\varphi \) does not change \(\texttt {sat}(\varphi )\), in particular there are no identical monomials (that means having the same set of literals) or trivial monomials with empty support (containing a variable and its negation). A monomial M will be called shorter than a monomial \(M'\) if it consists of less literals than \(M'\); we call M larger than \(M'\) if \(|\texttt {sat}(M)|>|\texttt {sat}(M')|\). In this paper we consider the family of concept classes \(\{ \texttt {sat}(\varphi ) \subseteq \mathcal X\mid \varphi \text { is a }k\text {-term DNF formula}\}\) and proper learning of the classes from positive examples, i.e. we require that a learner seeing only satisfying assignments outputs a k-term DNF formula.

The reader is assumed to be familiar with the standard concepts of PAC theory (see e.g. [18]). Below we present only the definition of learnability of a concept C from positive examples. This can be modeled by the condition that the underlying distribution D on \(\mathcal X\) fulfills \(\texttt {sp}(D)=C\). Allowing false positives makes the problem trivial because the hypothesis \(H = \mathcal X\) would make errors \(D(C \; \triangle \; H)\) with weight 0. We therefore define: \(\mathcal{A}\) learns \(\mathcal{C}\) from positive samples without false positives if for every pair (CD) of a concept \(C \in \mathcal{C}\) and distribution \(D \in \mathcal{D}\) that fulfills \(\texttt {sp}(D)=C\) its hypothesis satisfies: \(H \subseteq C\) and \(\Pr [D(C \setminus H)\ge \varepsilon ] \mathrel {\ \le \ }\delta \). A concept class \(\mathcal{C}\) with a set \(\mathcal{D}\) of q-bounded distributions can be learned efficiently if a learner exists with running time bounded by a polynomial in \((1/\varepsilon ,1/\delta ,n,q)\).

3 Learning \(\varvec{k}\)-term DNF from Positive Samples

Flammini et al. [5] have presented a method for learning a k-term DNF formula \(\varphi \) for q-bounded distributions. In a first phase candidate monomials are generated from positive samples in such a way that all monomials of \(\varphi \) having enough assignments actually occur. But there are generally more, and some of these monomials may have assignments that do not belong to \(\texttt {sat}(\varphi )\). Therefore, in the second phase, combinations of at most k candidate monomials are tested against a set of positive and negative samples. If such a combination fulfills a specific error bound then it becomes the output. It has been shown that with high probability this yields an approximate hypothesis.

In the following we will develop a generalization of this method that is capable of positively learning k-term DNF formulas. The learner gets only positive samples and is not allowed to generate false positives.

Computing Maximal Monomials from CNF-Formulas. It is known how to learn a k-term DNF formula \(\varphi \) without false positives by using as hypothesis space k-CNF formulas. In this case \(((2n)^{k+1} - \ln \delta ) / \varepsilon \) positive samples are needed [2, 14, 18]. The learner starts with the conjunction of all possible non-tautological clauses of length at most k, of which there are at most \((2n)^{k+1}\). Then clauses not satisfied by positive samples are deleted.

Our first innovation will construct candidate monomials for \(\varphi \) by learning a k-CNF representation \(\psi \) for \(\varphi \) and extracting monomials from \(\psi \) afterwards. We choose monomials M with \(\texttt {sat}(M) \subseteq \texttt {sat}(\psi )\) as large as possible. Generally, for \(k \ge 3\) it is NP-hard to find a single satisfying assignment for a k-CNF formula. But here we already know a number of satisfying assignments, namely the positive samples used to create \(\psi \). For this purpose, we define a criterion for potential candidate monomials generated from \(\psi \) and a sample \(x\in \texttt {sat}(\psi )\).

Definition 1

Let \(\psi \) be a Boolean formula and \(x \in \texttt {sat}(\psi )\). A monomial M is \((\psi ,x)\) -maximal if \( x \in \texttt {sat}(M) \subseteq \texttt {sat}(\psi ) \) and there is no submonomial of M with this property (a submonomial is obtained by removing some literals from M).

Algorithm 1 given below computes such maximal monomials. It starts with the monomial \(M=1\) and adds literals until \( \texttt {sat}(M) \subseteq \texttt {sat}(\psi ) \) is satisfied. We may assume that every clause of \(\psi \) does not contain any variable more than once.

Lemma 1

For a k-CNF formula \(\psi \) and \(x\in \texttt {sat}(\psi )\) Algorithm 1 computes a \((\psi ,x)\)-maximal monomial. Its runtime is bounded by a polynomial \(p_{k}(n)\). For every \((\psi ,x)\)-maximal monomial M there exists a sequence of literals selected in line 10 such that the algorithm outputs M.

figure a

The learner to be defined below needs several \((\psi ,x)\)-maximal monomials, but at most \(2^k-1\) many. To get them one could perform a depth-first search over those literals that are selected and then deleted from \(\psi \) until enough maximal monomials have been found. However, different choices may lead to the same monomial eventually. In order to be efficient we need a suitable mechanism to prune the search tree. Our strategy and its analysis are quite involved; therefore, the details will be presented in a full version of this paper.

Learning Candidate Monomials. Considering every maximal monomial for each positive sample used to learn the k-CNF formula \(\psi \), one might get a very large set of monomials. Thus, a new idea is needed to handle such a situation. To obtain a bounded number of candidates to continue with we try to prune the set of maximal monomials without losing too many satisfying assignments. To this aim every monomial of the unknown k-term DNF formula \(\varphi \) that has a large support should become a candidate monomial. On the other hand, monomials with a small support might be removed without losing much accuracy.

Let us start by considering the number of maximal monomials in case the k-CNF formula \(\psi \) is equivalent to the unknown k-term DNF formula \(\varphi \). In general \(\texttt {sat}(\psi )\) may cover only parts of the satisfying region of a monomial in a scattered way. Hence, there could exist many \((\psi ,x)\)-maximal monomials.

Definition 2

Let \(\varphi =M_1 \vee \dots \vee M_k\) be a non-redundant k-term DNF formula, \(x \in \texttt {sat}(\varphi )\), and \(I=\{i_1,\dots ,i_p\} \subseteq \{1,\dots ,k\}\) be a non-empty set of indices. A monomial \(M_{I,x}\) is called \((\varphi ,I,x)\)-maximal if it is \((\varphi ,x)\)-maximal and \( \texttt {sat}(M_{I,x})\ \subseteq \ \texttt {sat}(M_{i_1}\vee \dots \vee M_{i_p}) \) and after removing any \( M_{i_{j}}\) from the right side this inclusion fails.

Lemma 2

For fixed \(\varphi \), I, and x, a \((\varphi ,I,x)\)-maximal monomial \(M_{I,x}\) is unique. If \(y \in \texttt {sat}(M_{i_1}\vee \dots \vee M_{i_p})\) has a maximal monomial \(M_{I,y}\) then \(M_{I,y}=M_{I,x}\).

This implies that the number of different \((\varphi ,I,x)\)-maximal monomials over all \(x\in \texttt {sat}(\varphi )\) and nonempty \( I \subseteq \{1,\dots ,k\}\) is bounded by \(2^k-1\). Next we will derive a bound on the number of satisfying assignments for those maximal monomials that intersect potentially scattered regions of \(\varphi \).

Lemma 3

Let \(\varphi =M_1\vee \dots \vee M_k\) be a non-redundant k-term DNF formula with monomials \(M_{i}\) ordered by increasing length. For \(d\in \mathbb {N}\) let \(\varphi _{d}=M_1\vee \dots \vee M_u\) be composed of all \(M_{i}\) with \(|\texttt {sat}(M_{i})| \ge 2^{d}\). For a Boolean formula \(\chi _{d}\) with \(\texttt {sat}(\chi _{d}) \ \subseteq \ \texttt {sat}(M_{u+1}\vee M_{u+2}\vee \dots \vee M_k)\) define \(\psi _{d} := \varphi _{d} \vee \chi _{d}\), \(\mathcal{M}^{[d]} := \{{M}\mid M is a (\psi _{d},x) -max.~monom.~for some x \in (\texttt {sat}(\chi _{d}) \setminus \texttt {sat}(\varphi _{d}))\},\) and \(\xi _{d} := \bigvee _{M \in \mathcal{M}^{[d]}} M\). Then it holds \(\ |\texttt {sat}(\xi _{d})| \mathrel {\ \le \ }2^{d+k-1}\).

These notions provide the foundation for the learner specified in Algorithm 2 giving the following result.

Theorem 1

For constant k, Algorithm 2 learns k-term DNF formulas without false positives over q-bounded distributions in polynomial time with respect to \((1/\varepsilon , 1/\delta , n, q)\) by drawing no more positive samples than

\( \sigma (\varepsilon ,\delta ,n,k,q) \ := \ \varepsilon ^{-1} \; q\; k\; 2^{3k+1} \left( (2n)^{k+1}+\ln (2/\delta )\right) + 48\varepsilon ^{-2} \; \ln \left( 2^{k^2+2}/{\delta }\right) . \)

figure b

Correctness Proof. We first show a bound on how much monomials may overlap (their \(\texttt {sat}\)-regions have a nonempty intersection).

Lemma 4

Let \(\varphi =M_1\vee \dots \vee M_k\) be a non-redundant k-term DNF formula and \(\varphi _{i}\) equal \(\varphi \) without \(M_i\). Then \(|\texttt {sat}(M_{i}) \setminus \texttt {sat}(\varphi _{i})| \mathrel {\ \ge \ }|\texttt {sat}(M_{i})| \cdot 2^{-k+1}\).

Next, let us estimate how well a k-CNF formula \(\psi \) can reconstruct the original monomials of the unknown k-term DNF \(\varphi \).

Definition 3

Let \(g(\varphi ,q,k) :=q \; 2^{k}|\texttt {sat}(\varphi )|\). For \(\gamma >0\) call a monomial \(M_{i}\) of \(\varphi \) \(\gamma \) -large if \(|\texttt {sat}(M_{i})| \ge \gamma \; g(\varphi ,q,k)\).

Lemma 5

Let \(\varphi =M_1\vee \dots \vee M_k\) be a k-term DNF formula with monomials \(M_{i}\) and \(\psi =K_1\wedge \dots \wedge K_p\) be a k-CNF formula with clauses \(K_{j}\) and \(\texttt {sat}(\psi )\subseteq \texttt {sat}(\varphi )\).

Let D be a q-bounded distribution with \(\texttt {sp}(D)=\texttt {sat}(\varphi )\) and let \(\gamma >0\). If \(D(\texttt {sat}(\varphi )\setminus \texttt {sat}(\psi )) < \gamma \) then for every \(\gamma \)-large \(M_{i}\) it holds \(\texttt {sat}(M_i) \subseteq \texttt {sat}(\psi )\).

Thus, if a CNF-formula \(\psi \) approximates a k-term DNF-formula \(\varphi \) quite well then every monomial of \(\varphi \) with large support is completely covered by \(\psi \). Only monomials with small support may give rise to errors in the approximation.

Now we show that the set of candidate monomials \(\mathcal{M}\) constructed by Algorithm 2 contains all large monomials.

Lemma 6

Let \(\varphi \mathrel {\ = \ }M_1\vee \dots \vee M_k\) be a non-redundant k-term DNF formula. With probability at least \(1- \delta /2\), Algorithm 2 adds a monomial \(M'_i\), with \(\texttt {sat}(M'_i) \supseteq \texttt {sat}(M_i)\), to \(\mathcal{M}\) for every \((\varepsilon _{1}2^{2k})\)-large \(M_{i}\), where \(\varepsilon _{1} = \varepsilon \; q^{-1} \, k^{-1} \; 2^{-(3k+1)}\).

Proof sketch

Let \(M_{i}\) be an \((\varepsilon _{1}2^{2k})\)-large monomial. Assume that the algorithm has learned a k-CNF formula \(\psi \) with \(D(\texttt {sat}(\varphi ) \setminus \texttt {sat}(\psi )) \le \varepsilon _{1}\), which happens with probability at least \(1-\delta /2\). Then, using Lemmas 34, and 5 one can show that the sample sequence E contains at least one element \(e_j\in \texttt {sat}(M_{i})\), such that no \((\varphi ,e_j)\)-maximal monomial intersects with potential scattered regions of \(\varphi \). Hence the number of \((\psi ,e_j)\)-maximal monomials can be bounded by Lemma 2 and some \(M'_i\) with \(\texttt {sat}(M'_{i}) \supseteq \texttt {sat}(M_i)\) will be added to \(\mathcal{M}\). All maximal monomials that intersect with scattered regions have less assignments than \(M_{i}\) by Lemmas 3 and 5. Thus \(M'_{i}\) is among the \(2^{k}-1\) shortest monomials in \(\mathcal{M}\) by Lemma 2.    \(\square \)

From Lemma 6 one can conclude the correctness of Algorithm 2. The learning algorithm can be made applicable even if q is unknown (see [5]).

A Negative Result. Verbeurgt [19] has developed a method for learning \(\mathrm {poly}(n)\)-term DNF over the uniform distribution from a polynomial number of positive and negative samples with a quasi-polynomial running time. In contrast, we can show (proof omitted):

Theorem 2

For every q-bounded distribution D and every hypothesis space \(\mathcal H\), learning n-term DNF formulas without false positives requires an exponential number of positive samples drawn according to D for \(\varepsilon <1/q\).

4 Learning Documents for Steganography

We start this section with a short review of basic definitions similar to [7]. Let \(\mathcal X\) denote the set of cover- or stegodocuments. A channel \(\mathcal{C}\) is a mapping with domain \(\mathcal X^{*}\) that for every sequence h of documents, called a history, defines a probability distribution \(\mathcal{C}_{h}\) on \(\mathcal X\).

A sampling oracle for \(\mathcal{C}\) takes a history h as input and returns a random element according to \(\mathcal{C}_{h}\). In order to generate a typical sequence of coverdocuments \(c_{1}, c_{2}, \ldots \) of \(\mathcal{C}\) one starts with the empty history and asks the sampling oracle for a first element \(c_{1}\), then with history \(h_{1}= c_{1}\) a second element \(c_{2}\) is requested, and so on. \(\mathcal{C}\) is called supuniform if for every h, \(\mathcal{C}_{h}\) is the uniform distribution on \(\texttt {sp}(\mathcal{C}_{h})\).

A stegosystem for \(\mathcal X\) is a pair of polynomial-time bounded probabilistic algorithms \(\mathcal{S}=[\textit{SE},\textit{SD}]\) such that, for a security parameter \(\kappa \),

  1. (1)

    the encoder \(\textit{SE}\) having access to a sampling oracle for a channel \(\mathcal{C}\) gets as input a history h (elements that have already been generated by \(\mathcal{C}\)), a secret key \(K\in \{0,1\}^\kappa \), and a message \(\mu \in \{0,1\}^m\) and returns a sequence of stegodocuments \(s_{1},s_{2},\ldots \) that should look like typical elements of \(\mathcal{C}\) starting with history h (the length of this sequence may depend on \(\kappa \) and m).

  2. (2)

    The decoder \(\textit{SD}\) takes as input a secret key K and a sequence of documents S and returns a string \(\mu \in \{0,1\}^m\).

The unreliability of \(\mathcal{S}=[\textit{SE},\textit{SD}]\) with respect to a channel \(\mathcal{C}\) is given by

\( \mathrm {UnRel}_{\mathcal{S},\mathcal{C}} \ :=\ \max _{h,\mu \in \{0,1\}^m} \left\{ \Pr _{K\in \{0,1\}^\kappa } [\textit{SD}(K,\textit{SE}(h,K,\mu )) \ne \mu ]\right\} \!. \)

For security analysis we take as adversary a probabilistic machine W called a \((t,\zeta )\)-warden that can perform a chosen hiddentext attack:

  • W can access a sampling oracle for the channel \(\mathcal{C}\) that in the following will be called his reference oracle;

  • W selects a history h and a message \(\mu \) and queries a challenge oracle \(\textit{CH}\) which is either \(\textit{SE}(h,K,\mu )\) or \(\mathcal{C}(h,\mu )\), where \(\mathcal{C}(h,\mu )\) returns a sequence of random elements of \(\mathcal{C}\) with history h of the same length as \(\textit{SE}(h,\cdot ,\mu )\);

  • W runs in time t and can make up to \(\zeta \) queries;

  • with the help of the reference oracle \(\mathcal{C}\) and the challenge oracle \(\textit{CH}\) the warden \(W^{\mathcal{C},\textit{CH}}\) tries to distinguish stego- from coverdocuments.

His advantage over random guessing is defined as the difference

$$\begin{aligned} \mathrm {Adv}_{\mathcal{S},\mathcal{C}}(W) \ := \ \left| {\mathop {\Pr }\limits _{K\in \{0,1\}^\kappa }}\left[ W^{\mathcal{C},\textit{SE}(\cdot ,K,\cdot )}=1\right] - \Pr \left[ W^{\mathcal{C},\mathcal{C}(\cdot ,\cdot )}=1\right] \right| \!. \end{aligned}$$

For a given family \(\mathcal{F}\) of channels \(\mathcal{C}\) the strongest notion of security for a stegosystem \(\mathcal{S}\) is defined as \( \mathrm {InSec}_{\mathcal{S},\mathcal{F}} (t,\zeta ) \ := \ \sup _{\mathcal{C}\in \mathcal{F}} \sup _W \mathrm {Adv}_{\mathcal{S},\mathcal{C}}(W),\) where W runs over all \((t,\zeta )\)-wardens. Thus, if \(\mathrm {InSec}_{\mathcal{S},\mathcal{F}}\) is small then for every channel \(\mathcal{C}\) of \(\mathcal{F}\) no W – even those having perfect knowledge about \(\mathcal{C}\) – can detect the usage of \(\mathcal{S}\) with significant advantage.

Now let us consider channels \(\mathcal{C}\) over the document space \(\mathcal X=\{0,1\}^n\) such that for every history h the support of \(\mathcal{C}_{h}\) can be described by a k-term DNF formula. These will be called k -term DNF channels. In [12] a polynomial-time bounded embedding algorithm has been constructed that for a given string \(\omega \in \{0,1\}^b\), an arbitrary key K, and a k-term DNF formula \(\varphi \) with sufficiently large support (depending on b) generates a document \(s\in \texttt {sat}(\varphi )\) that encodes \(\omega \). The distribution of these stegodocuments is uniform over \(\texttt {sat}(\varphi )\) where the probability is taken over random choices of K and the internal randomization of the algorithm. Assuming that the underlying k-term DNF channel \(\mathcal{C}\) is known exactly – this means for every h a k-term DNF formula for \(\texttt {sp}(\mathcal{C}_{h})\) – one can use this embedding procedure to construct an efficient stegosystem \(\hat{\mathcal{S}}\) for the family \(\mathcal{F}\) of all supuniform k-term DNF channels \(\mathcal{C}\). It has both small unreliability and small insecurity.

Definition 4

For \(\eta \ge 1\) and an integer \(k\ge 1\) let \(\mathcal{F}_{k,\eta }\) be the set of all supuniform k-term DNF channels \(\mathcal{C}\) such that for every history h it holds \(|\texttt {sp}(\mathcal{C}_h)| \ge 2^{\eta }\).

Let b denote the number of bits encoded per document and \(m=\ell \,\cdot \,b\) the length of the secret message \(\mu \) to be embedded. Combining the embedding technique of [12] with the results of the previous section we can show:

Theorem 3

For the channel family \(\mathcal{F}_{k,\eta }\) and given reliability parameters \(\varepsilon ,\delta >0\) there exists a stegosystem \(\mathcal{S}_k\) that for every \(\mathcal{C}\in \mathcal{F}_{k,\eta }\) achieves the insecurity bound of \(\hat{\mathcal{S}}\) and the unreliability bound \(\ \mathrm {UnRel}_{\mathcal{S}_k,\mathcal{C}} \mathrel {\ \le \ }2\ell (\varepsilon + \delta ) + 2\mathrm {e}m \left( k\cdot 2^{-\eta }/(1-\varepsilon )\right) ^{(\log \mathrm {e})/b}\).

Trying to extend this result to q-bounded channels one faces the problem that the corresponding distributions are not efficiently learnable – their support can be learned, but not the individual probabilities which cannot even be specified in polynomial length in general. Thus, the stegoencoder cannot get complete knowledge about the channel and the same should hold for the steganalyst – otherwise he can easily detect any deviation from the channel distribution implying that secure and efficient steganography would be impossible. The analysis for this situation is given in a full version of this paper.

5 Conclusions

We have provided a polynomial-time algorithm for properly learning k-term DNF formulas from positive samples only. Further, we have shown that unrestricted DNF formulas cannot be learned from positive samples without false positives due to information theoretical reasons. Although the analogous learnability problem for \(\log \)-term DNF formulas remains still open, the negative result for unrestricted DNF formulas shows that this new method for learning k-term DNF formulas is quite powerful.

Combining our learning algorithm with the embedding procedure of [12] we are able to construct an efficient and provably secure stegosystem for a family of channels that can be defined by k-term DNF formulas. This illustrates that methods of algorithmic learning are important for steganography. Here, however, both learning and embedding components are crucial. As an example, the embedding problem for supports represented by efficiently learnable k-CNF formulas seems to be infeasible.