1 Introduction

Partial words are strings where certain positions are not specified. These positions are often called holes or don’t cares and printed by a diamond symbol \(\diamond \). Apart from theoretical reasons, the basic motivation for studying this mechanism comes from the study of biological operations in connection with DNA strands. In particular, DNA sequencing is a biological process to determine the base sequence of a given DNA strand. To this end, the two DNA strands are separated and cut into small pieces. Afterwards the small sequences are copied, multiplied, and then detected. Subsequently, the complete strand has to be derived out of the small pieces. This assembling can be done by aligning the fragments with the help of gaps (holes) which leads to the definition of partial words. The first time the idea of words with don’t cares has been investigated goes back to [7], where they were considered in connection with string matching. The notation partial word has firstly been defined in [2].

Partial words were mainly investigated in connection with combinatorics on words. A survey can be found in [4]. An interesting motivation in theory for this model is that ordinary languages can be compressed by the usage of holes. Consider for example the language L over the ternary alphabet \(\varSigma =\{a,b,c\}\), \(L=\{aaa,aba,aca\}\). It can be compressed by using a hole into \(L'=\{a\diamond a\}\). Simply by replacing the diamond by a, b, or c the original language L can be achieved.

In 2012, partial words were studied in connection with families of formal languages [6]. In particular, a regular language is represented by the image of a partial language under a substitution that only replaces the hole symbols. In connection with \(\text {DFA}\)s it turned out that the usage of holes can be somehow seen as a limited nondeterminism, since it allows to define \(\text {DFA}\)s with outgoing edges that are labeled with ordinary symbols and additionally with a diamond. If some of the ordinary symbols may be substituted for the hole symbol as well, the corresponding state allows a nondeterministic choice with respect to the target language. While in the original definition of partial words a hole represents a placeholder for all letters of the underlying alphabet, in investigations in connection with language families the substitution of the hole symbol can be an arbitrary subset of the alphabet (see for example [1, 6, 10]).

The applications of defining language families by partial words via partial word finite automata have also been investigated from a complexity point of view. Concerning the descriptional complexity, in [1] it has been shown that the state complexity for a \(\text {DFA}\) that simulates a partial word \(\text {DFA}\) is exponential in general. Moreover also the state complexity of the simulation of an \(\text {NFA}\) by a partial word \(\text {DFA}\) may become exponential. Concerning the computational complexity, different problems as, for example, minimization have been studied for partial word automata [5, 10].

The main aim of this paper is to extend the investigations on the state complexity of partial word automata. In connection with lower bounds on the number of states necessary for an automaton to accept a given language, the problem arises to prove the minimality of a given automaton. In Sect. 3 we discuss this problem by referring to known results from the literature and provide methods to prove the minimality of partial word \(\text {DFA}\)s by utilizing minimal \(\text {NFA}\)s with certain properties. Section 4 considers the operational state complexity for Boolean operations. It turns out that upper and lower bounds are exponential. In the last section we consider the impact of the number of productive \(\diamond \)-transitions in a partial word finite automaton, where a transition is called productive, if it does not lead to the rejecting sink state. It comes out that even the reduction of one of these transitions may lead to an exponential state explosion, which leads to a state complexity hierarchy dependent on the number of \(\diamond \)-transitions.

2 Preliminaries

We denote the non-negative integers \(\{0,1,2,\dots \}\) by \(\mathbb {N}\). Let \(\varSigma ^*\) denote the set of all words over the finite alphabet \(\varSigma \). A subset \(L\subseteq \varSigma ^*\) is said to be a formal language over \(\varSigma \). We write \(\overline{L}\) for the complement of L with respect to \(\varSigma \), that is for \(\varSigma ^*\setminus L\). The empty word is denoted by \(\lambda \) and the reversal of a word w by \(w^R\). For the length of w we write |w|. We use \(\subseteq \) for inclusions and \(\subset \) for strict inclusions.

Setting \(\varSigma _{\diamond } =\varSigma \cup \{\diamond \}\), where \(\diamond \notin \varSigma \) represents undefined positions or holes, a partial word over \(\varSigma \) is a sequence of symbols from \(\varSigma _{\diamond }\). Denoting the set of all partial words over \(\varSigma \) by \(\varSigma _{\diamond }^*\), a partial language over \(\varSigma \) is a subset of \(\varSigma _{\diamond }^*\). Partial languages can be transformed to (ordinary) languages by using \(\diamond \)-substitutions over \(\varSigma \). A \(\diamond \)-substitution \(\sigma :\varSigma _{\diamond }^*\rightarrow 2^{\varSigma ^*}\) satisfies \(\sigma (a)= \{a\}\), for all \(a\in \varSigma \), \(\sigma (\diamond )\subseteq \varSigma \), and \(\sigma (uv) = \sigma (u)\sigma (v)\), for \(u, v \in \varSigma _{\diamond }^*\). As a result, \(\sigma \) is fully defined by \(\sigma (\diamond )\), for example, if \(\sigma (\diamond )=\{a,b\}\) and \(L = \{\diamond b,\diamond c\}\) then \(\sigma (L) = \{ab, bb, ac, bc\}\). So, applying \(\sigma \) to a partial language \(L\subseteq \varSigma _{\diamond }^*\) results in a (ordinary) language \(\sigma (L) \subseteq \varSigma ^*\).

A nondeterministic finite automaton (\(\text {NFA}\)) is a system \(M=\langle Q,\varSigma , \delta , q_0,F\rangle \), where Q is the finite set of internal states, \(\varSigma \) is the finite set of input symbols, \(q_0\in Q\) is the initial state, \(F\subseteq Q\) is the set of accepting states, and \(\delta :Q\times \varSigma \rightarrow 2^{Q}\) is the transition function. In the forthcoming, we sometimes refer to \(\delta \) as a subset of \(Q\times \varSigma \times Q\). A finite automaton M is deterministic (\(\text {DFA}\)) if and only if \(|\delta (q,a)|=1\), for all \(q\in Q\) and \(a\in \varSigma \). In this case, we simply write \(\delta (q,a)=q'\) for \(\delta (q,a)=\{q'\}\) assuming that the transition function is a total mapping \(\delta :Q\times \varSigma \rightarrow Q\). Note that here any \(\text {DFA}\) is complete, that is, the transition function is total, whereas it may be a partial function for \(\text {NFA}\)s in the sense that the transition function of nondeterministic machines may map to the empty set. A finite automaton is said to be minimal if there is no finite automaton of the same type with fewer states, accepting the same language. Note that a rejecting sink state is counted for \(\text {DFA}\)s, since they are always complete, whereas it is not counted for \(\text {NFA}\)s, since their transition function may map to the empty set.

Generally speaking, a language L can be represented by a partial language \(L'\) together with a \(\diamond \)-substitution \(\sigma \) such that \(\sigma (L') = L\). In particular, for regular languages, from the descriptional complexity point of view it is an interesting question to what extent there are regular languages \(L'\) such that the minimal \(\text {DFA}\) accepting \(L'\) has less states than the minimal \(\text {DFA}\) accepting L? In order to distinguish between finite automata accepting (ordinary) languages from those accepting partial languages, we refer to the latter as partial word deterministic finite automata (\(\diamond \text {-DFA}\)). Thus, \(\diamond \text {-DFA}\)s treat the hole symbol \(\diamond \) as an ordinary input letter.

The number of states of the (complete) minimal \(\text {DFA}\) accepting a regular language L is denoted by \(\min _{\textit{DFA}}(L)\). Similarly, \(\min _{\textit{NFA}}(L)\) denotes the minimal number of states necessary for some \(\text {NFA}\) to accept L. For partial languages, we write \(\min _{\diamond \textit{-DFA}}(L)\) to denote the minimal number of states of a \(\diamond \text {-DFA}\) accepting a language \(L'\) such that there exists a \(\diamond \)-substitution \(\sigma \) with \(\sigma (L')=L\).

3 Basic Constructions

In connection with lower bounds on the number of states necessary for an automaton to accept a given language, the problem arises to prove the minimality of a given automaton. While a couple of techniques exist to prove the minimality of \(\text {DFA}\)s, only a few techniques exist for \(\text {NFA}\)s. The situation is much worse for \(\diamond \text {-DFA}\)s. Clearly, a \(\diamond \text {-DFA}\) can be seen as a \(\text {DFA}\) over the alphabet \(\varSigma _{\diamond }\). But, in general, the minimization of a \(\diamond \text {-DFA}\) M changes the language that it represents, that is, \(\sigma (L(M))\). It has been shown in [10] that the problem to find a minimal \(\diamond \text {-DFA}\) \(M'\) (together with a \(\diamond \)-substitution) for a given regular language is PSPACE-complete. The problem has been studied in more detail in [5], where algorithms are given for the construction of minimal partial languages, associated with some \(\diamond \)-substitution, as well as approximation algorithms for the construction of minimal \(\diamond \text {-DFA}\)s. However, for particular languages that witness certain lower bounds, their minimality has to be proved almost from scratch. Here we continue with some observations that can nevertheless be applied in lower bound proofs.

First, we briefly recall the so-called (extended) fooling set technique (see, for example, [3, 8, 12]) that is widely used for proving lower bounds on the number of states necessary for an \(\text {NFA}\) to accept a given language.

Theorem 1

Let \(L\subseteq \varSigma ^*\) be a regular language and suppose there exists a set of pairs \(P=\{\,(x_i,y_i) \mid 1 \le i \le n\,\}\) such that (1) \(x_iy_i\in L\), for \(1 \le i \le n\), and (2) \(i\ne j\) implies \(x_i y_j\not \in L\) or \(x_jy_i\not \in L\), for \(1\le i,j\le n\). Then any nondeterministic finite automaton accepting L has at least n states. Here P is called an (extended) fooling set for L.

Let M be a \(\diamond \text {-DFA}=\langle Q,\varSigma _{\diamond }, \delta , q_0,F\rangle \) and \(\sigma \) be an associated \(\diamond \)-substitution. Then a minimal \(\text {DFA}\) \(M'\) that accepts the language \(\sigma (L(M))\) can be constructed as follows. First, modify M to the \(\text {NFA}\) \(\hat{M}=\langle Q,\varSigma , \hat{\delta }, q_0,F\rangle \) by replacing any transition \(\delta (p,\diamond )=q\) by the transitions \(\{\,\hat{\delta }(p,a)=q \mid a\in \sigma (\diamond )\,\}\) and keeping all other transitions from \(\delta \). Then determinize \(\hat{M}\) and minimize the outcome. We call \(M'\) constructed in this way the canonical \(\text {DFA}\) for M and \(\sigma \). This construction is presented as Algorithm 1 in [1].

The intermediate \(\text {NFA}\) in the construction exhibits the limited nondeterminism provided by \(\diamond \text {-DFA}\)s. In fact, for each state of the \(\text {NFA}\), there are at most two outgoing transitions for each input symbol. This is a valuable hint for the seek for suitable witness automata. For example, it is known that \(2^n-1\) is a tight bound on the number of states for a \(\text {DFA}\) that accepts the language \(\sigma (L(M))\) of an n-state \(\diamond \text {-DFA}\) M with associated \(\diamond \)-substitution [1]. In order to find further witnesses for the lower bound it is sufficient to look for complete \(\text {NFA}\)s having (i) the required form of limited nondeterminism for just one input symbol and (ii) causing the maximal state blow-up of \(2^n-1\) when determinized. An example is depicted in Fig. 1.

Fig. 1.
figure 1

A complete n-state \(\text {NFA}\) whose minimal equivalent \(\text {DFA}\) has \(2^n-1\) states.

In such an \(\text {NFA}\) M the nondeterminism can be removed by replacing one of the two nondeterministic outgoing transitions by a transition on \(\diamond \), respectively, and setting \(\sigma (\diamond )=\{x\}\), where x is the sole input symbol for which the nondeterminism occurs. Since M is complete, for all states of the resulting automaton on which a \(\diamond \)-transition is defined, transitions on all other input symbols are defined as well. So, in order to make the resulting automaton complete, it is sufficient to add a \(\diamond \)-transition to the states for which no \(\diamond \)-transition is defined so far. This can safely be done by copying the transition on x. The transition on x must exist, since M is complete. Clearly, the resulting automaton \(M'\) is a \(\diamond \text {-DFA}\) with \(\sigma (L(M'))=L(M)\) (see Fig. 2 for a possible \(\diamond \text {-DFA}\) obtained from the \(\text {NFA}\) of Fig. 1). Since \(\min _{\textit{NFA}}(L)\le \min \diamond \text {-DFA}(L)\) [6] and \(M'\) has the same number of states as M has, the \(\diamond \text {-DFA}\) \(M'\) is minimal, that is, even for any other \(\diamond \)-substitution no smaller equivalent \(\diamond \text {-DFA}\) exists.

Fig. 2.
figure 2

A minimal \(\diamond \text {-DFA}\) obtained from the \(\text {NFA}\) depicted in Fig. 1, \(\sigma (\diamond )=\{b\}\).

The example above dealt with the maximal state blow-up for “determinization”. However, the method to prove the minimality of \(\diamond \text {-DFA}\)s by utilizing minimal \(\text {NFA}\)s with certain properties can be extended.

Lemma 1

Let \(M=\langle Q,\varSigma , \delta , q_0,F\rangle \) be a possibly incomplete \(\text {DFA}\), \(S\subseteq \varSigma \) be a fixed subset of input symbols, \(P \subseteq Q\), and \(\alpha :Q\rightarrow Q\) be a total mapping. Moreover, let \(\hat{M}=\langle Q,\varSigma , \hat{\delta }, q_0,F\rangle \) be an \(\text {NFA}\) obtained from M by adding the transitions \(\{\hat{\delta }(p,a)=\alpha (p) \mid p\in P, a\in S\,\}\) to \(\delta \). Then \(\min _{\diamond \textit{-DFA}}(L(\hat{M})) \le |Q|\) if M is complete and \(P=Q\), and \(\min _{\diamond \textit{-DFA}}(L(\hat{M})) \le |Q|+1\) otherwise.

Proof

We set \(\sigma (\diamond )= S\) and construct a \(\diamond \text {-DFA}\) \(M'=\langle Q,\varSigma _{\diamond }, \delta ', q_0,F\rangle \) from the given \(\text {NFA}\) \(\hat{M}\). To this end, for any state \(p\in P\), a set of transitions \(\{\hat{\delta }(p,a)=\alpha (p) \mid a\in S\,\}\) is replaced by the transition \(\hat{\delta }(p,\diamond )=\alpha (p)\). By the construction of \(\hat{M}\) from the \(\text {DFA}\) M it follows that \(M'\) is deterministic. If M is complete and \(P=Q\), that is, for all states there is an outgoing \(\diamond \)-transition in \(M'\), then \(M'\) is complete. Otherwise, it is completed by adding missing transitions to a new rejecting sink state. This gives the transition function \(\delta '\). Since the input alphabet of \(M'\) is \(\varSigma _{\diamond }\), it is a \(\diamond \text {-DFA}\). Moreover, the canonical \(\text {DFA}\) for \(M'\) and \(\sigma \) is equivalent to \(\hat{M}\). Since apart from a possible new sink state the state set is the same for \(M'\) and \(\hat{M}\), we conclude \(\min _{\diamond \textit{-DFA}}(L(\hat{M})) \le |Q|\) if M is complete and \(P=Q\). Otherwise, we have \(\min _{\diamond \textit{-DFA}}(L(\hat{M})) \le |Q|+1\).    \(\square \)

Let \(L=\{aa,aaa,aaaa,aca,aaca,baca,baa,baaa\}\subset \{a,b,c\}^*\) be a finite language. It has been shown in [5] that any \(\diamond \text {-DFA}\) accepting L has at least seven states if \(\sigma (\diamond )=\{a,b\}\), and at least eight states if \(\sigma (\diamond )=\{a,c\}\). In order to show that any minimal \(\text {NFA}\) accepting L has five states, we apply Theorem 1 by providing the set \(P=\{(\lambda ,a^4),(b,a^3),(ba,a^2),(bac,a),(baca,\lambda )\}\) whose fooling set property for L is easily verified. A minimal \(\text {NFA}\) M accepting L is shown in Fig. 3.

Fig. 3.
figure 3

A minimal \(\text {NFA}\) accepting a finite language.

This \(\text {NFA}\) M cannot serve as witness for the constructions from above because there are three transitions on input symbol a defined for state 0. Moreover, let \(\alpha \) be the mapping of Lemma 1. Then, it can map state 0 to state 1 or to state 2 or to state 3 to remove the nondeterminism. However, in any case the nondeterminism is not removed entirely, when the transition \(\delta (0,a)= \alpha (0)\) is deleted. It is not hard to show that any minimal \(\text {NFA}\) accepting L must have three a-transitions from the initial state. So, changing to a possibly different but equivalent minimal \(\text {NFA}\) does not help. Nevertheless, we can utilize M to show the minimality of a \(\diamond \text {-DFA}\) accepting the finite language L as follows.

By way of contradiction, assume that there is a 6-state \(\diamond \text {-DFA}\) \(M'\) and a \(\diamond \)-substitution \(\sigma \) such that \(\sigma (L(M'))=L\). Since \(M'\) is complete and, for example, any input beginning with symbol c has to be rejected, \(M'\) has a rejecting sink state. We remove this sink state and all transitions to it, and obtain an equivalent incomplete 5-state \(\diamond \text {-DFA}\) \(M''\). Next, we construct an \(\text {NFA}\) \(\hat{M}\) from \(M''\) as in the construction of the canonical \(\text {DFA}\). That is, \(\hat{M}\) is obtained from \(M''\) by replacing any transition \(\delta (p,\diamond )=q\) from \(M''\) by the transitions \(\{\,\hat{\delta }(p,a)=q \mid a\in \sigma (\diamond )\,\}\) and keeping all other transitions from \(\delta \). So, \(\hat{M}\) has five states and is minimal. In particular, it has at most two transitions on input symbol a from the initial state. This is a contradiction, since any minimal \(\text {NFA}\) accepting L must have three a-transitions from the initial state. We conclude that, for any \(\diamond \)-substitution, a minimal \(\diamond \text {-DFA}\) accepting L has at least seven states.

In order to construct such a minimal \(\diamond \text {-DFA}\) with \(\sigma (\diamond )=\{a\}\), we can resolve the nondeterminism by replacing two a-transitions from the initial state by a single a-transition to a new state 2, 3 which, in turn, has the outgoing transitions of the states 2 and 3. The newly introduced nondeterminism for this state can be removed by a \(\diamond \)-transition as depicted in Fig. 4.

Fig. 4.
figure 4

A minimal \(\diamond \text {-DFA}\) accepting a finite language, where \(\sigma (\diamond )=\{a\}\). The rejecting sink state is not depicted.

4 Operational State Complexity

Let \(\circ \) be a fixed operation on languages that preserves regularity. Then the \(\circ \)-language operation problem for \(\diamond \text {-DFA}\)s is defined as follows:

  • Given an n-state \(\diamond \text {-DFA}\) \(M_1\) with \(\diamond \)-substitution \(\sigma _1\) and an m-state \(\diamond \text {-DFA}\) \(M_2\) with \(\diamond \)-substitution \(\sigma _2\).

  • How many states are sufficient and necessary in the worst case (in terms of n and m) for a \(\diamond \text {-DFA}\) \(M_3\) with some \(\diamond \)-substitution \(\sigma _3\) such that

    $$ \sigma _3(L(M_3))= \sigma _1(L(M_1)) \circ \sigma _2(L(M_2))? $$

Obviously, this problem generalizes to unary language operations like, for example, complementation or reversal.

We first consider the operation of complementation and show an upper bound and a lower bound that is tight up to a constant factor. The result reveals that complementation is an expensive operation from the state complexity point of view.

Proposition 1

Let \(n\ge 1\) be an integer and \(M_1\) be an n-state \(\diamond \text {-DFA}\) with \(\diamond \)-substitution \(\sigma _1\). Then \(2^{n}-1\) states are sufficient for a \(\diamond \text {-DFA}\) \(M_2\) with some \(\diamond \)-substitution \(\sigma _2\) such that \(\sigma _2(L(M_2))\) is the complement of \(\sigma _1(L(M_1))\). Therefore, we have \(\min _{\diamond \textit{-DFA}}(\overline{L}) \le 2^{\min _{\diamond \textit{-DFA}}(L)}-1\), for all regular languages L.

Theorem 2

Let \(n > 2\) be an integer. There exists a minimal n-state \(\diamond \text {-DFA}\) \(M_1\) with \(\diamond \)-substitution \(\sigma _1\) such that any \(\diamond \text {-DFA}\) \(M_2\) with any \(\diamond \)-substitution \(\sigma _2\), where \(\sigma _2(L(M_2))\) is the complement of \(\sigma _1(L(M_1))\), has at least \(2^{n-3}\) states. Therefore, we have \(\min _{\diamond \textit{-DFA}}(\overline{L}) \le 2^{\min _{\diamond \textit{-DFA}}(L)-3}\), for infinitely many regular languages L.

Proof

We are going to utilize Lemma 1 to construct minimal witness automata. To this end, consider the incomplete \(\text {DFA}\) M depicted in Fig. 5. We set \(S=\{a\}\), \(P=\{0\}\), and \(\alpha \) to be the identity on the state set. After adding the required transitions to M, we obtain the \(\text {NFA}\) \(\hat{M}\) depicted in Fig. 6.

Fig. 5.
figure 5

An incomplete \(\text {DFA}\).

Fig. 6.
figure 6

The \(\text {NFA}\) obtained from the \(\text {DFA}\) of Fig. 5.

So, for \(k \ge 0\), we consider the witness languages \(L_k = \{a,b\}^* a \{a,b\}^k b \{a,b\}^*\). Now, by Lemma 1, we have \(\min _{\diamond \textit{-DFA}}(L_k) \le k+4\). On the other hand, it is not hard to see that the \(\text {NFA}\) of Fig. 6 is minimal. Therefore, by \(\min _{\textit{NFA}}(L_k)\le \min _{\diamond \textit{-DFA}}(L_k)\) we derive \(k+3\le \min _{\diamond \textit{-DFA}}(L_k) \le k+4\). Since a minimal \(\diamond \text {-DFA}\) \(M'\) with \(\diamond \)-substitution \(\sigma '\) such that \(\sigma '(L(M')= L_k\) is complete and, thus, has a rejecting sink state which the \(\text {NFA}\) of Fig. 6 does not have, we conclude \(\min _{\diamond \textit{-DFA}}(L_k)=k+4\).

Essentially, in order to accept the complement of \(L_k\) an \(\text {NFA}\) has to verify that the input has no substring \(a\{a,b\}^k b\). Therefore, after reading a symbol a the \(\text {NFA}\) must be able to remember the next k input symbols. Altogether this needs \(2^{k+1}\) states. In fact, it has been shown in [11] that any \(\text {NFA}\) that accepts the complement of \(L_k\) needs at least \(2^{k+1}\) states. Again, by \(\min _{\textit{NFA}}(\overline{L_k})\le \min _{\diamond \textit{-DFA}}(\overline{L_k})\), we derive that any \(\diamond \text {-DFA}\) \(M_2\) with any \(\diamond \)-substitution \(\sigma _2\) where \(\sigma _2(L(M_2))= \overline{L_k}\) has at least \(2^{k+1}\) states. Setting \(n=k+4\) shows the theorem.    \(\square \)

We continue with Boolean operations. In general, neither the union nor the intersection of partial languages gives a partial language whose substitution is the union or intersection of the substitutions of the given partial languages. So a simple cross-product construction does not help. The idea for the union is to take a \(\diamond \text {-DFA}\) for one of the given partial languages and the canonical \(\text {DFA}\) for the other one, and build their cross-product automaton to obtain a \(\diamond \text {-DFA}\) for the upper bound of the state costs. However, for the intersection, the idea does not apply. The reason is that a \(\diamond \) in the input that can be substituted by at least two different symbols \(a_1\) and \(a_2\), must be treated by the canonical \(\text {DFA}\) as if the input were \(a_1\) or \(a_2\) and both symbols lead to accepting computations.

So, currently the best general upper bound for the intersection is the trivial one obtained by building the cross-product automaton of two canonical \(\text {DFA}\)s. In particular, let \(m, n \ge 1\) be two integers, \(M_1\) be an m-state \(\diamond \text {-DFA}\) with \(\diamond \)-substitution \(\sigma _1\), and \(M_2\) be an n-state \(\diamond \text {-DFA}\) with \(\diamond \)-substitution \(\sigma _2\). Then \((2^{m}-1)\cdot (2^{n}-1)\) states are sufficient for a \(\diamond \text {-DFA}\) \(M_3\) with some \(\diamond \)-substitution \(\sigma _3\) such that \(\sigma _3(L(M_3))= \sigma _1(L(M_1)) \cap \sigma _2(L(M_2))\). In fact, \(M_3\) is a \(\text {DFA}\).

For the special case that one of the two involved \(\diamond \)-substitutions is a singleton, a much better upper bound can be shown, which turns out to be tight in the order of magnitude.

Theorem 3

Let \(m, n \ge 1\) be two integers, \(M_1\) be an m-state \(\diamond \text {-DFA}\) with \(\diamond \)-substitution \(\sigma _1\), where \(|\sigma _1(\diamond )|=1\), and \(M_2\) be an n-state \(\diamond \text {-DFA}\) with \(\diamond \)-substitution \(\sigma _2\). Then \(m\cdot (2^{n}-1)\) states are sufficient for a \(\diamond \text {-DFA}\) \(M_3\) with some \(\diamond \)-substitution \(\sigma _3\) such that \(\sigma _3(L(M_3))= \sigma _1(L(M_1)) \cap \sigma _2(L(M_2))\).

Proof

To construct \(M_3\), we take the \(\diamond \text {-DFA}\) \(M_1\) with \(\sigma _1\) as it is. Then we build the canonical \(\text {DFA}\) \(M'_2=\langle Q,\varSigma , \delta , q_0,F\rangle \) for \(M_2\) and \(\sigma _2\). Let \(\sigma _1(\diamond )=\{a\}\). We add a \(\diamond \)-transition to each state of \(M'_2\) by copying the a-transition. More precisely, for each state \(q\in Q\), we define additionally \(\delta (q, \diamond )=\delta (q,a)\). Finally, we construct the cross-product automaton from \(M_1\) and \(M'_2\), call it \(M_3\), and set \(\sigma _3(\diamond )=\{a\}\).

Now let \(w\in \sigma _1(L(M_1)) \cap \sigma _2(L(M_2))\). Then there is a word \(w'\in \sigma _1^{-1}(w)\) accepted by \(M_1\). Moreover, w is accepted by the canonical \(\text {DFA}\) for \(M_2\) and \(\sigma _2\). Since \(\sigma _1(\diamond )=\{a\}\) and \(M'_2\) is this canonical \(\text {DFA}\) extended by a \(\diamond \)-transition in parallel to every a-transition, we derive that \(w'\) is accepted by \(M'_2\) as well. So, \(w'\) is accepted by \(M_3\) and, thus, \(w=\sigma _3(w')\in \sigma _3(L(M_3))\).

Conversely, let \(w\in \sigma _3(L(M_3))\). Then there is a word \(w'\in \sigma _3^{-1}(w)\) accepted by \(M_3\). Therefore, \(w'\) is accepted by \(M_1\) and by \(M'_2\). Since \(\sigma _1=\sigma _3\), we have \(w=\sigma _3(w')\in \sigma _1(L(M_1))\). Furthermore, by construction, \(\sigma _2(L(M_2)) = \sigma _3(L(M'_2))\) and, hence, \(w=\sigma _3(w') \in \sigma _2(L(M_2))\). We conclude \(w\in \sigma _1(L(M_1)) \cap \sigma _2(L(M_2))\) and altogether have derived \(\sigma _3(L(M_3))= \sigma _1(L(M_1)) \cap \sigma _2(L(M_2))\).

For the construction of \(M_3\) as cross-product automaton of \(M_1\) and \(M'_2\), a number of states that is the product of the number of states of \(M_1\) and \(M'_2\), that is \(m\cdot (2^{n}-1)\), is sufficient.    \(\square \)

The proofs of the lower bounds are more involved, in a sense that the minimality of a \(\diamond \text {-DFA}\) accepting the intersection or union has to be shown.

Theorem 4

Let \(m \ge n \ge 1\) be two positive integers. There exist a 2m-state \(\diamond \text {-DFA}\) \(M_1\) with \(\diamond \)-substitution \(\sigma _1\), and an n-state \(\diamond \text {-DFA}\) \(M_2\) with \(\diamond \)-substitution \(\sigma _2\), such that any \(\diamond \text {-DFA}\) \(M_3\) with any \(\diamond \)-substitution \(\sigma _3\) where \(\sigma _3(L(M_3))= \sigma _1(L(M_1)) \cap \sigma _2(L(M_2))\) has at least \((m+1)\cdot (2^{n}-1)\) states. Therefore, we have

$${\min \nolimits _{\diamond \textit{-DFA}}}(L_1\cap L_2) \ge ({\min \nolimits _{\diamond \textit{-DFA}}}(L_1)/2+1) \cdot 2^{\min _{\diamond \textit{-DFA}}(L_2)}-1,$$

for infinitely many regular languages \(L_1\) and infinitely many regular languages \(L_2\).

The last Boolean operation we are looking at is the union. As mentioned above, the idea for the upper bound is to take a \(\diamond \text {-DFA}\) for one of the given partial languages and the canonical \(\text {DFA}\) for the other one, and build their cross-product automaton.

Theorem 5

Let \(m \ge n \ge 1\) be two integers, \(M_1\) be an m-state \(\diamond \text {-DFA}\) with \(\diamond \)-substitution \(\sigma _1\), and \(M_2\) be an n-state \(\diamond \text {-DFA}\) with \(\diamond \)-substitution \(\sigma _2\). Then \(m\cdot (2^{n}-1)\) states are sufficient for a \(\diamond \text {-DFA}\) \(M_3\) with some \(\diamond \)-substitution \(\sigma _3\) such that \(\sigma _3(L(M_3))= \sigma _1(L(M_1)) \cup \sigma _2(L(M_2))\).

A lower bound for the union is shown in the next theorem. While it is exponential, it does not match the upper bound, since it consists of the sum of the number of states of the larger given automaton and two to the power of the number of states of the smaller given automaton. The upper bound was given by their product.

Theorem 6

Let \(m \ge n \ge 0\) be two positive integers. There exist a \((m+1)\)-state \(\diamond \text {-DFA}\) \(M_1\) with \(\diamond \)-substitution \(\sigma _1\), and an \((n+1)\)-state \(\diamond \text {-DFA}\) \(M_2\) with \(\diamond \)-substitution \(\sigma _2\), such that any \(\diamond \text {-DFA}\) \(M_3\) with any \(\diamond \)-substitution \(\sigma _3\) where \(\sigma _3(L(M_3))= \sigma _1(L(M_1)) \cup \sigma _2(L(M_2))\) has at least \(m+2^{n}\) states. Therefore, we have \(\min _{\diamond \textit{-DFA}}(L_1\cup L_2) \ge (\min _{\diamond \textit{-DFA}}(L_1)-1) + 2^{\min _{\diamond \textit{-DFA}}(L_2)-1}\), for infinitely many regular languages \(L_1\) and infinitely many regular languages \(L_2\).

5 Hierarchy of \(\diamond \)-Transitions

Here we turn to considering the number of productive \(\diamond \)-transitions in a \(\diamond \text {-DFA}\). Here a transition is called productive, if it does not lead to the rejecting sink state. By the tight bound of \(2^n-1\) states for the \(\diamond \text {-DFA}\) to \(\text {DFA}\) conversion, the state costs for removing all \(\diamond \)-transitions are already known. But this raises the question for the state costs when only some of the productive \(\diamond \)-transitions are removed. In other words, we consider the following \((k_1,k_2)\)-\(\diamond \)-transition problem:

  • Let \(k_1> k_2 \ge 0\) be two integers.

  • Given an n-state \(\diamond \text {-DFA}\) \(M_1\) with \(\diamond \)-substitution \(\sigma _1\) having at most \(k_1\) productive \(\diamond \)-transitions.

  • How many states are sufficient and necessary in the worst case (in terms of n) for a \(\diamond \text {-DFA}\) \(M_2\) with some \(\diamond \)-substitution \(\sigma _2\) having at most \(k_2\) productive \(\diamond \)-transitions such that \( \sigma _2(L(M_2))= \sigma _1(L(M_1))? \)

Corollary 1

For any \(k_1 > 0\), the upper bound of the \((k_1,0)\)-\(\diamond \)-transition problem is \(2^n-1\).

Next, we generalize the problem and derive exponential lower bounds. In particular, the lower bound for the \((k_1,k_1-1)\)-\(\diamond \)-transition problem turns out to be exponential in the order of magnitude. Moreover, for every further productive \(\diamond \)-transition that is removed, an exponential number of states is additionally necessary in the worst case.

Theorem 7

Let \(k_1> k_2 \ge 0\) be two constant integers. Then, for each \(\ell \ge 2\), there exist a \((5k_1+k_1\ell -1)\)-state \(\diamond \text {-DFA}\) \(M_1\) with \(\diamond \)-substitution \(\sigma _1\) having \(k_1\) productive \(\diamond \)-transitions, such that any \(\diamond \text {-DFA}\) \(M_2\) with any \(\diamond \)-substitution \(\sigma _2\) having at most \(k_2\) productive \(\diamond \)-transitions and \(\sigma _2(L(M_2))= \sigma _1(L(M_1))\) has at least \(2 k_1 + k_2 (\ell +1) + (k_1-k_2)2^{\ell }-1\) states.

Proof

First, we construct a witness automaton. To this end, let \(\ell \ge 2\) be an integer. We consider the \((\ell +1)\)-state \(\diamond \text {-DFA}\) \(\hat{M}\) with \(\sigma _1(\diamond )=\{a\}\) as depicted on the right-hand side of Fig. 7, where all transitions not depicted go into the rejecting sink-state that is not depicted as well. Clearly, \(\hat{M}\) has exactly one productive \(\diamond \)-transition.

Fig. 7.
figure 7

A \(\diamond \text {-DFA}\) with \(\sigma _1(\diamond )=\{a\}\) and 4 productive \(\diamond \)-transitions. Four copies of \(\hat{M}\) are plugged in as \(\hat{M}_i\), \(1\le i\le 4\). The common rejecting sink-state as well as the transitions to it are not depicted.

Next, we use \(k_1\) copies \(\hat{M}_i\), \(1\le i\le k_1\) of \(\hat{M}\) that are distinct except for a common sink-state. Say the states are \(p_{i,j}\), for \(1\le i\le k_1\) and \(0\le j\le \ell -1\), and \(p_e\), for the sink-state. Finally, these copies are assembled into one \(\diamond \text {-DFA}\) \(M_1\) by selecting \(k_1\) different words \(z_1,z_2,\dots , z_{k_1}\) of length \(\lceil \log (k_1)\rceil \) from \(\{a,b\}^*\). These words are processed from an initial state in a tree-like structure, where the initial state is the root and each of the \(k_1\) leaves is connected to and from one copy by \(\texttt {\$}\)-transitions, where \(\texttt {\$}\) is a new symbol (see the left-hand side of Fig. 7). In this way, each copy \(\hat{M}_i\) is selected by an individual prefix \(z_i\). Again, all missing transitions are directed to the common sink-state \(p_e\). Let \(L(\hat{M})\) denote the language of words accepted by \(\hat{M}\) with initial state \(p_0\) and sole accepting state \(p_{\ell -1}\). Then a word w is accepted by \(M_1\) if and only if it has the form \(z (\texttt {\$}L(\hat{M}) \texttt {\$})^* z^R\), where \(z\in \{z_1,z_2,\dots , z_{k_1}\}\) (see Fig. 7 for an example with \(k_1=4\)). In total, the \(\diamond \text {-DFA}\) \(M_1\) has \(k_1\) productive \(\diamond \)-transitions and at most \((2\cdot (2^{\lceil \log (k_1)\rceil }-1))+ k_1 + k_1 \ell +1 \le 5k_1+k_1\ell -1\) states.

The rest of the proof is to show the claimed lower bound for the number of states necessary for any \(\diamond \text {-DFA}\) \(M_2\) with any \(\diamond \)-substitution \(\sigma _2\) having at most \(k_2\) productive \(\diamond \)-transitions and \(\sigma _2(L(M_2))= \sigma _1(L(M_1))\).    \(\square \)