Keywords

1 Introduction

The deterministic ordered restarting automaton (or det-ORWW-automaton) was introduced in [9] in the setting of picture languages. While the nondeterministic variant of this type of automaton even accepts some languages that are not context-free, it has been shown in [9] that the deterministic variant accepts exactly the regular languages.

In [10] an investigation of the descriptional complexity of the det-ORWW-automaton was initiated. It was shown that each det-ORWW-automaton can be simulated by an automaton of the same type that has only a single state, which means that for these automata, states are actually not needed. Accordingly, such an automaton is called a stateless det-ORWW-automaton (stl-det-ORWW-automaton). For these automata, the size of their working alphabets can be taken as a measure for their descriptional complexity, and it has been shown that these automata are polynomially related in size to the weight-reducing Hennie machines studied by Průša in [12]. Actually, for \(n\ge 1\), there exists a regular language that is accepted by a stl-det-ORWW-automaton of size \(O(n)\) such that each DFA for this language has size at least \(2^{2^n}\). On the other hand, each stl-det-ORWW-automaton of size \(n\) can be simulated by a DFA of size \(2^{2^{O(n^2\cdot \log n)}}\). Thus, there is a huge gap between the upper and lower bounds.

Here we present a new construction that, for a stl-det-ORWW-automaton of size \(n\), yields an equivalent unambiguous NFA of size \({2^{O(n)}}\), which implies that there is an equivalent DFA of size \(2^{2^{O(n)}}\). Actually, we will show that these bounds are sharp (up to the \(O\)-notation). We then exploit our construction to establish that many basic decision problems, like emptiness, universality, finiteness, inclusion, and equivalence, are PSPACE-complete for stl-det-ORWW-automata. In addition, we consider the problem of deciding, given a stl-det-ORWW-automaton, whether the language accepted belongs to a certain subclass of the regular languages. For the subclasses of strictly locally \(k\)-testable languages (\(k\ge 1\)), nilpotent languages, combinatorial languages, and some others, we obtain that the corresponding decision problems are PSPACE-complete, too.

This paper is structured as follows. In Sect. 2, we introduce the stl-det-ORWW-automata, and we restate the main results on them from [10]. Then, in Sect. 3, we present the announced construction of an NFA from a given stl-det-ORWW-automaton, and in Sect. 4 we consider the decision problems mentioned above. The paper closes with Sect. 5, which summarizes our results briefly and states a number of open problems for future work.

2 Stateless Deterministic Ordered Restarting Automata

A stateless deterministic ordered restarting automaton (stl-det-ORWW-automaton) is a one-tape machine that is described by a 6-tuple \(M=(\varSigma ,\varGamma ,\rhd ,\lhd ,\delta ,>),\) where \(\varSigma \) is a finite input alphabet, \(\varGamma \) is a finite tape alphabet such that \(\varSigma \subseteq \varGamma \), the symbols \(\rhd , \lhd \not \in \varGamma \) serve as markers for the left and right border of the work space, respectively,

$$\begin{aligned} \delta :(((\varGamma \cup \{\rhd \})\cdot \varGamma \cdot (\varGamma \cup \{\lhd \}))\cup \{\rhd \lhd \}) \dashrightarrow {\{\mathsf{MVR}\}\cup \varGamma \cup \{\mathsf{Accept}\}} \end{aligned}$$

is the (partial) transition function, and \(>\) is a partial ordering on \(\varGamma \). The transition function describes three different types of transition steps:

  1. (1)

    A move-right step has the form \(\delta (a_1a_2a_3)={\mathsf{MVR}}\), where \(a_1\in \varGamma \cup \{\rhd \}\) and \(a_2,a_3\in \varGamma \). It causes \(M\) to shift the window one position to the right. Observe that no move-right step is possible, if the window contains the symbol \(\lhd \).

  2. (2)

    A rewrite/restart step has the form \(\delta (a_1a_2a_3)=b\), where \(a_1\in \varGamma \cup \{\rhd \}\), \(a_2,b\in \varGamma \), and \(a_3\in \varGamma \cup \{\lhd \}\) such that \(a_2>b\) holds. It causes \(M\) to replace the symbol \(a_2\) in the middle of its window by the symbol \(b\) and to restart.

  3. (3)

    An accept step has the form \( \delta (a_1a_2a_3)={\mathsf{Accept}}\), where \(a_1\in \varGamma \cup \{\rhd \}\), \(a_2\in \varGamma \), and \(a_3\in \varGamma \cup \{\lhd \}\). It causes \(M\) to halt and accept. In addition, we allow an accept step of the form \(\delta (\rhd \lhd )=\mathsf{Accept}\).

If \(\delta (u)\) is undefined for some word \(u\), then \(M\) necessarily halts, when it sees \(u\) in its window, and we say that \(M\) rejects in this situation. Further, the letters in \(\varGamma \backslash \varSigma \) are called auxiliary symbols.

A configuration of a stl-det-ORWW-automaton \(M\) is a pair of words \((\alpha ,\beta )\), where \(|\beta |\ge 3\), and either \(\alpha =\lambda \) (the empty word) and \(\beta \in \{\rhd \}\cdot \varGamma ^+\cdot \{\lhd \}\) or \(\alpha \in \{\rhd \}\cdot \varGamma ^*\) and \(\beta \in \varGamma \cdot \varGamma ^+\cdot \{\lhd \}\); here \(\alpha \beta \) is the current content of the tape, and it is understood that the window contains the first three symbols of \(\beta \). In addition, we admit the configuration \((\lambda ,\rhd \lhd )\). A restarting configuration has the form \((\lambda ,\rhd w\,\lhd )\); if \(w\in \varSigma ^*\), then \((\lambda ,\rhd w\,\lhd )\) is also called an initial configuration. Furthermore, we use Accept to denote the accepting configurations, which are those configurations that \(M\) reaches by an accept step. We let \(\vdash _M\) denote the single-step computation relation that \(M\) induces on the set of configurations, and the computation relation \(\vdash _M^*\) of \(M\) is the reflexive and transitive closure of \(\vdash _M\).

Any computation of a stl-det-ORWW-automaton \(M\) consists of certain phases. A phase, called a cycle, starts in a restarting configuration, the head is moved along the tape by MVR steps until a rewrite/restart step is performed and thus, a new restarting configuration is reached. If no further rewrite operation is performed, any computation necessarily finishes in a halting configuration – such a phase is called a tail. By \(\vdash ^c_M\) we denote the execution of a complete cycle, and \(\vdash ^{c^*}_M\) is the reflexive transitive closure of this relation. It can be seen as the rewrite relation that \(M\) induces on its set of restarting configurations.

An input \(w\in \varSigma ^*\) is accepted by \(M\) if the computation of \(M\) which starts with the initial configuration \((\lambda ,\rhd w\,\lhd )\) ends with an accept step. The language consisting of all input words that are accepted by \(M\) is denoted by \(L(M)\).

As each cycle ends with a rewrite operation, which replaces a symbol \(a\) by a symbol \(b\) that is strictly smaller than \(a\) with respect to the given ordering \(>\), we see that each computation of \(M\) on an input of length \(n\) consists of at most \((|\varGamma |-1)\cdot n\) cycles and a tail. Thus, \(M\) can be simulated by a deterministic single-tape Turing machine in time \(O(n^2)\). The following example illustrates the way in which a stl-det-ORWW-automaton works.

Example 1

Let \(n\ge 2\) be a fixed integer, and let \(M=(\varSigma ,\varGamma ,\rhd ,\lhd ,\delta ,>)\) be defined by taking \(\varSigma = \{a,b\}\) and \(\varGamma = \varSigma \cup \{\,a_{i},b_{i},x_i\mid 1\le i\le n-1\,\}\), by choosing the ordering \(>\) such that \(a>a_{i}>x_j\) and \(b>b_{i}>x_j\) hold for all \(1\le i,j\le n-1\), and by defining the transition function \(\delta \) in such a way that \(M\) proceeds as follows: on input \(w=w_1w_2\cdots w_m\), \(w_1,\dots ,w_m\in \varSigma \), \(M\) numbers the first \(n-1\) letters of \(w\) from left to right, by replacing \(w_i=a\) (\(b\)) by \(a_{i}\) (\(b_{i}\)) for \(i=1,\dots ,n-1\). If \(w_n\not =a\), then the computation fails, but if \(w_n=a\), then \(M\) continues by replacing the last \(n-1\) letters of \(w\) from right to left using the letters \(x_1\) to \(x_{n-1}\). If the \(n\)-th last letter is \(b\) or some \(b_{i}\), then \(M\) accepts, otherwise the computation fails again.

Then \(L(M) = \{\,w\in \{a,b\}^m\mid m > n,\,w_n =a, \text{ and } w_{m+1-n}=b\,\}.\) As shown in [6], every det-RR(\(1\))-automaton for \(L(M)\) has at least \(O(2^n)\) states. Here a det-RR(\(1\))-automaton is another type of deterministic restarting automaton that characterizes the regular languages (see [7]).

While nondeterministic ORWW-automata are quite expressive, the deterministic variants are fairly weak. Taking the size of the tape alphabet as the measure for the descriptional complexity of a stl-det-ORWW-automaton, the following results are shown in [10].

Theorem 2

  1. (a)

    For each DFA \(A=(Q,\varSigma ,q_0,F,\varphi )\), there is a stl-det-ORWW-automaton \(M=(\varSigma ,\varGamma ,\rhd , \lhd ,\delta ,>)\) such that \(L(M)=L(A)\) and \(|\varGamma | = |Q| + |\varSigma |\).

  2. (b)

    For each stl-det-ORWW-automaton \(M\) with an alphabet of size \(n\), there exists a DFA \(A\) of size \(2^{2^{O(n^2 \log n)}}\) such that \(L(A)=L(M)\) holds.

  3. (c)

    For each \(n\ge 1\), there exists a regular language \(B_n\subseteq \{0,1,\$\}^*\) such that \(B_n\) is accepted by a stl-det-ORWW-automaton over an alphabet of size \(O(n)\), but each DFA for accepting \(B_n\) has at least \(2^{2^n}\) states.

Thus, there is a double exponential trade-off for converting a stl-det-ORWW-automaton into a DFA. Observe, however, that the gap between the lower and upper bounds is still huge.

3 Simulating a stl-det-ORWW-automaton by an NFA

Here we present our main result, which consists in the construction of an unambiguous NFA \(A\) of size \(2^{O(n)}\) from a stl-det-ORWW-automaton \(M\) of size \(n\) such that \(A\) accepts the same language as \(M\). In order to simplify this construction, we require that \(M\) only accepts on reaching the right sentinel \(\lhd \). This is not a restriction, as shown by the following lemma.

Lemma 3

From a stl-det-ORWW-automaton \(M = ( \varSigma , \varGamma , \rhd , \lhd , \delta , >)\), one can construct a stl-det-ORWW-automaton \(M' = ( \varSigma , \varDelta , \rhd , \lhd , \delta ', >)\) such that \(L(M')=L(M)\), \(|\varDelta |\le |\varGamma |+1\), and \(M'\) only accepts when its window contains the right sentinel \(\lhd \).

To motivate our main construction we consider an example.

Example 4

Let \(M\) be a stl-det-ORWW-automaton on the input alphabet \(\varSigma =\{a_1,a_2,a_3,a_4,a_5\}\) and the working alphabet \(\varGamma = \varSigma \cup \{b_1,b_2,b_3,b_4,c_1,c_2,c_3,c_4\}\) with the ordering \(a_i>b_i>c_i\) for all \(1\le i\le 4\), and let the transition function be given by the following table:

$$\begin{array}{lcllcllcllcl} \delta (\rhd a_1a_2) &{} = &{} b_1,\quad &{} \delta (\rhd b_1 a_2) &{} = &{} \mathsf{MVR},\quad &{} \delta (b_1a_2a_3) &{} = &{} \mathsf{MVR},\quad &{} \delta (a_2a_3a_4) &{} = &{} b_3,\\ \delta (b_1a_2b_3) &{} = &{} b_2,&{} \delta (c_2c_3a_4) &{} = &{} \mathsf{MVR},&{} \delta (\rhd c_1b_2) &{} = &{} \mathsf{MVR},&{} \delta (c_1b_2b_3) &{} = &{} \mathsf{MVR},\\ \delta (b_2b_3a_4) &{} = &{} c_3,&{} \delta (c_2c_3c_4) &{} = &{} \mathsf{MVR},&{} \delta (\rhd c_1c_2) &{} = &{} \mathsf{MVR},&{} \delta (c_1c_2c_3) &{} = &{} \mathsf{MVR},\\ \delta (c_1b_2c_3) &{} = &{} c_2,&{} \delta (c_3a_4a_5) &{} = &{} b_4,&{} \delta (c_2c_3b_4) &{} = &{} \mathsf{MVR},&{} \delta (c_3b_4a_5) &{} = &{} c_4,\\ \delta (\rhd b_1b_2) &{} = &{} c_1,&{} \delta (c_3c_4a_5) &{} = &{} \mathsf{MVR},&{} \delta (c_4a_5\lhd ) &{} = &{} \mathsf{Accept}. \end{array}$$

Given the word \(w=a_1a_2a_3a_4a_5\) as input, \(M\) executes the following accepting computation, where the rewritten letters are underlined:

$$\begin{array}{lclclclcl} (\lambda ,\rhd \underline{a_1}a_2a_3a_4a_5\lhd ) &{} \vdash _M^c &{} (\lambda ,\rhd b_1a_2\underline{a_3}a_4a_5\lhd ) &{} \vdash _M^c &{} (\lambda ,\rhd b_1\underline{a_2}b_3a_4a_5\lhd ) &{} \vdash _M^c \\ [+0.1cm] (\lambda ,\rhd \underline{b_1}b_2b_3a_4a_5\lhd ) &{} \vdash _M^c &{} (\lambda ,\rhd c_1b_2\underline{b_3}a_4a_5\lhd ) &{} \vdash _M^c &{} (\lambda ,\rhd c_1\underline{b_2}c_3a_4a_5\lhd ) &{} \vdash _M^c \\ [+0.1cm] (\lambda ,\rhd c_1c_2c_3\underline{a_4}a_5\lhd ) &{} \vdash _M^c &{} (\lambda ,\rhd c_1c_2c_3\underline{b_4}a_5\lhd ) &{} \vdash _M^c &{} (\lambda ,\rhd c_1c_2c_3c_4a_5\lhd ) &{} \vdash _M^* &{} \mathsf{Accept}. \end{array}$$

To encode this computation in a compact way, we introduce a 3-tuple of vectors \(T = (L,W,R)\) for each position on the tape of \(M\), where

  • \(W\) is a sequence of letters \(W=(x_1,x_2,\dots ,x_r)\) over \(\varGamma \) such that \(x_1>x_2>\dots > x_r\) using the ordering on \(\varGamma \) defined by \(M\),

  • \(L\) is a sequence of indices \(L=(i_1,\dots ,i_{r-1})\) such that \(i_1\le \dots \le i_{r-1}\le |\varGamma |\),

  • \(R\) is a sequence of indices \(R=(j_1,\dots ,j_{r-1})\) such that \(j_1\le \dots \le j_{r-1}\le |\varGamma |\).

The idea is that \(W\) encodes the sequence of letters that are produced by \(M\) in an accepting computation for a particular field, and \(L\) and \(R\) encode the information on the neighbouring letters to the left and to the right that are used to perform the corresponding rewrite operations. For the computation above we obtain the following sequence of triples, where \(\varLambda \) denotes an empty sequence:

For example, the triple \((2,b_3,1)\in (L_3,W_3,R_3)\) means that \(b_3\) is rewritten into \(c_3\), while the left neighbouring field contains the second letter of its sequence \(W_2\), and the right neighbouring field contains the first letter of its sequence \(W_4\).

If a letter is not rewritten at all, like \(a_5\), then the corresponding sequences \(L\) and \(R\) are empty. In fact, there is a consistency condition that must be met by the sequences \(R_{i-1}\) and \(L_{i}\) for each index \(i\), as the rewrites at positions \(i-1\) and \(i\) are executed in some order, and this order is encoded in these sequences. For example, \(L_3=(1,2)\), which means that \(a_3\) is rewritten into \(b_3\), while tape field \(2\) still contains the original letter \(a_2\), and \(b_3\) is rewritten into \(c_3\), while tape field \(2\) contains the next letter \(b_2\). Thus, before the second rewrite at position \(3\) can occur, the letter \(a_2\) at position \(2\) has been rewritten into \(b_2\), which is expressed by the fact that \(R_2=(2,3)\) starts with the number \(2\). Finally, the second number in \(R_2\) states that \(b_2\) is rewritten into \(c_2\) only after the second rewrite at position \(3\) has been performed. Hence, \(R_2=(2,3)\) and \(L_3=(1,2)\) lead to the sequence of rewrite steps \((1: a_3\rightarrow b_3), (2: a_2\rightarrow b_2), (3: b_3\rightarrow c_3), (4: b_2\rightarrow c_2).\) \(\square \)

To formalize the notion of compatibility of two finite non-decreasing sequences of integers \(R = (r_1,\dots ,r_k)\) and \(L=(\ell _1,\dots ,\ell _s)\), where \(k,s\ge 0\), we define a multiset \({\mathrm {order}}(R,L)\) as follows:

$${\mathrm {order}}(R,L) = \{\,r_i+i-1\mid i=1,\dots ,k\,\}\,\cup \,\{\,\ell _j+j-1\mid j=1,\dots ,s\,\}.$$

Now the pair of sequences \((R,L)\) is called consistent, if \({\mathrm {order}}(R,L)=\{1,2,\dots ,k+s\}\), that is, it is the integer interval \([1,k+s]\). In the example above, we obtain \({\mathrm {order}}(R_2,L_3) = {\mathrm {order}}((2,3),(1,2)) = \{2,4,1,3\}=\{1,2,3,4\}\), thus we assign a number between 1 and \(4=|R_2|+|L_3|\) to each of the rewrites at positions \(i-1\) and \(i\), in this way specifying the order in which these rewrites must be executed.

Based on the above ideas, we will now establish the following general result.

Theorem 5

Let \(M=(\varSigma ,\varGamma ,\rhd ,\lhd ,\delta _M,>)\) be a stl-det-ORWW-automaton. Then an unambiguous NFA \(A=(Q,\varSigma ,\varDelta _A,q_0,F)\) can be constructed from \(M\) such that \(L(A)=L(M)\) and \(|Q|\in 2^{O(|\varGamma |)}\).

Proof

Let \(M=(\varSigma ,\varGamma ,\rhd ,\lhd ,\delta _M,>)\) be a stl-det-ORWW-automaton. At the extra cost of at most one additional tape symbol, we can assume by Lemma 3 that \(M\) executes an accept step only when its window contains the right sentinel \(\lhd \). Let \(n=|\varGamma |\). As a first step we construct an NFA \(B\) for the characteristic language \(L_C(M) = \{\,w\in \varGamma ^*\mid (\lambda ,\rhd w \lhd )\vdash _M^* \mathsf{Accept}\,\}\) of \(M\), which consists of all words over \(\varGamma \) that \(M\) accepts.

The NFA \(B=(Q,\varGamma ,\varDelta _B,q_0,F)\) is constructed as follows:

  • The set \(Q\) contains the initial state \(q_0\), a designated final state \(q_F\), and all pairs of triples of the form \(\left( (L_1,W_1,R_1), (L_2,W_2,R_2) \right) \), where, for \(i=1,2\),

    • \(W_i\) is a sequence of letters \(W_i = \left( w_{i,1} , \dots , w_{i,k_i} \right) \) from \(\varGamma \) of length \(1\le k_i\le n\) such that \(w_{i,1}> w_{i,2} > \dots > w_{i,k_i}\), or \(W_i=\left( \rhd \right) \) and \(k_i=1\),

    • \(L_i\) is a sequence of positive integers \(L_i = \left( l_{i,1} , \dots , l_{i,k_i-1} \right) \) of length \(k_i - 1\) such that \(l_{i,1}\le l_{i,2} \le \dots \le l_{i,k_i-1}\le n\),

    • \(R_i\) is a sequence of positive integers \(R_i = \left( r_{i,1} , \dots , r_{i,k_i-1} \right) \) of length \(k_i - 1\) such that \(r_{i,1}\le r_{i,2} \le \dots \le r_{i,k_i-1}\le n\),

    • the sequences \(R_1\) and \(L_2\) are consistent, that is, \({\mathrm {order}}(R_1,L_2) = \{1,2,\dots ,k_1+k_2-2\}\).

The transition relation \(\varDelta _B\) is given through the following rules, where \(x \in \varGamma \) and \(\left( (L_{i-1}, W_{i-1}, R_{i-1}), (L_i, W_i, R_i) \right) \), \(i=2,3\), are states from \(Q\):

  • \( \varDelta _B(q_0,\lambda ) \ni q_F\), if \(\delta _M(\rhd \lhd ) = \mathsf{Accept}\).

  • \(\varDelta _B(q_0, x) \ni \left( (\varLambda , (\rhd ), \varLambda ), (L_1, W_1, R_1) \right) \), if \(x = w_{1,1}\).

  • \( \varDelta _B\left( ( (L_1, W_1, R_1),(L_2, W_2, R_2) ), x\right) \ni ( (L_2, W_2, R_2),(L_3, W_3, R_3) )\), if

    1. 1.

      \(x = w_{3,1}\),

    2. 2.

      \(\forall 1 \le j \le k_2 -1: \delta _M\left( w_{1,l_{2,j}} w_{2,j} w_{3,r_{2,j}} \right) = w_{2,j+1}\),

    3. 3.

      \(\forall 1 \le j \le k_3 -1: \delta _M\left( w_{1,l_{2,l_{3,j}}} w_{2,l_{3,j}} w_{3,j} \right) = \mathsf{MVR}\), where \(l_{2,k_2} = k_1\) is taken, and

    4. 4.

      \( \delta _M\left( w_{1,k_1} w_{2,k_2} w_{3,k_3} \right) = \mathsf{MVR}\).

  • \(\varDelta _B\left( ( (L_1, W_1, R_1),(L_2, W_2, R_2) ), \lambda \right) \ni q_F\), if

    1. 1.

      \(R_2\) is a sequence of \(1\)’s of length \(k_2-1\),

    2. 2.

      \(\delta _M\left( w_{1,k_1} w_{2,k_2} \lhd \right) = \mathsf{Accept}\), and

    3. 3.

      \(\forall 1 \le j \le k_2 -1: \delta _M\left( w_{1,l_{2,j}} w_{2,j} \,\lhd \right) = w_{2,j+1} \).

We will prove that \(L(B) = L_C(M)\) holds.

Claim 1. \(L_C(M)\subseteq L(B)\).

Proof

Let \(w\in \varGamma ^*\) be a word that belongs to the language \(L_C(M)\). Thus, the computation of \(M\) that starts with the restarting configuration \((\lambda ,\rhd w \lhd )\) is accepting. If \(w = \lambda \), then \(\delta _M(\rhd \lhd ) = \mathsf{Accept}\), which implies that \(q_F\in \varDelta _B(q_0,\lambda )\). It follows that \(w \in L(B)\) holds in this case.

Now assume that \(w = w_1w_2 \cdots w_n\) for some \(n\ge 1\) and letters \(w_1,\dots ,w_n\in \varGamma \). As \(w\in L_C(M)\), we can now use the accepting computation of \(M\) for \(w\) to construct a representation as in the example above. This representation translates into a sequence of states of \(B\), and it can be shown that this sequence of states yields an accepting computation of \(B\) for the input \(w\). \(\square \)

Claim 2. \(L(B)\subseteq L_C(M)\).

Proof

We have to check that we can deduct a valid computation of \(M\) from an accepting computation of \(B\). So let \(w\in \varGamma ^*\) be any word in \(L(B)\), and let \(n=|w|\). If \(w = \lambda \), then \(q_F\in \delta _B(q_0,\lambda )\), which implies that \(\delta _M(\rhd \lhd ) = \mathsf{Accept}\) holds, which in turn means that \(w \in L_C(M)\).

If \(w=w_1\in \varGamma \), then there exist sequences \(W_1 = (w_{1,1},\dots , w_{1,k_1})\) over \(\varGamma \) and \(L_1=(l_{1,1},\dots ,l_{1,k_1-1})\) and \(R_1=(r_{1,1},\dots ,r_{1,k_1-1})\) over \(\mathbb {N}\) such that

  • \(w_{1,1} = w_1\),

  • \(( (\varLambda ,(\rhd ),\varLambda ) , (L_1,W_1,R_1)) \in \varDelta _B(q_0,w_1)\), and

  • \(q_F \in \varDelta _B(( (\varLambda ,(\rhd ),\varLambda ) , (L_1,W_1,R_1)), \lambda ).\)

From the definition of \(\varDelta _B\) it follows that either \(k_1=1\), and then \(\mathsf{Accept}\in \delta _M(\rhd w_1\lhd )\), or \(k_1>1\), and then \(l_{1,j} = 1 = r_{1,j}\) for all \(j=1,\dots , k_1-1\), \(w_{1,j+1}\in \delta _M(\rhd w_{1,j}\lhd )\) for all \(j=1,\dots , k_1-1\), and \(\mathsf{Accept}\in \delta _M(\rhd w_{1,k_1}\lhd )\). Hence, we see that the computation of \(M\) that begins with the restarting configuration \((\lambda ,\rhd w \lhd )\) accepts, that is, \(w = w_1\in L_C(M)\).

Now assume that \(w = w_1 \cdots w_n\) for some \(n\ge 2\) and letters \(w_1,\dots ,w_n\in \varGamma \). As \(B\) accepts on input \(w\), there exist sequences \(W_i=(w_{i,1}, \dots , w_{i,k_i})\) over \(\varGamma \) and sequences of integers \(L_i= ( l_{i,1}, \dots , l_{i,k_i -1})\) and \(R_i = ( r_{i,1}, \dots , r_{i,k_i -1})\), \(i=1,\dots ,n\), such that all of the following conditions are met:

\( \begin{array}{cl} 1. &{} ( (\varLambda ,(\rhd ),\varLambda ) , (L_1,W_1,R_1)) \in \varDelta _B(q_0,w_1),\\ 2. &{} ( (L_{i-1},W_{i-1},R_{i-1}) , (L_i,W_i,R_i)) \in \\ &{} \varDelta _B( (L_{i-2},W_{i-2},R_{i-2}) , (L_{i-1},W_{i-1},R_{i-1}) ), w_i) \text{ for } \text{ all } i=2,\dots ,n,\\ 3. &{} q_F \in \varDelta _B( (L_{n-1},W_{n-1},R_{n-1}) , (L_{n},W_{n},R_{n}) ), \lambda ). \end{array} \)

From the definition of \(\varDelta _B\) we see that, for all \(i=1,\dots ,n\), \(k_i\ge 1\) and \(w_{i,1}=w_i\). Now let \(N=N(R_1, \ldots , R_n ) = \sum _{i=1}^n |R_i| = \sum _{i=1}^n (k_i-1)\). By induction on \(N\) we will prove the following technical statement.

Claim 2.1. The computation of \(M\) that begins with the restarting configuration \((\lambda ,\rhd w\lhd )\) consists of \(N\) cycles and an accepting tail, that is, it has the form

$$(\lambda ,\rhd w\lhd ) \vdash _M^{c} (\lambda ,\rhd z^{(1)}\lhd ) \vdash _M^c \dots \vdash _M^c (\lambda ,\rhd z^{(N)}\lhd ) \vdash _M^* (\rhd u,v\lhd ) \vdash _M \mathsf{Accept},$$

where \(z^{(N)} = uv\) and \(|v| = 2\).

Proof

If \(N=0\), then \(k_i=1\) for all \(i=1,\dots ,n\), and hence, \(W_i = (w_i)\) and \(L_i=R_i=\varLambda \) for all \(i=1,\dots ,n\). From the definition of \(\varDelta _B\) it follows that \(\delta _M(w_{i-2}w_{i-1}w_i ) = \mathsf{MVR}\) for all \(i=2,\dots ,n\), where \(w_0=\rhd \) is taken, and \(\delta _M (w_{n-1}w_n \lhd ) = \mathsf{Accept}\). Thus, the computation of \(M\) that begins with the restarting configuration \((\lambda ,\rhd w\lhd )\) is simply an accepting tail computation.

Now assume that \(N\ge 1\). Then \(k_i>1\) for some indices \(i\in \{1,\dots ,n\}\), and accordingly, the corresponding sequences \(L_i\) and \(R_i\) are non-empty. Because of the consistency of the pairs \((R_{i-1},L_i)\), \(i=1,\dots ,n\), there exists an index \(j\in \{1,\dots ,n\}\) such that \(l_{j,1} = 1 = r_{j,1}\). Let \(s\in \{1,\dots ,n\}\) be the minimal index such that \(l_{s,1} = 1 = r_{s,1}\) holds. It follows that \(k_s>1\) and that \(W_s = (w_{s,1},w_{s,2},\dots ,w_{s,k_s})\), where \(w_s=w_{s,1}>w_{s,2}\). Let \(\hat{w}\) denote the word \(\hat{w} = w_1\cdots w_{s-1}w_{s,2}w_{s+1}\cdots w_n\in \varGamma ^n\). For this word the following result can be shown.

Claim 2.1.1. \((\lambda ,\rhd w \lhd ) \vdash _M^c (\lambda ,\rhd \hat{w}\lhd )\).

We continue with the proof of Claim 2.1 by establishing the following claim, which will allow us to perform the intended inductive step.

Claim 2.1.2. The word \(\hat{w}\) is accepted by the NFA \(B\).

Proof

For all \(i=1,\dots ,n\), we define sequences \(\hat{W}_i\) over \(\varGamma \) and sequences of integers \(\hat{L}_i\) and \(\hat{R}_i\) as follows:

$$\begin{array}{lcl} \hat{W}_i &{}=&{}\left\{ \begin{array}{ll} (w_{i,2}, \dots , w_{i,k_i}), &{} \text {if } i= s,\\ W_i, &{} \text {otherwise;} \end{array}\right. \\ \\ \hat{L}_i &{}=&{}\left\{ \begin{array}{ll} ( l_{i,2}, \dots , l_{i,k_i -1}), &{} \text {if } i= s, \\ ( l_{i,1} - 1, \dots , l_{i,k_i -1}-1), &{} \text {if } i= s +1, \\ L_i, &{} \text {otherwise;} \end{array}\right. \\ \\ \hat{R}_i &{} = &{}\left\{ \begin{array}{ll} (r_{i,2}, \dots , r_{i,k_i -1}), &{} \text {if } i= s ,\\ ( r_{i,1}-1, \dots , r_{i,k_i -1}-1), &{} \text {if } i= s - 1, \\ R_i, &{} \text {otherwise,} \end{array}\right. \end{array}$$

and we take \(\hat{k}_i\) to denote the length of the sequence \(\hat{W}_i\), \(i=1,\dots ,n\). Then \(\hat{k}_s = k_s-1\), and \(\hat{k}_i=k_i\) for all \(i\not =s\). In order to unify the notation we write \(\hat{w} = w_1\cdots w_{s-1}w_{s,2}w_{s+1}\cdots w_n\) as \(\hat{w}= \hat{w}_1 \cdots \hat{w}_n\). Also we write \(\hat{W}_i\) as \(\hat{W}_i=(\hat{w}_{i,1},\dots , \hat{w}_{i,\hat{k}_i})\), and \(\hat{L}_i\) and \(\hat{R}_i\) as \(\hat{L}_i = (\hat{l}_{i,1},\dots ,\hat{l}_{i,\hat{k}_i-1})\) and \(\hat{R}_i = (\hat{r}_{i,1},\dots ,\hat{r}_{i,\hat{k}_i-1})\), \(i=1,\dots ,n\). It can now be shown that the above sequences satisfy all of the following conditions::

\(\begin{array}{cl} 1. &{} ( (\varLambda ,(\rhd ),\varLambda ) , (\hat{L}_1,\hat{W}_1,\hat{R}_1)) \in \varDelta _B(q_0,\hat{w}_1),\\ 2. &{} ( (\hat{L}_{i-1},\hat{W}_{i-1},\hat{R}_{i-1}) , (\hat{L}_i,\hat{W}_i,\hat{R}_i)) \in \\ &{} \varDelta _B( (\hat{L}_{i-2},\hat{W}_{i-2},\hat{R}_{i-2}) , (\hat{L}_{i-1},\hat{W}_{i-1},\hat{R}_{i-1}) ), \hat{w}_i) \text{ for } \text{ all } i=2,\dots ,n,\\ 3. &{} q_F \in \varDelta _B( (\hat{L}_{n-1},\hat{W}_{n-1},\hat{R}_{n-1}) , (\hat{L}_{n},\hat{W}_{n},\hat{R}_{n}) ), \lambda ). \end{array}\)

It follows that the word \(\hat{w}\) is accepted by \(B\) using the sequence of states defined above. As

$$N(\hat{R}_1,\dots ,\hat{R}_n) = \sum _{i=1}^n (\hat{k}_i-1) = \sum _{i=1}^{n} (k_i-1)-1 = N(R_1,\dots ,R_n)-1=N-1,$$

we can apply our induction hypothesis, which implies that the computation of \(M\) that begins with the restarting configuration \((\lambda ,\rhd \hat{w}\lhd )\) consists of \(N-1\) cycles and an accepting tail. Together with Claim 2.1.1 this says that the computation of \(M\) that begins with the restarting configuration \((\lambda ,\rhd w\lhd )\) consists of \(N\) cycles and an accepting tail, which completes the proof of Claim 2.1. \(\square \)

From the claims above we obtain that \(L(B)=L_C(M)\) holds. As \(M\) is deterministic, there is only a single accepting computation of \(B\) for each word \(w\in L_C(M)\). It follows that \(B\) is unambiguous. \(\square \)

Claim 3. \(|Q| \in 2^{O(|\varGamma |)}\).

Proof

The set \(Q\) of states of \(B\) contains the two designated states \(q_0\) and \(q_F\) and certain states that consist of pairs of triples of the form \((L,W,R)\), where \(W\) is a sequence of letters \(W=(a_1,\dots ,a_m)\) from \(\varGamma \) such that \(a_1>\dots > a_m\), and \(L\) and \(R\) are sequences of integers \(L=(l_1,\dots ,l_{m-1})\) and \(R=(r_1,\dots , r_{m-1})\) such that \(1\le l_1\le \dots \le l_{m-1}\) and \(1\le r_1\le \dots \le r_{m-1}\). From upper bounds for the number of these sequences we will obtain an upper bound for the size of \(Q\).

From the condition on the sequence \(W\) we see that \(m\le n=|\varGamma |\), and also \(l_{m-1}\le n\) and \(r_{m-1}\le n\). The sequence \(W\) defines the subset \(\{w_1,\dots ,w_m\}\) of \(\varGamma \), and different sequences \(W\) and \(W'\) yield different subsets. Hence, the number \(2^n-1\) of non-empty subsets of \(\varGamma \) is an upper bound for the number of different subsequences \(W\).

The sequence \(L\) can be interpreted as a multiset over the set of integers \(\{1,\dots ,n\}\), because it can contain repetitions. This multiset is of size at most \(n-1\) (counting elements with their multiplicities). There are \({n+r-1 \atopwithdelims ()r}\) such multisets of size \(r\) (see, e.g., [14]), and hence, the number of possible sequences \(L\) is bounded from above by the expression

$$ \sum _{r=0}^{n-1} {n+ r -1 \atopwithdelims ()r} \;\le \; \sum _{r=0}^{n-1} { 2n \atopwithdelims ()r } \;\le \; \sum _{r=0}^{2n} { 2n \atopwithdelims ()r } \; =\; 2^{2n}, $$

and the same is true for the number of possible sequences \(R\). Hence, there are at most \(2^{2n}\cdot 2^{n}\cdot 2^{2n} = 2^{5n}\) different triples of the form \((L,W,R)\), and so the number of states of \(B\) is bounded from above by the number \(2^{10n}\). \(\square \)

It follows that \(B\) is of size \(2^{O(n)}\). From \(B\) we now obtain an NFA \(A\) for the language \(L(M) = L_C(M) \cap \varSigma ^*\) by simply deleting all transitions from \(\varDelta _B\) that read a letter \(x\in (\varGamma \backslash \varSigma )\). Then it is immediate that \(A\) is an unambiguous NFA of size \(2^{O(n)}\) that accepts the language \(L(A) = L(B)\cap \varSigma ^* = L(M)\). \(\square \)

For all \(n\ge 3\), the language \(U_n=\{a^{2^n}\}\) can be shown to be accepted by a stl-det-ORWW-automaton with an alphabet of \(3n-1\) letters, while each NFA for \(U_n\) needs at least \(2^n+1\) states. Hence, the bound given in Theorem 5 is sharp up to the \(O\)-notation. In addition, we have the following consequence, which is a clear improvement over the upper bound given in Theorem 2 (b).

Corollary 6

For each stl-det-ORWW-automaton \(M\) with alphabet of size \(n\), there exists a DFA \(C\) of size \(2^{2^{O(n)}}\) such that \(L(C)=L(M)\) holds.

4 Decision Problems for stl-det-ORWW-automata

The emptiness problem for an NFA \(A=(Q,\varSigma ,\delta ,q_0,F)\) of size \(|Q|=m\) is decidable nondeterministically in space \(O(\log m)\) (see, e.g., [5]), and so, by Savitch’s Theorem [13] it follows that \(\mathsf{NFA{\text {-}}Emptiness}\in \mathsf{DSPACE}((\log |Q|)^2)\). Based on this observation we can use Theorem 5 to derive the following result.

Theorem 7

The emptiness problem for stl-det-ORWW-automata is PSPACE-complete.

Proof

Let \(M=(\varSigma ,\varGamma ,\rhd ,\lhd ,\delta ,>)\) be a stl-det-ORWW-automaton such that \(|\varGamma |= n\). By Theorem 5, there exists an NFA \(A\) of size \(2^{O(n)}\) such that \(L(A) = L(M)\). Now we can check emptiness of \(L(A)\) deterministically using space \((\log (2^{O(n)}))^2 = O(n^2)\). Thus, we see that \(\mathsf{stl{\text {-}}det{\text {-}}ORWW{\text {-}}Emptiness} \in {\mathrm {PSPACE}}\).

Now let \(A_1,\dots ,A_t\) be \(t\ge 2\) DFAs over a common input alphabet \(\varSigma \) of size \(k\) such that \(A_i\) has \(n_i\) states, \(1\le i\le t\). From these DFAs we can construct a stl-det-ORWW-automaton \(M\) with a tape aphabet of size \(k\cdot (1+n_1+\dots +n_{t-1})+n_t\) such that \(L(M)=\bigcap _{i_1}^t L(A_i)\) [10]. Hence, \(M\) has at most \(O((k\cdot \sum _{i=1}^t n_i)^3)\) transitions, and so it can be computed from \(A_1,\dots , A_t\) in polynomial time. Now \(L(M)\not =\emptyset \) iff \(L(A_1)\cap \dots \cap L(A_t)\not =\emptyset \), which shows that the above construction yields a polynomial-time reduction from the \(\mathsf{DFA{\text {-}}Intersection{\text {-}}Emptiness~Problem}\) to \(\mathsf{stl{\text {-}}det{\text {-}}ORWW{\text {-}}Emptiness}\). As the former is PSPACE-complete (see, e.g., [4]), we see that the latter is also PSPACE-hard. Together with the membership in PSPACE shown above, PSPACE-completeness follows. \(\square \)

From this theorem we also get the following completeness results.

Corollary 8

For stl-det-ORWW-automata, universality, finiteness, inclusion, and equivalence are PSPACE-complete.

Proof

Universality: Let \(M\) be a stl-det-ORWW-automaton with input alphabet \(\varSigma \). In polynomial time we can construct a stl-det-ORWW-automaton \(M^c\) for the language \(L(M^c) = (L(M))^c = \varSigma ^*\backslash L(M)\) from \(M\) such that \(M^c\) uses the same tape alphabet as \(M\) [10]. The automaton \(M\) is universal, that is, \(L(M)=\varSigma ^*\), iff \(L(M^c) = \emptyset \). PSPACE-completeness of the universality problem now follows from PSPACE-completeness for the emptiness problem.

Inclusion and Equivalence: Let \(M_1\) and \(M_2\) be stl-det-ORWW-automata with alphabets of sizes \(n_1\) and \(n_2\), respectively. In polynomial time we can construct a stl-det-ORWW-automaton \(M\) with an alphabet of size \(O(n_1\cdot n_2)\) from \(M_1\) and \(M_2\) such that \(L(M) = L(M_1)\cap L(M_2)^c\) [10]. Now \(L(M_1) \subseteq L(M_2)\) iff \(L(M_1)\cap L(M_2)^c = \emptyset \) iff \(L(M)=\emptyset \). It follows that the inclusion problem is in PSPACE, which in turn implies immediately that the equivalence problem is in PSPACE.

On the other hand, let \(M'\) be a stl-det-ORWW-automaton that accepts the empty set. Then \(L(M)=L(M')\) iff \(L(M)\subseteq L(M')\) iff \(L(M)=\emptyset \). Thus, PSPACE-completeness of the inclusion and the equivalence problems follows from PSPACE-completeness for the emptiness problem.

Finiteness: Let \(M=(\varSigma ,\varGamma ,\rhd ,\lhd ,\delta ,>)\) be a stl-det-ORWW-automaton. We take a new symbol \(\Box \), that is, \(\Box \not \in \varGamma \), and define a stl-det-ORWW-automaton \(M'=(\varSigma ',\varGamma ',\rhd ,\lhd ,\delta ',>)\) as follows:

  • \(\varSigma ' = \varSigma \cup \{\Box \}\) and \(\varGamma '=\varGamma \cup \{\Box \}\),

  • the transition function \(\delta '\) is obtained from \(\delta \) by simply interpreting an occurrence of the symbol \(\Box \) as an occurrence of the right delimiter \(\lhd \).

Then \(L(M') = L(M) \cup (L(M) \cdot \Box \cdot {\varSigma '}^*)\), which means that \(L(M')\) is finite iff \(L(M)=\emptyset \). PSPACE-hardness of finiteness now follows from PSPACE-hardness of the emptiness problem.

On the other hand, from a stl-det-ORWW-automaton \(M\) with an alphabet of size \(n\) we can construct an NFA \(A\) of size \(2^{O(n)}\) such that \(L(M)=L(A)\). Just like emptiness, also infiniteness is decidable for \(A\) nondeterministically in space \(\log (2^{O(n)}) \in O(n)\), and hence, it is decidable deterministically in space \(O(n^2)\). Thus, finiteness for stl-det-ORWW-automata is indeed PSPACE-complete. \(\square \)

In the literature many subfamilies of the regular languages have been studied (see, e.g., [1, 3, 11]). Here we only consider some of them, beginning with the strictly locally testable languages of [8, 15], but the corresponding problem can be stated for any subclass of REG.

A language \(L\subseteq \varSigma ^*\) is strictly \(k\)-testable for some \(k\ge 1\) if \(L \cap \varSigma ^k\cdot \varSigma ^* = (A\cdot \varSigma ^* \cap \varSigma ^*\cdot B) \backslash \varSigma ^+\cdot (\varSigma ^k\backslash C)\cdot \varSigma ^+\) for some finite sets \(A,B,C\subseteq \varSigma ^k\). For example, the language \((a+b)^*\) is strictly 1-testable, and the language \(a(baa)^+\) is strictly 3-testable, but the language \((aa)^*\) is not strictly locally testable.

For each \(k\ge 1\), if a language \(L\) is given through a DFA, then it is decidable in polynomial time whether or not \(L\) is strictly locally \(k\)-testable. Also it is decidable in polynomial time whether \(L\) is strictly locally testable [2]. We are interested in the corresponding variant of these problems in which the language considered is given through a stl-det-ORWW-automaton. Here we have the following result.

Theorem 9

The following problem is PSPACE-complete for each \(k\ge 1\):

INSTANCE: A stl-det-ORWW-automaton \(M\).

QUESTION: Is the language \(L(M)\) strictly locally \(k\)-testable?

The construction in the proof shows that the problem of deciding strictly locally testability is at least PSPACE-hard for stl-det-ORWW-automata, but it remains open whether this problem is in PSPACE.

Using the same kind of reasoning it can be shown that, for a stl-det-ORWW-automaton, also the problems of deciding whether the accepted language is nilpotent, combinatorial, circular, suffix-closed, prefix-closed, suffix-free, or prefix-free (see, e.g., [1, 3] for the definitions of these notions) are PSPACE-complete.

5 Concluding Remarks

We have shown that stl-det-ORWW-automata, although being deterministic devices, can provide exponentially more succinct representations for regular languages than NFAs. In addition, we have shown that many decision problems of interest are PSPACE-complete for stl-det-ORWW-automata. However, some open problems remain, for example:

  • Can the given upper bounds be further improved by providing small constants in the exponents?

  • Is the problem of deciding whether the language \(L(M)\) that is accepted by a given stl-det-ORWW-automaton \(M\) is strictly locally testable decidable in polynomial space?