Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

Syntactic considerations of digital images have a tradition of about five decades. They should (somehow) reflect methods applied to picture processing. However, one of the basic methods of scanning pictures in practice have not been thoroughly investigated from a more theoretical point of view: that of using space-filling curves. Here, we start such an investigation with what can be considered as the most simple way of defining space-filling curves: scanning line after line of an image, alternating the direction of movement every time when the image boundary is encountered (more information on the use of space-filling curves in connection with image processing or picture languages can be found in [12, 14, 20, 23]).

We consider finite automata that work this way. We show that they are (essentially) equivalent to regular matrix languages (RMLs) as introduced in a sequence of papers of Rani Siromoney and her co-authors already in the early 1970s. These two-dimensional picture languages have connection with generation of kolam patterns [17, 24]. Possibly surprisingly enough, we also present quite a number of new results for this class of picture languages, including a discussion of natural decidability questions lacking so far.

2 Our Model and Some Examples

2.1 General Definitions

In this section, we briefly recall the standard definitions and notations regarding one- and two-dimensional words and languages.

Let \(\mathbb {N} := \{1, 2, 3, \ldots \}\) be the set of natural numbers. For a finite alphabet \(\varSigma \), a string or word (over \(\varSigma \)) is a finite sequence of symbols from \(\varSigma \), and \(\varepsilon \) stands for the empty string. The notation \(\varSigma ^+\) denotes the set of all nonempty strings over \(\varSigma \), and \(\varSigma ^*:=\varSigma ^+ \cup \{ \varepsilon \}\). For the concatenation of two strings \(w_1, w_2\) we write \(w_1 \cdot w_2\) or simply \(w_1 w_2\). We say that a string \(v \in \varSigma ^*\) is a factor of a string \(w \in \varSigma ^*\) if there are \(u_1, u_2 \in \varSigma ^*\) such that \(w = u_1 \cdot v \cdot u_2\). If \(u_1\) or \(u_2\) is the empty string, then v is a prefix (or a suffix, respectively) of w. The notation |w| stands for the length of a string w.

A two-dimensional word (also called a picture, a matrix or an array) over \(\varSigma \) is a tuple

$$\begin{aligned} W := ((a_{1,1}, a_{1,2}, \ldots , a_{1,n}), (a_{2,1}, a_{2,2}, \ldots , a_{2,n}), \ldots , (a_{m,1}, a_{m,2}, \ldots , a_{m,n})), \end{aligned}$$

where \(m, n \in \mathbb {N}\) and, for every i, \(1 \le i \le m\), and j, \(1 \le j \le n\), \(a_{i, j} \in \varSigma \). We define the number of columns (or width) and number of rows (or height) of W by \(|W|_c := n\) and \(|W|_r := m\), respectively. For the sake of convenience, we also denote W by \([a_{i, j}]_{m, n}\) or by a matrix in a more pictorial form. If we want to refer to the \(j^{\text {th}}\) symbol in row i of the picture W, then we use \(W[i, j] = a_{i, j}\). By \(\varSigma ^{++}\), we denote the set of all (non-empty) pictures over \(\varSigma \). Every subset \(L \subseteq \varSigma ^{++}\) is a picture language. \(L' = \varSigma ^{++} - L\) is the complement of the picture language L.

Let \(W := [a_{i, j}]_{m, n}\) and \(W' := [b_{i, j}]_{m', n'}\) be two non-empty pictures over \(\varSigma \). The column concatenation of W and \(W'\), denoted by , is undefined if \(m \ne m'\) and is the picture

$$\begin{aligned} {\begin{matrix} a_{1,1} &{} a_{1,2} &{} \ldots &{} a_{1,n} &{} b_{1,1} &{} b_{1,2} &{} \ldots &{} b_{1,n'} \\ a_{2,1} &{} a_{2,2} &{} \ldots &{} a_{2,n} &{} b_{2,1} &{} b_{2,2} &{} \ldots &{} b_{2,n'} \\ \vdots &{} \vdots &{} \ddots &{} \vdots &{} \vdots &{} \vdots &{} \ddots &{} \vdots \\ a_{m,1} &{} a_{m,2} &{} \ldots &{} a_{m,n} &{} b_{m',1} &{} b_{m',2} &{} \ldots &{} b_{m',n'} \end{matrix}} \end{aligned}$$

otherwise. The row concatenation of W and \(W'\), denoted by , is undefined if \(n \ne n'\) and is the picture

$$\begin{aligned} {\begin{matrix} a_{1,1} &{} a_{1,2} &{} \ldots &{} a_{1,n} \\ a_{2,1} &{} a_{2,2} &{} \ldots &{} a_{2,n} \\ \vdots &{} \vdots &{} \ddots &{} \vdots \\ a_{m,1} &{} a_{m,2} &{} \ldots &{} a_{m,n} \\ b_{1,1} &{} b_{1,2} &{} \ldots &{} b_{1,n'} \\ b_{2,1} &{} b_{2,2} &{} \ldots &{} b_{2,n'} \\ \vdots &{} \vdots &{} \ddots &{} \vdots \\ b_{m',1} &{} b_{m',2} &{} \ldots &{} b_{m',n'} \end{matrix}} \end{aligned}$$

otherwise. In order to denote that, e. g., is undefined, we also write .

Example 1

Let

Then , but

For a picture W and \(k, k' \in \mathbb {N}\), by \(W^k\) we denote the k-fold column-concatenation of W, by \(W_k\) we denote the k-fold row-concatenation of W, and \(W^k_{k'} = (W^k)_{k'}\).

2.2 Boustrophedon Finite Automata

We now give the main definition of this paper, introducing a new automaton model for picture processing.

Definition 1

A boustrophedon finite automaton, or BFA for short, can be specified as a 7-tuple , where Q is a finite set of states, \(\varSigma \) is an input alphabet, \(R\subseteq Q\times (\varSigma \cup \{\#\})\times Q\) is a finite set of rules. A rule \((q,a,p)\in R\) is usually written as \(qa\rightarrow p\). The special symbol \(\# \notin \varSigma \) indicates the border of the rectangular picture that is processed, \(s\in Q\) is the initial state, F is the set of final states.

We are now going to discuss the notions of configurations, valid configurations and an according configuration transition to formalize the work of BFAs, based on snapshots of their work.

Let be a new symbol indicating an erased position and let . Then \(C_M:=Q\times \varSigma _+^{++}\times \mathbb {N}\) is the set of configurations of M.

A configuration \((p, A, \mu ) \in C_M\) is valid if \(1 \le \mu \le |A|_r\) and, for every i, \(1 \le i \le \mu -1\), the \(i^\text {th}\) row equals , for every j, \(\mu + 1 \le j \le |A|_r\), the \(j^\text {th}\) row equals \(\# w \#\), \(w \in \varSigma ^{|A|_c - 2}\), and, for some \(\nu \), \(0 \le \nu \le |A|_c - 2\), \(w \in \varSigma ^{|A|_c - \nu - 2}\), the \(\mu ^\text {th}\) row equals , if \(\mu \) is odd and , if \(\mu \) is even. Notice that valid configurations model the idea of observable snapshots of the work of the BFA.

If \((p,A,\mu )\) and \((q,A',\mu )\) are two valid configurations such that A and \(A'\) are identical but for one position (ij), where while \(A[i,j]\in \varSigma \), then \((p,A,\mu )\vdash _M (q,A',\mu )\) if \(p A[i,j]\rightarrow q\in R\). If \((p,A,\mu )\) and \((q,A,\mu +1)\) are two valid configurations, then \((p,A,\mu )\vdash _M (q,A,\mu +1)\) if the \(\mu ^\text {th}\) row contains only \(\#\) and symbols, and if \(p\#\rightarrow q\in R\). The reflexive transitive closure of the relation \(\vdash _M\) is denoted by \(\vdash _M^*\).

The BFA M is deterministic, or a BDFA for short, if for all \(p\in Q\) and \(a\in \varSigma \cup \{\#\}\), there is at most one \(q\in Q\) with \(pa\rightarrow q\in R\).

The language L(M) accepted by M is then the set of all \(m\times n\) pictures A over \(\varSigma \) such that

for some \(f\in F\).

Note that the automaton works on a picture with a first and last column of only \(\#\) symbols, but only the part in between these border columns is accepted. In other words, the computation starts with scanning the left uppermost corner of the picture and then working through the picture row-by-row, as the ox turns, i.e., the boustrophedon way, until the last entry of the last row is scanned. The following illustrates how a BFA scans some input picture and how a picture of a valid configuration looks like; it can be seen that the sequence of only indicates how far the input has been processed:

Notice that since rules of the form \(p\#\rightarrow q\) need not be present in R, so that in some natural sense the classical regular string languages are a special case of BFA languages.

Example 2

The set of tokens L of all sizes and of all proportions is accepted by the BFA , where \(Q = \{s, s_1, s_2, s_3, s_4\}\), \(\varSigma = \{\mathtt {x},\cdot \}\), \(R = \{s \mathtt {x}\rightarrow s_1, s_1 \cdot \rightarrow s_1, s_1 \# \rightarrow s_2, s_1\# \rightarrow s_4, s_2\cdot \rightarrow s_2, s_2\mathtt {x}\rightarrow s_3, s_3\# \rightarrow s, s_3\# \rightarrow s_4, s_4\mathtt {x}\rightarrow s_4\}\), and \(F = \{s_4\}\).

A sample token of L accepted by M is

We now recall the notion of two-dimensional right-linear grammars (2RLG) as given in [7]. The original definition of a 2RLG (under the name of a regular matrix grammar (RMG)) and the properties of the corresponding class of picture languages called RML can be found in [8, 15, 16, 21].

Definition 2

A two-dimensional right-linear grammar (2RLG) is defined by a 7-tuple \(G = (V_h, V_v, \varSigma _I, \varSigma , S, R_h, R_v)\), where:

  • \(V_h\) is a finite set of horizontal nonterminals;

  • \(V_v\) is a finite set of vertical nonterminals;

  • \(\varSigma _I \subseteq V_v\) is a finite set of intermediates;

  • \(\varSigma \) is a finite set of terminals;

  • \(S \in V_h\) is a starting symbol;

  • \(R_h\) is a finite set of horizontal rules of the form \(V \rightarrow AV'\) or \(V \rightarrow A\), where V, \(V' \in V_h\) and \(A \in \varSigma _I\);

  • \(R_v\) is a finite set of vertical rules of the form \(W \rightarrow aW'\) or \(W \rightarrow a\), where \(W, W' \in V_v\) and \(a \in \varSigma \).

There are two phases of derivation of the 2RLG. In the first phase, a horizontal string of intermediate symbols is generated by means of the string grammar \(G_h = (V_h, \varSigma _I, S, R_h)\), denoted by H(G). In the second phase, treating each intermediate as a start symbol, the vertical generation of the actual picture is done in parallel, by applying a finite set of right-linear rules \(R_v\). In order to produce a rectangular-shaped picture, the rules of \(R_v\) must be applied in parallel; also this means that the rules of the form \(V_i \rightarrow a_i\) are all applied in every column simultaneously to finish the picture with the generation of its last row. These grammars make sure that the columns can grow only in downward direction.

We note that our model is closely connected with 2RLG, as we will show more precisely in the following. The formalization of 2RLG that we chose is closer to our model than the original one due to Siromoney and her co-authors.

3 Characterization Results

Clearly, BFAs are a special form of 4-NFA (4-way nondeterministic finite automata, see [7]), and it is known that for these 2-dimensional automata, the deterministic variant is weaker regarding its descriptive capacity compared to the nondeterministic one. Hence, and also because of the practical relevance of the deterministic model, the following result is interesting.

Theorem 1

BDFAs and BFAs describe the same class of picture languages.

Proof

Apply the well-known subset construction. This works out as our BFAs are syntactically the same as classical finite automata, only the interpretation of their processing is different.    \(\square \)

Next, we examine the question whether the boustrophedon processing mode of our automata is essential. To this end, let us consider yet another interpretation of finite automata, this time termed returning finite automata, or RFA for short. Syntactically, they are identical to BFA, so they can be again described by a 7-tuple . However, they always process rows from left to right. Formally, this means that we can carry over all parts of the definition of BFA apart from the notion of a valid configuration, which needs to be slightly modified. Now, a configuration \((p, A, \mu ) \in C_M\) is valid if \(1 \le \mu \le |A|_r\) and, for every i, \(1 \le i \le \mu -1\), the ith row equals , for every j, \(\mu + 1 \le j \le |A|_r\), the jth row equals \(\# w \#\), \(w \in \varSigma ^{|A|_c - 2}\), and, for some \(\nu \), \(0 \le \nu \le |A|_c - 2\), \(w \in \varSigma ^{|A|_c - \nu - 2}\), the \(\mu \)th row equals .

Theorem 2

BFAs and RFAs describe the same class of picture languages.

Proof

We first show how an RFA can simulate a BFA. The basic idea can be summarised as follows. On the first row, which is scanned from left to right by both automata, the RFA simulates the BFA one to one. Assume that the BFA, while moving on to the second row, changes into a state q, scans the row from right to left and enters a state p when the beginning of this row is reached. In order to simulate this behaviour, the RFA stores its current state q in the finite state control and guesses the state p. It then scans the second row from left to right (starting in state p) by applying the transitions of the BFA in reverse direction. When the end of the row is reached, the computation only proceeds if the RFA is in state q. This procedure is then repeated.

More formally, the states of the RFA are like BFA states or triples thereof. These triples simulate the processing of even rows and are like \((\ell ,q,r)\), where q is the actual state, r is the state that the RFA should reach after finishing the current even row at the right border and \(\ell \) is the state in which the RFA starts simulating the current row (left border). The formal definition is as follows.

Let be some BFA. Then, the equivalent RFA is defined by \(Q'= Q\cup (Q\times Q\times Q)\),

$$\begin{aligned} R'&= \{pa\rightarrow q\mid pa\rightarrow q\in R, a\in \varSigma \} \\&\cup \{(\ell ,q,r)a\rightarrow (\ell ,p,r)\mid pa\rightarrow q\in R, \ell ,r\in Q,a\in \varSigma \}\\&\cup \{p\#\rightarrow (\ell ,\ell ,q)\mid p\#\rightarrow q\in R, \ell \in Q\}\\&\cup \{(\ell ,r,r)\#\rightarrow q\mid \ell \#\rightarrow q\in R, r\in Q\} \,, \end{aligned}$$

and \(F'=F\cup \{(\ell ,q,q)\mid \ell \in F, q\in Q\}\). The formal (induction) proof of the correctness of the construction is left to the reader.

The converse direction, simulating RFAs with BFAs, can be seen in a similar way.    \(\square \)

We can likewise define finite automata that read all rows in a right-to-left fashion. A similar construction as in the previous theorem shows (again) that this model is equivalent to BFAs. All conversions between the different FA models for picture processing are at worst quadratic. This also shows that the direction of the rotation mentioned in the next theorem does not matter.

Theorem 3

A picture language can be described by a BFA if and only if its image, rotated by 90 degrees, is in RML.

Proof

We provide two simulations to show the claim.

Let \(G = (V_h, V_v, \varSigma _I, \varSigma , S, R_h, R_v)\), be a 2-dimensional right-linear grammar. The rotation can be interpreted as H(G) describing the leftmost column of the picture, while the second phase of G then means to generate all rows, starting from the intermediate string from H(G). We are going to construct an equivalent RFA , which is sufficient for giving a BFA thanks to Theorem 2. Let \(Q=(V_h\cup \{f\})\times (V_v\cup \{f\})\cup \{s\}\), where \(f\notin V_h\cup V_v\), and \(F=\{(f,f)\}\). Let R contain the following rules:

  • \(sa\rightarrow (S',A')\), if \(S\rightarrow AS'\in R_h\) and \(A\rightarrow aA'\in R_v\),

  • \(sa\rightarrow (f,A')\), if \(S\rightarrow A\in R_h\) and \(A\rightarrow aA'\in R_v\),

  • \((X,A)a\rightarrow (X,A')\), if \(X\in V_h\cup \{f\}\) and \(A\rightarrow aA'\in R_v\),

  • \((X,A)a\rightarrow (X,f)\), if \(A\rightarrow a\in R_v\), \(X\in V_h\),

  • \((X,f)\#\rightarrow (X',A)\), if \(X\rightarrow AX'\in R_h\),

  • \((X,f)\#\rightarrow (f,A)\), if \(X\rightarrow A\in R_h\),

  • \((f,A)a\rightarrow (f,f)\), if \(A\rightarrow a\in R_v\).

The idea of the construction is that the generation of columns of G is performed in the second component of the state pairs, whereas the first component corresponds to the generation of the axiom (i.e., the first row of the pictures generated by G). The crucial difference is that the first symbol of the axiom (which in the case of RFA is the first column instead of the first row) is generated and then the first row is generated before the second letter of the axiom is generated in the next row. Hence, the two phases of the picture construction of G is dovetailed.

The converse is seen as follows. Let be some RFA. We construct an equivalent 2-dimensional right-linear grammar \(G = (V_h, V_v, \varSigma _I, \varSigma , S, R_h, R_v)\) (generating the rotated picture) with \(V_h=Q\cup \{S\}\), \(\varSigma _I=Q\times Q\), and rules

  • \(S\rightarrow (s,r)r\in R_h\) for all \(r\in Q\),

  • \(q\rightarrow (q,r)r\in R_h\) for all \(q,r\in Q\),

  • \(q\rightarrow \varepsilon \) for all \(q\in Q\),

  • \((p,r)\rightarrow (q,r)a\in R_v\) for all \(pa\rightarrow q\in R\), \(r\in Q\),

  • \((p,q)\rightarrow \varepsilon \in R_v\) for all \(p\#\rightarrow q\in R\).

The astute reader will have noticed that we took the freedom to incorporate erasing productions for convenience, but these can be avoided by using standard formal language constructions. This concludes the proof.    \(\square \)

Due to Theorem 3, we can inherit several properties for the class of picture languages described by BFAs. For instance, the class is not closed under rotation by 90 degrees, also known as quarter turns, see [16]. On the positive side, RML (and hence BFA picture languages) are closed under Boolean operations. More precisely, it was shown in [16] that RML (and hence BFA picture languages) are closed under union. We supplement this by the following two results.

Theorem 4

BFA picture languages are closed under complementation.

Proof

First, let us recall from Theorem 1 that BDFA and BFA describe the same class of picture languages. Let be some BDFA. Without loss of generality, we assume that M is complete. Then we can construct a BDFA \(\overline{M}\) by state complementation, i.e., . On some input picture \(A \in \varSigma ^{++}\), M reaches the same state as \(\overline{M}\) and, furthermore, since both M and \(\overline{M}\) are deterministic, there exists exactly one state \(q \in Q\) that can be reached by M and \(\overline{M}\) on input A. This directly implies that \(A \in L(M)\) if and only if \(A \notin L(\overline{M})\); thus, \(L(\overline{M}) = \overline{L(M)}\). Hence BFA picture languages are closed under complementation.    \(\square \)

Notice that the previous theorem has become easy because we have a deterministic model for BFAs, in contrast to what has been established for RML before. De Morgan’s law now immediately yields:

Corollary 1

BFA picture languages are closed under intersection.

Conversely, the results we derive in the following for BFAs can be immediately read as results for RML, as well.

4 Pumping and Interchange Lemmas

Since in the pictures of an RML, the first row as well as the columns are generated by regular grammars, there are two ways to apply the pumping lemma for regular languages: we can pump the first row, which results in repetitions of a column-factor of the picture, or we can pump each column individually, which will only lead to a rectangular shaped picture if the pumping exponents are, in a sense, well-chosen. Hence, we can conclude a horizontal and a vertical pumping lemma for RML (see [9]) and, due to Theorem 3, these pumping lemmas carry over to BFA languages:

Lemma 1

Let M be a BFA. Then there exists an \(n \in \mathbb {N}\), such that, for every \(W \in L(M)\) with \(|W|_{r} \ge n\), , , \(|Y|_r \ge 1\) and, for every \(k \ge 0\), .

Lemma 2

Let M be a BFA and let \(W \in L(M)\) with \(|W|_r = m\). Then there exist \(n, r_1, r_2, \ldots , r_m \in \mathbb {N}\), such that, for every \(W \in L(M)\) with \(|W|_{c} \ge n\),

, \(|y_i|_c \ge 1\), \(1 \le i \le m\), and, for every \(k \ge 1\),

where \(t_i = \frac{\text {lcm}(r_1, r_2, \ldots , r_m)}{r_i}\), \(1 \le i \le m\).

Lemma 1 is straightforward and in order to see that Lemma 2 holds, it is sufficient to note that n is the maximum of all the pumping lemma constants for the individual rows (recall that each row is generated by an individual regular grammar) and the \(r_i\) are the lengths of the factors that are pumped. Obviously, not every way of pumping the rows results in a rectangular shaped picture, so we can only pump by multiples of the \(t_i\).

While the vertical pumping lemma has the nice property that a whole row-factor can be pumped, in the horizontal pumping lemma we can only pump factors of each individual row, that are independent from each other. As a result, this lemma does not guarantee the possibility of pumping by 0, i.e., removing a factor, which, for classical regular languages, often constitutes a particularly elegant way of showing the non-regularity of a language.

However, it can be shown that also for BFA there exists a horizontal pumping lemma that pumps whole column-factors (which then also translates into a vertical pumping lemma for RML that pumps whole row-factors).

Lemma 3

Let M be a BFA and let \(m \in \mathbb {N}\). Then there exists an \(n \in \mathbb {N}\), such that, for every \(W \in L(M)\) with \(|W|_{r} \le m\) and \(|W|_{c} \ge n\), , , \(|Y|_c \ge 1\) and, for every \(k \ge 0\), .

Proof

Let Q be the set of states of M, let \(q_0\) be the start state and let \(n = |Q|^m + 1\). Furthermore, let \(W \in L(M)\) with \(|W|_{r} = m\) (the case \(|W|_{r} < m\) can be handled analogously) and \(|W|_{c} = n' \ge n\). Since M accepts W, there is an accepting computation \((p_1, W_1, m_1) \vdash _M^* (p_k, W_k, m_k)\) for W, i.e., and . We can now consider the extended configurations \((p_i, W'_i, m_i)\), where \(W'_i\) is like \(W_i\) with the only difference that each symbol is replaced by the state that has been entered by producing this occurrence of . Since W has m rows, the maximum number of different columns in \(W'_k\) is \(|Q|^m\) and since W has at least \(n = |Q|^m + 1\) columns, we can conclude that , where \(|\alpha '|_c = 1\). Furthermore, and . Now let , where \(|X|_c = |X'|_c\), \(|Y|_c = |Y'|_c\) and \(|Z|_c = |Z'|_c\). By definition of BFA, for every \(i \ge 0\), M accepts .    \(\square \)

We wish to point out that in a similar way, we can also prove a row and a column interchange lemma (the only difference is that the number n has to be chosen large enough to enforce repeating pairs of states):

Lemma 4

Let M be a BFA. Then there exists an \(n \in \mathbb {N}\), such that, for every \(W \in L(M)\) with \(|W|_{r} \ge n\), there exists a factorisation , \(|X|_c \ge 1\), \(|Y|_c \ge 1\), such that .

Lemma 5

Let M be a BFA and let \(m \in \mathbb {N}\). Then there exists an \(n \in \mathbb {N}\), such that, for every \(W \in L(M)\) with \(|W|_{r} \le m\) and \(|W|_{c} \ge n\), there exists a factorisation , \(|X|_c \ge 1\), \(|Y|_c \ge 1\), such that .

5 Complexity Results

Only few complexity results have been obtained so far for RML. The only reference that we could find was an unpublished manuscript of Dassow [2] that also merely classified the decidability versus undecidability status of several decision problems. Here, we give the exact complexity status of the basic decidability questions for RML, formulated in terms of BFA.

We will only look into classical formal language questions, which are:

  • Universal membership: Given a B(D)FA M and a picture A (as input of some algorithm), is A accepted by M?

  • Non-emptiness: Given a B(D)FA M, is there some picture A accepted by M?

  • Inequivalence: Given two B(D)FAs \(M_1\) and \(M_2\), do both automata accept the same set of pictures?

Also, we shortly discuss the issue of minimization in the context of BDFAs. Our complexity considerations will be concerned with standard complexity classes, like (N)L, i.e., (non-)deterministic logarithmic space, (N)P, i.e., (non-)deterministic polynomial time, and PSPACE, i.e., polynomial space. All hardness reductions that we sketch are implementable in deterministic logarithmic space.

Theorem 5

The universal membership problem for BFAs is NL-complete.

Proof

As universal membership is NL-hard for NFAs (working on strings), NL-hardness is clear. As a configuration of a BFA can be specified (basically) by the state and a pointer into the input picture, membership in NL is obvious for this problem, as well.    \(\square \)

By a similar argument applied to deterministic devices, we conclude:

Corollary 2

The universal membership problem for BDFAs is L-complete.

We now turn to the problem of deciding whether or not a given B(D)FA accepts a non-empty language. Interestingly, membership in NP is not that easy to derive as it is usually the case.

Theorem 6

The non-emptiness problem for B(D)FAs is NP-complete, even for unary input alphabets.

Proof

We reduce from the well-known NP-complete intersection emptiness problem for finite automata with a one-letter input alphabet, see [10]. The idea is to run each of the input automata \(A_1,\dots ,A_m\) in one line. We can assume that all state alphabets are disjoint. All transition rules of each \(A_i\) are also transition rules of the BFA M we are going to construct. For each final state \(f_i\) of \(A_i\), we introduce a rule \(f_i\#\rightarrow s_{i+1}\), where \(s_{i+1}\) is the initial state of \(A_{i+1}\). The set \(F_m\) of final states of \(A_m\) is the set of final states of M. The intersection \(L(A_1)\cap \cdots \cap L(A_m)\) is non-empty if and only if some word \(a^k\) is accepted by all these automata, which means that the picture \(a^k_m\) is accepted by M.

Conversely, let be some B(D)FA. Given \(q_0,q_f\in Q\), we first associate to M the (string-processing) NFA \(A[q_0,q_f]\) with state set Q, initial state \(q_0\), the (only) final state \(q_f\) and (unary) input alphabet \(\{a\}\), as well as with a transition \(pa\rightarrow q\) if and only if there is some \(x\in \varSigma \) (notice that \(x\ne \#\)) such that \(px\rightarrow q\in R\). Our nondeterministic algorithm for deciding whether or not \(L(M)\ne \emptyset \) now works as follows:

  • First, it guesses \(2r\le 2|Q|\) states \(q_0^1,q_f^1,q_0^2,q_f^2,\dots ,q_0^r,q_f^r\) and verifies the following properties:

    • \(q_0^1=s\) and \(q_f^r\in F\);

    • for each \(j=1,\dots ,r-1\): \(q_f^j\#\rightarrow q_0^{j+1}\in R\).

    If these tests are passed, the computation continues (otherwise, it simply aborts).

  • Call (as a black box) the NP algorithm that decides the non-emptiness of intersection for finite automata with unary input alphabet with the r automata \(A[q_0^j,q_f^j]\), \(1\le j\le r\), as input.

  • Output that \(L(M)\ne \emptyset \) if and only if \(\bigcap _{j=1}^r L(A[q_0^j,q_f^j])\ne \emptyset \).

The construction is correct, as by the pumping lemmas we know that any n-state BFA M either accepts a picture with at most n rows and at most \(n^n\) columns, or it does not accept any picture at all. Hence, it is sufficient to guess at most 2n states as indicated.    \(\square \)

Theorem 7

The inequivalence problem for BDFAs is NP-complete.

Proof

The hardness immediately transfers from Theorem 6. As the BDFA picture languages are closed under Boolean operations (and the constructions can be carried out in polynomial time), given two BDFAs \(M_1\) and \(M_2\), we can construct a BDFA M such that the picture language accepted by M is the symmetric difference of the picture languages of \(M_1\) and \(M_2\); hence, \(M_1\) and \(M_2\) are equivalent if and only if the picture language of M is empty.    \(\square \)

From what is known about the string language case, we immediately infer:

Corollary 3

The inequivalence problem for BFAs is PSPACE-complete.

Let us finally comment shortly about minimization. Here, the question is, given a BDFA A, to find a BDFA \(A^*\) that has as few states as possible but describes the same pictures as A does. Notice that this problem can be solved in polynomial time for DFAs accepting words. It might be tempting to use this well-known algorithm and apply it to a given BDFA A. In fact, this would result, in general, in a smaller automaton \(A'\) that is also picture-equivalent to A. However, in general \(A^*\) and \(A'\) can differ significantly.

Let us explain the problems with this approach with a simple example. Consider the (string) language

$$\begin{aligned}L=\{a^{385}\}^+\#\{a^{385}\}^+\#\{a^{385}\}^+\,.\end{aligned}$$

The smallest DFA accepting this language has 1159 states. The pictures that are accepted by this automaton (as a BDFA) are pictures of three rows, where each row has a number of a’s that is a multiple of 385. However, as the length of rows have to synchronize, it is sufficient to make sure that at least one row has the right length, so that \(L'=\{a^{385}\}^+\#\{a\}^+\#\{a\}^+\) would be another string language, whose minimal state deterministic finite automaton has only 388 states, that accepts the same picture language. As \(385=5*7*11\), we can even see that the minimal DFA for

$$\begin{aligned}L''=\{a^5\}^+\#\{a^7\}^+\#\{a^{11}\}^+\,,\end{aligned}$$

which has 26 states, again accepts the same picture language when interpreted as a BDFA. Still, it remains unclear to us if this is the minimal deterministic automaton for the picture language in question. Even more, we do not see a general efficient methodology how to obtain minimal-state BDFAs. The only method that we can propose is brute-force, cycling through all smaller BDFAs and then testing for equivalence. This can be easily implemented in polynomial space, so that we can conclude (in terms of a decision problem).

Proposition 1

The question to determine whether a given BFA is minimal-state can be solved in polynomial space.

Notice that we could state this (even) for nondeterministic devices, but as indicated above, we do not know anything better for deterministic ones, either. Basically the same results could be stated for RFAs, as well (also for this type of automata, we could consider a deterministic variant). Especially, our (bad) example carries over, as the input alphabet is a singleton set.

6 Possible Applications to Character Recognition

Character recognition has always been the testbed application for picture processing methods. We refer to [4, 13] and the literature quoted therein. In this regard, we are now going to discuss the recognition of some classes of characters, also (sometimes) showing the limitations of our approach, making use of the pumping lemmas that we have shown above.

For example, consider the set K of all L tokens of all sizes with fixed proportion i.e., the ratio between the two arms of L being 1. The three members of K are as follows:

$$\begin{aligned} {\begin{matrix} &{} &{} \\ &{} &{} \\ &{} \mathtt {x}&{} \cdot \\ &{} \mathtt {x}&{} \mathtt {x}\end{matrix}}\,, {\begin{matrix} &{} &{} &{} \\ &{} \mathtt {x}&{} \cdot &{} \cdot \\ &{} \mathtt {x}&{} \cdot &{} \cdot \\ &{} \mathtt {x}&{} \mathtt {x}&{} \mathtt {x}\end{matrix}}\,, {\begin{matrix} &{} \mathtt {x}&{} \cdot &{} \cdot &{} \cdot \\ &{} \mathtt {x}&{} \cdot &{} \cdot &{} \cdot \\ &{} \mathtt {x}&{} \cdot &{} \cdot &{} \cdot \\ &{} \mathtt {x}&{} \mathtt {x}&{} \mathtt {x}&{} \mathtt {x}\end{matrix}}\,. \end{aligned}$$

We claim that K is not accepted by any BFA. Suppose there exists a BFA to accept K. Then by Lemma 1 there exists an \(n \in \mathbb {N}\), such that, for every \(W \in K\) with \(|W|_{r} \ge n\), , , \(|Y|_r \ge 1\) and, for every \(k \ge 0\), . But, unfortunately, for many values of k we get L tokens with unequal arms which are not members of K which gives a contradiction to our assumption.

On the other hand, as pointed out by Example 2, if we do not require the ratio between the two arms to be fixed, then the corresponding set of pictures can be recognised by a BFA. Similarly, the characters A (if given in the form

figure a

), E, F, H, I, P (if given in the form

figure b

), T, U (if given in the form

figure c

) can be recognised by BFA, if we do not require fixed proportions. In particular, this means that

figure d

are valid characters as well. Note that the character I plays a special role: this set of characters can only be recognised by a BFA if it is given in the form for some fixed constant \(k \in \mathbb {N}\) (i.e., a BFA is not able to recognise the set of all vertical lines).

However, if we insist on fixed proportions, then it can be easily shown that the character classes mentioned above cannot be recognised by BFAs. For example, if the length of an arm of a character (or the distance between two parallel arms) is only allowed to grow in proportion to the length of another arm, then the vertical or horizontal pumping lemma shows that this class of characters cannot be recognised by a BFA.

More generally, even single diagonal lines cannot be detected by BFA, which excludes several classes of characters from the class of BFA languages, e. g., A, K, M, N, X. We shall prove this claim more formally, by applying the vertical interchange lemma.

Let L be the set of pictures of diagonal lines from the upper-left corner to the lower-right corner, i.e.,

figure e

If L can be recognised by a BFA, then, according to Lemma 4, there is a picture \(W \in L\) with , \(|X|_c \ge 1\), \(|Y|_c \ge 1\), and . The following illustrates how this leads to a contradiction:

figure f

We chose L to only contain square pictures with a diagonal connecting the upper-left corner with the lower-right one for presentational reasons. In the same way, it can be shown that the set of single continuous diagonal lines cannot be recognised by BFA.

7 Discussions

Scanning pictures line by line ‘as the ox turns’ is for sure not a new invention in image processing. We have tried to derive a formal model that does mirror this strategy. On the one hand, we have shown that this formal model is pretty stable, as it has various characterizations, and it is even linked to RML, one of the earliest formal models of picture processing. On the other hand, there are quite some natural operations under which we would hope such a model to be closed, as, for example, quarter turns.

There are more powerful models than ours that have been proposed for picture processing, like 4-way NFAs or OTAs, see [7]. These have better closure properties, but also much weaker decidability results; for instance, the emptiness problem for such devices is undecidable.

However, OTAs are related to our model in the sense that they process a picture diagonal by diagonal, whereas our model process it row by row. The additional power seems to come from the fact that during a computation, OTAs label positions of the pictures by states and this labelling depends not only on the current symbol, but also on the state labels of the upper and left neighbours (i.e., OTAs are special versions of cellular automata). This means that information can be passed from top to bottom in every single column, whereas BFA can only accumulate information of a whole row. Notice that, when we remove this option from the way OTAs work, we arrive at a model that is possibly even closer to ours, the only difference being the way images are scanned. Clearly, diagonal scans can (now) detect diagonal lines, but now there is no way to detect vertical or horizontal ones, as would be the case for RML or BFA.

Conversely, we have seen that diagonals cannot be detected by neither RML nor BFA. Possibly, a more thorough study of different scanning schemes from the point of view of the (typical) classes of images that can be accepted would lead to new insights telling how images should be scanned by computers in practice.

The complexity status of variants of the membership problem for NFAs and DFAs is well understood. As we have seen, by the very definition of the work of BFAs and BDFAs, these results immediately transfer to this type of picture processing automata, as well. We have made this explicit above for the uniform membership problem, but the same observation also holds for the fixed membership problem. This should also enable practical implementations in future works. However, several other interesting problems appear to be much harder for picture automata, compared to their string-processing counterparts. Whether dimensionality is the core problem is still a matter of investigation.

It might be interesting to enhance the power of the automata that scan images. As we have seen, many interesting decidability questions are already pretty hard for finite automata; however, if we now extend our basic models, for example, from finite automata to weak forms of counter automata, we might be able to stay within the same complexity classes for these decision problems, while significantly increasing the usefulness of these models. For instance, we can use counters or linear-type grammars to recognize X-shaped letters (or diagonals).

The easiest way to deal with this might be to think about processing pictures with automata using two or more heads in a synchronized fashion, as already proposed in [5]. From a formal language point of view, using two heads, scanning row by row from left to right and right to left in parallel corresponds to even-linear languages as introduced in [1] and further generalized in [3, 6, 18, 22]. We also mention bio-inspired models of computation that are closely linked to these language classes, as discussed in [11].

We are currently investigating the relationships of BFA languages with other types of picture languages introduced in the literature. We are determining the exact complexity status of the state minimization problem, as well. Also, learnability issues might be of interest, as only very few results are known about learning picture languages, see [19] as one example.