Keywords

1 Introduction

The state complexity, as used here, of a regular language L is the minimal number of states needed in a complete deterministic automaton recognizing L. The state complexity of an operation on regular languages is the greatest state complexity of the result of this operation as a function of the (maximal) state complexities of its arguments.

Investigating the state complexity of the result of a regularity-preserving operation on regular languages, see [7] for a survey, was first initiated by Maslov in [20] and systematically started by Yu, Zhuang and Salomaa in [27].

A language is called commutative, if for each word in the language, every permutation of this word is also in the language. The class of commutative automata, which recognize commutative regular languages, was introduced in [2].

The shuffle and iterated shuffle have been introduced and studied to understand the semantics of parallel programs. This was undertaken, as it appears to be, independently by Campbell and Habermann [3], by Mazurkiewicz [22] and by Shaw [25]. They introduced flow expressions, which allow for sequential operators (catenation and iterated catenation) as well as for parallel operators (shuffle and iterated shuffle) to specify sequential and parallel execution traces.

The shuffle operation as a binary operation, but not the iterated shuffle, is regularity-preserving on all regular languages. The state complexity of the shuffle operation in the general cases was investigated in [1] for complete deterministic automata and in [4] for incomplete deterministic automata. The bound \(2^{nm-1} + 2^{(m-1)(n-1)}(2^{m-1}-1)(2^{n-1}-1)\) was obtained in the former case, which is not known to be tight, and the tight bound \(2^{nm}-1\) in the latter case.

A word is a (scattered) subsequence of another word, if it can be obtained from the latter word by deleting letters. This gives a partial order, and the upward and downward closure and interior operations refer to this partial order. The upward closures are also known as shuffle ideals. The state complexity of these operations was investigated in [11,12,13, 19, 23]

The state complexity of the projection operation was investigated in [17, 18, 26]. In [26], the tight upper bound \(3 \cdot 2^{n-2} - 1\) was shown, and in [18] the refined, and tight, bound \(2^{n-1} + 2^{n-m} - 1\) was shown, where m is related to the number of unobservable transitions for the projection operator. Both results were established for incomplete deterministic automata.

In [14,15,16,17] the state complexity of these operations was investigated for commutative regular languages. The results are summarized in Table 1.

Table 1. Overview of results for commutative regular languages. The state complexities of the input languages are n and m. Also, \(f(n,m) = 2^{nm-1} + 2^{(m-1)(n-1)}(2^{m-1}-1)(2^{n-1}-1)\) is the general bound for shuffle from [1] in case of complete automata.
Table 2. State complexity results on the subclass of commutative languages with product-form minimal automaton for input languages with state complexities n and m.

In [8] the minimal commutative automaton was introduced, which can be associated with every commutative regular language. This automaton played a crucial role in [14, 15] to derive the bounds mentioned in Table 1. Here, we will investigate the subclass of those language for which the minimal commutative automaton is in fact the smallest automaton recognizing a given commutative language. For this language class, we will derive the following state complexity bounds summarized in Table 2. Additionally, we will prove other characterizations and properties of the subclass considered and relate it with other subclasses, in a more general setting, in the final chapter.

2 Preliminaries

In this section and Sect. 3, we assume that \(k \geqslant 0\) denotes our alphabet size and \(\varSigma = \{a_1, \ldots , a_k\}\) is our alphabet. We will also write abc for \(a_1,a_2,a_3\) in case of \(|\varSigma | \leqslant 3\). The set \(\varSigma ^{*}\) denotes the set of all finite sequences over \(\varSigma \), i.e., of all words. The finite sequence of length zero, or the empty word, is denoted by \(\varepsilon \). For a given word we denote by |w| its length, and for \(a \in \varSigma \) by \(|w|_a\) the number of occurrences of the symbol a in w. For \(a \in \varSigma \), we set \(a^* = \{a\}^*\). A language is a subset of \(\varSigma ^*\). For \(u \in \varSigma ^*\), the left quotient is \(u^{-1}L = \{ v \in \varSigma ^* \mid uv \in L\}\) and the right quotient is \(Lu^{-1} = \{ v \in \varSigma ^* \mid vu \in L \}\).

The shuffle operation, denoted by , is defined by

figure a

for \(u,v \in \varSigma ^{*}\) and for \(L_1, L_2 \subseteq \varSigma ^{*}\). If \(L_1, \ldots , L_n \subseteq \varSigma ^*\), we set .

Let \(\varGamma \subseteq \varSigma \). The projection homomorphism \(\pi _{\varGamma } : \varSigma ^* \rightarrow \varGamma ^*\) is given by \(\pi _{\varGamma }(x) = x\) for \(x \in \varGamma \) and \(\pi _{\varGamma }(x) = \varepsilon \) for \(x \notin \varGamma \) and extended to \(\varSigma ^*\) by \(\pi _{\varGamma }(\varepsilon ) = \varepsilon \) and \(\pi _{\varGamma }(wx) = \pi _{\varGamma }(w)\pi _{\varGamma }(x)\) for \(w \in \varSigma ^*\) and \(x \in \varSigma \). As a shorthand, we set, with respect to a given naming \(\varSigma = \{a_1, \ldots , a_k\}\), \(\pi _j = \pi _{\{a_j\}}\). Then \(\pi _j(w) = a_j^{|w|_{a_j}}\).

A language \(L \subseteq \varSigma ^*\) is commutative, if, for \(u,v \in \varSigma ^*\) such that \(|v|_x = |u|_x\) for every \(x \in \varSigma \), we have \(u \in L\) if and only if \(v \in L\), i.e., L is closed under permutation of letters in words from L.

A quintuple \(\mathcal A = (\varSigma , Q, \delta , q_0, F)\) is a finite deterministic and complete automaton (DFA), where \(\varSigma \) is the input alphabet, Q the finite set of states, \(q_0 \in Q\) the start state, \(F \subseteq Q\) the set of final states and \(\delta : Q \times \varSigma \rightarrow Q\) is the totally defined state transition function. Here, we do not consider incomplete automata. The transition function \(\delta : Q \times \varSigma \rightarrow Q\) extends to a transition function on words \(\delta ^{*} : Q \times \varSigma ^{*} \rightarrow Q\) by setting \(\delta ^{*}(q, \varepsilon ) := q\) and \(\delta ^{*}(q, wa) := \delta (\delta ^{*}(q, w), a)\) for \(q \in Q\), \(a \in \varSigma \) and \(w \in \varSigma ^{*}\). In the remainder, we drop the distinction between both functions and also denote this extension by \(\delta \). The language recognized by an automaton \(\mathcal A = (\varSigma , Q, \delta , q_0, F)\) is \( L(\mathcal A) = \{ w \in \varSigma ^{*} \mid \delta (q_0, w) \in F \}. \) A language \(L \subseteq \varSigma ^{*}\) is called regular if \(L = L(\mathcal A)\) for some finite automaton \(\mathcal A\).

The Nerode right-congruence with respect to \(L \subseteq \varSigma ^*\) is defined, for \(u,v \in \varSigma ^*\), by \(u \equiv _L v\) if and only if \( \forall x \in \varSigma ^* : ux \in L \Leftrightarrow vx \in L. \) The equivalence class of \(w \in \varSigma ^{*}\) is denoted by \([w]_{\equiv _L} = \{ x \in \varSigma ^{*} \mid x \equiv _L w \}\). A language is regular if and only if the above right-congruence has finite index, and it can be used to define the minimal deterministic automaton \(\mathcal A_L = (\varSigma , Q_L, \delta _L, [\varepsilon ]_{\equiv _L}, F_L)\) with \(Q_L = \{ [u]_{\equiv _L} \mid u \in \varSigma ^{*} \}\), \(\delta _L([w]_{\equiv _L}, a) = [wa]_{\equiv _L}\) and \(F_L = \{ [u]_{\equiv _L} \mid u \in L \}\). Let \(L \subseteq \varSigma ^*\) be regular with minimal automaton \(\mathcal A_L = (\varSigma , Q_L, \delta _L, [\varepsilon ]_{\equiv _L}, F_L)\). The number \(|Q_L|\) is called the state complexity of L and denoted by \({\text {sc}}(L)\). The state complexity of a regularity-preserving operation on a class of regular languages is the greatest state complexity of the result of this operation as a function of the (maximal) state complexities for argument languages from the class.

Fig. 1.
figure 1

The minimal deterministic automaton (left) and the minimal commutative automaton (right) of the language \(\{ w \in \varSigma ^* \mid |w|_a = 0 \text{ or } |w|_b > 0 \}\).

Given two automata \(\mathcal A = (\varSigma , S, \delta , s_0, F)\) and \(\mathcal B = (\varSigma , T, \mu , t_0, E)\), an automaton homomorphism \(h : S \rightarrow T\) is a map between the state sets such that for each \(a \in \varSigma \) and state \(s \in S\) we have \( h(\delta (s, a)) = \mu (h(s),a), \) \(h(s_0) = t_0\) and \(h^{-1}(E) = F\). If \(h : S \rightarrow T\) is surjective, then \(L(\mathcal B) = L(\mathcal A)\). A bijective homomorphism between automata \(\mathcal A\) and \(\mathcal B\) is called an isomorphism, and the two automata are said to be isomorphic.

The minimal commutative automaton was introduced in [8] to investigate the learnability of commutative languages. In [14, 15] this construction was used to define the index and period vector and in the derivation of the state complexity bounds mentioned in Table 1.

Definition 1

(minimal commutative aut.). Let \(L \subseteq \varSigma ^*\) be regular. The minimal commutative automaton for L is \(\mathcal C_L = (\varSigma , S_1 \times \ldots \times S_k, \delta , s_0, F)\) with

$$ S_j = \{ [a_j^m]_{\equiv _L} : m \geqslant 0 \}, \quad F = \{ ([\pi _1(w)]_{\equiv _L}, \ldots , [\pi _k(w)]_{\equiv _L}) : w \in L \} $$

and \(\delta ((s_1, \ldots , s_j, \ldots , s_k), a_j) = (s_1, \ldots , \delta _{j}(s_j, a_j), \ldots , s_k)\) with one-letter transitions \(\delta _{j}([a_j^m]_{\equiv _L}, a_j) = [a_j^{m+1}]_{\equiv _L}\) for \(j = 1,\ldots , k\) and \(s_0 = ([\varepsilon ]_{\equiv _L}, \ldots , [\varepsilon ]_{\equiv _L})\).

In [8], the next result was shown.

Theorem 2

(Gómez and Alvarez [8]). Let \(L \subseteq \varSigma ^*\) be a commutative regular language. Then, \(L = L(\mathcal C_L)\).

In general the minimal commutative automaton is not equal to the minimal deterministic and complete automaton for a regular commutative language L, see Example 1.

Example 1

For \(L = \{ w \in \varSigma ^* \mid |w|_a = 0 \text{ or } |w|_b > 0 \}\) with \(\varSigma = \{a,b\}\) the minimal deterministic and complete automaton and the minimal commutative automaton are not the same, see Fig. 1. This language is from [8]. In fact, the difference can get quite large, as shown by \(L_p = \{ w \in \varSigma ^* \mid \sum _{j=1}^k j\cdot |w|_{a_j} \equiv 0 \pmod {p} \}\) for a prime \(p > k\). Here, \({\text {sc}}(L_p) = p\), but \(\mathcal C_{L_p}\) has \(p^k\) states.

The next definition from [14, 15] generalizes the notion of a cyclic and non-cyclic part for unary automata [24], and the notion of periodic language [6, 14, 15].

Definition 3

(index and period vector). The index vector \((i_1, \ldots , i_k)\) and period vector \((p_1, \ldots , p_k)\) for a commutative regular language \(L \subseteq \varSigma ^*\) with minimal commutative automaton \(\mathcal C_L = (\varSigma , S_1 \times \ldots \times S_k, \delta , s_0, F)\) are the unique minimal numbers such that \(\delta (s_0, a_j^{i_j}) = \delta (s_0, a_j^{i_j + p_j})\) for all \(j \in \{1,\ldots ,k\}\).

Note that, in Definition 3, we have, for all \(j \in \{1,\ldots ,k\}\), \(|S_j| = i_j + p_j\). Also note that for unary languages, i.e., if \(|\varSigma | = 1\), \(\mathcal C_L\) equals \(\mathcal A_L\) and \(i_1 + p_1\) equals the number of states of the minimal automaton.

Example 2

Let . Then \((i_1, i_2) = (0,0)\), \((p_1, p_2) = (4,2)\), \(\pi _1(L) = (a a)^{*}\) and \(\pi _2(L) = b^{*}\).

Let \(u, v \in \varSigma ^*\). Then, u is a subsequenceFootnote 1 of v, denoted by \(u \preccurlyeq v\), if and only if The thereby given order is called the subsequence order. Let \(L \subseteq \varSigma ^*\). Then, we define (1) the upward closure ; (2) the downward closure ; (3) the upward interior, denoted by , as the largest upward-closed set in L, i.e. the largest subset \(U \subseteq L\) such that \(\mathop {\uparrow \!} U = U\) and (4) the downward interior, denoted by , as the largest downward-closed set in L, i.e., the largest subset \(U \subseteq L\) such that \(\mathop {\downarrow \!} U = U\). We have and

The following two results, which will be needed later, are from [14, 15].

Theorem 4

Let \(U,V \subseteq \varSigma ^*\) be commutative regular languages with index and period vectors \((i_1, \ldots , i_k), (j_1, \ldots , j_k)\) and \((p_1, \ldots , p_k), (q_1, \ldots , q_k)\). Then, the index vector of is at most

$$ (i_1 + j_1 + \mathrm {lcm}(p_1, q_1) - 1, \ldots , i_k + j_k + \mathrm {lcm}(p_k,q_k) - 1) $$

and the period vector is at most \( (\mathrm {lcm}(p_1, q_1), \ldots , \mathrm {lcm}(p_k, q_k)). \) So, .

Theorem 5

Let \(\varSigma = \{a_1, \ldots , a_k\}\). Suppose \(L \subseteq \varSigma ^*\) is commutative and regular with index vector \((i_1, \ldots , i_k)\) and period vector \((p_1, \ldots , p_k)\). Then, .

3 Product-Form Minimal Automata

As shown in Example 1, the minimal automaton, in general, does not equal the minimal commutative automaton. Here, we introduce the class of commutative regular languages for which both are isomorphic. The corresponding commutative languages are called languages with a minimal automaton of product-form, as the minimal commutative automaton is built with the Cartesian product.

Definition 6

(languages with product-form minimal automaton). A commutative and regular language \(L \subseteq \varSigma ^*\) is said to have a minimal automaton of product-form, if \(\mathcal C_L\) is isomorphic to \(\mathcal A_L\).

If \(|\varSigma | = 1\), we see easily that \(\mathcal C_L\) is the minimal deterministic and complete automaton.

Proposition 7

If \(|\varSigma | = 1\), then each commutative and regular \(L \subseteq \varSigma ^*\) has a minimal automaton of product-form. More generally, if \(L \subseteq \{a\}^*\), then has a minimal automaton of product-form.

Apart from the unary languages, we give another example of a language with minimal automaton of product-form next.

Example 3

Let over \(\varSigma = \{a,b\}\). See Fig. 2 for the minimal commutative automaton. Here, the minimal commutative automaton equals the minimal automaton.

Fig. 2.
figure 2

\(\mathcal C_L\) for . Here \(\mathcal C_L\) is isomorphic to \(\mathcal A_L\).

However, the next proposition gives a strong necessary criterion for a commutative language to have a minimal automaton of product-form.

Proposition 8

If \(L \subseteq \varSigma ^*\) is commutative and regular with a minimal automaton of product-form, then \(|\{ x \in \varSigma \mid \pi _{\{x\}}(L) \text{ is } \text{ finite } \}| \leqslant 1\). So, \(\pi _{\varGamma }(L)\) is infinite for \(|\varGamma | \geqslant 2\), in particular no finite language over an at least binary alphabet is in this class.

For example, \(L = \{\varepsilon \}\) over \(\varSigma \) does not have a minimal automaton of product-form if \(|\varSigma | > 1\). Recall that the minimal automaton, as defined here, is always complete. Note that the converse of Proposition 8 is not true, as shown by \(aa^*\) over \(\varSigma = \{a,b\}\).

In the following statement, we give alternative characterizations for commutative languages with minimal automata of product-form.

Theorem 9

Let \(L \subseteq \varSigma ^*\) be a commutative regular language with index vector \((i_1, \ldots , i_k)\) and period vector \((p_1, \ldots , p_k)\). The following are equivalent:

  1. 1.

    the minimal automaton has product-form;

  2. 2.

    \({\text {sc}}(L) = \prod _{j=1}^k (i_j + p_j)\);

  3. 3.

    \(u \equiv _L v\) implies \(\forall a \in \varSigma : a^{|u|_a} \equiv _L a^{|v|_a}\);

  4. 4.

    \(u \equiv _L v\) if and only if \(\forall a \in \varSigma : a^{|u|_a} \equiv _L a^{|v|_a}\).

Next, we give a way to construct commutative regular languages with minimal automata of product-form.

Lemma 10

Let \(\varSigma = \{a_1, \ldots , a_k\}\) and, for \(j \in \{1,\ldots ,k\}\), \(L_j \subseteq \{a_j\}^*\) be regular and infinite with index \(i_j\) and period \(p_j\). Then, and has index vector \((i_1, \ldots , i_k)\) and period vector \((p_1, \ldots , p_k)\). With Theorem 9, has a product-form minimal automaton.

In the next theorem and the following remark, we investigate closure properties of the class in question.

Theorem 11

The class of commutative regular languages with minimal automata of product-form is closed under left and right quotients and complementation. It is not closed under union, intersection and projection.

Remark 1

We have , showing, using Proposition 7 and 8, that this class is not closed under intersection and by DeMorgan’s laws, as we have closure under complementation, we also cannot have closure under union. Also, has a minimal automaton of product-form, but is the language from Example 1. So, this class is also not closed under projection.

Theorem 12

Let \(U, V \subseteq \varSigma ^*\) be commutative regular languages with product-form minimal automata with \({\text {sc}}(U) = n\) and \({\text {sc}}(V) = m\).

  1. 1.

    We have and if \(|\varSigma | = 1\). Furthermore, for any \(\varSigma \), there exist UV as above such that .

  2. 2.

    In the worst case, n states are sufficient and necessary for a DFA to recognize \(\uparrow \! U\). Similarly for the downward closure and interior operations.

  3. 3.

    In the worst case, n states are sufficient and necessary for a DFA to recognize the projection of U.

  4. 4.

    In the worst case, nm states are sufficient and necessary for a DFA to recognize \(U \cap V\) or \(U \cup V\).

Remark 2

I do not know if the bound 2nm stated in Theorem 1 for the shuffle operation is tight, but the next example shows that if we have a binary alphabet, we can find commutative languages with state complexities n and m and product-form minimal automata whose shuffle needs an automaton with strictly more than nm states. A similar construction works for more than two letters. Let \(p, q > 11\) be two coprime numbers. Set and . Then, using that shuffle distributes over union and a number-theoretical result from [27, Lemma 5.1], we find

figure b

where \(a^{q-1 + p - 1}(a^p)^* (a^q)^* = F \cup a^{pq - 1}a^*\) for some finite set \(F \subseteq \{\varepsilon , a, \ldots , a^{pq - 3} \}\) and \(W = E \cup b^{pq-1}b^*\) for some \(E \subseteq \{\varepsilon , b, \ldots , b^{pq-3} \}\). Note that by [27, Lemma 5.1] we have . All languages involved have a product-form minimal automaton. The minimal automaton for U has \((2 + p) \cdot (1+p)\) states, the minimal automaton for V has \((1 + q)\cdot (q+2)\) states and that for has \(2pq\cdot (pq+3)\) states. As \((p-11)(q-11) > 0\) we can deduce \((1+p)(2+p)(1+q)(2+q)< 2(pq)^2 < 2pq(pq+3)\).

4 Partial Commutativity and Other Subclasses

A partial commutation on \(\varSigma \) is a symmetric and irreflexive relation \(I \subseteq \varSigma \times \varSigma \), often called the independence relation. Of interest is the congruence \(\sim _I\) generated on \(\varSigma ^*\) by the relation \( \{ (ab, ba) \mid (a,b) \in I \}. \) A language \(L \subseteq \varSigma ^*\) is closed under I-commutation if \(u \in L\) and \(u \sim _I v\) implies \(v \in L\). If \(I = \{ (a,b) \in \varSigma \times \varSigma \mid a \ne b \}\), then the languages closed under I-commutation are precisely the commutative languages.

Languages closed under some partial commutation relation have been extensively studied, see [10], also for further references, and in particular with relation to (Mazurkiewicz) trace theory [5, 10, 21], a formalism to describe the execution histories of concurrent programs.

Here, we will focus on the case that \((\varSigma \times \varSigma ) \setminus I\) is transitive, i.e., if \(u \not \sim _I v\) and \(v \not \sim _I w\) implies \(u \not \sim _I w\). In this case, \((\varSigma \times \varSigma ) \setminus I\) is an equivalence relation and we will write \(\varSigma _1, \ldots , \varSigma _k\) for the different equivalence classes.

The reason to focus on this particular generalization is, as we will see later, that the definition of the minimal commutative automaton transfers to this more general setting without much difficulty.

To ease the notation, if we have a partial commutation relation as above with a corresponding partition \(\varSigma = \varSigma _1 \cup \ldots \varSigma _k\) of the alphabet, we also write \(\mathcal L_{\varSigma _1, \ldots , \varSigma _k}\) for the class of languages closed under this partial commutation. Then, as is easily seen, we have \(L \in \mathcal L_{\varSigma _1, \ldots , \varSigma _k}\) if and only if, for \(x \in \varSigma _i\), \(y \in \varSigma _j\) (\(i \ne j\)) and each \(u, v \in \varSigma ^*\) we have \( uxyv \in L \Leftrightarrow uyxv \in L. \) For example, L is commutative if and only if \(L \in \mathcal L_{\{a_1\}, \ldots , \{a_k\}}\) for \(\varSigma = \{a_1, \ldots , a_k\}\).

4.1 The Canonical Automaton

Here, we generalize our notion of commutative minimal automaton, Definition 1, to have uniform recognition devices for languages in \(\mathcal L_{\varSigma _1,\ldots ,\varSigma _k}\).

Definition 13

Let \(\varSigma = \varSigma _1 \cup \ldots \cup \varSigma _k\) be a partition and \(L \subseteq \varSigma ^*\). Set \(\mathcal C_{L, \varSigma _1, \ldots , \varSigma _k} = (\varSigma , S_1 \times \ldots \times S_k, \delta , s_0, F)\) with, for \(i \in \{1,\ldots , k\}\), \( S_i = \{ [u]_{\equiv _L} \mid u \in \varSigma _i^* \}\), \(F = \{ ([\pi _{\varSigma _1}(u)]_{\equiv _L}, \ldots , [\pi _{\varSigma _k}(u)]_{\equiv _L}) \mid u \in L \}\), \(s_0 = ( [\varepsilon ]_{\equiv _L}, \ldots , [\varepsilon ]_{\equiv _L})\) and, for \(x \in \varSigma _i\),

$$ \delta (([u_1]_{\equiv _L}, \ldots , [u_i]_{\equiv _L}, \ldots , [u_k]_{\equiv _L}), x) = ([u_1]_{\equiv _L}, \ldots , [u_ix]_{\equiv _L}, \ldots , [u_k]_{\equiv _L}) $$

with words \(u_j \in \varSigma _j^*\), \(j \in \{1,\ldots , k\}\). This is called the canonical automaton for the given L with respect to \(\varSigma = \varSigma _1 \cup \ldots \cup \varSigma _k\).

Next, we show that the canonical automata recognize precisely the languages in \(\mathcal L_{\varSigma _1, \ldots , \varSigma _k}\). Note that we have dropped the assumption of regularity of L.

Theorem 14

Let \(L \subseteq \varSigma ^*\) and \(\varSigma = \varSigma _1 \cup \ldots \cup \varSigma _k\) be a partition. Then,

  1. 1.

    \(L \subseteq L(\mathcal C_{L,\varSigma _1,\ldots ,\varSigma _k})\) and \(L(\mathcal C_{L,\varSigma _1,\ldots ,\varSigma _k}) \in \mathcal L_{\varSigma _1, \ldots , \varSigma _k}\).

  2. 2.

    \(L = L(\mathcal C_{L,\varSigma _1,\ldots ,\varSigma _k}) \Leftrightarrow L \in \mathcal L_{\varSigma _1, \ldots , \varSigma _k}\).

  3. 3.

    Let \(L \in \mathcal L_{\varSigma _1,\ldots ,\varSigma _k}\). Then L is regular if and only if \(\mathcal C_{L,\varSigma _1,\ldots ,\varSigma _k}\) is finite.

Also, used in defining a subclass in the next subsection, we will derive a canonical automaton for certain projected languages from \(\mathcal C_{L,\varSigma _1,\ldots ,\varSigma _k}\). Essentially, the next definition and proposition mean that if we only use one “coordinate” of \(\mathcal C_{L,\varSigma _1, \ldots , \varSigma _k}\), then this recognizes a projection of L.

Definition 15

Let \(i \in \{1,\ldots , k\}\) and \(L \in \mathcal L_{\varSigma _1,\ldots ,\varSigma _k}\). The canonical projection automaton (for \(\varSigma _i)\) is \(\mathcal C_{L,\varSigma _i} = (\varSigma _i, S_i, \delta _i, [\varepsilon ]_{\equiv _L}, F_i)\) with \(S_i = \{ [u]_{\equiv _L} \mid u \in \varSigma _i^* \}\), \(\delta _i([u]_{\equiv _L}, x) = [ux]_{\equiv _L} \text{ for } x \in \varSigma _i\) and \(F_i = \{ [\pi _{\varSigma _i}(u)]_{\equiv _L} \mid u \in L \}\).

Proposition 16

Let \(L \in \mathcal L_{\varSigma _1, \ldots , \varSigma _k}\). Then, for \(i \in \{1,\ldots ,k\}\), \(\pi _{\varSigma _i}(L) = L(\mathcal C_{L, \varSigma _i})\).

4.2 Subclasses in \(\mathcal L_{\varSigma _1, \ldots , \varSigma _k}\)

Here, we investigate several subclasses of \(\mathcal L_{\varSigma _1,\ldots , \varSigma _k}\). Recall that, for \(L \subseteq \varSigma ^*\), the minimal automaton of L is denoted by \(\mathcal A_L\).

Definition 17

Let \(\varSigma = \varSigma _1 \cup \ldots \cup \varSigma _k\) be a partition. Then, define the following classes of languages.

First, we show that these are in fact subclasses of \(\mathcal L_{\varSigma _1, \ldots , \varSigma _k}\).

Proposition 18

Let \(\varSigma = \varSigma _1 \cup \ldots \cup \varSigma _k\) be a partition. For each \(i \in \{1,2,3,4\}\) we have \(\mathcal L_i \subseteq \mathcal L_{\varSigma _1, \ldots , \varSigma _k}\).

Remark 3

Regarding \(\mathcal L_1\), note that there exist languages \(L = L(\mathcal C_{L,\varSigma _1, \ldots , \varSigma _k})\) such that the minimal automaton has a single final state, but \(\mathcal C_{L,\varSigma _1, \ldots , \varSigma _k}\) has more than one final state. For example, \(L = \{ w \in \{a,b\}^* \mid |w|_a> 0 \text{ or } |w|_b > 0 \}\). However, if \(\mathcal C_{L,\varSigma _1, \ldots , \varSigma _k}\) has a single final state, then the minimal automata also has only a single final state.

Example 4

Let \(\varSigma = \varSigma _1 \cup \varSigma _2\) with \(\varSigma _1 = \{a\}\) and \(\varSigma _2 = \{b\}\). Set . Then \(L \in (\mathcal L_3 \cap \mathcal L_4) \setminus \mathcal L_2\).

Example 5

Set . Then \(L \in \mathcal L_3 \setminus \mathcal L_4\).

The languages in \(\mathcal L_1\) arise in connection with the canonical automaton.

Proposition 19

Let \(L \in \mathcal L_{\varSigma _1, \ldots , \varSigma _k}\) and \( \mathcal C_{L, \varSigma _1, \ldots , \varSigma _k} = (\varSigma , S_1 \times \ldots \times S_k, \delta , s_0, F). \) Then, for all \(s \in S_1 \times \ldots \times S_k\), \( \{ w \in \varSigma ^* \mid \delta (s_0, w) = s \} \in \mathcal L_1. \)

Next, we give alternative characterization for \(\mathcal L_2, \mathcal L_3\) and \(\mathcal L_4\).

Theorem 20

Let \(L \in \mathcal L_{\varSigma _1, \ldots , \varSigma _k}\). Then,

  1. 1.

    \(L \in \mathcal L_2\) if and only if, for each \(w \in \varSigma ^*\), the following is true:

    $$ w \in L \Leftrightarrow \forall i \in \{1,\ldots ,k\} : \pi _{\varSigma _i}(w) \in \pi _{\varSigma _i}(L); $$
  2. 2.

    \( L \in \mathcal L_3\) if and only if, for all \(i \in \{1,\ldots ,k\}\) and \(u \in \varSigma _i^*\), we have

    $$ [u]_{\equiv _L} \cap \varSigma _i^* = [u]_{\equiv _{\pi _{\varSigma _i}(L)}} \cap \varSigma _i^*; $$
  3. 3.

    \(L \in \mathcal L_4\) if and only if, for each \(u,v \in \varSigma ^*\),

    $$ u \equiv _L v \Leftrightarrow \forall i \in \{1,\ldots ,k\} : \pi _{\varSigma _i}(u) \equiv _L \pi _{\varSigma _i}(v). $$

Example 6

Let \(L_1\) be the language from Example 3. Set . Both of their letters commute for the partition \(\{a_1,a_2\} = \{a_1\} \cup \{a_2\}\). Then, \(L_1 \in \mathcal L_4 \setminus \mathcal L_3\) and \(L_2 \in \mathcal L_1 \setminus \mathcal L_4\).

Finally, in Theorem 21, we establish inclusion relations, which are all proper, between \(\mathcal L_1, \mathcal L_2\) and \(\mathcal L_3\), also see Fig. 3.

Fig. 3.
figure 3

Inclusion relations between the language classes.

Theorem 21

We have \(\mathcal L_1 \subsetneq \mathcal L_2 \subsetneq \mathcal L_3\).

Remark 4

Theorem 21 and Example 6 show that \(\mathcal L_4\) is incomparable to each of the other language classes with respect to inclusion.

5 Conclusion

The language class of commutative regular languages with minimal automata of product-form behaves well with respect to the descriptional complexity measure of state complexity for certain operations, see Table 2, and Lemma 10 allows us to construct infinitely many commutative regular languages with product-form minimal automaton. The investigation started could be carried out for other operations and measures of descriptional complexity as well. Likewise, as done in [8, 9] for commutative and more general partial commutativity conditions, it might be interesting if the learning algorithms given there could be improved for the language class introduced.

Lastly, if the bound 2nm for shuffle is tight is an open problem. Remark 2 shows that the bound nm is not sufficient, however, giving an infinite family of commutative regular languages with minimal automata of product-form attaining the bound 2nm for shuffle is an open problem.