Keywords

1 Introduction

In computer science, trace monoids have been introduced by Mazurkiewiecz [22] as a model of concurrent events, describing which action can permute or not with another action (we give a formal definition of traces and trace monoids in Sect. 2, see also [14] for a treatise on the subject). In combinatorics, they are related to the fundamental studies of the “monoïde partiellement commutatif” introduced by Cartier and Foata in [10], and to its convenient geometrical view as heap of pieces proposed by Viennot in [25].

Several classical problems in language theory (recognition of rational and context-free trace languages, determination of the number of representative words of a given trace, computing the finite state automaton recognizing these words) can be solved by algorithms that work in time and space proportional to (or strictly depending on) the number of prefixes of the input trace [3, 6,7,8, 15, 23]. This is due to the fact that prefixes represent the possible decompositions of a trace in two parts and hence they are natural indexes for computations on traces.

This motivates the analysis of the number of prefixes of a trace of given length both in the worst and in the average case. In the average case analysis, two natural sequences of random variables play a key role:

  • \(\{T_n\}_{n\in \mathbb {N}}\), the number of prefixes of traces of length n generated at random under the equidistribution of traces of given size;

  • \(\{W_n\}_{n\in \mathbb {N}}\), the number of prefixes of traces of length n generated at random under the equidistribution of representative words of given size.

For some families of trace monoids, the asymptotic average, variance, and limit distributions of \(\{T_n\}\) and \(\{W_n\}\) are known [6, 7, 19,20,21]. It is interesting that they rely on the structural properties of an underlying graph (the independency graph, defined in Sect. 2). For example, it is known that, for every trace monoid \(\mathcal {M}\), the maximum number of prefixes of a trace of length n is of the order \(\varTheta (n^{\alpha })\), where \(\alpha \) is the size of the largest clique in the concurrent alphabet defining \(\mathcal {M}\) [8]. We summarize further such results in Sect. 3. In analytic combinatorics (see [17] for an introduction to this field), it remains a nice challenge to get a more universal description of the possible asymptotics of \(T_n\) and \(W_n\).

In this work we prove that, if the concurrent alphabet \((\varSigma ,{\mathcal C})\) admits a transitive orientation, then

$$\begin{aligned} {\text {E}}[T_n] = O({\text {E}}[W_n]).\end{aligned}$$

This is obtained by showing a general property of undirected graphs, which in our context is applied to the concurrent alphabet \((\varSigma ,C)\) and its complement \((\varSigma ,C^c)\). Such a property states that, for any undirected graph \(\mathcal{G}\) admitting a transitive orientation of its edges, the number of connected components of its complement is greater or equal to the multiplicity of the root of smallest modulus in the clique polynomial of \(\mathcal{G}\). The interest for the present discussion mainly relies on the use of finite state automata and on classical tools of formal languages to study properties of integer random variables in particular the asymptotic behaviour of their moments.

The paper is organized as follows: in Sect. 2 we recall the basic definitions on trace monoids; in Sect. 3 we summarize some asymptotic results on the random variables \(T_n\) and \(W_n\); in Sects. 4 and 5, we present our main results on cross-sections of trace monoids, clique polynomials, and a new bound relating the asymptotic behaviour of \(T_n\) and \(W_n\); we then conclude with possible future extensions of our work.

2 Notation and Preliminary Notions

For the reader not already familiar with the terminology of trace languages, we present in this section the key notions used in this article (see e.g. [14] for more details on all these notions).

Given a finite alphabet \(\varSigma \), as usual \(\varSigma ^*\) denotes the free monoid of all words over \(\varSigma \), \(\varepsilon \) is the empty word and |w| is the length of a word w for every \(w\in \varSigma ^*\). We recall that, for any \(w\in \varSigma ^*\), a prefix of w is a word \(u\in \varSigma ^*\) such that \(w=uv\), for some \(v \in \varSigma ^*\). Also, for any finite set \({\mathcal S}\), we denote by \(\#{\mathcal S}\) the cardinality of \({\mathcal S}\).

A concurrent alphabet is then a pair \((\varSigma ,{\mathcal C})\), where \({\mathcal C}\subseteq \varSigma \times \varSigma \) is a symmetric and irreflexive relation over \(\varSigma \). Such a pair can alternatively be defined by anundirected graph, which we call independency graph, where \(\varSigma \) is the set of nodes and \(\{\{a,b\} \mid (a,b) \in {\mathcal C}\}\) is the set of edges. Its complement \((\varSigma , C^c)\) is called dependency graph. As the notions of concurrent alphabet and independency graph are equivalent, in the sequel we indifferently refer to either of them. Informally, a concurrent alphabet lists the pairs of letters which can commute.

The trace monoid generated by a concurrent alphabet \((\varSigma ,{\mathcal C})\) is defined as the quotient monoid \(\varSigma ^*/ \equiv _{\mathcal C}\), where \(\equiv _{\mathcal C}\) is the smallest congruence extending the equations \(\{ab=ba : (a,b)\in {\mathcal C}\}\), and is denoted by \(\mathcal {M}(\varSigma ,{\mathcal C})\) or simply by \(\mathcal {M}\). Its elements are called traces and its subsets are named trace languages. In other words, a trace is an equivalence class of words with respect to the relation \(\equiv _{\mathcal C}\) given by the reflexive and transitive closure of the binary relation \(\sim _{\mathcal C}\) over \(\varSigma ^*\) such that \(uabv \sim _{\mathcal C}ubav\) for every \((a,b)\in {\mathcal C}\) and every \(u,v \in \varSigma ^*\). For any \(w\in \varSigma ^*\), we denote by [w] the trace represented by w; in particular \([\varepsilon ]\) is the empty trace, i.e. the unit of \(\mathcal {M}\). Note that the product of two traces \(r,s \in \mathcal {M}\), where \(r=[x]\) and \(s=[y]\), is the trace \(t= [xy]\), which does not depend on the representative words \(x,y\in \varSigma ^*\) and we denote the product by \(t=s\cdot r\). The length of a trace \(t \in \mathcal {M}\), denoted by |t|, is the length of any representative word. For any \(n\in \mathbb {N}\), let \(\mathcal {M}_n := \{ t \in \mathcal {M}: |t| = n\}\) and \(m_n:=\# \mathcal {M}_n\).

Note that if \({\mathcal C}=\emptyset \) then \(\mathcal {M}\) reduces to \(\varSigma ^*\), while if \({\mathcal C}=\{(a,b)\in \varSigma \times \varSigma \mid a\ne b\}\) then \(\mathcal {M}\) is the commutative monoid of all monomials with letters in \(\varSigma \).

Any trace \(t \in \mathcal {M}\) can be represented by a partial order over the multiset of letters of t, denoted by \(\text{ PO }(t)\). It works as follows: first, consider a word w satisfying \(t=[w]\). Then, for any pair of letters (ab) of w, let \(a_i\) be the i-th occurrence of the letter a and \(b_j\) the j-th occurrence of the letter b. The partial order is then defined as \(a_i < b_j\) whenever \(a_i\) precedes \(b_j\) in all representative words of [w]. (See Example 1 hereafter.)

A prefix of a trace \(t \in \mathcal {M}\) is a trace p such that \(t=p\cdot s\) for some \(s \in \mathcal {M}\). Clearly, any prefix of t is a trace \(p=[u]\) where u is a prefix of a representative of t. It is easy to see that if p is a prefix of t then the \(\text{ PO }(u)\) is an order ideal of \(\text{ PO }(t)\) and can be represented by the corresponding antichain. We recall that an antichain of a partial order set \(({\mathcal S},\le )\) is a subset \(A\subseteq {\mathcal S}\) such that \(a\le b\) does not hold for any pair of distinct elements \(a,b\in A\), while an order ideal in \(({\mathcal S},\le )\) is a subset \(\{a\in {\mathcal S}\mid \exists \ b\in A \text{ such } \text{ that } a \le b\}\) for some antichain A of \(({\mathcal S},\le )\). For every \(t \in \mathcal {M}\), we denote by \({\text {Pref}}(t)\) the set of all prefixes of t.

Example 1

Let \(\mathcal {M}\) be the trace monoid characterized by the following independency graph:

figure a

That is, one has \({ab=ba, bc=cb, cd=dc}\). Then, the trace [bacda] (i.e., the equivalence class of the word bacda) is the set of words \(\{bacda, badca, abdca, abcda, acbda \}\). The corresponding partially ordered set is given by the following diagram

figure b

where an arrow from \(x_i\) to \(y_j\) means that \(x_i\) always precedes \(y_j\) and where we omitted the arrows implied by transitivity. The set of prefixes is given by

$$\begin{aligned} {\text {Pref}}([bacda]) \ = \{ [\varepsilon ], [{a}],[{b}], [{ab}], [a{c}], [a{bc}], [ab{d}], [ab{cd}], [abcd{a}] \}. \end{aligned}$$

In this set, we now overline the letters belonging to the antichain of each prefix: \(\{ [\varepsilon ], [\overline{a}],[\overline{b}], [\overline{ab}], [a\overline{c}], [a\overline{bc}], [ab\overline{d}], [ab\overline{cd}], [abcd\overline{a}] \}\). \(\blacksquare \)

Recognizable, rational and context-free trace languages are well defined by means of linearization and closure operations over traditional string languages; their properties and in particular the complexity of their membership problems are widely studied in the literature (see for instance [8, 14, 15, 23]).

For any alphabet \(\varSigma \) and trace monoid \(\mathcal {M}\), we denote by \({\mathbb {Z}\langle \!\langle \varSigma \rangle \!\rangle }\) the set of formal series on words (they are thus series in noncommutative variables) and by \({\mathbb {Z}\langle \!\langle \mathcal {M}\rangle \!\rangle }\) the set of formal series on traces (they are thus series in partially commutative variables), and \({\mathbb {Z}[\![z]\!]}\) stands for ring of classical power series in the variable z. These three distinct rings (with the operations of sum and Cauchy product, see [5, 14, 24]) will be used in Sects. 4 and 5.

3 Asymptotic Results for the Number of Prefixes

Several algorithms are presented in the literature for the recognition of rational and context-free trace languages, or for other problems like computing the number of representative words of a trace, that take a trace t as input and then carry out some operations on all prefixes of t [3, 6,7,8,9, 15, 23]. Thus, their time and space complexity strictly depend on the number of prefixes of t and in many cases they work just in time \(\varTheta (\#{\text {Pref}}(t))\). Now, it follows from [8] that

$$\begin{aligned} \max \{\#{\text {Pref}}(t) : t \in \mathcal {M}_n\} = \varTheta (n^{\alpha }), \end{aligned}$$
(1)

where \(\alpha \) is the size of the largest clique in the independency graph of \(\mathcal {M}\). It is thus essential to get a more refined analysis of the asymptotic behaviour of \(\#{\text {Pref}}(t)\) under natural distribution models in order to obtain a better understanding of the average complexity of all these algorithms.

In this section, we recall the main results on the number of prefixes of a random trace, under two different probabilistic models.

3.1 Probabilistic Analysis on Equiprobable Words

A main goal of the present contribution is to compare the random variables \(T_n\) and \(W_n\), defined by

$$\begin{aligned} T_n = \#{\text {Pref}}(t)\quad \text {and} \quad W_n = \#{\text {Pref}}([w]), \end{aligned}$$
(2)

where t is uniformly distributed over \(\mathcal {M}_n\), while w is uniformly distributed over \(\varSigma ^n\). Clearly the properties of \(T_n\) and \(W_n\) immediately yield results on time complexity of the algorithms described in [3, 6, 7] assuming, respectively, equiprobable input traces of length n and equiprobable representative words of length n. Since every trace of length n has at least \(n+1\) prefixes, a first crude asymptotic bound is

$$\begin{aligned} n+1 \le T_n \le dn^{\alpha }, \quad n+1 \le W_n \le dn^{\alpha } \qquad (\forall \ n\in \mathbb {N}), \end{aligned}$$

for a suitable constant \(d>0\), where \(\alpha \) is defined as in (1). More precise results on the moments of \(W_n\) are studied in [6, 7, 20]:

$$\begin{aligned} {\text {E}}[W_n^j] = \varTheta (n^{jk})\ \quad \forall \ j \in \mathbb {N}, \end{aligned}$$
(3)

where k is the number of connected components of the dependency graph of \(\mathcal {M}\). This relation is obtained by constructing suitable bijections between each moment of \(W_n\) and the set of words of length n in a regular language [6]. These bijections also allow proving a first order cancellation of the variance, i.e. \(\text{ var }(W_n) = O(n^{2k-1})\) [20]. Further, when the dependency graph is transitive, this leads to two different limit laws, either chi-squared or Gaussian, according whether all the connected components of \((\varSigma ,C^c)\) have the same size or not [19].

3.2 Probabilistic Analysis on Equiprobable Traces

Now, in order to analyse \(T_n\) (the number of prefixes of a random trace of size n), it is useful to introduce the generating function of the trace monoid \(\mathcal {M}\):

$$\begin{aligned} M(z):=\sum _{n\in \mathbb {N}} m_n z^n, \qquad \text {with }m_n:=\#\mathcal {M}_n = \#\{t\in \mathcal {M}: |t| = n\}. \end{aligned}$$

The Möbius function of \(\mathcal {M}\) is defined as \(\mu _{\mathcal {M}} :=\sum _{t\in \mathcal {M}} \mu _{\mathcal {M}} (t) \, t\), where

$$\begin{aligned} \mu _{\mathcal {M}} (t) = \left\{ \begin{array}{ll} 1 &{} \text { if }t=[\varepsilon ], \\ (-1)^n &{} \text { if }t=[a_1a_2\cdots a_n], \\ &{} \text { where all }a_i\in \varSigma \text { are distinct and } (a_i,a_j)\in {\mathcal C}\text { for any }i\ne j, \\ 0 &{} \text { otherwise.} \end{array} \right. \ \end{aligned}$$

It is in fact a polynomial belonging to \({\mathbb {Z}\langle \!\langle \mathcal {M}\rangle \!\rangle }\). As established by Cartier and Foata in [10], an important property of \(\mu _{\mathcal {M}}\) is that

$$\begin{aligned} \xi _{\mathcal {M}} \cdot \mu _{\mathcal {M}} = \mu _{\mathcal {M}}\cdot \xi _{\mathcal {M}} = 1, \end{aligned}$$
(4)

where \(\xi _{\mathcal {M}} = \sum _{t \in \mathcal {M}} t \) is the characteristic series of \(\mathcal {M}\). Here, \(\xi _\mathcal {M}\) can be seen as a partially commutative analogue of M(z).

Now, let \(p_{\mathcal {M}}\in \mathbb {Z}[z]\) be the commutative analogue of \(\mu _{\mathcal {M}}\). It then follows that

$$\begin{aligned} p_{\mathcal {M}} (z) = 1 - c_1z + c_2 z^2 - \cdots + (-1)^{\alpha }c_{\alpha }z^{\alpha }, \end{aligned}$$
(5)

where \(c_i\) is the number of cliques of size i in the independency graph of \(\mathcal {M}\). For this reason, we call \(p_{\mathcal {M}}\) the clique polynomial of the independency graph \((\varSigma ,C)\). Its properties are studied in several papers (see for instance [18, 21]). In particular, the commutative analogue of Eq. (4) is then

$$\begin{aligned} M(z) \cdot p_{\mathcal {M}}(z) = p_{\mathcal {M}}(z) \cdot M(z) = 1. \end{aligned}$$
(6)

This entails that \(M(z)=\left( p_{\mathcal {M}}(z)\right) ^{-1}\), a fundamental identity which can also be derived by an inclusion-exclusion principle.

As it is known from [21] that \(p_{\mathcal {M}}\) has a unique root \(\rho \) of smallest modulus (and clearly \(\rho >0\) via Pringsheim’s theorem, see [17]), one gets \(m_n=\#\mathcal {M}_n = c \rho ^{-n} n^{\ell -1} + O\left( \rho ^{-n}n^{\ell -2}\right) \), where \(c>0\) is a constant and \(\ell \) is the multiplicity of \(\rho \) in \(p_{\mathcal {M}}(z)\). We observe that the existence of a unique root of smallest modulus for \(p_{\mathcal {M}}(z)\) is not a consequence of the strict monotonicity of the sequence \(\{m_n\}\). Indeed, if one considers \(M(z)=\frac{1}{(1-z^3)(1-z)^2}\), one has \(m_{n+3}= ((n + 5) m_n + 2 m_{n + 1} + 2 m_{n + 2})/(n + 3)\) so the sequence \(\{m_n\}\) is strictly increasing; however, the polynomial \((1-z^3)(1-z)^2\) has 3 distinct roots of smallest modulus. Therefore, such a M(z) cannot be the generating function of a trace monoid.

In our context, clique polynomials are particularly relevant as they are related to the average value of the number of prefixes of traces [7, 21]. In fact, for any trace monoid \(\mathcal {M}\), we have \({\text {E}}[T_n] = \frac{P_n}{m_n}\), where \(P_n = \sum _{t\in \mathcal {M}_n}\#{\text {Pref}}(t) \). Since \(\xi _{\mathcal {M}}^2 = \sum _{t\in \mathcal {M}} \#{\text {Pref}}(t) t\), from (4) and (6) its commutative analogue becomes \(\sum _n P_n z^n = p_{\mathcal {M}}(z)^{-2}\) and hence \(P_n = \varTheta (\rho ^{-n} n^{2\ell -1})\), which proves

$$\begin{aligned} {\text {E}}[T_n] = \varTheta (n^{\ell }), \end{aligned}$$
(7)

where \(\ell \) is the multiplicity of the smallest root of \(p_{\mathcal {M}}(z)\).

4 Cross-Sections of Trace Monoids

Cross-sections are standard tools to study the properties of trace monoids by lifting the analysis at the level of free monoids. Intuitively, a cross-section of a trace monoid \(\mathcal {M}\) is a language \({\mathcal L}\) having exactly one representative string for each trace in \(\mathcal {M}\). Thus, the generating function of \({\mathcal L}\) coincides with M(z) and hence it satisfies equality (6). As a consequence, by choosing an appropriate regular cross-section \({\mathcal L}\), one can use the property of a finite state automaton recognizing \({\mathcal L}\) to study the singularities of M(z), i.e. the roots of \(p_{\mathcal {M}}(z)\).

Formally, a cross-section of a trace monoid \(\mathcal {M}\) over a concurrent alphabet \((\varSigma ,C)\) is a language \({\mathcal L}\subseteq \varSigma ^*\) such that

  • for each trace \(t\in \mathcal {M}\), there exists a word \(w\in {\mathcal L}\) such that \(t=[w]\),

  • for each pair of words \(x, y \in {\mathcal L}\), if \([x]=[y]\) then \(x=y\).

Among all cross-sections of \(\mathcal {M}\), it is convenient to consider a canonical one. A natural one is based on a normal form using the lexicographic order [1]. Alternatively, one can see it as based on the orientations of edges in the independency graph of \(\mathcal {M}\), as used in [12, 13] to study properties of Möbius functions in trace monoids. It works as follows. Let \(\le \) be any total order on the alphabet \(\varSigma \) and let \(\le ^*\) be the lexicographic linear order induced by \(\le \) over \(\varSigma ^*\). We denote by \(<_{\mathcal C}\) the binary relation over \(\varSigma \) such that \(a<_{\mathcal C}b\) if \((a,b)\in {\mathcal C}\) and \(a\le b\). Thus, \(<_{\mathcal C}\) is an orientation of the independency graph of \(\mathcal {M}\). We now consider the following cross-section of \(\mathcal {M}\): the language \({\mathcal L}_{\le }\) of all minimal lexicographic representatives of traces in \(\mathcal {M}\), i.e. \({\mathcal L}_{\le } = \{w\in \varSigma ^* \mid w \le ^* y \text{ for } \text{ every } y\in [w] \} \). Moreover, \({\mathcal L}_{\le }\) is regular, as it satisfies the equality

$$\begin{aligned} {\mathcal L}_{\le } = \varSigma ^* \backslash \bigcup _{\begin{array}{c} (a,b)\in {\mathcal C}\\ a<_{\mathcal C}b \end{array}} \varSigma ^* b\, {\mathcal C}_a^*\, a \, \varSigma ^*, \end{aligned}$$
(8)

where \({\mathcal C}_a := \{c \in \varSigma \mid (a,c)\in {\mathcal C}\}\) is the set of letters allowed to commute with a. Thus, \({\mathcal L}_{\le }\) is the set of all words in \(\varSigma ^*\) that do not contain any factor of the form bva where \(a<_{\mathcal C}b\) and \(v\in {\mathcal C}_a^*\). Then, for any \(w\in \varSigma ^*\), in order to verify whether \(w\in {\mathcal L}_{\le }\), one can read the letters of w in their order, updating at each step the family of letters \(a\in \varSigma \) forming a “forbidden” factor of the form bva, with \(a<_{\mathcal C}b\), \(v\in {\mathcal C}_a^*\). If one of these letters is met then w is rejected, otherwise it is accepted.

To formalize the definition, for each \(b\in \varSigma \), the predecessors of b are \({\text {Pred}}(b)=\{a\in \varSigma \mid a <_{\mathcal C}b \}\). Define the finite state automaton \(\mathcal{A}\) as the 4-tuple \((2^{\varSigma },\emptyset ,\delta ,F)\), where the set of states is \(2^{\varSigma }\), i.e. the power set of \(\varSigma \), the initial state is the empty set \(\emptyset \), \(F=\{{\mathcal S}\in 2^{\varSigma }\mid {\mathcal S}\ne \varSigma \}\) is the family of final states and the transition function \(\delta :(2^{\varSigma }\times \varSigma ) \rightarrow 2^{\varSigma }\) is given by

$$\begin{aligned} \delta ({\mathcal S},b) = \left\{ \begin{array}{ll} \varSigma &{} \text{ if } b\in {\mathcal S}\\ {\text {Pred}}(b) \cup ({\mathcal S}\cap {\mathcal C}_{b}) &{} \text{ otherwise } \end{array} \right. \qquad (\forall \ {\mathcal S}\subseteq \varSigma , \ \forall \ b \in \varSigma ). \end{aligned}$$

Note that, during a computation, the current state \({\mathcal S}\) represents the set of forbidden letters. At the beginning, all input letters are allowed, as \(\emptyset \) is the initial state, while \(\varSigma \) is a trap state, where all letters are forbidden. In a general step, if \({\mathcal S}\subseteq \varSigma \) is the current state and \(b\notin {\mathcal S}\) is an input letter, the new set of forbidden letters must be obtained from \({\mathcal S}\cup {\text {Pred}}(b)\) by removing those elements that do not commute with b. This justifies the above definition of \(\delta \) and it is clear that \(\mathcal A\) recognizes \({\mathcal L}_{\le }\).

Moreover, the state set of the above automaton can be reduced to the states \({\mathcal S}\subsetneq \varSigma \) reachable from \(\emptyset \). Setting

$$\begin{aligned} Q=\{ {\mathcal S}\subseteq \varSigma \mid {\mathcal S}\ne \varSigma , \exists \ w\in \varSigma ^* : \delta (\emptyset ,w)={\mathcal S}\}, \end{aligned}$$

the entries of the transition matrix \(\widetilde{A}\) of the automaton \(\mathcal A\) are given by:

$$\begin{aligned} \widetilde{A}_{{\mathcal S},{\mathcal S}'} = \sum _{b \in \varSigma : \delta ({\mathcal S},b)={\mathcal S}'} b \qquad (\forall \ {\mathcal S},{\mathcal S}'\in Q). \end{aligned}$$

The commutative analogue in \(\mathbb {N}[[z]]\) of this transition matrix has therefore all its entries which are monomials of degree one in z. Factorizing by z, this commutative analogue can thus be written zA, for a matrix A we call the adjacency matrix of \({\mathcal A}\). Note that A strictly depends on both the concurrent alphabet \((\varSigma ,C)\) and the total order \(\le \) over \(\varSigma \).

As a consequence, since \(\mathcal A\) recognizes a cross-section of \(\mathcal {M}\), denoting by \(\pi \) and \(\eta \), respectively, the characteristic (column) vectors of \(\emptyset \) and Q, the generating function M(z) is given by

$$\begin{aligned} M(z) = \sum _{n=0}^{+\infty } \pi ' A^n \eta z^n = \pi ' (I-zA)^{-1}\eta , \end{aligned}$$
(9)

where I is the identity matrix of size \(\#Q\times \#Q\) and \(\pi '\) is the transposed of \(\pi \). This identity, together with relation (6) proves the following proposition.

Proposition 1 (Factorisation property)

For any trace monoid \(\mathcal {M}\) with a concurrent alphabet \((\varSigma ,{\mathcal C})\), let \(\le \) be a total order on \(\varSigma \), let A be the adjacency matrix of the automaton \(\mathcal A\) recognizing the cross-section \({\mathcal L}_{\le }\) of \(\mathcal {M}\), and assume I, \(\pi \) and \(\eta \) defined as in (9). Then, M(z) and \(p_{\mathcal {M}}(z)\) satisfy the identities

$$\begin{aligned} M(z) = \pi ' (I-zA)^{-1}\eta = \frac{\pi ' {\text {adj}}(I-zA)\eta }{\det (I-zA)}\ ,\ p_{\mathcal {M}}(z) = \frac{\det (I-zA)}{\pi ' {\text {adj}}(I-zA)\eta }. \end{aligned}$$
(10)

Example 2

Consider the concurrent alphabet \((\varSigma ,{\mathcal C})\) defined by the graph

figure c

Then, the clique polynomial and the generating function of \(\mathcal {M}\) are given by

$$\begin{aligned} p_{\mathcal {M}}(z) = 1-5z+6z^2 - z^3, \quad M(z) = \sum _{n=0}^{+\infty } m_n z^n = \frac{1}{1-5z+6z^2 - z^3}. \end{aligned}$$

The standard ordering (abcde) on \(\varSigma \) induces the following (non-transitive) orientation \(<_{\mathcal C}\) over the independency graph

figure d

Thus the predecessors of each letter are given by \({\text {Pred}}(a)={\text {Pred}}(b)=\emptyset \), \({\text {Pred}}(c)={\text {Pred}}(d)=\{a,b\}\), \({\text {Pred}}(e)=\{b,c\}\) and the transition matrix of \(\mathcal A\) is defined by the following table, where rows and columns are labelled by the states of \(\mathcal A\):

  

\(\emptyset \)

\(\{a,b\}\)

\(\{b,c\}\)

\(\{c\}\)

 

\(\emptyset \)

\(a+b\)

\(c+d\)

e

0

\(\widetilde{A} \ = \)

\(\{a,b\}\)

0

\(c+d\)

e

0

 

\(\{b,c\}\)

0

d

e

a

 

\(\{c\}\)

0

d

e

\(a+b\)

From that \(I-zA\) is easily computed (where A is the adjacency matrix of \({\mathcal A}\)):

$$\begin{aligned} I-zA = \left[ \begin{array}{cccc} \ 1-2z &{} -2z &{} -z &{} 0 \\ 0 &{} \ 1-2z &{} -z &{} 0 \\ 0 &{} -z &{} \ 1-z &{} -z \\ 0 &{} -z &{} -z &{} \ 1-2z\ \\ \end{array} \right] \end{aligned}$$

and, accordingly, \(\det (I-zA) = 1-7z+16z^2-13z^3+2z^4 = (1-2z) p_{\mathcal {M}}(z)\). \(\blacksquare \)

Proposition 2

For any trace monoid \(\mathcal {M}\) over a concurrent alphabet \((\varSigma ,C)\) and any total order \(\le \) on \(\varSigma \), all roots of the clique polynomial \(p_{\mathcal {M}}(z)\) are reciprocals of eigenvalues of the corresponding adjacency matrix A. More precisely, the clique polynomial of any independency graph \((\varSigma ,{\mathcal C})\) is of the form

$$\begin{aligned} p_{\mathcal {M}}(z) = \prod _{i=1}^{\alpha }(1-x_i z) \end{aligned}$$

where \(\alpha \) is the size of the maximum clique in \((\varSigma ,{\mathcal C})\) and all \(x_i\)’s are eigenvalues of a adjacency matrix A.

Proof (sketch)

The result follows from Proposition 1 by refining equalities (10) and recalling that all roots of clique polynomials are different from 0. \(\square \)

We observe that the reverse property does not hold in general, i.e. it may occur that an eigenvalue of A is not the reciprocal of a root of \(p_{\mathcal {M}}(z)\). However, as shown in the following section, such a reverse sentence is true whenever the graph \((\varSigma ,{\mathcal C})\) admits a transitive orientation.

5 Concurrent Alphabets with Transitive Orientation

Now let us consider a trace monoid \(\mathcal {M}\) such that its independency graph \((\varSigma ,C)\) admits a transitive orientation. Then, we may fix a total order \(\le \) on \(\varSigma \) such that \(<_{\mathcal C}\) is transitive. In this case, the definition of cross-section \({\mathcal L}_{\le }\) and of the automaton \(\mathcal A\) can be simplified, since the set of “forbidden” factors of the form bwa, with \(a<_{\mathcal C}b\) and \(w\in {\mathcal C}_a^*\), can be reduced to the simple set of words \({\mathcal S}= \{\tau \sigma \in \varSigma ^2 \mid \sigma <_{\mathcal C}\tau \}\). To prove this property, consider a forbidden factor of the above form bwa, with \(a<_{\mathcal C}b\) and \(w\in {\mathcal C}_a^*\); thus any symbol c occurring in w must verify \((a,c)\in {\mathcal C}\). As a consequence, either \(a<_{\mathcal C}c\) or \(c <_{\mathcal C}a\): in the first case ca belongs to \({\mathcal S}\) while, in the second case, by transitivity of \(<_{\mathcal C}\) we have \(c <_{\mathcal C}b\) and hence bc is in \({\mathcal S}\).

Thus, identity (8) can be simplified as \({\mathcal L}_{\le } = \varSigma ^* \backslash \bigcup _{a<_{\mathcal C}b} \varSigma ^* b a \varSigma ^*.\) Moreover, the state set of the automaton \(\mathcal A\) can be reduced to \(Q = \{{\text {Pred}}(a)\mid a \in \varSigma \}\) and the transition function now assumes values \(\delta ({\mathcal S},b)={\text {Pred}}(b)\), for every \({\mathcal S}\in Q\) and every \(b\in \varSigma \backslash {\mathcal S}\).

Proposition 3

Let \((\varSigma ,{\mathcal C})\) be a concurrent alphabet with an associated independency graph admitting a transitive orientation \(<_{\mathcal C}\). Let \(\le \) be a total order on \(\varSigma \) extending \(<_{\mathcal C}\). Also assume that the dependency graph \((\varSigma ,{\mathcal C}^c)\) is connected. Then the adjacency matrix A is primitive.

Proof (sketch)

Under these hypotheses, by the simplifications above, it turns out that the state diagram of the automaton \(\mathcal A\) (defined by \(\le \)) is strongly connected and has at least one loop. \(\square \)

The hypothesis of transitivity for \(<_{\mathcal C}\) cannot be avoided to guarantee that A is primitive. For instance, in Example 2 the dependency graph \((\varSigma ,{\mathcal C}^c)\) is connected but the orientation \(<_{\mathcal C}\) of \((\varSigma ,{\mathcal C})\) is not transitive, and in fact observe that the corresponding transition matrix is not irreducible and hence A is not primitive. Nevertheless, the smallest root of \(p_{\mathcal {M}}(z)\) is simple and then the same concurrent alphabet satisfies the following theorem.

Theorem 4

Let \((\varSigma ,{\mathcal C})\) be a concurrent alphabet. If its independency graph admits a transitive orientation \(<_{\mathcal C}\), then one has \(\ell \le k\), where \(\ell \) and k denote, respectively, the multiplicity of the smallest root of \(p_{\mathcal {M}}(z)\) and the number of connected components of the dependency graph \((\varSigma ,{\mathcal C}^c)\).

Proof (sketch)

First, it is well-known [18, 21] that \(p_{\mathcal {M}}(z)\) is always the product of the clique polynomials of all independency subalphabets given by the connected components of \((\varSigma ,C^c)\). Then, each of these clique polynomials (using the additional condition that one has a transitive orientation) has a smallest root of multiplicity 1: this follows from Proposition 3 and a commutative analogue of a result in [11] stating that, when \((\varSigma ,C)\) has a transitive orientation, its clique polynomial equals \(\det (I-zA)\). \(\square \)

Applying the previous theorem to relations (3) and (7), one gets the following.

Theorem 5

Let \((\varSigma ,{\mathcal C})\) be a concurrent alphabet. If its independency graph admits a transitive orientation \(<_{\mathcal C}\), then the random variables counting the number of prefixes in traces (as defined in (2)) satisfy \({\text {E}}[T_n] = O({\text {E}}[W_n])\).

Example 3

Consider the concurrent alphabet \((\varSigma ,{\mathcal C})\) and the orientation \(<_{\mathcal C}\) of Example 2. Note that \((\varSigma ,{\mathcal C})\) is connected but \(<_{\mathcal C}\) is not transitive and in fact A is not primitive. However, \((\varSigma ,{\mathcal C})\) admits a (different) orientation that is transitive, given by

figure e

A total order extending the previous orientation is \(c<d<a<e<b\). Computing matrix A with respect to this total order we obtain

$$\begin{aligned} I-zA = \left[ \begin{array}{cccc} \ 1-2z&{} -z &{} -z &{} -z \\ 0 &{} \ 1-z&{} -z &{} -z \\ -z &{} -z &{} \ 1-z &{} -z \\ 0 &{} -z &{} 0 &{} \ 1-z\ \\ \end{array} \right] , \end{aligned}$$

and hence \(\det (I-zA) = 1-5z+6z^2-z^3 = p_{\mathcal {M}}(z)\). \(\blacksquare \)

The following example considers an independency graph of \(\mathcal {M}\) that does not admit any transitive orientation. In this case \(p_{\mathcal {M}}(z)\) is a proper factor of \(\det (I-zA)\), but its smallest root is again simple and hence \(\ell \le k\) is still true even if the hypothesis of Theorem 4 is not satisfied.

Example 4

Consider the concurrent alphabet corresponding to the following independency graph \({\mathcal G}\), associated to the following partial order \(<_{\mathcal C}\):

figure f

Thus the transition matrix, defined according to Sect. 4, is given by the following table:

  

\(\emptyset \)

\( \{a,b\} \)

\( \{a\} \)

\(\{b,d\}\)

\(\{d\}\)

\(\widetilde{A} \ = \)

\(\emptyset \)

\(a+b\)

c

d

e

0

\(\{a,b\} \)

0

c

d

e

0

\(\{a\} \)

b

c

d

e

0

\(\{b,d\}\)

0

c

0

e

a

\(\{d\}\)

b

c

0

e

a

Accordingly, one has \(\det (I-zA) =1-6z+10z^2-5z^3 = (1-z) p_{\mathcal {M}}(z).\) \(\blacksquare \)

6 Conclusion

We have investigated the fundamental role played by the clique polynomial in asymptotic studies of trace monoids. Building on the factorization property (stated in Proposition 1), we got a link between the multiplicity of its smallest root and the number of connected components of some associated graph (Theorem 4). This, in turn, is the key for a new asymptotic relation between the number of prefixes in traces of length n: \({\text {E}}[T_n] = O({\text {E}}[W_n])\) (Theorem 5), where \(T_n\) and \(W_n\) correspond to two natural models (uniform distribution over traces and over words). In the long version of this article, we plan to extend these analyses to more general cases (including concurrent alphabets without transitive orientation).

Several other problems remain open in our context and could be at the centre of future investigations. The first one concerns the adjacency matrix A defined in Sect. 4, which does not seem to be studied too much in the previous literature; in particular, in all our examples \(\text{ det }(I-zA)\) is a clique polynomial, even when the concurrent alphabet \((\varSigma ,{\mathcal C})\) does not admit any transitive orientation. For this purpose, similarly to the approach used in [11] and in our proof of Theorem 4, it is possible to adapt a noncommutative approach building on links to words with forbidden patterns (see [2]). We plan to use these links to tackle the asymptotic behaviour of the variance and higher moments of \(\{T_n\}\), and the limit distributions of both \(\{T_n\}\) and \(\{W_n\}\) for all trace monoids.

In conclusion, all these studies are further illustration of the nice interplay between complex analysis (analytic combinatorics) and the structural properties of formal languages, as also illustrated e.g. in [4, 5, 16, 17, 19, 20].