Abstract
We present a non-Gaussian local limit theorem for the number of occurrences of a given symbol in a word of length n generated at random. The stochastic model for the random generation is defined by a rational formal series with non-negative real coefficients. The result yields a local limit towards a uniform density function and holds under the assumption that the formal series defining the model is recognized by a weighted finite state automaton with two primitive components having equal dominant eigenvalue.
Access provided by CONRICYT-eBooks. Download conference paper PDF
Similar content being viewed by others
Keywords
- Local Limit Properties
- Rational Formal Series
- Primitive Components
- Rational Stochastic Model
- Bicomponent Model
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
1 Introduction
A classical counting problem concerning formal languages is the evaluation of the number of occurrences of a given symbol a in a word w of length n generated at random by a suitable stochastic source. Denoting by \(Y_n\) such a random variable, traditional goals of interest are the asymptotic evaluation of mean value and variance of \(Y_n\) as well as its limit distribution and local limit evaluations for its probability function. Clearly these properties depend on the stochastic model used for generating the random text w. Classical models are the Bernoulli and Markovian ones [11, 13]. Here we consider the rational stochastic models, which are defined by rational formal series in non-commutative variables with coefficients in \(\mathbb {R}_+\) [3]. In these models the computation of a random word w can be done easily in linear time [8] once we know a \(\mathbb {R}_+\)-weighted finite state automaton that recognizes the series. Such a probabilistic source is rather general since it includes as special cases the traditional Bernoulli and Markovian models and also encompasses the random generation of words of length n in any regular language under uniform distribution.
The problem above is of interest for several reasons. First, it has been studied in connection with the analysis of pattern statistics and in particular those occurring in computational biology [3, 11,12,13]. It turns out that evaluating the frequency of patterns from a regular expression in a random text generated by a Markovian model can be reduced to determining the frequency of a single symbol in a word over a binary alphabet generated by a rational stochastic model [3, 11].
Moreover, it is well-known that the average number of occurrences of symbols in words of regular and context-free languages plays a relevant role in the analysis of the descriptional complexity of languages and computational models [5, 6]. Clearly the limit distributions of these quantities (also in the local form) yield a more complete information and in particular they allow to evaluate their dispersion around the average values.
Our problem is also related to the asymptotic estimate of the coefficients of rational formal series in commutative variables. In particular the local limit properties of \(Y_n\) enabled to show that the maximum coefficients of monomials of size n for some of those series is of the order \(\varTheta (n^{k-3/2}\lambda ^n)\) for \(\lambda \ge 1\) and any positive \(k\in \mathbb {N}\) [4, Corollary 16]. Similar consequences hold for the study of the degree of ambiguity of rational trace languages (subset of free partially commutative monoids) [3].
The asymptotic behaviour of \(Y_n\) assuming the rational stochastic model defined by a series r depends on the finite state automata that recognize r. The main results known in the literature concern the case when such an automaton has a primitive transition matrix. In this case asymptotic expressions for the mean value \(E(Y_n)\) and the variance \(var(Y_n)\) are known. In particular, there is a real value \(\beta \), \(0< \beta < 1\), such that \(E(Y_n) = \beta n + O(1)\) and a similar result holds for \(var(Y_n)\). Under the same hypothesis it is also proved that \(Y_n\) has a Gaussian limit distribution (i.e. it satisfies a central limit theorem) and it admits local limit properties intuitively stating that its probability function approximates a normal density [3, 4].
The properties of \(Y_n\) have been studied also when the transition matrix consists of two primitive components. A variety of results on the asymptotic behaviour of \(Y_n\) are obtained in this case [7], but none of them concerns local limit properties. Here, we extend the previous results by showing a non-Gaussian local limit theorem that holds assuming that the two components have equal dominant eigenvalue while the main constants of the average value, \(\beta _1\) and \(\beta _2\) (associated with the first and the second component, respectively) are different. Under these hypotheses, it is known that \(Y_n/n\) converges in distribution to a random variable U uniformly distributed over the interval \([\min \{\beta _1,\beta _2\}, \max \{\beta _1,\beta _2\}]\) [7]. In the present work, assuming a further natural aperiodicity condition on the transition matrix, we prove that, as n grows to \(+\infty \) and for any integer expression \(k=k(n)\) such that k / n converges to a value x different from \(\beta _1\) and \(\beta _2\), we have \(n\text{ Pr }(Y_n = k) = f_U(x) + o(1)\), where \(f_U\) is the density function of the uniform random variable U defined above.
The proof of our result is based on the analysis of the characteristic function of \(Y_n\) and it is obtained by adapting to our settings the so-called Saddle Point Method, traditionally used for proving Gaussian local limit properties [9].
The material we present is organized as follows. In Sect. 2 we recall the rational stochastic model and other preliminary notions. In Sect. 3 we revisit the properties of \(Y_n\) when the transition matrix of the automaton is primitive (Gaussian case). In Sect. 4 we introduce the bicomponent model and prove the main result comparing it with the convergence in distribution given in [7]. In the last section we discuss possible extensions and future work.
2 Preliminary Notions
In this section we give some preliminary notions and define our problem.
Given the binary alphabet \(\{a,b\}\), for every word \(w\in \{a,b\}^*\) we denote by |w| the length of w and by \(|w|_a\) the number of occurrences of a in w. For each \(n\in \mathbb {N}\), we also represent by \(\{a,b\}^n\) the set \(\{w\in \{a,b\}^*:|w|=n\}\). A formal series in the non-commutative variables a, b with coefficients in the set \(\mathbb {R}_+\) of non-negative real numbers is a function \(r:\{a,b\}^* \rightarrow \mathbb {R}_+\), usually represented in the form \(r = \sum _{w\in \{a,b\}^*}(r,w)w\), where each coefficient (r, w) is the value of r at w. The set \(\mathbb {R}_+\langle \!\langle a,b \rangle \!\rangle \) of all such formal series forms a semiring with respect to the operations of sum and Cauchy product. A series \(r\in \mathbb {R}_+\langle \!\langle a,b \rangle \!\rangle \) is called rational [14] if for some integer \(m>0\) there is a monoid morphism \(\mu : \{a,b\}^* \rightarrow \mathbb {R}_+^{m\times m}\) and two arrays \(\xi , \eta \in \mathbb {R}_+^m\), such that \((r,w) = \xi '\mu (w) \eta \), for every \(w\in \{a,b\}^*\). In this case, as the morphism \(\mu \) is generated by matrices \(A=\mu (a)\) and \(B=\mu (b)\), we say that the 4-tuple \((\xi ,A,B,\eta )\) is a linear representation [2] of r.
Now, consider a rational formal series \(r\in \mathbb {R}_+\langle \!\langle a,b \rangle \!\rangle \) with linear representation \((\xi ,A,B,\eta )\) and let \(\mu \) be the morphism generated by A and B. Assume that the set \(\{w\in \{a,b\}^n : (r,w)>0\}\) is not empty for every positive integer n. Then we can consider the probability measure \(\text{ Pr }\) over the set \(\{a,b\}^n\) given by
Note that, if r is the characteristic series of a language \(L\subseteq \{a,b\}^*\) then \(\text{ Pr }\) is the uniform probability function over the set \(L\cap \{a,b\}^n\). Moreover, it is easy to see that any Bernoullian or Markovian source, for the random generation of words in \(\{a,b\}^*\), produces strings in \(\{a,b\}^n\) with probability Pr for some rational series \(r\in \mathbb {R}_+\langle \!\langle a,b \rangle \!\rangle \). We also recall that there are linear time algorithms that on input n generate a random word in \(\{a,b\}^n\) according with probability Pr [8].
Then we can define the random variable (r.v.) \(Y_n\) representing the number of occurrences of the symbol a in a word w chosen at random in \(\{a,b\}^n\) with probability \(\text{ Pr }(w)\). In this work we are interested in the asymptotic properties of \(\{Y_n\}\). It is clear that, for every \(k\in \{0,1,\ldots ,n\}\),
Since r is rational also the previous probability can be expressed by using its linear representation. It turns out that
where \([x^k] q(x)\) denotes the coefficient of \(x^k\) in a polynomial \(q\in \mathbb {R}[x]\). For sake of brevity we say that \(Y_n\) is defined by the linear representation \((\xi ,A,B,\eta )\). Then the distribution of each \(Y_n\) can be characterized by function \(h_n(z)\) given by
Indeed, setting \(M=A+B\), the moment generating function of \(Y_n\) is given by
This leads to determine mean value and variance of \(Y_n\) as
Analogously, the characteristic function of \(Y_n\) is given by
It turns out that the limit distribution of \(Y_n\) depends on the properties of the matrix \(M=A+B\). A relevant case occurs when M is primitive (i.e. \(\exists k \in \mathbb {N}: M^k>0\)). In this case it is known that \(Y_n\) has a Gaussian limit distribution [1, 3] and satisfies a local limit theorem (in the sense of De Moivre - Laplace Theorem [10]) we recall in the next section.
3 Primitive Case
In this section we consider the case when \(M=A+B\) is a primitive matrix [15] and recall some properties proved in [3, 4, 11] that are useful in the sequel.
Since M is primitive, by Perron-Frobenius Theorem, it admits a real eigenvalue \(\lambda > 0\) that is greater than the modulus of any other eigenvalue of M. Thus, we can consider the function \(u=u(z)\) implicitly defined by the equation
such that \(u(0)= \lambda \). It turns out that, in a neighbourhood of \(z=0\), u(z) is analytic, is a simple root of the characteristic polynomial of \(Ae^z + B\) and |u(z)| is strictly greater than the modulus of all other eigenvalues of \(Ae^z + B\).
Now, consider the bivariate matrix-valued function H(x, y) given by
Clearly, \(\xi 'H(e^z,y)\eta \) is the generating function of \(\{h_n(z)\}_n\), i.e.
Thus, for every z near 0, the singularities of \(\xi 'H(e^z,y)\eta \) are the values \(\mu ^{-1}\) for all (non-null) eigenvalues \(\mu \) of \(Ae^z + B\) and hence \(u(z)^{-1}\) is its (unique) singularity of minimum modulus. Then, by the properties of u(z) one can get the following
Proposition 1
([3]). If M is primitive then there are two positive constants c, \(\rho \) and a function r(z) analytic and non-null at \(z=0\), such that for every \(|z|\le c\)
and \(\rho < |u(z)|\). In particular \(\rho < \lambda \).
Mean value and variance of \(Y_n\) can be estimated from Eq. (2). In turns out that the constants \(\beta = u'(0)/\lambda \) and \(\gamma = \frac{u''(0)}{\lambda } - \left( \frac{u'(0)}{\lambda } \right) ^2\) are positive and satisfy the equalities \(E(Y_n) = \beta n + O(1)\) and \(var(Y_n) = \gamma n + O(1)\) [3]. Explicit expressions of \(\beta \) and \(\gamma \) are also obtained in [3] that depend on the matrices A, M, and in particular on \(\lambda \) and the corresponding left and right eigenvectors.
Other properties concern the function \(y(t) = u(it)/\lambda \) used in Sect. 4, defined for real t in a neighbourhood of 0. By Proposition 1, for any t near 0, \(y(t)^n\) is the leading term of the characteristic function \(\varPsi _n(t)\). Moreover, for some \(c>0\) and every \(|t| \le c\), the following relations hold [3](Footnote 1):
The behaviour of y(t) can be estimated precisely when t tends to 0. For any q such that \(1/3< q < 1/2\) it can be proved that
The previous properties can be used to prove a local limit theorem for \(\{Y_n\}\) when M is primitive [3]. The result holds under a further assumption (introduced to avoid periodicity phenomena) stating that for every \(0< t < 2\pi \).
Such a property is studied in detail in [4] and is often verified. For instance it holds true whenever there are two indices i, j such that \(A_{ij} >0\) and \(B_{ij} >0\), or \(A_{ii} >0\) and \(B_{jj} >0\). Intuitively, it corresponds to an aperiodicity property of the oriented graph defined by matrices A and B concerning the number of occurrences of the label a in cycles of equal length.
The local limit theorem in the primitive case can be stated as follows.
Theorem 2
Let \(\{Y_n\}\) be defined by a linear representation \((\xi ,A,B,\eta )\) such that the matrix \(M=A\,+\,B\) is primitive and assume that property (6) holds for every \(0< t <2\pi \). Moreover, let \(\beta \) and \(\gamma \) be defined as above. Then, as n tends to \(+\infty \), the following equation holds uniformly for every \(k=0,1,\ldots ,n\),
4 Bicomponent Models
In this section we study the behaviour of \(\{Y_n\}_{n\in \mathbb {N}}\) in the bicomponent model. We first recall some notions and properties introduced in [7] for this model: in particular we need a sort of analogous of Proposition 1 in this case.
Here \(\{Y_n\}_{n\in \mathbb {N}}\) is defined by a linear representation \((\xi ,A,B,\eta )\) of size m, such that the matrix \(M=A+B\) consists of two primitive components. More precisely, there are two linear representations \((\xi _1,A_1,B_1,\eta _1)\), \((\xi _2,A_2,B_2,\eta _2)\), of size \(m_1\) and \(m_2\), respectively, with \(m=m_1+m_2\), such that for some \(A_0, B_0 \in \mathbb {R}_+^{m_1\times m_2}\)
Moreover we assume the following conditions:
-
(A)
The matrices \(M_1 =A_1 + B_1\) and \(M_2 =A_2 + B_2\) are primitive and we denote by \(\lambda _1\) and \(\lambda _2\) the corresponding Perron-Frobenius eigenvalues;
-
(B)
\(\xi _1 \ne 0 \ne \eta _2\) and \(A_0 + B_0 \ne 0\);
-
(C)
\(A_1 \ne 0 \ne B_1\) and \(A_2 \ne 0 \ne B_2\).
Since the two components are primitive the properties presented in the previous section hold for each of them. In particular, for \(j=1,2\), we can define \(H^{(j)}(x,y)\), \(h_n^{(j)}(z)\), \(u_j(z)\), \(y_j(t)\), \(\beta _j\), and \(\gamma _j\), respectively, as the values H(x, y), \(h_n(z)\), u(z), y(t), \(\beta \), \(\gamma \) referred to component j. Note that condition (C) guarantees that \(0< \beta _j < 1\) and \(0 < \gamma _j\) for every \(j=1,2\), while condition (B) implies that both components contribute to probability values of \(Y_n\).
In such a bicomponent model the limit distribution of \(\{Y_n\}\) mainly depends on whether \(\lambda _1 \ne \lambda _2\) or \(\lambda _1=\lambda _2\). If \(\lambda _1 > \lambda _2\) then \(\frac{Y_n-\beta _1 n}{\sqrt{\gamma _1 n}}\) converges in distribution to a standard normal r.v. (the case \(\lambda _1 < \lambda _2\) is symmetric) [7]. If \(\lambda _1=\lambda _2\) and \(\beta _1 \ne \beta _2\) then \(Y_n/n\) converges in distribution to a random variable U uniformly distributed over the interval \([b_1,b_2]\), where \(b_1= \min \{\beta _1,\beta _2\}\) and \(b_2= \max \{\beta _1,\beta _2\}\) [7, Theorem 15]. This means that, for every \(x\in \mathbb {R}\),
However this relation does not give information about the probability that \(Y_n\) takes a specific value \(k \in \mathbb {N}\) (possibly depending on n). Here we want to show that adding a further condition on the model such a probability can be estimated at least for reasonable expressions \(k=k(n)\). To this end, we still consider the case \(\lambda _1=\lambda _2\) and \(\beta _1 \ne \beta _2\) and assume a further hypothesis analogous to condition (6): for every \(0< t < 2\pi \)
where we set \(\lambda = \lambda _1 = \lambda _2\).
In this case, following [7], the matrix-valued function H(x, y) is given by
Thus, the generating function of \(\{h_n(z)\}_n\) is now given by
An analysis of the singularities of \(\xi ' H(e^z,y) \eta \) is presented in [7, Sect. 7.2] where the following property is proved.
Proposition 3
For some constant \(c>0\) and every \(z\in \mathbb {C}\) such that \(|z|\le c\), we have
where s(z) is a function analytic and non-null for \(|z|\le c\).
Since \(u_1(0) = \lambda = u_2(0)\) the previous proposition implies
4.1 Analysis of the Characteristic Function
Here we study the characteristic function \(\varPsi _n(t) = \frac{h_n(it)}{h_n(0)}\), for \(-\pi \le t \le \pi \). We split this interval in three sets:
where c and q are positive constants and \(\frac{1}{3}< q < \frac{1}{2}\). We observe that such a splitting is typical of the “Saddle Point Method”, and it is often used to derive local limit properties in the Gaussian case [9].
Proposition 4
For every \(0< c < \pi \) there exists \(0< \varepsilon < 1\) such that
Proof
From Eq. (11) it is clear that, for every \(z \in \mathbb {C}\), the singularities of the generating function \(\xi ' H(e^z,y) \eta \) are the inverses of the eigenvalues of the matrices \((A_1e^z+B_1)\) and \((A_2e^z + B_2)\). Then, by condition (10), for every \(0<c<\pi \), all singularities of \(\xi ' H(e^{it},y) \eta \), for any \(c\le |t|\le \pi \), are in modulus greater than a value \(\tau ^{-1}\) such that \(0< \tau < \lambda \), and hence \(|h_n(it)| = O(\tau ^n)\). Thus, by equality (12), for some \(0< \varepsilon < 1\) we have
\(\square \)
Now, let us study \(\varPsi _n(t)\) for t in a neighbourhood of 0. We recall that in such a set both functions \(y_1(t)= u_1(it)/\lambda \) and \(y_2(t)= u_2(it)/\lambda \) satisfy equations (4). Then, for some \(c>0\) and every \(|t|\le c\), we have
Moreover, by Proposition 3 we immediately get, for \(|t|\le c\), with \(t\ne 0\),
Thus from equalities (12), (13) and (14), we have
This leads to evaluate \(\varPsi _n(t)\) in the second set, i.e. for \(n^{-q}< |t| < c\).
Proposition 5
Let \(0< q < 1/2\). Then there are two positive constants a, c such that, for every real t satisfying \(n^{-q}< |t| < c\),
Proof
From Eq. (15), taking \(a= \min \{\gamma _1,\gamma _2\}/4\), we obtain for some \(c>0\)
and by (14) we get \(|\varPsi _n(t)| = O\left( n^{q-1} e^{-an^{1-2q}}\right) \) proving the result. \(\square \)
Now, let us evaluate \(\ \varPsi _n(t)\ \) in the first set, that is for \(\ |t|\le n^{-q}\) where \(1/3< q < 1/2\). First note that, by simple computations, the following relations can be proved:
Therefore, by Eq. (15), for every \(k \in \{0,1,\ldots ,n\}\) we get
Also observe that both \(y_1(t)\) and \(y_2(t)\) satisfy Eq. (5), whence
Thus, replacing these values in (16), after some computations (similar to the previous ones) we obtain the following
Proposition 6
For every \(k \in \{0,1,\ldots ,n\}\) and every \(1/3< q < 1/2\) we have
4.2 Main Result
Without loss of generality assume \(\beta _1 < \beta _2 \), and denote by \(f_U(x)\) the density function of a uniform r.v. U in the interval \([\beta _1,\beta _2]\), that is
where \(\chi _I\) denotes the indicator function of the interval \(I\subset \mathbb {R}\).
For our purpose we need the following property.
Lemma 7
For \(k,m \in \mathbb {N}\), \(k < m\), let \(g:[2k\pi ,2m\pi ]\rightarrow \mathbb {R}_+\) be a monotone function, and let \(I_{k,m} = \int _{2k\pi }^{2m\pi } g(x) \sin (x) dx \). Then:
-
(a)
if g is non-increasing we have \(\ 0 \le I_{k,m} \le 2[g(2k\pi )-g(2m\pi )]\);
-
(b)
if g is non-decreasing we have \(\ 2[g(2k\pi )-g(2m\pi )] \le I_{k,m} \le 0 \).
In both cases \(|I_{k,m}| \le 2|g(2k\pi )-g(2m\pi )|\).
Proof
If g is non-increasing, for each integer \(k\le j < m\) the following relations hold
while \(0 \le I_{j,j+1} \) is obvious. Thus (a) follows by summing the expressions above for \(j=k,\ldots ,m-1\).
Part (b) follows by applying (a) to the function \(h(x) = g(2m\pi ) - g(x)\). \(\square \)
Now, we can state our main result.
Theorem 8
Let \((\xi ,A,B, \eta )\) be a linear representation of the form (8) satisfying conditions (A), (B), (C) above; also assume \(\lambda _1=\lambda _2\), \(\beta _1 \ne \beta _2\) together with the aperiodicity condition (10). Then, the r.v. \(Y_n\) satisfies the relation
for every integer \(k=k(n)\), provided that \(k/n\rightarrow x\) for a constant x such that \(\beta _1 \ne x \ne \beta _2\) (as \(n\rightarrow +\infty \)).
Proof
It is known [10] that the probability \(p_n(k)= \text{ Pr }\left\{ Y_n = k \right\} \), for every \(k\in \{0,1,\ldots ,n\}\), can be obtained from \(\varPsi _n(t)\) by the inversion formula
To evaluate the integral above let us split the interval \([-\pi ,\pi ]\) into the three sets
with c as in Proposition 5 and some \(1/3< q < 1/2\). Then, by Propositions 4, 5, 6, we obtain
Now, set \(v=k/n\) and note that for \(n\rightarrow +\infty \), v approaches a value different from \(\beta _1\) and \(\beta _2\). Thus, defining
we have to prove that
Since \(\beta _1 < \beta _2\) set \(\delta = \beta _2 - \beta _1\). Then, \(\varDelta _n(v)\) is the integral of the difference of two functions of the form
where \(\beta \) and \(\gamma \) take the values \(\beta _2\), \(\gamma _2\) and \(\beta _1\), \(\gamma _1\), respectively. Using the symmetries of real and imaginary part of \(A_n\), by a change of variable we get
As \(\int _{0}^{+\infty } \frac{\sin (u)}{u} du = \pi /2\), for \(n\rightarrow +\infty \) the first term converges to \(\frac{\pi }{\delta } \text{ sgn }(\beta -v)\). Now we show that the second term of (20) tends to 0 as \(n\rightarrow +\infty \). This term is equal to
where \(B_n(u) = u^{-1} \left( 1 - e^{-\frac{\gamma u^2}{2(\beta -v)^2n}}\right) \). To evaluate (21) we use Lemma 7. Note that \(B_n(u)>0\) for all \(u>0\), and \(\lim _{u\rightarrow 0} B_n(u) = 0 = \lim _{u\rightarrow +\infty } B_n(u)\). Moreover in the set \((0,+\infty )\) its derivative is null only at the point \(u_n = \alpha |\beta - v| \sqrt{n/\gamma }\), for a constant \(\alpha \in (1,2)\) independent of n and v. Thus, for n large enough, \(u_n\) belongs to the interval \((0,|\beta -v| n^{1-q})\), \(B_n(u)\) is increasing in the set \((0,u_n)\) and decreasing in \((u_n,+\infty )\), while its maximum value is
Defining \(k_n= \lfloor \frac{u_n}{2\pi }\rfloor \) and \(K = \lfloor \frac{|\beta - v| n^{1-q}}{2\pi } \rfloor \), we can apply Lemma 7 to the intervals \([0,2k_n]\) and \([2k_n+2,2K]\), to get
where c is a positive constant independent of v and n.
This implies that, for any v approaching a constant different from \(\beta _1\) and \(\beta _2\), the second term of (20) is \(O(n^{-1/2})\). Therefore, we get
which proves Eq. (19) and the proof is complete. \(\square \)
A typical consequence of this result is that \(n \text{ Pr }(Y_n= \lfloor xn \rfloor )\) converges to \(f_U(x)\) for every real x different from \(\beta _1\) and \(\beta _2\). Intuitively equalities of the form (17) are considered more precise than convergence in distribution since they estimate the probability that the n-th random variable of the sequence takes a specific value rather than lying on an interval.
On the other hand we observe that (without condition (10)) the convergence in distribution (9) does not imply our equality (17). In particular if there are periodicity phenomena in the occurrences of letter a it may happen that (9) holds while (17) does not. For instance if the overall series r of linear representation \((\xi ,A,B,\eta )\) has non-zero coefficients (r, w) only for words w with even \(|w|_a\), then \(\text{ Pr }(Y_n=k) = 0\) for all odd integers k, and hence (17) cannot hold while (9) may still be valid. This observation also shows that condition (10) prevents such periodicity phenomena in the stochastic model.
5 Conclusions
In this work we have presented a non-Gaussian local limit property for the number of occurrences of a symbol in words generated at random according with a rational stochastic model defined by a linear representation with two primitive components. Our result concerns the case when the two components have the same dominant eigenvalue but different main constants of the respective mean value (\(\beta _1\) and \(\beta _2\)). We expect that in case of different dominant eigenvalues a Gaussian local limit property holds, where the main terms of mean value and variance correspond to the dominant component. On the contrary, we conjecture that results similar to ours (that is of a non-Gaussian type) hold for other rational stochastic models, defined by assuming different hypotheses on the key parameters associated to mean value and variance of the statistic of interest (e.g. \(\beta _1=\beta _2\)), or assuming more than two primitive components with equal dominant eigenvalues.
Notes
- 1.
Here, for every interval \(I\subseteq \mathbb {R}\) and functions \(f,g :I \rightarrow \mathbb {C}\), by “\(g(t)=O(f(t))\) for \(t\in I\)” we mean “\(|g(t)| \le b |f(t)|\) for all \(t\in I\)”, for some constant \(b>0\).
References
Bender, E.A.: Central and local limit theorems applied to asymptotic enumeration. J. Comb. Theory 15, 91–111 (1973)
Berstel, J., Reutenauer, C.: Rational Series and Their Languages. Springer, New York (1988)
Bertoni, A., Choffrut, C., Goldwurm, M., Lonati, V.: On the number of occurrences of a symbol in words of regular languages. Theoret. Comput. Sci. 302, 431–456 (2003)
Bertoni, A., Choffrut, C., Goldwurm, M., Lonati, V.: Local limit properties for pattern statistics and rational models. Theory Comput. Syst. 39, 209–235 (2006)
Broda, S., Machiavelo, A., Moreira, N., Reis, R.: A hitchhiker’s guide to descriptional complexity through analytic combinatorics. Theory Comput. Syst. 528, 85–100 (2014)
Broda, S., Machiavelo, A., Moreira, N., Reis, R.: On the average complexity of strong star normal form. In: Pighizzini, G., Câmpeanu, C. (eds.) DCFS 2017. LNCS, vol. 10316, pp. 77–88. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-60252-3_6
de Falco, D., Goldwurm, M., Lonati, V.: Frequency of symbol occurrences in bicomponent stochastic models. Theoret. Comput. Sci. 327(3), 269–300 (2004)
Denise, A.: Génération aléatoire uniforme de mots de langages rationnels. Theoret. Comput. Sci. 159, 43–63 (1996)
Flajolet, P., Sedgewick, R.: Analytic Combinatorics. Cambridge University Press, Cambridge (2009)
Gnedenko, B.V.: Theory of Probability. Gordon and Breach Science Publisher, Amsterdam (1997)
Nicodeme, P., Salvy, B., Flajolet, P.: Motif statistics. Theoret. Comput. Sci. 287(2), 593–617 (2002)
Prum, B., Rudolphe, F., Turckheim, E.: Finding words with unexpected frequencies in deoxyribonucleic acid sequence. J. Roy. Stat. Soc. Ser. B 57, 205–220 (1995)
Régnier, M., Szpankowski, W.: On pattern frequency occurrences in a Markovian sequence. Algorithmica 22(4), 621–649 (1998)
Salomaa, A., Soittola, M.: Automata-Theoretic Aspects of Formal Power Series. Springer, New York (1978). https://doi.org/10.1007/978-1-4612-6264-0
Seneta, E.: Non-negative Matrices and Markov Chains. Springer, New York (1981). https://doi.org/10.1007/0-387-32792-4
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 IFIP International Federation for Information Processing
About this paper
Cite this paper
Goldwurm, M., Lin, J., Vignati, M. (2018). A Local Limit Property for Pattern Statistics in Bicomponent Stochastic Models. In: Konstantinidis, S., Pighizzini, G. (eds) Descriptional Complexity of Formal Systems. DCFS 2018. Lecture Notes in Computer Science(), vol 10952. Springer, Cham. https://doi.org/10.1007/978-3-319-94631-3_10
Download citation
DOI: https://doi.org/10.1007/978-3-319-94631-3_10
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-94630-6
Online ISBN: 978-3-319-94631-3
eBook Packages: Computer ScienceComputer Science (R0)