Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

A classical counting problem concerning formal languages is the evaluation of the number of occurrences of a given symbol a in a word w of length n generated at random by a suitable stochastic source. Denoting by \(Y_n\) such a random variable, traditional goals of interest are the asymptotic evaluation of mean value and variance of \(Y_n\) as well as its limit distribution and local limit evaluations for its probability function. Clearly these properties depend on the stochastic model used for generating the random text w. Classical models are the Bernoulli and Markovian ones [11, 13]. Here we consider the rational stochastic models, which are defined by rational formal series in non-commutative variables with coefficients in \(\mathbb {R}_+\) [3]. In these models the computation of a random word w can be done easily in linear time [8] once we know a \(\mathbb {R}_+\)-weighted finite state automaton that recognizes the series. Such a probabilistic source is rather general since it includes as special cases the traditional Bernoulli and Markovian models and also encompasses the random generation of words of length n in any regular language under uniform distribution.

The problem above is of interest for several reasons. First, it has been studied in connection with the analysis of pattern statistics and in particular those occurring in computational biology [3, 11,12,13]. It turns out that evaluating the frequency of patterns from a regular expression in a random text generated by a Markovian model can be reduced to determining the frequency of a single symbol in a word over a binary alphabet generated by a rational stochastic model [3, 11].

Moreover, it is well-known that the average number of occurrences of symbols in words of regular and context-free languages plays a relevant role in the analysis of the descriptional complexity of languages and computational models [5, 6]. Clearly the limit distributions of these quantities (also in the local form) yield a more complete information and in particular they allow to evaluate their dispersion around the average values.

Our problem is also related to the asymptotic estimate of the coefficients of rational formal series in commutative variables. In particular the local limit properties of \(Y_n\) enabled to show that the maximum coefficients of monomials of size n for some of those series is of the order \(\varTheta (n^{k-3/2}\lambda ^n)\) for \(\lambda \ge 1\) and any positive \(k\in \mathbb {N}\) [4, Corollary 16]. Similar consequences hold for the study of the degree of ambiguity of rational trace languages (subset of free partially commutative monoids) [3].

The asymptotic behaviour of \(Y_n\) assuming the rational stochastic model defined by a series r depends on the finite state automata that recognize r. The main results known in the literature concern the case when such an automaton has a primitive transition matrix. In this case asymptotic expressions for the mean value \(E(Y_n)\) and the variance \(var(Y_n)\) are known. In particular, there is a real value \(\beta \), \(0< \beta < 1\), such that \(E(Y_n) = \beta n + O(1)\) and a similar result holds for \(var(Y_n)\). Under the same hypothesis it is also proved that \(Y_n\) has a Gaussian limit distribution (i.e. it satisfies a central limit theorem) and it admits local limit properties intuitively stating that its probability function approximates a normal density [3, 4].

The properties of \(Y_n\) have been studied also when the transition matrix consists of two primitive components. A variety of results on the asymptotic behaviour of \(Y_n\) are obtained in this case [7], but none of them concerns local limit properties. Here, we extend the previous results by showing a non-Gaussian local limit theorem that holds assuming that the two components have equal dominant eigenvalue while the main constants of the average value, \(\beta _1\) and \(\beta _2\) (associated with the first and the second component, respectively) are different. Under these hypotheses, it is known that \(Y_n/n\) converges in distribution to a random variable U uniformly distributed over the interval \([\min \{\beta _1,\beta _2\}, \max \{\beta _1,\beta _2\}]\) [7]. In the present work, assuming a further natural aperiodicity condition on the transition matrix, we prove that, as n grows to \(+\infty \) and for any integer expression \(k=k(n)\) such that k / n converges to a value x different from \(\beta _1\) and \(\beta _2\), we have \(n\text{ Pr }(Y_n = k) = f_U(x) + o(1)\), where \(f_U\) is the density function of the uniform random variable U defined above.

The proof of our result is based on the analysis of the characteristic function of \(Y_n\) and it is obtained by adapting to our settings the so-called Saddle Point Method, traditionally used for proving Gaussian local limit properties [9].

The material we present is organized as follows. In Sect. 2 we recall the rational stochastic model and other preliminary notions. In Sect. 3 we revisit the properties of \(Y_n\) when the transition matrix of the automaton is primitive (Gaussian case). In Sect. 4 we introduce the bicomponent model and prove the main result comparing it with the convergence in distribution given in [7]. In the last section we discuss possible extensions and future work.

2 Preliminary Notions

In this section we give some preliminary notions and define our problem.

Given the binary alphabet \(\{a,b\}\), for every word \(w\in \{a,b\}^*\) we denote by |w| the length of w and by \(|w|_a\) the number of occurrences of a in w. For each \(n\in \mathbb {N}\), we also represent by \(\{a,b\}^n\) the set \(\{w\in \{a,b\}^*:|w|=n\}\). A formal series in the non-commutative variables ab with coefficients in the set \(\mathbb {R}_+\) of non-negative real numbers is a function \(r:\{a,b\}^* \rightarrow \mathbb {R}_+\), usually represented in the form \(r = \sum _{w\in \{a,b\}^*}(r,w)w\), where each coefficient (rw) is the value of r at w. The set \(\mathbb {R}_+\langle \!\langle a,b \rangle \!\rangle \) of all such formal series forms a semiring with respect to the operations of sum and Cauchy product. A series \(r\in \mathbb {R}_+\langle \!\langle a,b \rangle \!\rangle \) is called rational [14] if for some integer \(m>0\) there is a monoid morphism \(\mu : \{a,b\}^* \rightarrow \mathbb {R}_+^{m\times m}\) and two arrays \(\xi , \eta \in \mathbb {R}_+^m\), such that \((r,w) = \xi '\mu (w) \eta \), for every \(w\in \{a,b\}^*\). In this case, as the morphism \(\mu \) is generated by matrices \(A=\mu (a)\) and \(B=\mu (b)\), we say that the 4-tuple \((\xi ,A,B,\eta )\) is a linear representation [2] of r.

Now, consider a rational formal series \(r\in \mathbb {R}_+\langle \!\langle a,b \rangle \!\rangle \) with linear representation \((\xi ,A,B,\eta )\) and let \(\mu \) be the morphism generated by A and B. Assume that the set \(\{w\in \{a,b\}^n : (r,w)>0\}\) is not empty for every positive integer n. Then we can consider the probability measure \(\text{ Pr }\) over the set \(\{a,b\}^n\) given by

$$ \text{ Pr }(w) = \frac{(r,w)}{\sum _{x\in \{a,b\}^n} (r,x)} = \frac{\xi '\mu (w)\eta }{\xi '(A+B)^n\eta } \qquad \ \forall \ w\in \{a,b\}^n $$

Note that, if r is the characteristic series of a language \(L\subseteq \{a,b\}^*\) then \(\text{ Pr }\) is the uniform probability function over the set \(L\cap \{a,b\}^n\). Moreover, it is easy to see that any Bernoullian or Markovian source, for the random generation of words in \(\{a,b\}^*\), produces strings in \(\{a,b\}^n\) with probability Pr for some rational series \(r\in \mathbb {R}_+\langle \!\langle a,b \rangle \!\rangle \). We also recall that there are linear time algorithms that on input n generate a random word in \(\{a,b\}^n\) according with probability Pr [8].

Then we can define the random variable (r.v.) \(Y_n\) representing the number of occurrences of the symbol a in a word w chosen at random in \(\{a,b\}^n\) with probability \(\text{ Pr }(w)\). In this work we are interested in the asymptotic properties of \(\{Y_n\}\). It is clear that, for every \(k\in \{0,1,\ldots ,n\}\),

$$ p_n(k) := \text{ Pr }(Y_n =k) = \frac{\sum _{|w|=n,|w|_a=k} (r,w)}{\sum _{w\in \{a,b\}^n} (r,w)} $$

Since r is rational also the previous probability can be expressed by using its linear representation. It turns out that

$$\begin{aligned} p_n(k) = \frac{[x^k]\xi ' (Ax+B)^n \eta }{\xi ' (A+B)^n \eta } \qquad \forall \ k\in \{0,1,\ldots ,n\} \end{aligned}$$
(1)

where \([x^k] q(x)\) denotes the coefficient of \(x^k\) in a polynomial \(q\in \mathbb {R}[x]\). For sake of brevity we say that \(Y_n\) is defined by the linear representation \((\xi ,A,B,\eta )\). Then the distribution of each \(Y_n\) can be characterized by function \(h_n(z)\) given by

$$ h_n(z) = \xi ' (Ae^z + B)^n\eta $$

Indeed, setting \(M=A+B\), the moment generating function of \(Y_n\) is given by

$$ F_n(z) = \sum _{k=0}^n p_n(k) e^{zk} = \frac{\xi ' (Ae^z + B)^n\eta }{\xi ' M^n\eta } = \frac{h_n(z)}{h_n(0)} \qquad \forall \ z \in \mathbb {C}$$

This leads to determine mean value and variance of \(Y_n\) as

$$\begin{aligned} E(Y_n) = F_n'(0) = \frac{h_n'(0)}{h_n(0)}, \ \ Var(Y_n) = \frac{h_n''(0)}{h_n(0)} - \left( \frac{h_n'(0)}{h_n(0)} \right) ^2 \end{aligned}$$
(2)

Analogously, the characteristic function of \(Y_n\) is given by

$$\begin{aligned} \varPsi _n(t) = \sum _{k=0}^n p_n(k) e^{itk} = \frac{\xi ' (Ae^{it} + B)^n\eta }{\xi ' M^n\eta } = \frac{h_n(it)}{h_n(0)} \qquad \forall \ t \in \mathbb {R}\end{aligned}$$
(3)

It turns out that the limit distribution of \(Y_n\) depends on the properties of the matrix \(M=A+B\). A relevant case occurs when M is primitive (i.e. \(\exists k \in \mathbb {N}: M^k>0\)). In this case it is known that \(Y_n\) has a Gaussian limit distribution [1, 3] and satisfies a local limit theorem (in the sense of De Moivre - Laplace Theorem [10]) we recall in the next section.

3 Primitive Case

In this section we consider the case when \(M=A+B\) is a primitive matrix [15] and recall some properties proved in [3, 4, 11] that are useful in the sequel.

Since M is primitive, by Perron-Frobenius Theorem, it admits a real eigenvalue \(\lambda > 0\) that is greater than the modulus of any other eigenvalue of M. Thus, we can consider the function \(u=u(z)\) implicitly defined by the equation

$$ \text{ Det }(Iu - Ae^z - B) = 0$$

such that \(u(0)= \lambda \). It turns out that, in a neighbourhood of \(z=0\), u(z) is analytic, is a simple root of the characteristic polynomial of \(Ae^z + B\) and |u(z)| is strictly greater than the modulus of all other eigenvalues of \(Ae^z + B\).

Now, consider the bivariate matrix-valued function H(xy) given by

$$ H(x,y) = \sum _{n=0}^{+\infty } (Ax + B)^n \ y^n \ = \ \left( I - (Ax+B) y \right) ^{-1} $$

Clearly, \(\xi 'H(e^z,y)\eta \) is the generating function of \(\{h_n(z)\}_n\), i.e.

$$ \xi 'H(e^z,y)\eta = \sum _{n=0}^{+\infty } h_n(z) y^n = \frac{\xi ' \text{ Adj }\left( I - (Ae^z+B) y \right) \eta }{\text{ Det }\left( I-(Ae^z+B) y\right) } $$

Thus, for every z near 0, the singularities of \(\xi 'H(e^z,y)\eta \) are the values \(\mu ^{-1}\) for all (non-null) eigenvalues \(\mu \) of \(Ae^z + B\) and hence \(u(z)^{-1}\) is its (unique) singularity of minimum modulus. Then, by the properties of u(z) one can get the following

Proposition 1

([3]). If M is primitive then there are two positive constants c, \(\rho \) and a function r(z) analytic and non-null at \(z=0\), such that for every \(|z|\le c\)

$$ h_n(z) = r(z)\; u(z)^n + O(\rho ^n) $$

and \(\rho < |u(z)|\). In particular \(\rho < \lambda \).

Mean value and variance of \(Y_n\) can be estimated from Eq. (2). In turns out that the constants \(\beta = u'(0)/\lambda \) and \(\gamma = \frac{u''(0)}{\lambda } - \left( \frac{u'(0)}{\lambda } \right) ^2\) are positive and satisfy the equalities \(E(Y_n) = \beta n + O(1)\) and \(var(Y_n) = \gamma n + O(1)\) [3]. Explicit expressions of \(\beta \) and \(\gamma \) are also obtained in [3] that depend on the matrices A, M, and in particular on \(\lambda \) and the corresponding left and right eigenvectors.

Other properties concern the function \(y(t) = u(it)/\lambda \) used in Sect. 4, defined for real t in a neighbourhood of 0. By Proposition 1, for any t near 0, \(y(t)^n\) is the leading term of the characteristic function \(\varPsi _n(t)\). Moreover, for some \(c>0\) and every \(|t| \le c\), the following relations hold [3](Footnote 1):

$$\begin{aligned} |y(t)| = 1 - \frac{\gamma }{2} t^2 + O(t^4), \qquad \arg {y(t)} = \beta t + O(t^3), \qquad |y(t)| \le e^{-\frac{\gamma }{4}t^2} \end{aligned}$$
(4)

The behaviour of y(t) can be estimated precisely when t tends to 0. For any q such that \(1/3< q < 1/2\) it can be proved that

$$\begin{aligned} y(t)^n = e^{-\frac{\gamma }{2}t^2n + i\beta t n} (1 + O(t^3)n) \qquad \text{ for } |t| \le n^{-q} \end{aligned}$$
(5)

The previous properties can be used to prove a local limit theorem for \(\{Y_n\}\) when M is primitive [3]. The result holds under a further assumption (introduced to avoid periodicity phenomena) stating that for every \(0< t < 2\pi \).

$$\begin{aligned} |\mu | < \lambda \ \ \text{ for } \text{ every } \text{ eigenvalue } \mu \text{ of } Ae^{it} + B \end{aligned}$$
(6)

Such a property is studied in detail in [4] and is often verified. For instance it holds true whenever there are two indices ij such that \(A_{ij} >0\) and \(B_{ij} >0\), or \(A_{ii} >0\) and \(B_{jj} >0\). Intuitively, it corresponds to an aperiodicity property of the oriented graph defined by matrices A and B concerning the number of occurrences of the label a in cycles of equal length.

The local limit theorem in the primitive case can be stated as follows.

Theorem 2

Let \(\{Y_n\}\) be defined by a linear representation \((\xi ,A,B,\eta )\) such that the matrix \(M=A\,+\,B\) is primitive and assume that property (6) holds for every \(0< t <2\pi \). Moreover, let \(\beta \) and \(\gamma \) be defined as above. Then, as n tends to \(+\infty \), the following equation holds uniformly for every \(k=0,1,\ldots ,n\),

$$\begin{aligned} \text{ Pr }\left\{ Y_n = k \right\} \, = \, \frac{e^{-\frac{(k-\beta n)^2}{2\gamma n}}}{\sqrt{2\pi \gamma n}} \, + \, \text{ o }\left( \frac{1}{\sqrt{n}}\right) \end{aligned}$$
(7)

4 Bicomponent Models

In this section we study the behaviour of \(\{Y_n\}_{n\in \mathbb {N}}\) in the bicomponent model. We first recall some notions and properties introduced in [7] for this model: in particular we need a sort of analogous of Proposition 1 in this case.

Here \(\{Y_n\}_{n\in \mathbb {N}}\) is defined by a linear representation \((\xi ,A,B,\eta )\) of size m, such that the matrix \(M=A+B\) consists of two primitive components. More precisely, there are two linear representations \((\xi _1,A_1,B_1,\eta _1)\), \((\xi _2,A_2,B_2,\eta _2)\), of size \(m_1\) and \(m_2\), respectively, with \(m=m_1+m_2\), such that for some \(A_0, B_0 \in \mathbb {R}_+^{m_1\times m_2}\)

$$\begin{aligned} \xi ' = (\xi '_1, \xi '_2) , \quad A = \left( \begin{array}{cc} A_1 &{} A_0 \\ 0 &{} A_2 \end{array} \right) , \quad B = \left( \begin{array}{cc} B_1 &{} B_0 \\ 0 &{} B_2 \end{array} \right) , \quad \eta =\left( \begin{array}{c} \eta _1 \\ \eta _2 \end{array} \right) \end{aligned}$$
(8)

Moreover we assume the following conditions:

  1. (A)

    The matrices \(M_1 =A_1 + B_1\) and \(M_2 =A_2 + B_2\) are primitive and we denote by \(\lambda _1\) and \(\lambda _2\) the corresponding Perron-Frobenius eigenvalues;

  2. (B)

    \(\xi _1 \ne 0 \ne \eta _2\) and \(A_0 + B_0 \ne 0\);

  3. (C)

    \(A_1 \ne 0 \ne B_1\) and \(A_2 \ne 0 \ne B_2\).

Since the two components are primitive the properties presented in the previous section hold for each of them. In particular, for \(j=1,2\), we can define \(H^{(j)}(x,y)\), \(h_n^{(j)}(z)\), \(u_j(z)\), \(y_j(t)\), \(\beta _j\), and \(\gamma _j\), respectively, as the values H(xy), \(h_n(z)\), u(z), y(t), \(\beta \), \(\gamma \) referred to component j. Note that condition (C) guarantees that \(0< \beta _j < 1\) and \(0 < \gamma _j\) for every \(j=1,2\), while condition (B) implies that both components contribute to probability values of \(Y_n\).

In such a bicomponent model the limit distribution of \(\{Y_n\}\) mainly depends on whether \(\lambda _1 \ne \lambda _2\) or \(\lambda _1=\lambda _2\). If \(\lambda _1 > \lambda _2\) then \(\frac{Y_n-\beta _1 n}{\sqrt{\gamma _1 n}}\) converges in distribution to a standard normal r.v. (the case \(\lambda _1 < \lambda _2\) is symmetric) [7]. If \(\lambda _1=\lambda _2\) and \(\beta _1 \ne \beta _2\) then \(Y_n/n\) converges in distribution to a random variable U uniformly distributed over the interval \([b_1,b_2]\), where \(b_1= \min \{\beta _1,\beta _2\}\) and \(b_2= \max \{\beta _1,\beta _2\}\) [7, Theorem 15]. This means that, for every \(x\in \mathbb {R}\),

$$\begin{aligned} \lim _{n \rightarrow +\infty } \text{ Pr }(Y_n/n \le x) = \text{ Pr }(U \le x) \end{aligned}$$
(9)

However this relation does not give information about the probability that \(Y_n\) takes a specific value \(k \in \mathbb {N}\) (possibly depending on n). Here we want to show that adding a further condition on the model such a probability can be estimated at least for reasonable expressions \(k=k(n)\). To this end, we still consider the case \(\lambda _1=\lambda _2\) and \(\beta _1 \ne \beta _2\) and assume a further hypothesis analogous to condition (6): for every \(0< t < 2\pi \)

$$\begin{aligned} |\mu | < \lambda \text{ for } \text{ all } \text{ eigenvalues } \mu \text{ of } \text{ the } \text{ matrices } A_1e^{it} + B_1 \text{ and } A_2e^{it} + B_2 \end{aligned}$$
(10)

where we set \(\lambda = \lambda _1 = \lambda _2\).

In this case, following [7], the matrix-valued function H(xy) is given by

$$\begin{aligned} H(x,y)=\sum _{n=0}^{+\infty } (Ax+B)^n y^n= \left[ \begin{array}{cc} H^{(1)}(x,y) &{} G(x,y) \nonumber \\ 0 &{} H^{(2)}(x,y) \end{array} \right] , \quad \text{ where } \qquad \nonumber \\ H^{(1)}(x,y) = \frac{\text{ Adj }\left( I-(A_1x+B_1)y\right) }{\text{ Det }\left( I-(A_1x+B_1) y\right) }, \ H^{(2)}(x,y) = \frac{\text{ Adj }\left( I-(A_2x+B_2)y\right) }{\text{ Det }\left( I-(A_2x+B_2) y\right) } \nonumber \\ \text{ and } \ G(x,y) = H^{(1)}(x,y) \; (A_0 x + B_0)y \; H^{(2)}(x,y).\qquad \end{aligned}$$
(11)

Thus, the generating function of \(\{h_n(z)\}_n\) is now given by

$$ \sum _{n=0}^\infty h_n(z) y^n = \xi ' H(e^z,y) \eta = \xi _1' H^{(1)}(e^z,y) \eta _1 + \xi _1' G (e^z,y) \eta _2 + \xi _2' H^{(2)} (e^z,y) \eta _2 $$

An analysis of the singularities of \(\xi ' H(e^z,y) \eta \) is presented in [7, Sect. 7.2] where the following property is proved.

Proposition 3

For some constant \(c>0\) and every \(z\in \mathbb {C}\) such that \(|z|\le c\), we have

$$ h_n(z) = s(z) \sum _{j=0}^{n-1} u_1(z)^j u_2(z)^{n-1-j} + O(u_1(z)^n) + O(u_2(z)^n) $$

where s(z) is a function analytic and non-null for \(|z|\le c\).

Since \(u_1(0) = \lambda = u_2(0)\) the previous proposition implies

$$\begin{aligned} h_n(0) = s(0) n \lambda ^{n-1} + O(\lambda ^n) \qquad \text{( }s(0)\ne 0\text{) } \end{aligned}$$
(12)

4.1 Analysis of the Characteristic Function

Here we study the characteristic function \(\varPsi _n(t) = \frac{h_n(it)}{h_n(0)}\), for \(-\pi \le t \le \pi \). We split this interval in three sets:

$$|t| \le n^{-q},\ \ n^{-q}< |t| <c, \ \ c \le |t| \le \pi $$

where c and q are positive constants and \(\frac{1}{3}< q < \frac{1}{2}\). We observe that such a splitting is typical of the “Saddle Point Method”, and it is often used to derive local limit properties in the Gaussian case [9].

Proposition 4

For every \(0< c < \pi \) there exists \(0< \varepsilon < 1\) such that

$$ |\varPsi _n(t)| = O(\varepsilon ^n) \qquad \text{ for } \text{ all } \ c \le |t| \le \pi . $$

Proof

From Eq. (11) it is clear that, for every \(z \in \mathbb {C}\), the singularities of the generating function \(\xi ' H(e^z,y) \eta \) are the inverses of the eigenvalues of the matrices \((A_1e^z+B_1)\) and \((A_2e^z + B_2)\). Then, by condition (10), for every \(0<c<\pi \), all singularities of \(\xi ' H(e^{it},y) \eta \), for any \(c\le |t|\le \pi \), are in modulus greater than a value \(\tau ^{-1}\) such that \(0< \tau < \lambda \), and hence \(|h_n(it)| = O(\tau ^n)\). Thus, by equality (12), for some \(0< \varepsilon < 1\) we have

$$ |\varPsi _n(t)| = \left| \frac{h_n(it)}{h_n(0)} \right| = \frac{O(\tau ^n)}{\varTheta (n\lambda ^n)} = O(\varepsilon ^n) \qquad \text{ for } \text{ any } \ c \le |t| \le \pi $$

   \(\square \)

Now, let us study \(\varPsi _n(t)\) for t in a neighbourhood of 0. We recall that in such a set both functions \(y_1(t)= u_1(it)/\lambda \) and \(y_2(t)= u_2(it)/\lambda \) satisfy equations (4). Then, for some \(c>0\) and every \(|t|\le c\), we have

$$\begin{aligned}&y_1(t) = 1 + i\beta _1t + O(t^2), \qquad y_2(t) = 1 + i\beta _2t + O(t^2) \end{aligned}$$
(13)
$$\begin{aligned}&|y_1(t)| \le e^{-\frac{\gamma _1}{4}t^2}, \qquad |y_2(t)| \le e^{-\frac{\gamma _2}{4}t^2} \end{aligned}$$
(14)

Moreover, by Proposition 3 we immediately get, for \(|t|\le c\), with \(t\ne 0\),

$$ h_n(it) = s(it)\frac{u_1(it)^n - u_2(it)^n}{u_1(it) - u_2(it)} + O(u_1(it)^n) + O(u_2(it)^n) $$

Thus from equalities (12), (13) and (14), we have

$$\begin{aligned} \varPsi _n(t) = \frac{h_n(it)}{h_n(0)} = (1+O(t)) \left( \frac{y_1(t)^n - y_2(t)^n}{it\;(\beta _1-\beta _2)\; n} \right) + \sum _{j=1,2} O\left( \frac{e^{-\frac{\gamma _j}{4}t^2n}}{n} \right) \end{aligned}$$
(15)

This leads to evaluate \(\varPsi _n(t)\) in the second set, i.e. for \(n^{-q}< |t| < c\).

Proposition 5

Let \(0< q < 1/2\). Then there are two positive constants a, c such that, for every real t satisfying \(n^{-q}< |t| < c\),

$$ |\varPsi _n(t)| = O\left( e^{-a n^{1-2q}} \right) $$

Proof

From Eq. (15), taking \(a= \min \{\gamma _1,\gamma _2\}/4\), we obtain for some \(c>0\)

$$ |\varPsi _n(t)| \le \frac{|y_1(t)|^n+|y_2(t)|^n}{\varTheta (n^{1-q})} + O\left( {e^{-at^2n}/n}\right) \ \qquad \text{ for } \text{ all } n^{-q}< |t| < c $$

and by (14) we get \(|\varPsi _n(t)| = O\left( n^{q-1} e^{-an^{1-2q}}\right) \) proving the result.    \(\square \)

Now, let us evaluate \(\ \varPsi _n(t)\ \) in the first set, that is for \(\ |t|\le n^{-q}\) where \(1/3< q < 1/2\). First note that, by simple computations, the following relations can be proved:

$$\begin{aligned} \int _{|t|\le n^{-q}}O\left( {e^{-\frac{\gamma _j}{4}t^2n}}/n\right) dt = O(n^{-1-q}) = o(n^{-4/3}) \qquad \text{ for } j=1,2, \\ \int _{|t|\le n^{-q}} O(t)\frac{y_1(t)^n - y_2(t)^n}{it\;(\beta _1-\beta _2)\; n} dt = \int _{|t|\le n^{-q}} O(1/n) dt = o(n^{-4/3}) \end{aligned}$$

Therefore, by Eq. (15), for every \(k \in \{0,1,\ldots ,n\}\) we get

$$\begin{aligned} \int _{|t|\le n^{-q}} \varPsi _n(t) e^{-ikt} dt = \int _{|t|\le n^{-q}} \left( \frac{y_1(t)^n - y_2(t)^n}{it\;(\beta _1-\beta _2)\; n} \right) e^{-ikt} dt \ + \ o(n^{-4/3}) \end{aligned}$$
(16)

Also observe that both \(y_1(t)\) and \(y_2(t)\) satisfy Eq. (5), whence

$$ y_j(t)^n = e^{-\frac{\gamma _j}{2}t^2n + i\beta _j t n} (1 + O(t^3)n) \quad \text { for all } |t|\le n^{-q} , \quad j=1,2 $$

Thus, replacing these values in (16), after some computations (similar to the previous ones) we obtain the following

Proposition 6

For every \(k \in \{0,1,\ldots ,n\}\) and every \(1/3< q < 1/2\) we have

$$ \int _{|t|\le n^{-q}} \varPsi _n(t) e^{-ikt} dt = \int _{|t|\le n^{-q}} \left( \frac{e^{-\frac{\gamma _1}{2}t^2n + i\beta _1 t n} - e^{-\frac{\gamma _2}{2}t^2n + i\beta _2 t n}}{it\;(\beta _1-\beta _2)\; n} \right) e^{-ikt} dt + o(1/n) $$

4.2 Main Result

Without loss of generality assume \(\beta _1 < \beta _2 \), and denote by \(f_U(x)\) the density function of a uniform r.v. U in the interval \([\beta _1,\beta _2]\), that is

$$f_U(x) = \frac{1}{\beta _2-\beta _1} \chi _{[\beta _1,\beta _2]}(x)\ \qquad \forall x \in \mathbb {R}$$

where \(\chi _I\) denotes the indicator function of the interval \(I\subset \mathbb {R}\).

For our purpose we need the following property.

Lemma 7

For \(k,m \in \mathbb {N}\), \(k < m\), let \(g:[2k\pi ,2m\pi ]\rightarrow \mathbb {R}_+\) be a monotone function, and let \(I_{k,m} = \int _{2k\pi }^{2m\pi } g(x) \sin (x) dx \). Then:

  1. (a)

    if g is non-increasing we have \(\ 0 \le I_{k,m} \le 2[g(2k\pi )-g(2m\pi )]\);

  2. (b)

    if g is non-decreasing we have \(\ 2[g(2k\pi )-g(2m\pi )] \le I_{k,m} \le 0 \).

    In both cases \(|I_{k,m}| \le 2|g(2k\pi )-g(2m\pi )|\).

Proof

If g is non-increasing, for each integer \(k\le j < m\) the following relations hold

$$ I_{j,j+1} = \int _{2j\pi }^{(2j+1)\pi } g(x) \sin (x) dx - \int _{(2j+1)\pi }^{2(j+1)\pi } g(x) |\sin (x)| dx \le 2[g(2j\pi ) - g(2(j+1)\pi )] $$

while \(0 \le I_{j,j+1} \) is obvious. Thus (a) follows by summing the expressions above for \(j=k,\ldots ,m-1\).

Part (b) follows by applying (a) to the function \(h(x) = g(2m\pi ) - g(x)\).    \(\square \)

Now, we can state our main result.

Theorem 8

Let \((\xi ,A,B, \eta )\) be a linear representation of the form (8) satisfying conditions (A), (B), (C) above; also assume \(\lambda _1=\lambda _2\), \(\beta _1 \ne \beta _2\) together with the aperiodicity condition (10). Then, the r.v. \(Y_n\) satisfies the relation

$$\begin{aligned} \lim _{n\rightarrow +\infty } \ n\; \text{ Pr }(Y_n= k) \ = \ f_U(x) \end{aligned}$$
(17)

for every integer \(k=k(n)\), provided that \(k/n\rightarrow x\) for a constant x such that \(\beta _1 \ne x \ne \beta _2\) (as \(n\rightarrow +\infty \)).

Proof

It is known [10] that the probability \(p_n(k)= \text{ Pr }\left\{ Y_n = k \right\} \), for every \(k\in \{0,1,\ldots ,n\}\), can be obtained from \(\varPsi _n(t)\) by the inversion formula

$$ p_n(k) \, = \, \frac{1}{2\pi } \int _{-\pi }^{\pi } \varPsi _n(t) e^{-itk} dt $$

To evaluate the integral above let us split the interval \([-\pi ,\pi ]\) into the three sets

$$[-n^{-q},n^{-q}]\ , \qquad \{t\in \mathbb {R}: n^{-q}< |t| <c \}, \qquad \{t\in \mathbb {R}: c \le |t| \le \pi \}$$

with c as in Proposition 5 and some \(1/3< q < 1/2\). Then, by Propositions 4, 5, 6, we obtain

$$\begin{aligned} p_n(k) = \frac{1}{2\pi } \int _{|t|\le n^{-q}} \left( \frac{e^{-\frac{\gamma _2}{2}t^2n + i\beta _2 t n} - e^{-\frac{\gamma _1}{2}t^2n + i\beta _1 t n}}{it\;(\beta _2-\beta _1)\; n} \right) e^{-ikt} dt + o(1/n) \end{aligned}$$
(18)

Now, set \(v=k/n\) and note that for \(n\rightarrow +\infty \), v approaches a value different from \(\beta _1\) and \(\beta _2\). Thus, defining

$$ \varDelta _n(v) = \int _{|t|\le n^{-q}} \frac{e^{i(\beta _2 - v)tn - \frac{\gamma _2}{2} t^2n} - e^{i(\beta _1 - v)tn - \frac{\gamma _1}{2} t^2n}}{i(\beta _2 - \beta _1) t} \; dt $$

we have to prove that

$$\begin{aligned} \varDelta _n(v) = 2\pi f_U(v) + o(1) \end{aligned}$$
(19)

Since \(\beta _1 < \beta _2\) set \(\delta = \beta _2 - \beta _1\). Then, \(\varDelta _n(v)\) is the integral of the difference of two functions of the form

$$ A_n(t,v) = \frac{e^{i(\beta - v)tn - \frac{\gamma }{2} t^2n} - 1}{i\delta t} $$

where \(\beta \) and \(\gamma \) take the values \(\beta _2\), \(\gamma _2\) and \(\beta _1\), \(\gamma _1\), respectively. Using the symmetries of real and imaginary part of \(A_n\), by a change of variable we get

$$\begin{aligned} \int _{|t|\le n^{-q}} A_n(t,v) dt \ = \frac{2}{\delta } \int _0^{n^{-q}} \frac{e^{-\frac{\gamma }{2} t^2n} \sin ((\beta -v)tn)}{t}\, dt \nonumber \\ = \ \frac{2}{\delta } \int _0^{(\beta -v)n^{1-q}} \frac{\sin (u)}{u}\, du - \frac{2}{\delta } \int _0^{(\beta -v)n^{1-q}} \left( 1 - e^{-\frac{\gamma u^2}{2(\beta -v)^2n}}\right) \frac{\sin (u)}{u}\, du \end{aligned}$$
(20)

As \(\int _{0}^{+\infty } \frac{\sin (u)}{u} du = \pi /2\), for \(n\rightarrow +\infty \) the first term converges to \(\frac{\pi }{\delta } \text{ sgn }(\beta -v)\). Now we show that the second term of (20) tends to 0 as \(n\rightarrow +\infty \). This term is equal to

$$\begin{aligned} \frac{2}{\delta } \int _0^{(\beta -v)n^{1-q}} B_n(u) \sin (u) du \end{aligned}$$
(21)

where \(B_n(u) = u^{-1} \left( 1 - e^{-\frac{\gamma u^2}{2(\beta -v)^2n}}\right) \). To evaluate (21) we use Lemma 7. Note that \(B_n(u)>0\) for all \(u>0\), and \(\lim _{u\rightarrow 0} B_n(u) = 0 = \lim _{u\rightarrow +\infty } B_n(u)\). Moreover in the set \((0,+\infty )\) its derivative is null only at the point \(u_n = \alpha |\beta - v| \sqrt{n/\gamma }\), for a constant \(\alpha \in (1,2)\) independent of n and v. Thus, for n large enough, \(u_n\) belongs to the interval \((0,|\beta -v| n^{1-q})\), \(B_n(u)\) is increasing in the set \((0,u_n)\) and decreasing in \((u_n,+\infty )\), while its maximum value is

$$ B_n(u_n) = \frac{1-e^{-\frac{\alpha ^2}{2}}}{\alpha |\beta -v|} \sqrt{\frac{\gamma }{n}} = \varTheta (n^{-1/2}) $$

Defining \(k_n= \lfloor \frac{u_n}{2\pi }\rfloor \) and \(K = \lfloor \frac{|\beta - v| n^{1-q}}{2\pi } \rfloor \), we can apply Lemma 7 to the intervals \([0,2k_n]\) and \([2k_n+2,2K]\), to get

$$\begin{aligned}&{\left| \int _0^{|\beta - v| n^{1-q}} B_n(u) \sin u du \right| \le 2 B_n(2k_n\pi ) + \left| \int _{2k_n\pi }^{2(k_n+1)\pi } B_n(u) \sin u du \right| } \\&\qquad \ \ \ \ + 2[B_n(2(k_n+1)\pi ) - B_n(2K\pi )] + \int _{2K\pi }^{|\beta - v| n^{1-q}} B_n(u) \sin u du \\&\le 2 B_n(2k_n\pi ) + 2 B_n(u_n) + 2[B_n(2(k_n+1)\pi ) - B_n(2K\pi )] + 2 B_n(2K\pi ) \\&\le 6 B_n(u_n) \ = \ \frac{c \ \sqrt{\gamma }}{|\beta - v| \sqrt{n}} \end{aligned}$$

where c is a positive constant independent of v and n.

This implies that, for any v approaching a constant different from \(\beta _1\) and \(\beta _2\), the second term of (20) is \(O(n^{-1/2})\). Therefore, we get

$$\begin{aligned} \varDelta _n(v)&= \displaystyle \frac{2}{\delta } \left[ \int _{0}^{(\beta _2 -v) n^{1-q}} \frac{\sin u}{u} du - \int _{0}^{(\beta _1 -v) n^{1-q}} \frac{\sin u}{u} du \right] + \text{ O }(n^{-1/2}) \\&\displaystyle = \frac{\pi }{\delta } \left[ \text{ sgn }(\beta _2-v) - \text{ sgn }(\beta _1-v) \right] + o(1) \ = \ 2\pi f_U(v) + o(1) \end{aligned}$$

which proves Eq. (19) and the proof is complete.    \(\square \)

A typical consequence of this result is that \(n \text{ Pr }(Y_n= \lfloor xn \rfloor )\) converges to \(f_U(x)\) for every real x different from \(\beta _1\) and \(\beta _2\). Intuitively equalities of the form (17) are considered more precise than convergence in distribution since they estimate the probability that the n-th random variable of the sequence takes a specific value rather than lying on an interval.

On the other hand we observe that (without condition (10)) the convergence in distribution (9) does not imply our equality (17). In particular if there are periodicity phenomena in the occurrences of letter a it may happen that (9) holds while (17) does not. For instance if the overall series r of linear representation \((\xi ,A,B,\eta )\) has non-zero coefficients (rw) only for words w with even \(|w|_a\), then \(\text{ Pr }(Y_n=k) = 0\) for all odd integers k, and hence (17) cannot hold while (9) may still be valid. This observation also shows that condition (10) prevents such periodicity phenomena in the stochastic model.

5 Conclusions

In this work we have presented a non-Gaussian local limit property for the number of occurrences of a symbol in words generated at random according with a rational stochastic model defined by a linear representation with two primitive components. Our result concerns the case when the two components have the same dominant eigenvalue but different main constants of the respective mean value (\(\beta _1\) and \(\beta _2\)). We expect that in case of different dominant eigenvalues a Gaussian local limit property holds, where the main terms of mean value and variance correspond to the dominant component. On the contrary, we conjecture that results similar to ours (that is of a non-Gaussian type) hold for other rational stochastic models, defined by assuming different hypotheses on the key parameters associated to mean value and variance of the statistic of interest (e.g. \(\beta _1=\beta _2\)), or assuming more than two primitive components with equal dominant eigenvalues.