1 Introduction and model

A basic and important problem in biology is to gain an understanding of the dynamics of the group formation process of social animals, which are animals who spend their lives in groups, for instance, wolves, gazelles, elephants, lions, etc. In order to solve this problem, biologists have proposed many models for animal grouping, e.g., fusion/fission models, kinship models and models based on game theory; see the introduction of Durand et al. (2007) for a detailed discussion.

Some of these models use genetic relatedness as one of the driving forces behind the group formation process. Moreover, in many real-world studies, it has been observed that animals within a group are indeed often genetically related. Thus, in Durand et al. (2007), the authors proposed a simplification of previous models where only genetic relatedness is used to decide which animals belong to the same group.

The advantage of such a simplified model is that one can use the coalescent process in order to define group patterns. Moreover, the model is simple enough to devise statistical tests with which one can test whether genetic relatedness really is the major driving force behind the group formation process. For this, the authors of Durand et al. (2007) defined the extra clustering model which depends on a parameter \(0\le p\le 1\). The parameter gives the probability of additional group formation which does not correspond to genetic relatedness. Hence, for \(p=0\), no other factors than genetic relatedness are present and in this case, the authors of Durand et al. (2007) called their model the neutral model. From a statistical point of view, one is now interested in testing the hypothesis \(p=0\) against \(p>0\). For this purpose, the authors of Durand et al. (2007) used the maximum-likelihood test and applied it to real-world data. The outcome was a good fit of the neutral model for many classes of social animals except classes which have many predators, likely, because in this case, security is another important reason why animals huddle together.

A first probabilistic analysis of the extra clustering model was carried out by Durand and Franco̧is (2010) who derived asymptotic expansions for the mean number of groups. However, the knowledge of only the mean value might give little information about the distribution of the number of groups. Thus, in this paper, we will look at more refined properties. Firstly, we will derive asymptotic expansions of variances and higher moments of the number of groups. These results will show that there is a strong concentration around the mean value for the neutral model (\(p=0\)), whereas for the extra clustering model with \(0<p<1\) such a concentration does not take place. Secondly, we will prove limiting distribution results for all values of p which further highlight the above mentioned concentration phenomena and also give precise insight into the behaviour of these limiting distributions. In the neutral model, a surprising central limit theorem holds which from a theoretical angle adds a further layer of richness to the model proposed by Durand et al. In the case \(0<p<1/2\) there is not only no concentration but a mass concentration at 0 with probability \(p/(1-p)\), which means that there is positive probability that the number of groups is significantly smaller than the expected number. Finally, a phase change will be observed at \(p=1/2\), where the transition from many small groups to one big group takes place.

Before going into more details, we will give a precise definition of the extra clustering model. We start with the case \(p=0\) (neutral model). Here, the model is defined via the Kingman coalescent (Kingman 1982): start with n animals which are considered to be singleton units; at every time point pick uniformly at random two units and let them coalesce; continue this until only one unit is left. This random process can be depicted by a rooted, binary tree, where the animals are the leaves and every coalescent event corresponds to the creation of an internal node. If the leaves are drawn at the bottom and the root at the top, then the Kingman coalescent corresponds to a random process building the tree bottom-up; see Fig. 1. Alternatively, one can build a random tree top-down as follows: start with the root and two leaves; choose a leave uniformly at random and replace it by an internal node with two leaves; do this until n leaves are created. It is well-known that these two random processes yield the same random model on the set of all rooted, binary trees; see Blum et al. (2006). Moreover, this random model is also equivalent to the Yule-Harding model on phylogenetic trees; see Chang and Fuchs (2010) for details.

Fig. 1
figure 1

The tree arising from the coalescent process applied to five animals (grey nodes). The number of groups in this example is two

We recall some properties of the above random tree. First, if the two subtrees of the root have size j and \(n-j\), respectively, then given the size, the two subtrees are again random trees generated by the same model. Moreover, the (random) size of the subtrees is j and \(n-j\) with \(1\le j\le n-1\) with equal probabilities, i.e., probability \(1/(n-1)\); for these properties see, e.g., Chang and Fuchs (2010).

The above random tree was used in Durand et al. (2007) to define the random number of groups. More precisely, consider n animals and construct the above random tree. This random tree describes genetic relatedness of the animals. In particular, for a given leaf of the tree, all the animals belonging to the subtree rooted at the father are genetically closely related to the leaf and this set of animals is called a clade; see Blum and François (2005) and Chang and Fuchs (2010). The number of groups of the n animals is now given by the number of maximal clades; see Fig. 1. In the sequel, we will denote this number by \(X_n\). From the top-down construction of the random tree and the above stochastic properties, we immediately see that \(X_n\) satisfies the following distributional recurrence

$$\begin{aligned} X_n\mathop {=}\limits ^{d} {\left\{ \begin{array}{ll} 1,&{}\quad \text {if}\ I_n\in \{1,n-1\};\\ X_{I_n}+X_{n-I_n}^{*},&{}\quad \text {otherwise}, \end{array}\right. } \qquad (n\ge 3), \end{aligned}$$
(1)

where \(X_2=1\), \(I_n\) has a uniform distribution on \(\{1,\ldots ,n-1\}\), and \(X_n^{*}\) denotes an independent copy of \(X_n\). This recurrence is explained as follows: the number of groups is computed as the sum of the number of groups of the subtrees of the root unless there is only one maximal clade which is the case if and only if one of the subtrees has size one.

Recurrences of the above type have been extensively studied over the last few decades because they also arise in the analysis of certain algorithms and data structures from computer science. In particular, in Hwang and Neininger (2002), the authors proposed a very general framework to limit laws of sequences of random variable satisfying distributional recurrence similar to (1). Our above recurrence, although closely related, however, does not fall into the framework of Hwang and Neininger (2002). In particular, new phenomena not observed before for these recurrences will appear and this makes a detailed analysis of (1) highly interesting.

We next explain the extra clustering model from Durand et al. (2007). As mentioned before, this model depends on a probability p which describes the probability of extra clustering in the group formation process. More precisely, the recurrence (1) for the number of groups is replaced by the following distributional recurrence

(2)

where \(X_2=1\) and notation is as above. Note that \(p=0\) corresponds to the neutral model. For this model, the authors in Durand and Franco̧is (2010) computed the following asymptotic expansion of the mean

(3)

where

$$\begin{aligned} c(p)=\frac{1}{e^{2(1-p)}}\int _{0}^{1}(1-t)^{-2p}e^{2(1-p)t} (1-(1-p)t^2)\,\mathrm{d}t. \end{aligned}$$

We will refine this result by proving asymptotic expansions for the variance and all higher moments and by investigating the limiting distribution of \(X_n\) for all p. Our results together with discussions and comparisons with the results from Durand and Franco̧is (2010) will be given in the next section.

We conclude the introduction by pointing out that a preliminary version of this paper already appeared as an extended abstract (Drmota et al. 2014). The present version contains full proofs of our results [in Drmota et al. (2014) the proof of the neutral model was only sketched and only some special cases of the extra clustering model were treated]. Moreover, we correct the expression for the density of the continuous part of the limiting distribution in the case \(0<p<1/2\) which was stated wrongly in Drmota et al. (2014).

2 Results

In this section, we will state our results and discuss them. We start with the neutral model. Note that in this case, we have from (3) that \({\mathbb {E}}(X_n) = (1-e^{-2})n/4 + {{\mathcal {O}}}(1)\).

Theorem 1

Suppose that \(p=0.\) Then,  we have

$$\begin{aligned} \mathrm{Var}(X_n)=\frac{(1-e^{-2})^2}{4}n\log n+cn+{{\mathcal {O}}}(\log n) \end{aligned}$$
(4)

with

$$\begin{aligned} c= & {} \frac{(4\gamma -3)(1-e^{-2})^2}{16}\\&+e^{-2}\int _{0}^{1}\left( \frac{(1-(1+2t-2t^2)e^{2t})^2}{8(1-t)^2e^{2t}}+(1-t)^2e^{2t}-\frac{e^2(1-e^{-2})^2(3-2t)}{8(1-t)^2}\right) \mathrm{d}t\\= & {} -\,0.45679\cdots , \end{aligned}$$

where \(\gamma \) denotes Euler’s constant,  and for all \(k\ge 3,\)

$$\begin{aligned} {{{\mathbb {E}}}}\left( X_n-{{{\mathbb {E}}}}(X_n)\right) ^k\sim (-1)^k\frac{2k}{k-2}\left( \frac{1-e^{-2}}{4}\right) ^kn^{k-1}. \end{aligned}$$

Furthermore, 

$$\begin{aligned} \frac{X_n-{{{\mathbb {E}}}}(X_n)}{\sqrt{\mathrm{Var}(X_n)/2}}\mathop {\longrightarrow }\limits ^{d} N(0,1), \end{aligned}$$

where N(0, 1) denotes the standard normal distribution,  and

$$\begin{aligned} X_n \sim {\mathbb {E}}(X_n)\quad \text {a.s.} \end{aligned}$$

with the coupling arising from the top-down construction of the random tree.

Remark 1

Note that according to the above result, the standard deviation has order \(\sqrt{n\log n}\) which shows strong concentration of the number of groups around the mean (which is of order n). For instance, for 100 animals (\(n=100\)), we obtain a standard deviation of \(6.35582\cdots \) Comparing this with the real value \(6.82125\cdots \) which can be computed from (1), we see that this is a quite good approximation.

However, the convergence to the normal limit law is slow; see Fig. 2 for a plot of the limiting distribution functions and the exact distribution function for \(n=100,200,400,800\) which were computed from (1) (computations of the exact distribution function for n beyond 1000 are getting rather time-consuming). Looking at the data gathered in Durand et al. (2007), sample sizes are too small to make our limit result applicable (the only larger class considered in Durand et al. (2007) are browsing springboks from the Etosha National Park in Namibia where 1064 animals have been observed).

Remark 2

Mathematically there are two surprising facts. First, there is the curious normalization by half of the variance. Note that a similar (but seemingly unrelated) phenomenon was also observed by Janson and Kersting (2011) in their analysis of the total external path length of the Kingman coalescent. A probabilistic proof of the above central limit theorem shedding further light on the curious normalization was given recently by Janson; see Janson (2014). Second, the asymptotics of centralized moments do not correspond to the moments of the normal distribution. Thus, the central limit theorem cannot be found from the method of moments; for the latter method see Section 30 in Billingsley (1995). This is in sharp contrast to Hwang and Neininger (2002), where the method of moments was applied to many examples of \(X_n\) satisfying a recurrence similar to (1).

Fig. 2
figure 2

The distribution functions for \(n=100,200,400,800\) and limiting distribution function of the number of groups under the neutral model

We next turn to the extra clustering model with \(p>0\). From the data presented in Durand et al. (2007), we see that p usually does not exceed 1 / 2. Thus, the case \(0<p<1/2\) (together with \(p=0\)) is of particular relevance for real-world applications and this is the range we will treat next.

Theorem 2

Suppose that \(0<p<1/2\). Then,  for all \(k\ge 1,\)

$$\begin{aligned} {{{\mathbb {E}}}}(X_n^k)\sim \frac{d_k}{\Gamma (k(1-2p)+1)}n^{k(1-2p)}, \end{aligned}$$

where \(d_k\) is recursively given by \(d_1=c(p)\) and for \(k\ge 2\)

$$\begin{aligned} d_k=\frac{1-p}{(k-1)(1-2p)}\sum _{j=1}^{k-1}\left( {\begin{array}{c}k\\ j\end{array}}\right) d_{j}d_{k-j}. \end{aligned}$$

Moreover, 

$$\begin{aligned} \frac{X_n}{n^{1-2p}}\mathop {\longrightarrow }\limits ^{d} X \end{aligned}$$

with convergence of all moments,  where X is the sum of a discrete distribution of measure \(p/(1-p)\) that is concentrated at 0 and a continuous distribution on \([0,\infty )\) with density

$$\begin{aligned} f(x)=-\delta (p)\frac{1-2p}{1-p}\sum _{k\ge 0}\frac{\delta (p)^k}{k!\Gamma (2(k+1)p-k)}x^k, \end{aligned}$$
(5)

where

$$\begin{aligned} \delta (p)=\frac{(1-2p)^2W_{p,(1-2p)/2}(-2(1-p))}{e^{2\pi ip}4^{p-1}(1-p)^{2p}M_{p,(1-2p)/2}(-2(1-p))}, \end{aligned}$$

and where \(M_{\kappa ,\mu }(z)\) and \(W_{\kappa ,\mu }(z)\) are the Whittaker M and W functions (see Section 6.7 in Beals and Wong 2010).

Remark 3

The most remarkable fact is that the limiting distribution has a discontinuous part at 0 (with probability \(p/(1-p)\)) which shows that \(X_n\) is significantly smaller than \({\mathbb {E}}(X_n)\) with positive probability \(p/(1-p)\). It is also of interest to look at the densities of the continuous part. Figure 3 shows a plot of the density functions for several values of p. Note that the density is getting less peaked for p closer to 1 / 2 which reflects the fact that the distribution becomes less and less concentrated. In particular, for \(p=1/4\), one obtains

$$\begin{aligned} f(x)=0.3780064347\cdots e^{-0.2525054668\cdots x^2}. \end{aligned}$$

For other values of p the resulting expressions are usually less explicit.

Note also that in contrast to the asymptotic expansion of the mean (\(k=1\)), for the variance

$$\begin{aligned} \mathrm{Var}(X_n)\sim \left( \frac{2(1-p)}{(1-2p)\Gamma (3-4p)}-\frac{1}{\Gamma (2(1-p))^2}\right) c(p)^2n^{2(1-2p)}, \end{aligned}$$
(6)

if we let \(p\rightarrow 0\), we do not recover the result of the neutral model. Thus one expects a less accurate approximation in (6) for p close to 0; see Fig. 4 which shows a plot of the relative error of the standard deviation for \(n=100\) when using the approximation of (6) and different values of p. We see that indeed the error is large for small values of p. Moreover, the error is also large for values of p approaching 1 / 2. The latter was also observed for the mean in Durand et al. (2007), however, the approximation of the mean is also very accurate for small values of p. The minimum of the relative error in Fig. 4 is attained at a value of p close to 0.09 (that is why we plotted the relative error of the standard deviation only for values of p in the vicinity of this minimum).

Fig. 3
figure 3

The density f(x) of the continuous part of the limiting distribution of the number of groups X for \(p=1/8, 3/16, 1/4, 5/16\) (top to bottom)

Fig. 4
figure 4

Relative error of standard deviation of the number of groups for \(n=100\) as a function of p

Remark 4

We have again plotted the limiting distribution functions and the exact distribution functions for \(n=100,200,400,800\) and two values of p, namely, \(p=0.02\) and \(p=0.24\); see Fig. 5. The reason for these two choices of p comes from Durand et al. (2007). The former is the maximal-likelihood estimate of p for Alaska wolves and the second is the maximum-likelihood estimate for browsing springboks from the Etosha National Park in Namibia (they have one of the smallest values of p and the largest value of p, respectively, from the real-world data presented in Durand et al. 2007).

Fig. 5
figure 5

The distribution functions for \(n=100,200,400,800\) of \(p=0.02\) (left Alaska wolves) and \(p=0.24\) (right browsing springboks) and their limiting distribution functions

The remaining range of \(1/2\le p\le 1\) is less important from a practical point of view. Nevertheless, we will give results for this range as well for the sake of completeness. Our results show again that no strong concentration around the mean takes place (except in the trivial case \(p=1\)). We start with the case \(p=1/2\).

Theorem 3

Suppose that \(p=1/2\). Then,  we have

$$\begin{aligned} {{{\mathbb {E}}}}(X_n^k)\sim \frac{k!J_{2k-1}}{(2k-1)!2^{2k-1}}\log ^{2k-1}n, \end{aligned}$$

where \(J_{2k-1}\) are the Euler numbers of odd index (see,  e.g.,  page 144 in Flajolet and Sedgewick 2009). Furthermore, 

$$\begin{aligned} X_n\mathop {\longrightarrow }\limits ^{d} X, \end{aligned}$$

where X has a discrete law on \(\{1,2,\ldots \}\) which is given by

$$\begin{aligned} P(X = k)=\frac{2^{-2k}}{2k-1}{2k \atopwithdelims ()k}. \end{aligned}$$

Note that in this case the moments of \(X_n\) do not converge.

Finally, we turn to the case \(1/2<p\le 1\).

Theorem 4

Suppose that \(1/2<p\le 1\). Then,  for all \(k\ge 1,\)

$$\begin{aligned} {{{\mathbb {E}}}}(X_n^k)\sim e_k, \end{aligned}$$

where \(e_k\) is recursively given by \(e_1=p/(2p-1)\) and for \(k\ge 2\)

$$\begin{aligned} e_k=\frac{1-p}{2p-1}\sum _{j=1}^{k-1}\left( {\begin{array}{c}k\\ j\end{array}}\right) e_{j}e_{k-j}+\frac{p}{2p-1}. \end{aligned}$$

Moreover, 

$$\begin{aligned} X_n\mathop {\longrightarrow }\limits ^{d} X \end{aligned}$$

with convergence of all moments,  where X has a discrete law on \(\{1,2,\ldots \}\) which is given by

$$\begin{aligned} P(X = k)=\frac{p^k(1-p)^{k-1}}{2(2k-1)}{2k \atopwithdelims ()k}. \end{aligned}$$

Remark 5

Note that Theorems 3 and 4 can be merged. However, there is an significant difference between these results: in Theorem 3 we only have convergence in distribution, whereas in Theorem 4 also all moments do converge.

Overall, our above results combined give a full picture of the limiting behavior of the number of groups under the extra clustering model. In particular, we see that the limit law is continuous for \(p=0\), is a mixture of a discrete and continuous distribution for \(0<p<1/2\), and finally becomes discrete as \(1/2\le p\le 1\) (and degenerates at \(p=1\)). The most important range for real-world applications is \(0\le p<1/2\). Here, a notable phenomenon is the strong concentration around the mean for the neutral model which, however, does not hold for the extra clustering model with \(p>0\). This shows a quantitative difference between the neutral model and the extra clustering model with \(p>0\), something which was not visible from the previous analysis of the mean value (Durand and Frano̧is 2010). Moreover, from a mathematical point of view, an interesting aspect is that the limit law can be obtained via the method of moments if and only if \(0<p<1/2\) and \(1/2<p\le 1\), but not in the two cases \(p=0\) and \(p=1/2\).

We conclude the introduction with a short sketch of the paper. Since the derivation of the moments of \(X_n\) is quite technical but standard, we have put this analysis to Appendix 1 (as main tool we will use singularity analysis which will briefly reviewed at the beginning of this appendix). In Sect. 3, we will introduce the mathematical tools needed for the proofs of our limit laws. More precisely, this section will contain a short discussion of Whittaker functions and some of their properties which are needed in the proofs. Moreover, we will explain our approach to limit laws via singularity perturbation analysis. The proofs of the limiting distribution results will then be contained in Sect. 4 for \(p=0\) and Appendix 2 for \(0<p\le 1\). We will end the paper with a conclusion.

3 Whittaker functions and singularity perturbation analysis

In this section, we will explain our analytic method used for proving our limiting distribution results of the number of groups under the extra clustering model (Theorems 14). The method will rely on the explicit solution of (7) which will involve Whittaker functions. Thus, properties of Whittaker functions will play a crucial role and we will recall them below. The method itself then uses singularity perturbation analysis and will also be explained in details below.

We consider the moment-generating function of \(X_n\) which by (2) satisfies the recurrence, for \(n\ge 3\),

$$\begin{aligned} {{{\mathbb {E}}}}\big (e^{yX_n}\big )=pe^y+(1-p)\frac{2}{n-1}e^{y}+ \frac{1-p}{n-1}\sum _{j=2}^{n-2}{{{\mathbb {E}}}} \big (e^{yX_j}\big ){{{\mathbb {E}}}}\big (e^{yX_{n-j}}\big ) \end{aligned}$$

with initial condition \({{{\mathbb {E}}}}\left( e^{yX_2}\right) =e^{y}\) (the above sum is equal to 0 for \(n=3\)). Next, set

$$\begin{aligned} X(y,z)=\sum _{n\ge 2}{{{\mathbb {E}}}}\big (e^{yX_n}\big )z^n. \end{aligned}$$

Then, by a straightforward computation

$$\begin{aligned} z\frac{\partial }{\partial z}X(y,z)=X(y,z)+(1-p)X(y,z)^2+e^y\frac{z^2(1-(1-p)z^2)}{(1-z)^2} \end{aligned}$$
(7)

with \(X(y,0)=0\).

Solution of (7). Note that (7) is a Riccati differential equation for which a standard solution procedure exists. Therefore, set

$$\begin{aligned} {\tilde{X}}(y,z)=\frac{X(y,z)}{z}. \end{aligned}$$

Then, (7) becomes

$$\begin{aligned} \frac{\partial }{\partial z}{\tilde{X}}(y,z)=(1-p){\tilde{X}}(y,z)^2+e^{y}\frac{1-(1-p)z^2}{(1-z)^2} \end{aligned}$$

with \({\tilde{X}}(y,0)=0\). Next, set

$$\begin{aligned} {\tilde{X}}(y,z)=-\frac{1}{1-p}\cdot \frac{V'(y,z)}{V(y,z)}, \end{aligned}$$

where \(V(y,0)=1\) and differentiation is with respect to z. Then, we obtain the second-order differential equation

$$\begin{aligned} V''(y,z)+(1-p)e^y\frac{1-(1-p)z^2}{(1-z)^2}V(y,z)=0 \end{aligned}$$

with \(V(y,0)=1\) and \(V'(y,0)=0\). This differential equation is a variant of Whittaker’s differential equation. Thus, its solution can be expressed in terms of the Whittaker functions as follows

$$\begin{aligned} V(y,z)= & {} M_{-(1-p)e^{y/2},\sqrt{1-4p(1-p)e^{y}}/2} (2(1-p)e^{y/2}(z-1))\\&+\,c(y)W_{-(1-p)e^{y/2},\sqrt{1-4p(1-p)e^{y}}/2} (2(1-p)e^{y/2}(z-1)) \end{aligned}$$

with

$$\begin{aligned} c(y)=\frac{(1+\sqrt{1-4p(1-p)e^{y}}-2(1-p)e^{y/2}) M_{-(1-p)e^{y/2}+1,\sqrt{1-4p(1-p)e^{y}}/2} (-2(1-p)e^{y/2})}{2W_{-(1-p)e^{y/2}+1,\sqrt{1-4p(1-p)e^{y}}/2} (-2(1-p)e^{y/2})}. \end{aligned}$$

We will work in the next section with this explicit solution. Consequently, we will need some background knowledge on Whittaker functions which we will recall next.

Whittaker functions. Here, we gather some properties of the Whittaker functions. The exposition will follow Section 6 in Beals and Wong (2010).

We start with the definition of the Whittaker functions which are independent solutions of Whittaker’s differential equation

$$\begin{aligned} v''(z)+\left( -\frac{1}{4}+\frac{\kappa }{z}+\frac{1-4\mu ^2}{4z^2}\right) v(z)=0. \end{aligned}$$

They can be expressed as follows

$$\begin{aligned} M_{\kappa ,\mu }(z)= & {} e^{-z/2}z^{\mu +1/2}M\left( \mu -\kappa +\frac{1}{2},1+2\mu ,z\right) ,\\ W_{\kappa ,\mu }(z)= & {} e^{-z/2}z^{\mu +1/2}U\left( \mu -\kappa +\frac{1}{2},1+2\mu ,z\right) . \end{aligned}$$

Here, M(acz) and U(acz) are the Kummer functions. The former is defined for all \(a,c,z\in {{\mathbb {C}}}\) with \(c\ne 0,-1,-2,\ldots \) by the following series

$$\begin{aligned} M(a,c;z)=\sum _{\ell =0}^{\infty }\frac{(a)_{\ell }}{(c)_{\ell }\ell !}z^\ell , \end{aligned}$$

where \((a)_{\ell }\) is the Pochhammer symbol

$$\begin{aligned} (a)_{\ell }:=a(a+1)\cdots (a+\ell -1). \end{aligned}$$

Note that the above expression shows that M(acz) is analytic in all three variables. The definition of the Kummer function of second kind, namely U(acz), is slightly more involved. More precisely, U(acz) is defined as

$$\begin{aligned} U(a,c;z)=\frac{\Gamma (1-c)}{\Gamma (a+1-c)}M(a,c;z)+\frac{\Gamma (c-1)}{\Gamma (a)}z^{1-c}M(a+1-c,2-c;z) \end{aligned}$$

for all acz with \(c\not \in {{\mathbb {Z}}}\). The definition can be extended to \(c=m\in {{\mathbb {N}}}\) (where the limit exists) as follows

$$\begin{aligned} U(a,m;z)= & {} \frac{(-1)^m}{\Gamma (a+1-m)(m-1)!}\left( M(a,m;z)\log z \right. \\&+\left. \sum _{\ell =0}^{\infty }\frac{(a)_{\ell }}{(m)_{\ell }\ell !} \left( \psi (a+\ell )-\psi (\ell +1)-\psi (m+\ell )\right) z^{\ell }\right) \\&+\frac{(m-2)!}{\Gamma (a)}z^{1-m}\sum _{\ell =0}^{m-2} \frac{(a+1-m)_{\ell }}{(2-m)_{\ell }\ell !}z^{\ell } \end{aligned}$$

with \(\psi (z)=\Gamma '(z)/\Gamma (z)\). We will in the sequel choose the determination of \(\log \) and powers such that we have a branch cut at \([0,\infty )\). Then, from the above definitions we obtain the following lemma.

Lemma 1

Assume that \(\mu \ne -1/2,-1,-3/2,\ldots \) Then,  both Whittaker functions are analytic on \({{\mathbb {C}}}{\setminus }[0,\infty ).\)

Finally, note that the above expressions also give singularity expansions as \(z\rightarrow 0\). For instance, if \(\mu \ne -1/2,-1,-3/2,\ldots \), then

$$\begin{aligned} M_{\kappa ,\mu }(z)\sim z^{\mu +1/2},\quad (z\rightarrow 0) \end{aligned}$$

and if in addition \(\mu \ne 0,1/2,1,\ldots ,\) then

Similar expansions can be found for \(W_{\kappa ,\mu }(z)\) when \(\mu =0,1/2,1,\ldots \) as well.

Singularity perturbation analysis. We will now explain our method of proof of our limit laws from Sect. 1. The method is based on singularity perturbation analysis, a term coined by Flajolet and Lafforgue (1994). The idea is to directly work with the moment-generating function of \(X_n\) which by Cauchy’s integral formula is obtained from X(yz) by

$$\begin{aligned} {{{\mathbb {E}}}}\big (e^{yX_n}\big )=\frac{1}{2\pi i}\int _{\gamma }\frac{X(y,z)}{z^{n+1}}\,\mathrm{d}z. \end{aligned}$$
(8)

Here, y is considered to be a parameter for which we assume that \(\vert y\vert <\eta \) with \(\eta >0\) suitably small.

In order to use (8), one has to choose a suitable contour \(\gamma \) and to study the singularity structure of X(yz). Due to the above explicit expression for X(yz), we see that the singularities are either the branch point singularities (with moving branch-cut) of the Whittaker functions or are poles arising from the zeros of V(yz). By doing a change of variable in (8) (replacing \(e^{y/2}(z-1)\) by \(z-1\)), we can consider

$$\begin{aligned} {\tilde{V}}(y,z)= & {} V(z,1+e^{-y/2}(z-1))=M_{-(1-p)e^{y/2}, \sqrt{1-4p(1-p)e^{y}}/2}(2(1-p)(z-1))\nonumber \\&+\,c(y)W_{-(1-p)e^{y/2},\sqrt{1-4p(1-p)e^{y}}/2} (2(1-p)(z-1)). \end{aligned}$$
(9)

Now, the branch cut is fixed at \([1,\infty )\). As for the zeros of this function, we will prove that there are two cases:

  • Case I: \(p=0\). Here, we will show that for \(\vert z\vert <1+\delta \) with a suitable \(\delta \), we have exactly one zero \(z_0(y)\) of \({\tilde{V}}(y,z)\). Moreover, this zero has the property that it converges to the branch point singularity as y tends to 0.

  • Case II: \(p>0\). Here, we will show that for \(\vert z\vert <1+\delta \) with a suitable \(\delta \), we have no zeros of \({\tilde{V}}(y,z)\).

The second case is more in line with other instances to which singularity perturbation analysis was applied; see Flajolet and Lafforgue (1994) and Chapter X of Flajolet and Sedgewick (2009). In this case, \(\gamma \) will be deformed into a Hankel-type contour; see the right contour in Fig. 6. The asymptotic evaluation of (8) is then immediate and the main term comes from the part of the contour close to the branch point singularity.

Fig. 6
figure 6

The integration contour and singularities in the two cases. The only (but crucial) difference is the additional polar singularity in case I

The first case is more involved, in particular, due to the fact that the polar singularity (arising from the zero of \({\tilde{V}}(y,z)\)) coalesces with the branch point singularity as y tends to 0. Note that a somewhat similar situation was encountered in a recent study of Drmota et al. (2009). In fact, our approach in case I will resemble the one of Drmota et al. (2009). More precisely, we will again deform the contour into the same type of contour as in case II; see the left contour in Fig. 6. This will lead to a contribution coming from the polar singularity by a straightforward application of the residue theorem. Then, in contrast to case II, we will show that the contribution of the branch point singularity is negligible. From this, the unusual central limit theorem of the neutral model will follow.

Remark 6

Analytically, the unusual normalization in Theorem 1 arises from the two coalescing singularities. If, for instance, one would only have a polar singularity, then a central limit theorem with the usual normalization would hold; see Flajolet et al. (1997).

4 Limit laws

In this section, we will prove the limiting distribution result for the neutral model (Theorem 1). The corresponding proofs of the limiting distribution results for the extra clustering model with \(p>0\) (Theorems 24) will be given in Appendix 2. For the proof, we will use singularity perturbation analysis and the properties of the Whittaker functions from the previous section.

We first collect some properties of \({\tilde{V}}(y,z)\) and \({\tilde{V}}'(y,z) = \frac{\partial }{\partial z}{\tilde{V}}(y,z)\) (for the definition see (9)).

Lemma 2

Let \(\vert y\vert <\eta \) and

$$\begin{aligned} {\tilde{\Delta }}=\{z\in {{\mathbb {C}}}\ :\ \vert z\vert <1+\delta ,\, \arg (z-1)\ne 0\}, \end{aligned}$$

where \(\eta ,\delta >0\). Then,  \({\tilde{V}}(y,z)\) and \({\tilde{V}}'(y,z)\) are analytic in \({\tilde{\Delta }}\) and satisfy

$$\begin{aligned} {\tilde{V}}(y,z)= & {} 2(z-1)+2ay+4ay(z-1)\log (z-1)\nonumber \\&+\,{{\mathcal {O}}}(\max \{\vert y\vert ^2,\vert y\vert \vert z-1\vert ,\vert z-1\vert ^2\}),\end{aligned}$$
(10)
$$\begin{aligned} {\tilde{V}}'(y,z)= & {} 2+4ay\log (z-1)+{{\mathcal {O}}}\left( \max \{\vert y\vert ,\vert z-1\vert \}\right) , \end{aligned}$$
(11)

where \(a=(1-e^{-2})/4\).

Proof

This follows from the properties of the Whittaker function from Sect. 3. \(\square \)

Next, we need the following lemma which was already announced in the previous section.

Lemma 3

For \(\eta ,\delta \) sufficiently small,  \({\tilde{V}}(y,z)\) as a function of z has only one (simple) zero \(z_0(y)\) in \({\tilde{\Delta }}\). Moreover,  we have,  as \(y\rightarrow 0,\)

$$\begin{aligned} z_0(y)=1-ay+2a^2y^2\log y+{{\mathcal {O}}}(y^2). \end{aligned}$$

Proof

First note that \({\tilde{V}}(0,z)=2(z-1)e^{z-1}\) which is an entire function with only one (simple) zero at \(z=1\). Next, \({\tilde{V}}(y,z)\) is analytic in both y and z in \({\tilde{\Delta }}\) and thus its zeros vary continuously with y. Also, note that because of (10) of the above lemma, there is no zero in a sufficiently small neighborhood of \(z=1\) for y sufficiently small (the limits as z tends to the branch-cut in the neighborhood are never equal to zero as well). Thus, for \(\eta ,\delta \) sufficiently small, we exactly have one zero in \({\tilde{\Delta }}\) which in addition must move to 1 as y tends to 0. This proves the first claim.

As for the proof of the second claim, we use bootstrapping. We already know that

$$\begin{aligned} z_0(y)=1+o(y). \end{aligned}$$

Plugging this into (10), we obtain that, as \(y\rightarrow 0\),

$$\begin{aligned} z_0(y)=1-ay+o(y). \end{aligned}$$

Using another bootstrapping step, this can be refined to

$$\begin{aligned} z_0(y)=1-ay+2a^2y^2\log y+o(y^2\log y). \end{aligned}$$

Yet another bootstrapping step gives the following refined error bound

$$\begin{aligned} z_0(y)=1-ay+2a^2y^2\log y+{{\mathcal {O}}}(y^2). \end{aligned}$$

This is the second claim. \(\square \)

Now, we state the key lemma for the proof of the central limit theorem.

Lemma 4

Let \(y=it/(2a\sqrt{n\log n})\). Then, 

$$\begin{aligned} {{{\mathbb {E}}}}\big (e^{yX_n}\big )=z_0(y)^{-n}+{{\mathcal {O}}} \left( \frac{1}{\log n}\right) . \end{aligned}$$
(12)

Proof

For the proof, we use Cauchy’s integral formula

$$\begin{aligned} {{{\mathbb {E}}}}\big (e^{yX_n}\big )= & {} [z^n]X(y,z)=-[z^{n-1}] \frac{V'(y,z)}{V(y,z)}\\= & {} -\frac{1}{2\pi i}\int _{\tilde{\gamma }} \frac{V'(y,z)}{V(y,z)}\frac{\mathrm{d}z}{z^n}\\= & {} -\frac{1}{2\pi i}\int _{\gamma }\frac{{\tilde{V}}'(y,w)}{{\tilde{V}}(y,w)}\frac{\mathrm{d}\omega }{(e^{-y/2}(w-1)+1)^n}, \end{aligned}$$

where \(\tilde{\gamma }\) is a small positively oriented circle centered at the origin and the last step follows from the change of variables \(e^{y/2}(z-1)=w-1\). We now deform the contour \(\gamma \) into a contour \(\gamma '\) which is given by \(\gamma '=\gamma _1'\cup \gamma _2'\) with

$$\begin{aligned} \gamma _1'=\{w=1+v/n\ :\ v\in {{\mathcal {H}}}_n\}, \end{aligned}$$

where \({{\mathcal {H}}}_n\) denotes the major part of the Hankel contour with

$$\begin{aligned} {{\mathcal {H}}}_n= & {} \{v\in {{\mathbb {C}}}\ :\ \vert v\vert =1, \mathfrak {R}(v)\le 0\}\\&\cup \,\left\{ v\in {{\mathbb {C}}}\ :\ 0\le \mathfrak {R}(v)\le \sqrt{(1+\delta ')^2n^2-1}-n, \mathfrak {I}(v)=\pm 1\right\} \end{aligned}$$

(here, as usual \(\mathfrak {R}\) and \(\mathfrak {I}\) denote the real part and imaginary part of a complex number) and \(\gamma _2'\) completes the contour with an almost circle of radius \(1+\delta '\) with \(0<\delta '<\delta \); see Fig. 6 where \(\gamma _1'\) is the small noose around the branch cut and the almost circle \(\gamma '_2\) is the remainder of the contour. Note that the above integral then becomes

$$\begin{aligned} {{{\mathbb {E}}}}\big (e^{yX_n}\big )=(e^{-y/2}(z_0(y)-1)+1)^{-n}-\frac{1}{2\pi i}\int _{\gamma '}\frac{{\tilde{V}}'(y,w)}{{\tilde{V}}(y,w)}\frac{\mathrm{d}\omega }{(e^{-y/2}(w-1)+1)^n} \end{aligned}$$

since by the residue theorem, we have to add the residue

$$\begin{aligned} \mathrm{Res}\left( \frac{{\tilde{V}}'(y,w)}{{\tilde{V}}(y,w)}(e^{-y/2}(w-1)+1)^{-n},w=z_0(y)\right) =(e^{-y/2}(z_0(y)-1)+1)^{-n}. \end{aligned}$$

In order to derive (12) from this, we first note that from \(z_0(y)=1+{{\mathcal {O}}}(1/\sqrt{n\log n})\), we obtain that

$$\begin{aligned} (e^{-y/2}(z_0(y)-1)+1)^{-n}=z_0(y)^{-n}\left( 1+{{\mathcal {O}}}\left( \frac{1}{n\log n}\right) \right) ^{-n}=z_0(y)^{-n}+{{\mathcal {O}}}\left( \frac{1}{\log n}\right) . \end{aligned}$$

Next for the integral, note that by (10) and (11) and again \(z_0(y)=1+{{\mathcal {O}}}(1/\sqrt{n\log n})\), we have for \(w\in \gamma _1'\),

$$\begin{aligned} \frac{{\tilde{V}}'(y,w)}{{\tilde{V}}(y,w)}={{\mathcal {O}}}\left( \frac{n}{\log ^2 n}\right) . \end{aligned}$$

Moreover, for \(w\in \gamma _1'\)

$$\begin{aligned} \vert e^{-y/2}(w-1)+1\vert ^{-n}\le \left( 1+\frac{\mathfrak {R}(e^{-y/2}v)}{n}\right) ^{-n}\le e^{-\mathfrak {R}(e^{-y/2}v)}={{\mathcal {O}}}\big (e^{-\epsilon \mathfrak {R}(v)}\big )\qquad \end{aligned}$$
(13)

for a suitable \(\epsilon >0\). Hence, we obtain that for \(w\in \gamma _1'\),

$$\begin{aligned} \frac{{\tilde{V}}'(y,w)}{{\tilde{V}}(y,w)}\cdot \frac{1}{(e^{-y/2}(w-1)+1)^n}={{\mathcal {O}}}\left( \frac{n}{\log ^2 n}e^{-\epsilon \mathfrak {R}(v)}\right) . \end{aligned}$$

Consequently,

$$\begin{aligned}&-\frac{1}{2\pi i}\int _{\gamma '_1}\frac{{\tilde{V}}'(y,w)}{{\tilde{V}}(y,w)}\frac{\mathrm{d}\omega }{(e^{-y/2}(w-1)+1)^n}\\&\quad =-\frac{1}{2\pi i}\int _{{{\mathcal {H}}}_n}{{\mathcal {O}}}\left( \frac{n}{\log ^2 n}e^{-\epsilon \mathfrak {R}(v)}\right) \frac{\mathrm{d}v}{n}={{\mathcal {O}}}\left( {\frac{1}{\log ^2 n}}\right) . \end{aligned}$$

Finally, suppose that \(\vert w\vert =1+\delta '\). First, from the analyticity of \({\tilde{V}}(y,z)\) and \({\tilde{V}}'(y,z)\), we obtain that

$$\begin{aligned} \frac{{\tilde{V}}'(y,z)}{{\tilde{V}}(y,z)}={{\mathcal {O}}}(1). \end{aligned}$$

Moreover,

$$\begin{aligned} \vert e^{-y/2}(w-1)+1\vert \ge \vert w\vert +{{\mathcal {O}}}\left( \frac{1}{\sqrt{n\log n}}\right) \ge 1+\delta '' \end{aligned}$$

for n large enough with \(0<\delta ''<\delta '\). Thus,

$$\begin{aligned} -\frac{1}{2\pi i}\int _{\gamma '_2}\frac{{\tilde{V}}'(y,w)}{{\tilde{V}}(y,w)}\frac{\mathrm{d}\omega }{(e^{-y/2}(w-1)+1)^n}={{\mathcal {O}}} ((1+\delta '')^{-n}). \end{aligned}$$

Putting everything gives

$$\begin{aligned} {{{\mathbb {E}}}}\big (e^{yX_n}\big )= & {} z_0(y)^{-n} +{{\mathcal {O}}}\left( \frac{1}{\log n}\right) +{{\mathcal {O}}}\left( \frac{1}{\log ^2n}\right) +{{\mathcal {O}}}((1+\delta '')^{-n})\\= & {} z_0(y)^{-n}+{{\mathcal {O}}}\left( \frac{1}{\log n}\right) \end{aligned}$$

which is the claimed result. \(\square \)

The proof of the central limit theorem (from Theorem 1) follows now from the last lemma.

Proof of the central limit theorem from Theorem 1

As in Lemma 4 set \(y=it/(2a\sqrt{n\log n})\). Then, by the expansion of \(z_0(y)\) from Lemma 3, we obtain

$$\begin{aligned} z_0(y)=1-\frac{it}{2\sqrt{n\log n}}+\frac{t^2}{4n}+{{\mathcal {O}}}\left( \frac{\log \log n}{n\log n}\right) . \end{aligned}$$

Inserting this into the result from Lemma 4 yields

$$\begin{aligned} {{{\mathbb {E}}}}\left( e^{yX_n}\right) =\exp \left( \frac{it\sqrt{n}}{2\sqrt{\log n}}-\frac{t^2}{4}\right) \left( 1+\frac{\log \log n}{\log n}\right) \end{aligned}$$

and by rearranging

$$\begin{aligned} {{{\mathbb {E}}}}\left( e^{it(X_n-an)/(2a\sqrt{n\log n})}\right) =\exp \left( -\frac{t^2}{4}\right) \left( 1+\frac{\log \log n}{\log n}\right) . \end{aligned}$$

Since \(\exp (-t^2/4)\) is the characteristic function of a normal distribution with mean 0 and variance 1 / 2, the claimed central limit theorem follows from this by Lévy’s continuity theorem. \(\square \)

The proofs of the remaining limiting distribution results can be found in Appendix 2. Note that in the cases \(0< p < 1/2\) and \(1/2 < p \le 1\) our results can be also derived from the moment asymptotics (given in Appendix 1) together with the property that the limiting distribution is characterized by its moments.

5 Conclusion

In this paper, we gave a detailed analysis of the extra clustering model which was recently introduced by Durand et al. (2007) because of two reasons: (i) to model the group formation process of social animals and (ii) to test whether genetic relatedness is the main driving force behind the group formation process. Our analysis extends the previous analysis of Durand and Franco̧is (2010) which was concerned with asymptotic expansions of the mean of the number of groups formed by the animals. We derived all higher moments and completely classified the limiting distribution of the number of groups for all values of p. Our results are most relevant for the range \(0\le p<1/2\) which were the p values observed in real-word data. They show that the distribution of the number of groups is strongly concentrated around the mean for the neutral model, but not for the extra clustering model with \(p>0\). Thus, there is a phase change in the behaviour from \(p=0\) to \(p>0\), something which was not visible from previous results for the mean.

As for limiting distributions, our results show that the limit law is a continuous law for the neutral model (\(p=0\)), a mixture of discrete and continuous law for the extra clustering model with \(0<p<1/2\) and a discrete law for \(1/2\le p\le 1\). This transition from continuous to discrete is in fact expected since the extra clustering model is getting less random as p increases (because animals are more likely to form one huge group).

From a mathematical point of view, our results contain two surprises. First, not in all cases, the limit law can be obtained by the method of moments. In fact, we have seen two cases, namely, \(p=0\) and \(p=1/2\), where we have weak convergence but moments do not converge. The case of the neutral model is in particular surprising because the underlying sequence of random variables satisfies a divide-and-conquer recurrence of a type which often appears in computer science and for which in many previous studies an application of the method of moments led to the limit law. The second surprise is the curious limit law for the neutral model. In fact, our proof does not give a lot of insight of as to why this surprising result holds. A better explanation was given in a recent paper of Janson (2014). However, many things about this result are still shrouded in mystery, in particular, whether such a surprising result also holds for other classes of random trees such as random m-ary search trees (in this work, we considered trees which are equivalent to random binary search trees; for a definition of this family of trees as well as random m-ary search trees see Mahmoud 1992). We hope to come back to this question in a future work.