1 Introduction

Statistical methods based on randomization tests have been useful to determine whether a random process can explain a particular observed pattern of species occurrences at discrete locations. This strategy has contributed to the development of null model analysis of presence-absence (1–0) data in community ecology. Depending on the assumptions made in a particular situation, randomization tests for presence-absence matrices may rely on simulation protocols where the row and column totals of the randomised matrices are allowed to vary. This is the case when the null model assumes that the presence of species k on site l is a Bernoulli-distributed random variable with parameter \(\pi_{kl}\) (the “cell” probability of occurrence of species k on location l, \(\pi_{kl}\), k = 1,.., t; l = 1,…,s). Following ideas used in contingency table analysis by Whittam and Siegel-Causey (1981), Gilpin and Diamond (1982) pioneered this strategy to deliberately construct a null model from which they could test whether species co-occurrences are non-randomly structured, to provide support to Diamond’s (1975) assembly rules. Gilpin and Diamond raised a discussion about the suitability of using an empirical formula for a cell probability versus a modelling approach, by fitting a logistic model. Although these authors did not favour the idea in using logistic regression for the estimation of cell probabilities, other attempts supported that the modelling approach is not unrealistic (Ryti and Gilpin 1987). On this line of thinking, Navarro and Manly (2009) (NM) suggested that null model analyses of presence-absence data require a definition of statistical independence and developed several possible variations of the logistic function for modelling the presence and absence of species on discrete locations, and its further application in null model analysis of species occurrences. In particular, the estimated probabilities \(\hat{\pi }_{kl}\) resulting from those fitted null models naturally induced randomization protocols for the generation of null matrices.

The work of Navarro and Manly (2009) has been cited in varied contexts as plant communities (Götzenberger et al. 2012), associations in microbial communities (Pascual-García et al. 2014) and evaluation of chemical risks (Tornero-Velez et al. 2012); it has also been cited and suggested by methodologists who have evaluated the performance of (“frequentist”) algorithms for the analysis of presence-absence matrices (Veech 2013; Kallio 2016). The NM method assumes a log-linear model for the so-called “quasi-abundance” (unobserved counts) as a function of additive qualitative species and location effects. By linking the probability structures of i.i.d. non-negative quasi-abundances and i.i.d Bernoulli presences-absences, NM were able to produce null scenarios from three families of generalized linear models (GLMs) for \(\pi_{kl}\) when quasi-abundances follow (1) Poisson, (2) Negative Binomial and (3) Binomial distributions. This array of candidate (null) models was put to compete one to each other (using profile deviance), such that the model attaining the minimum deviance could be selected as the “best” null model for the estimation of species occurrences on discrete locations. This best model makes cell probabilities available for simulation in a randomization protocol seeking to detect non-random patterns in the observed presence-absence matrix. All those competitor null models are characterized by two main assumptions. First, the occurrences of different cells in the presence–absence matrix are independent. Second, the models include only additive qualitative effects of species and locations as explanatory variables. These assumptions provide minimal restrictions about each species–location combination for the purpose to apply a probabilistic approach in null model analyses of presence-absence data.

The motivations for our formal analysis of presence-absence data comes from classic literature on over and under dispersed count data. However, the resulting analysis will also apply to many situations where the presences/absence data is fundamentally binary, including the Case (1986) lizard data set analysed in this paper. A question that can be raised now is whether only three families of distributions for count data are sufficient to support the existence of an “optimal” null model for the randomization protocol. A guidance for finding the answer is the relationship between the expected value and the variance of the assumed distribution for the quasi-abundance. When this is Poisson, both the expected value and the variance are equal, while when it is Negative-Binomially-distributed the former is larger than the latter (showing over-dispersion relative to the Poisson distribution), and the expected value will be smaller than the variance when assuming a binomially distributed random variable (showing under-dispersion relative to the Poisson distribution). The Negative Binomial and the Binomial distribution are not the only over-dispersed and under-dispersed distributions, respectively. Thus, it would be worth including other over-dispersed and under-dispersed distributions in the null model selection process. To achieve this goal, we consider a family of distributions proposed by Faddy (1997) which offered a unified approach to over-dispersed and under-dispersed non-negative discrete random variables, based on extended Poisson process modelling. The purpose of this paper is to provide a theoretical formulation of the original Navarro and Manly (2009) protocol for null model analysis that will allow the extension of quasi-abundance distributions, using Faddy’s (1997) method such that an ampler array of candidate null models can be evaluated. It is also discussed that the choice of predictors (e.g., variables associated to species and location effects), and the null-model selection process are important steps for the generation of null scenarios and, in general, for a proper interpretation of results under any probabilistic approach in null model analysis.

2 Estimation of the probability of occurrence of species on discrete locations with GLMs

2.1 Null models using species and locations as qualitative predictors

Assume that t species on s locations are of interest, and the abundance of species k on location l, has been registered for each of the t × s species-location combination (denoted as \(n_{kl}\)). The NM protocol assumed that the \(n_{kl}\)’s are realizations of t × s non-negative independent discrete random variables \(N_{kl}\), with probability mass function (p.m.f.) \(f_{kl} (n_{kl} )\). A procedure for modelling the dependence of the expected value of \(N_{kl}\) on suitable covariates in null models based on \(N_{kl}\) suggests the use of qualitative effects of species and locations, in a similar fashion found in a t × s factorial arrangement without replication, i.e., to define a non-interaction model involving row (species) and column (location) effects. Now, abundance (count) data, are typically studied with log-linear models. Therefore, the expected value of the abundance or mean intensity \(\mu_{kl} = E(N_{kl} )\) can be modelled using a multiplicative relationship through the log-linear model:

$$ \mu_{kl} = \exp (\gamma + \alpha_{k} + \beta_{l} ) $$
(1)

here, \(\gamma\) = constant effect, \(\alpha_{k}\) = qualitative species effect and \(\beta_{l}\) = qualitative site effect. Very often \(N_{kl}\) is unknown and the only information available is the presence and absence of each species in the locations, the statistical problem would be reduced to estimating the probability of occurrence of species k on location l, \(\pi_{kl}\). The absence of information about \(N_{kl}\) forces us to refer it as the “quasi-abundance”. The null-model proposed by NM assumes that families of quasi-abundance models are shared for all species, and that it is possible to associate the estimation problem for \(\pi_{kl}\) with a particular quasi-abundance distribution, when the only information available is a binary matrix.

Let \(y_{kl}\) be the (k, l)-th entry of an observed t × s binary matrix, where \(y_{kl} = 1\) if species k occurs in location l, and \(y_{kl} = 0\) otherwise, k = 1,…,t, l = 1,…,s. Let \(R_{k} = \sum\nolimits_{l = 1}^{s} {y_{kl} }\) the total number of locations where species k was found, \(C_{l} = \sum\nolimits_{k = 1}^{t} {y_{kl} }\) the total number of species occurring in location l. Under the NM null model, the \(y_{kl}\)’s are realizations of t × s independent binary random variables \(Y_{kl}\) with p.m.f.’s Bernoulli-distributed with parameter \(\pi_{kl} = P(Y_{kl} = 1)\). Here, each \(\pi_{kl}\) is regarded as the probability of occurrence of species k on site l and, simultaneously, as the expected value of \(Y_{kl}\). The expected row or column totals are the sum of the probabilities of the corresponding row or column:

$$ E(R_{k} ) = \sum\limits_{l = 1}^{s} {E(y_{kl} ) = } \sum\limits_{l = 1}^{s} {\pi_{kl} } ;\quad E(C_{l} ) = \sum\limits_{k = 1}^{t} {E(y_{kl} ) = } \sum\limits_{k = 1}^{t} {\pi_{kl} } . $$

The dependence of \(E(Y_{kl} ) = \pi_{kl}\) on the species and island effects can be modelled by connecting the probability structures of \(N_{kl}\) and \(Y_{kl}\), under the following premises:

  1. 1.

    Species k is absent on island l if and only if its associated quasi-abundance is nil.

  2. 2.

    Species k is present on island l if and only if its associated quasi-abundance is greater than 0.

The null model also assumes that

$$ P(Y_{kl} = 0) = P(N_{kl} = 0);\quad P(Y_{kl} = 1) = P(N_{kl} \ge 1), $$
(2)

which will be called the assumption of continuity, in accordance with premises (1) and (2). The estimation of all probabilities of occurrence \(E(Y_{kl} ) = \pi_{kl}\) under the null model uses the two expressions in (2):

$$ 1 - \pi_{kl} = P(N_{kl} = 0);\quad \pi_{kl} = P(N_{kl} \ge 1) $$

Then,

$$ E(Y_{kl} ) = \pi_{kl} = P(N_{kl} \ge 1) = 1 - f_{kl} (0), $$
(3)

where \(f_{kl} ( \cdot )\) is the p.m.f. of \(N_{kl}\) with mean \(\mu_{kl}\). The function \(f_{kl} (0)\) in (3) may depend on a parameter vector \({{\varvec{\upzeta}}}_{(k,l)} = (\zeta_{1(k,l)} ,\zeta_{2(k,l)} , \ldots ,\zeta_{r(k,l)} )\) for some r. Without loss of generality, we set \(\zeta_{1(k,l)} = \mu_{kl}\) as one of the parameters on which \(f_{kl} (0)\) depends. Thus, we express \(f_{kl} (0)\) as

$$ f_{kl} (N_{kl} = 0;\; \mu_{kl} ,{{\varvec{\upzeta}}}^{\prime}_{(k,l)} ) $$

where \({{\varvec{\upzeta}}}^{\prime}_{(k,l)} = (\zeta_{2(k,l)} , \ldots ,\zeta_{r(k,l)} )\). Therefore:

$$ \pi_{kl} = 1 - f_{kl} (N_{kl} = 0;\mu_{kl} ,{{\varvec{\upzeta}}}^{\prime}_{(k,l)} ). $$
(4)

2.2 Estimation of the probability of occurrence of species on discrete locations using qualitative predictors

Equations (1) and (4) lead to a model for \(E(Y_{kl} ) = \pi_{kl}\) of the form:

$$ \pi_{kl} = 1 - f_{kl} (N_{kl} = 0;\mu_{kl} (\gamma ,\alpha_{k} ,\beta_{l} ),{{\varvec{\upzeta}}}^{\prime}_{(k,l)} ). $$
(5)

Distinct probability distributions for \(N_{kl}\) will provide different ways to estimate \(\pi_{kl}\) under the null model. If the right-hand expression in (5) can be written as:

$$ g(\pi_{kl} ) = {\mathbf{x}}_{kl}^{{\text{T}}} {\boldsymbol{\theta ,}} $$
(6)

where g is a monotonic, differentiable function, \({\mathbf{x}}_{kl}^{{\text{T}}}\) is a p × 1 vector of dummy variables and \({{\varvec{\uptheta}}}\) is a p × 1 vector of parameters such that \({{\varvec{\uptheta}}}^{\prime} = [\theta_{1} , \cdots ,\theta_{p} ]\), then we will be working with a GLM for \(\pi_{kl}\) with link function g (Dobson and Barnett 2018). Expression (5) shows that the link function (6) maps the mean intensity \({\mu}_{\text{kl}}\) onto the [0–1] Bernoulli parameter space for \(\pi_{kl}\).

Letting \(\alpha_{1} = \beta_{1} = 0\), \({{\varvec{\uptheta}}}\) can be parameterized with \(p = t + s - 1\) free parameters: \({{\varvec{\uptheta}}}^{{\text{T}}} = \left[ {\theta_{1} ,\theta_{2} , \ldots ,\theta_{t} ,\theta_{t + 1} , \ldots ,\theta_{t + s - 1} } \right]\), where \(\theta_{1} = \gamma ,\theta_{2} = \alpha_{2} , \ldots ,\theta_{t} = \alpha_{t} ,\theta_{t + 1} = \beta_{2} , \ldots ,\theta_{t + s - 1} = \beta_{s}\). Maximum likelihood estimation of the parameter vector \({{\varvec{\uptheta}}}\) may involve extra parameters depending on the distribution of the quasi-abundance model connected to \(\pi_{kl}\).

3 Distributions for the quasi-abundance and model selection

3.1 Link functions induced by Poisson, negative binomial and binomial distributions (Navarro and Manly 2009)

We briefly describe the quasi-abundance distributions already mentioned in Navarro and Manly (2009) and the resulting GLMs for \(\pi_{kl} = E(Y_{kl} )\).

  1. (1)

    \(N_{kl} \sim {\text{Poisson}}(\mu_{kl} )\). The GLM for \(\pi_{kl}\), is:

    $$ \pi_{kl} = \pi (\gamma ,\alpha_{k} ,\beta_{l} ) = 1 - \exp ( - \exp (\gamma + \alpha_{k} + \beta_{l} )) = 1 - \exp ( - \mu_{kl} ) $$
    (7)

    The link function is the known complementary log-log link:

    $$ \eta_{kl} = g(\pi_{kl} ) = \log \left[ { - \log (1 - \pi_{kl} )} \right] $$
    (8)
  2. (2)

    \(N_{kl} \sim {\text{negative binomial}}\,(\mu_{kl} ,\upphi )\). Here \({{\varvec{\upzeta}}}^{\prime}_{(k,l)} = (\upphi )\), for all k, l (i.e., \(\upphi\) is equal for all the t × s species-location combinations). The GLM for \(\pi_{kl}\)is:

    $$ \pi_{kl} = \pi (\gamma ,\alpha_{k} ,\beta_{l} ,\upphi ) = 1 - \left( {\frac{1}{{\upphi \exp (\gamma + \alpha_{k} + \beta_{l} ) + 1}}} \right)^{1/\upphi } = 1 - \left( {\frac{1}{{\upphi \mu_{kl} + 1}}} \right)^{1/\upphi } , $$
    (9)

    with link function:

    $$ \eta_{kl} = g(\pi_{kl} ;\varphi ) = \log \left( {\left( {{1 \mathord{\left/ {\vphantom {1 {(1 - \pi_{kl} )}}} \right. \kern-\nulldelimiterspace} {(1 - \pi_{kl} )}}} \right)^{\varphi } - 1} \right) - \log \varphi $$
    (10)

    McCullagh and Nelder (1989) refer this as a one-parameter link family for binomial data (Aranda-Ordaz 1981) containing both the (canonical) logistic link (\(\varphi = 1\), when \(N_{kl}\) follows a geometric distribution) and the complementary log–log link (\(\varphi \to 0\)) as special cases).

  3. (3)

    \(N_{kl} \sim {\text{binomial}}\; (\mu_{kl} ,m)\). Here \({{\varvec{\upzeta}}}^{\prime}_{(k,l)} = (m)\), for all k, l (i.e., m is shared by all the t × s combinations). The GLM is:

    $$ \pi_{kl} = \pi (\gamma ,\alpha_{k} ,\beta_{l} ,m) = 1 - \left( {1 - \frac{{\exp (\gamma + \alpha_{k} + \beta_{l} )}}{m}} \right)^{m} = 1 - \left( {1 - \frac{{\mu_{kl} }}{m}} \right)^{m} $$
    (11)

    with link function:

    $$ \eta_{kl} = g(\pi_{kl} ;m) = \log m + \log \left( {1 - (1 - \pi_{kl} )^{1/m} } \right) $$
    (12)

For the case m = 1, (11) turns into a log-Bernoulli model. Letting \(m \to \infty\) with \(\pi_{kl}\) fixed in (12), the complementary log–log link function (8) is obtained again. In the next section, the quasi-abundance distributions are extended to include Faddy distributions, which yields a more general family of link functions that includes the canonical link functions for negative binomial and the Poison distributions.

3.2 Extending the distributions for the quasi-abundance

Faddy (1997) considered “pure birth” stochastic processes X(t) of events occurring in the time interval (t, t + δt) such that

$$ P[\left. {X(t + \delta t) = n + 1\;} \right|\;X(t) = n] = \lambda_{n} \delta t + o(\delta t). $$
(13)

\(\lambda_{n}\), the “rate” of the process in (13), is allowed to depend on the current number of events. If \(\lambda_{n} = \lambda\) is constant, the process X(t) is Poisson distributed with mean \(\lambda t\). When \(\lambda_{n}\) is not constant, X(t) does not have a Poisson distribution. In fact, an increasing sequence \(\lambda_{0} < \lambda_{1} < \lambda_{2} < \ldots\) induces an over-dispersed distribution for X(t) relative to the Poisson distribution (\({\text{Var[}}X(t)] > {\text{E[}}X(t)]\)). Analogously, a decreasing sequence \(\lambda_{0} > \lambda_{1} > \lambda_{2} > \ldots\) induces an under-dispersed distribution for X(t) relative to the Poisson distribution (\({\text{Var[}}X(t)] < {\text{E[}}X(t)]\)). The probabilities \(p_{n} (t) = {\text{P}}[X(t) = n]\) can be found numerically, by solving Kolmogorov forward differential equations (Faddy 1997). For any of discrete probability distribution on {0, 1,2, …}, namely \(\{ \omega_{0} ,\omega_{1} ,\omega_{2} ,...\}\), it is always possible to find a sequence \(\lambda_{0} ,\lambda_{1} ,\lambda_{2} ,...\) such that \(p_{n} (t) = \omega_{n}\), and t is some fixed time. The procedure for obtaining that sequence is recursive, and it starts with:

$$ \lambda_{0} = - \log (\omega_{0} )/t. $$

As time is arbitrary, t can be taken to 1. Thus: \(\lambda_{0} = - \log (\omega_{0} )\). We are interested here in the probability distribution on {0, 1,2, …} for \(N_{kl}\). Assume k and l in \(N_{kl}\) as fixed. As \(p_{0} (1) = \omega_{0} = P(N_{kl} = 0)\), Faddy´s procedure indicates that

$$ \lambda_{0} = - \log [P(N_{kl} = 0)]. $$
(14)

Faddy (1997) suggested

$$ \lambda_{n} = a(b + n)^{c} ,\;\;n = 0,\,1,\,2,...,\quad {\text{with}} \; a > 0,\quad b > 0\quad {\text{and}} \; \quad c \le 1 $$
(15)

in order to produce general concave sequences and to cover over-dispersed probability distributions “between” Poisson and Negative Binomial, and under-dispersed distributions relative to Poisson. When c = 0, the \(\lambda_{n}\)’s are constant, giving a Poisson distribution for X(t); values of c > 0 generate increasing sequences of \(\lambda_{n}\)’s, producing over-dispersed distributions for X(t) relative to the Poisson distribution. Finally, c < 0, the result is a decreasing sequence and an under-dispersed distribution. Letting n = 0 in (15), we obtain: \(\lambda_{0} = ab^{c}\). Then, using (14):

$$ P(N_{kl} = 0) = \exp ( - ab^{c} ). $$
(16)

According to Eq. 2, Sect. 2.1, the probability structures of \(N_{kl}\) with mean \(\mu_{kl}\) and \(Y_{kl}\) with parameter \(\pi_{kl}\) can be linked using expression (16):

$$ 1 - \pi_{kl} = P(Y_{kl} = 0) = P(N_{kl} = 0) = \exp ( - ab^{c} ), $$

or, equivalently:

$$ E(Y_{kl} ) = \pi_{kl} = 1 - \exp ( - ab^{c} ) $$
(17)

The connection between \(\pi_{kl} = E(Y_{kl} )\) and \(\mu_{kl}\) leads us to ask about the possibility of expressing \(\pi_{kl}\) in terms the mean of any Faddy distribution, say μ, using Eq. 17. Closed expressions for μ in terms of a, b and c are not available in general, but Faddy (1997, Eq. 3.3) proposed a “diffusion approximation to the mean μ”. This approximation is valid as long as the rate of the process is large. Although the diffusion approximation clearly does not apply in this case, we have chosen to use it with the only purpose of establishing a wider collection of generalized linear models with valid link functions (as indicated by Eq. 6). Thus, applying Eq. 3.3 in Faddy (1997) for \(\mu_{kl}\), we set:

$$ \mu_{kl} = E\left( {N_{kl} } \right) \approx b\left\{ {\left[ {1 + \frac{a(1 - c)}{{b^{1 - c} }}} \right]^{{\frac{1}{1 - c}}} - 1} \right\} $$
(18)

Expression (18) is an equality for the cases c = 0 and c → 1. These “Faddy-inspired” link functions express the dependence of the mean quasi-abundance on the species and site effects, \(\alpha_{k}\) and \(\beta_{l}\), namely \(E(N_{kl} ) = \mu_{kl} = \exp (\gamma + \alpha_{k} + \beta_{l} )\), as a function of the usual predictor \(g(\pi_{kl} ) = {\mathbf{x}}_{kl}^{{\text{T}}} {{\varvec{\uptheta}}}\), as in (6) Sect. 2.2, with a (Faddy and Smith 2011, Eq. 5) given by:

$$ a = \left\{ {\begin{array}{*{20}c} {\frac{{(\mu_{kl} + b)^{1 - c} - b^{1 - c} }}{1 - c},} & {c < 1} \\ {\log \left( {1 + \frac{{\mu_{kl} }}{b}} \right),} & {c = 1} \\ \end{array} } \right. $$
(19)

Substituting (19) in (17), we obtain:

$$ \pi_{kl} = \left\{ {\begin{array}{ll} {1 - \exp \left\{ {\frac{b}{1 - c}\left[ {1 - \left( {1 + \frac{{\mu_{kl} }}{b}} \right)^{1 - c} } \right]} \right\},} & {c < 1} \\ {1 - \left( {\frac{b}{{b + \mu_{kl} }}} \right)^{b} ,} & {c = 1} \\ \end{array} } \right. $$
(20)

This expression for \(\pi_{kl}\) includes two generalized linear models described in Sect. 3.1, when b and c are set to particular values: for c = 0, b > 0, model (7) is obtained, and the values c = 1 and b = 1/ϕ in (20) lead to model (9). Moreover, the binomial distribution for the quasi-abundance with probability of success \(p = \frac{{\mu_{kl} }}{m}\) (case 3, Sect. 3.1) can be included as a member of the Faddy distributions in this modelling process, but parameterization (15) is not useful. One suitable parameterization for the rate of the process \(\lambda_{n}\) inducing a generalized linear model under the assumption of a binomial quasi-abundance is (Faddy and Smith 1997):

$$ \lambda_{n} = - (m - n)\log \left( {1 - \frac{{\mu_{kl} }}{m}} \right),\;\;0 \le n \le m $$

The proof is straightforward. From (14) and letting n = 0 in the last expression above:

$$ \lambda_{0} = - \log (1 - \pi_{kl} ) = - m\log \left( {1 - \frac{{\mu_{kl} }}{m}} \right) $$
$$ {\text{Solving for}}\; \pi_{kl} :\pi_{kl} = 1 - \left( {1 - \frac{{\mu_{kl} }}{m}} \right)^{m} . $$
(21)

Expression (21) is the same as Eq. 11, in Sect. 3.1. Other expressions for the rate parameter \({\lambda }_{n}\) having the binomial distribution as a particular case are described in Faddy and Smith (2011, 2012), but they will not be considered here.

3.3 Link functions induced by Faddy distributions

The GLMs with link function \(\eta_{kl} = g(\pi_{kl} )\) can be expressed as:

$$ \eta_{kl} = \sum\limits_{j = 1}^{t + s - 1} {x_{(k - 1) \cdot s + l,j} \theta_{j} }{,}\;k = 1, \ldots ,t; \, l = 1, \ldots ,s, $$
(22)

where X = (xij)(ts)×(t+s−1) is the design matrix assuming the constraints \(\hat{\alpha }_{1} = \hat{\beta }_{1} = 0\). The different link functions for every Faddy distribution based on the models given by Eq. 20 (for fixed values of b and c, say, b0 and c0, respectively) and 11 (for a fixed value m0), are shown below.

  1. 1.

    c0 = 1. The negative binomial family with fixed dispersion parameter \(b_{0}\), defines the link function:

    $${ \eta }_{\text{kl}} \, = \text{ log} \left({\text{b}}_{0}{\text{(}{1}/\text{(1 }-{\mu}_{\text{kl}}\text{))}}^{\frac{1}{{\text{b}}_{0}}}-{1}\right)\text{, }\quad{{k}}{=1, \ldots , }{{t}}{; \,}\quad{{l}}{=1,\ldots, }{\text{s.}}$$
    (23)
  2. 2.

    0 < c0 < 1 (Models where the distribution of the quasi-abundance lies between Poisson and Negative Binomial). The link is:

    \({\eta}_{\text{kl}}= \text{ log} {\text{b}}_{0}+ \text{log} \left[{\left({1}-\frac{{1}-{\text{c}}_{0}}{{\text{b}}_{0}}\text{log(1}-{ \pi }_{\text{kl}}\text{)}\right)}^{\frac{1}{{1}-{\text{c}}_{0}}}-{1} \right], \, {{k}}{=1,\ldots, }{{t}}\text{; }{{l}}{=1,\ldots, }{\text{s}}.\)

    Letting \(d_{0} = 1 - c_{0}\), then 0 < d0 < 1.

    $${\eta}_{\text{kl}}\text{ = log}{\text{b}}_{0}+ \text{log} \left[{\left({1}-\frac{{\text{d}}_{0}}{{\text{b}}_{0}}\text{log(1}-{\pi}_{\text{kl}}\text{)}\right)}^{\frac{1}{{\text{d}}_{0}}}-{1}\right] , {{k}}{=1,\ldots, }{{t}}\text{; }{{l}}{=1,\ldots, }{\text{s.}}$$
    (24)
  3. 3.

    c0 = 0. This case produces the complementary log-log link (Eq. 11, Sect. 3.1).

  4. 4.

    c0 < 0 (Models where the distribution of the quasi-abundance lies between Poisson and Binomial)

    The link function is the same as in case 2, but now the parameter \(d_{0} = 1 - c_{0}\) implies d0 > 1.

  5. 5.

    The quasi-abundance follows a binomial distribution with parameter m, fixed in m0. The link function is given by Eq. 11, Sect. 3.1, with m = m0.

3.4 Null model selection by profile deviance

The wide array of distributional families for the quasi-abundance will produce different estimates for the probability of occurrence of species k on island l. Given an observed presence-absence matrix, the selection of generalized linear (null) models for the estimation of \(\pi_{kl}\) is of interest. The deviance is typically used as a criterion for goodness-of-fit in generalized linear model selection. It is defined as the maximized log-likelihood ratio criterion for testing the fitted model against the saturated alternative, which assumes a separate parameter for each of the N expectations, which in our case consist of N = t × s observations (McCullagh and Nelder 1989); the deviance for modelling observed Bernoulli response variables ykl, is given by:

$$ D\left( {\widehat{\varvec{\Pi} },y} \right) = 2\left[ {l\left( {\widehat{\varvec{\Pi} }_{{\max }} ;y} \right) - l\left( {\widehat{\varvec{\Pi} };y} \right)} \right] = - 2\sum\limits_{{k = 1}}^{t} {\sum\limits_{{l = 1}}^{s} {\left\{ {y_{{kl}} \log \left( {\frac{{\hat{\pi }_{{kl}} }}{{1 - \hat{\pi }_{{kl}} }}} \right) + \log (1 - \hat{\pi }_{{kl}} )} \right\}} } .$$
(25)

here \({\widehat{\varvec{\Pi}}} = {[}\hat{\pi }_{{{11}}} , \ldots ,\hat{\pi }_{ts} {]}\) is the vector of fitted probabilities and \({\widehat{\varvec{\Pi}}}_{\max }\) is the vector of maximum likelihood estimates corresponding to the maximal model, which turns out to be simply the observed vector \({\mathbf{y}} = (y_{11} , \cdots ,y_{ts} )^{{\text{T}}}\). McCullagh and Nelder (1989) have emphasized that the deviance can be used as a measure of goodness-of-fit only when it is approximately independent of the estimated parameters, \({\widehat{\boldsymbol{\theta }}}\). This is not the case when each observation \(y_{kl}\) is Bernoulli-distributed: the fitted probabilities \(\widehat{\varvec{\Pi}}\) are not independent from the deviance. Therefore, the usual chi-square tests of the deviance for goodness-of-fit are not supported, as the distribution of the deviance given the estimated parameters reduces to a conditionally degenerate distribution, in the sense that the deviance \(D\) is a function of \(\widehat{{\boldsymbol{\theta }}}\). Even for a large number of cells (t × s), D is not independent of \({\widehat{\boldsymbol{\theta }}}\). Applications of the deviance for binary data are constrained “to those cases where the chi-square approximations work reasonably well”: (1) when the deviance is used in the comparison of two nested models and (2) in profile likelihood (or profile deviance) (McCullagh and Nelder 1989). Profile likelihood is one of the tools for goodness-of-link analyses where several link functions indexed by one or more parameters are tested against a pre-established link, usually the canonical link. In the case of modelling binary random variables, it is known that the logit link \(\eta_{kl} = \log \left\{ {\pi_{kl} /(1 - \pi_{kl} )} \right\}\) is the canonical link function connecting the mean response and the linear predictor. However, “canonical links do not always provide the best fit available for a given data set” (Czado and Munk 2000). Thus, non-canonical links are sometimes preferable (Huang 2020).

For simplicity, let us suppose that the link families we establish as competitor models are indexed by only one extra parameter, say \(\varsigma\). The binomial and negative binomial distributions are examples of one-parameter links of that sort. The usual application of profile likelihood for goodness-of-link analysis is that a particular GLM is fitted and embedded in a wider class of models including one extra parameter \(\varsigma\), with \(\varsigma_{0}\) the value corresponding to the link function for the original model. The deviance (likelihood ratio) test is applied to find the estimate \(\hat{\varsigma }\) giving the best fit in the complete family of models, and to compare \(\hat{\varsigma }\) with that at \(\varsigma_{0}\). This process produces an estimate of the profile deviance for \(\varsigma\). A model reaching the minimum deviance in the profile will be classified as a “good” candidate for a null model. Whenever possible, the model attaining the overall minimum deviance will be used as representative of the null models for the occurrence for each species-location combination. Calculation of the profile deviance for each family tested in this study follows the procedure described above, and it can be extended to more than one extra parameter (e.g., in Faddy distributions there are two extra parameters, b0 and c0).

3.5 Implementing algorithms for maximum likelihood estimation in the profile deviance

One advantage of seeing several families of discrete distributions for the quasi-abundance as Faddy distributions is that most of the inverse link functions for the calculations are log-concave [Tang and Ye (2020); see also “Appendix”], ensuring that the deviance is concave in the linear parameters and thus standard optimization methods should provide unique solutions.

3.5.1 Poisson, negative binomial, overdispersed (b > 0, 0 < c < 1) and underdispersed (b > 0, c < 0) Faddy-distributed models

At first glance, the only task needed to determine the minimum deviance among competitor models is to apply unconstrained optimization techniques for an assorted set of b and c values such that b > 0 and c < 1. However, the basic assumptions of the NM null model (Sect. 2.1) impose some restrictions in the fitting process. One constraint is that the expected value of the unobserved quasi-abundance \({\mu}_{\text{kl}}\) is an upper bound of the probability \({\pi}_{\text{kl}}\) [i.e. \({\pi}_{\text{kl}}\left({\mu}_{\text{kl}}\right){\le}{\mu}_{\text{kl}}\); see “Appendix”]. Additionally, the inverse link functions (20) are only log-concave in the region R = {b > 0, c ≤ 1, b ≥ ‒c}, a helpful property that guarantees the attainment of a global minimum of the deviance function (see “Appendix”). Therefore, our search for a suitable null model was carried out in that region, R. We also computed the deviance values inside the “non-permissible” region R1 = {b > 0, c < 0, b < ‒c}; however, as a way to detect possible convergence issues in the maximum likelihood estimation inside R1, the same set of initial values were used in the analysis of the example data described in Sect. 4.1. The profile deviance procedure was run in R (R Core Team 2021), specifically with the glm function for the Poisson case, and the maxBFGS function from the maxLik package (Henningsen and Toomet 2011) for the Aranda-Ordaz family of links, corresponding to the negative binomial distribution (Eq. 10). Here BFGS stands for the variable metric optimization method by Broyden-Fletcher-Goldfarb-Shanno (Fletcher 2000). For the remaining Faddy models, the quasi-Newton method L-BFGS-B without constraints as implemented in R (Byrd et al. 1995) was used.

3.5.2 Binomial models

Maximum likelihood estimation corresponding to the models assuming a binomial distribution with parameter m (Eq. 19) may suffer of a well-known issue in log-binomial models whenever the log-likelihood is maximized (i.e., the deviance is minimized) on the boundary of the parameter space. Very often, the iterative process of model fitting fails to locate the maximum as the algorithm used may inadvertently iterate to a prohibited place. One must guarantee that the fitted probabilities always remain inside the interval [0,1]. For the case m = 1, one solution is given by an EM algorithm (Marschner 2014), implemented in R in the logbin package (Donoghoe and Marschner 2018). However, for m > 1, Eq. 19 cannot be handled by the log-binomial model, and one must rely on alternative algorithms for ML estimation. Williamson et al. (2013) have suggested to implement “brute force” procedures for the optimization of the log-likelihood when all else fails. Our brute force method tries to estimate the minimum deviance using constraints. To searching a feasible solution \(\widehat{\theta}\), we solved the following non-linear optimization problem:

$$ \begin{gathered} \mathop {\min }\limits_{\theta } D\left(\varvec{\Pi} (\theta ),y\right) \hfill \\ {\text{s}}.{\text{t}}. \;0 \le \hat{\pi }_{kl} \le 1{\text{for\,all\,k,l}}\,\left( {2 \cdot t \cdot s \,{\text{constraints}}} \right) \hfill \\ \end{gathered} $$
(26)

To search the minimum deviance for different values of the binomial parameter m0, subject to the set of constraints indicated in Eq. 26, we applied the Augmented Lagrangian Adaptive Barrier Minimization Algorithm implemented in R-package alabama (Varadhan 2015).

4 Generation of “null matrices”

4.1 Algorithm

In this section we briefly discuss the randomization protocol implemented in Navarro and Manly (2009). To obtain the fitted probabilities \(\hat{\pi }_{kl}\), the model with the overall minimum deviance is chosen among competitor models with different links. These probabilities can then be used to produce random binary matrices. Let \({\mathbf{Y}}_{t \times s} = (Y_{kl} )\), where each \(Y_{kl}\) is distributed as \(F({{\varvec{\uptheta}}},\varsigma )\), F = Bernoulli (\(\pi_{kl}\)), depending on \({{\varvec{\uptheta}}}\), and assume that a total of NRM random matrices will be generated. The allocation of a presence or an absence to each cell (k,l) for each random matrix is given by:

$$ W_{kl} = \left\{ {\begin{array}{*{20}c} 1 & {{\text{if }}U_{kl} \le \hat{\pi }_{kl} } \\ 0 & {{\text{if }}U_{kl} > \hat{\pi }_{kl} } \\ \end{array} } \right. $$
(27)

where \(U{}_{kl}\) is a [0,1]-uniformly distributed pseudo-random number generated for cell (k,l) and \(\hat{\pi }_{kl}\) is the fitted probability for that cell. The values \(w_{kl}\) of the random variables \(W_{kl}\) are interpreted as realizations of i.i.d. Bernoulli random variables

$$ W_{kl} \mathop {{\sim }}\limits^{i.i.d.} {\text{Bernoulli}}\left( {\left. {\hat{\pi }_{kl} } \right|({\mathbf{Y}},{\mathbf{X}})} \right), $$
(28)

and their generation is described as a parametric bootstrap of the estimated distributions \(\hat{F}\). Because of the assumptions in the basic GLM, each row total (or each column total) is a random variable. Thus, the protocol allows row and column totals to vary from one simulation to another compared to the corresponding row and column totals of the observed presence-absence matrix. This protocol represents the null model algorithm, from which the pattern in the observed matrix needs to be quantified with an appropriate metric and compared to the distribution of the metric generated by simulating a large number of null matrices and calculating the metric for each (Gotelli and Ulrich 2012).

Metrics for testing non-random patterns may have paradoxical features, combining “checkerboardness” and species aggregation (Stone and Roberts 1992), or how the species share sites (Roberts and Stone 1990). Consequently, ecologists suggest that different indices summarizing potential patterns of presence/absence matrices be reported (Peres-Neto et al. 2001), in terms of avoidance (like the C-score), aggregation (like the T-score) and number of locations shared (like the S-score). The C-score is the mean number of checkerboard units per species-pair of the community (Stone and Roberts 1990) and has been often used as a statistic for testing actual distributions for non-randomness. In particular, the C-score is motivated as the way a pair of species might colonize a pair of islands. A high C-score is interpreted as a high average degree of mutual exclusivity between species. The T-score measures the degree of togetherness, enumerating the mean number of sites where the species occur together. A high T-score indicates that, on average, species tend to co-occur between sites. Finally, the S-score measures the average number of shared sites for all species pairs. A high S-score means that most species share most sites. The observed values for these three metrics are contrasted to the null distribution of random values, by means of a one-tailed randomization test.

4.2 Example: lizards on Islands in the Gulf of California

We illustrate the model selection process by considering a data set analyzed in Navarro and Manly (2009) and Manly (1995) about the presence and absence of 20 lizard species on 25 islands in the Gulf of California, taken from Case (1983). Species identities are displayed in Manly (1995). In Navarro and Manly (2009, Fig. 2) the minimum deviance was attained by a Poisson model for Nkl. For null model selection, in the present study we included Faddy distributions corresponding to GLMs with link functions defined by the diffusion approximation to the mean, for a selected set of values of b > 0, 0 < c0 < 1 and c0 < 0 in the profile deviance but separating the permissible deviance values (those computed inside region R, as described in Sect. 3.2) from those in which a global minimum is not assured. A global profile deviance plot was generated for these models, using bilinear interpolation as implemented in the R-package fields (Nychka et al. 2017). We also included a separate profile deviance analysis for Faddy distributions inducing a binomial quasi-abundance, using a small set of binomial parameter values, m (Navarro and Manly 2009). The best null model chosen was the one attaining the overall minimum deviance. Once this best model was selected, NRM = 9999 matrices were generated using the parametric bootstrap given by (27) and (28). t × s pseudo-random numbers uniformly distributed on [0,1] were generated using the R-function runif (R Core Team 2021), to produce either a presence or an absence. The simulated matrices were generated for the following subsets:

  1. 0.

    Unconstrained simulations.

  2. 1.

    Simulated matrices without degenerate distributions (a degenerate distribution means a presence-absence matrix with rows and/or columns made of just zeros).

  3. 2.

    Simulated matrices with the same grand total as the observed presence-absence matrix.

These scenarios represent different universes of simulated matrices in the testing procedure for a non-random pattern, and their comparison are useful in highlighting any bias that might occur in null model tests. Thus, subset 1 is motivated by the claim that degenerate arrangements are problematic and disadvantageous for any null model test since degenerate simulated matrices will bias the testing procedures towards the rejection of the null hypothesis (Wilson 1995; Gotelli 2000). A similar assertion justifies the interest in Subset 2 (Gotelli 2000).

Stone and Roberts’ C-, T- and S- scores were computed for the observed matrix, and the mean and standard deviation for the C-, T- and S-scores on each randomly generated matrix meeting conditions 0.-2. were obtained. The observed C-, T-, and S-scores were contrasted to the null distribution of random values, by means of a one-tailed randomization test. The probability of rejection was computed as

$$ p = \frac{{{\text{number of }}\left\{ {Score_{j} \, \ge Score_{{{\text{obs}}}} { , }\;j = 1, \ldots ,NRM} \right\} + 1}}{NRM + 1}, $$

where \(Score_{{{\text{obs}}}}\) and \(Score_{j}\) are the observed and simulated scores of the j-th randomized matrix, respectively. Numerical procedures were implemented in R for all randomizations and tests, using algorithms described in Navarro (2003).

5 Results

Table 1 displays the profile deviance of the null model selection for Faddy distributions corresponding to different values of the parameters b, c and d = 1 − c., Deviances for overdispersed Faddy distributions behaved similarly to those for negative binomial distributions, all of them being larger than the deviance for the Poisson distribution. In contrast, deviances for underdispersed Faddy distributions in the permissible region R were consistently smaller than deviances for negative binomial, overdispersed Faddy and Poisson distributions. It is worth noticing that deviances outside the permissible region, described as a “quasi-under-dispersed region”, seem to be even smaller, but the implied link functions cannot occur under our null model. The profile deviance shown in Tables 1 and 2 (depicted by Fig. 1) suggests that the minimum deviance is possibly attained on the lower boundary of the valid region R, b = − c, c < 0, as b goes to infinity. Thus, we fitted a new “Faddy model” in the profile deviance, given by:

$$ \begin{aligned}\pi _{{kl}} \left( {\mu _{{kl}} } \right) & = \mathop {\lim }\limits_{{b \to \infty }} \left( {1 - \exp \left( {1 - \left( {b/\left( {1 + b} \right)} \right)\left( {1 + \mu _{{kl}} /b} \right)^{{1 + b}} } \right)} \right) \\ & = 1 - \exp (1 - \exp (\mu _{{kl}} )) \end{aligned}$$
(29)
Fig. 1
figure 1

Null model selection of the lizard data. The plot shows the profile deviance for several members of the Faddy distribution family with parameters b and c, when the generalized linear models for the quasi-abundance have link functions defined via the diffusion approximation to the mean, and deviances are taken from Table 1. The level curves were estimated by bilinear interpolation. Two more members of the family, Poisson and Negative Binomial, were added for completeness, with the Geometric distribution as a particular case of the latter, corresponding to the logit link. Log-concavity of the deviance function does not hold for Region 1 (c < 0, b <  − c), while log-concavity is present for Region 2 (c < 0, b ≥  − c) and Region 3 (b > 0, 0 ≤ c ≤ 1) thus, the minimum deviance was only searched inside these two regions (see “Appendix”). Observing the deviance values in Tables 1 and 2, it is proposed that the overall minimum deviance is attained by an underdispersed Faddy distribution (23) with \({\text{b}}\text{ } = -{\text{c}} \to { \infty}\)

Table 1 Deviances for GLMs of the probability of occurrence of 20 species of lizards on 25 islands in the Gulf of California (Case 1983; Manly 1995)
Table 2 Deviances for models described in Table 1 for the case \(b\text{ = }-c\)

The non-canonical link function associated to (29) is \({\eta}_{\text{kl}}= \log(\log\left(1-\log\left(1-{\pi}_{\text{kl}}\right)\right))\), which can be called a “complementary triple-log link”. This limiting Faddy model turned out to be the optimal (Tables 1 and 2) among all models of the form (20) and among those models where the quasi-abundance follows a binomial distribution (Table 3). Therefore, for the lizard data, model (29) is a good representative null model for random matrix generation. The null model analysis continued with the ML estimation of the parameters for that model (not reported), the estimated probabilities for each cell, \(\hat{\pi }_{kl}\), and the deviance residuals (not reported) for each binary observation under this model. Fitted cell probabilities are shown in Table 4, as well as the observed and estimated row and column totals. A non-linear relation between the estimated cell probabilities and species and island effects is present for variables like the row and column totals, Rk and Cl. In order to illustrate this, we can explore the relationship between the probability of occurrence of “species” 16 (Sauromalus) for the lizard dataset, and the number of species on the islands, Cl. It was noticed above that the chosen null model distinguishes the variation in richness and composition between sites for the estimation of probabilities of occurrence. Let us consider the set of islands where Sauromalus is present. In Fig. 2 it is evident the non-linear behaviour of \(\hat{\pi }_{16,l}\) for increasing values of Cl, with probabilities as large as 1, the maximum value, for species-rich islands [San Marcos (14), Coronados (15) and Carmen (16), 10 species each; San José (22) and Espíritu Santo (24), 11 species each; and Tiburón (1), 13 species]. Furthermore, islands 4, 8 and 9 (Pond, San Lorenzo Norte and San Lorenzo Sur) have got the same composition of (widespread) species: Phyllodactylus, Cnemidophorus tigris, Uta and Sauromalus) On the other hand, the widespread species C. tigris didn't occur on Mejia Island (island number 2). Therefore, a higher probability of occurrence was given to species Sauromalus on islands having widespread species, namely, on islands 4, 8 and 9. This example illustrates that under the assumption of independence, the row and column effects govern the fitted probabilities in a way not followed by other approaches that make use of the observed row and column totals. Thus, if one chooses any of the eight algorithms SIM1–SIM8 in Gotelli (2000) and considers species 16 (Sauromalus) and islands 2, 4, 8 and 9), the probability of the first “hit” on the four cells (16,2), (16,4), (16,8) and (16,9) are all the same (for example, for SIM8 the probability is 0.003) because the species total is 18 (the number of times Sauromalus is present) and the island totals are all the same (4). The probabilities are governed by both site and species marginal totals only. This apparently trivial finding makes us to connect our approach of using the observed binary table in the null model and the pioneering approaches by Gilpin and Diamond (1982) and Ryti and Gilpin (1987). Under the Gilpin and Diamond (1982) strategy, fitting the expected row and column sums to the observed row and column sums is computationally less problematic since the number of observations is equal to the number of parameters.

Fig. 2
figure 2

Relation between the estimated probability of occurrence of Sauromalus (vertical axis) and richness, Cl, on islands (numbers in parentheses) where Sauromalus is present (horizontal axis) based on the presence-absence matrix of 20 lizard species on 25 islands in the Gulf of California (Manly 1995). (Null-model) probabilities were estimated using Eq. 23, b =  − c  → ∞. The four islands encircled have the same richness (four each); in addition, composition of species for islands 4 (Pond), 8 (San Lorenzo Norte) and 9 (San Lorenzo Sur) is the same: Phyllodactylus, Cnemidophorus tigris, Uta and Sauromalus). These are ubiquitous species, however C. tigris does not occur on Mejia island (number 2). This explains the differences in the probability of occurrence under the assumption of additivity of species and island effects. A similar explanation can be given for the difference in probability of occurrence of Sauromalus on islands (18) and (19)

Table 3 Profile deviance for Faddy models when the quasi-abundance follows a binomial distribution with parameter n (Eq. 21)
Table 4 Fitted probabilities of occurrence of 20 species of lizards on 25 islands in the Gulf of California

In Navarro and Manly (2009) results for the C-score were described using a different null model (Poisson), and it was found that the observed C-score was significantly larger than those from the simulated matrices. Here we updated the results of the randomization test for the C-score and added new ones for the S- and T-scores, using the limiting Faddy model (29) as the null model. All this information is shown in Table 5. It is confirmed that neither the species seem to occur together nor the species to share the same islands, on the basis of the non-significant S- and T- scores. The estimated p-values for subsets 1. and 2. (Sect. 4.2) varied in comparison to the whole set of 9999 simulations but the conclusions were consistent in terms of a significance level of \(\alpha = 0.05\).

Table 5 Summary of the observed and 9999 simulated values of three statistics for testing non-random patterns of 20 species of lizards on 25 islands in the Gulf of California

6 Discussion

The selection of the randomized distribution to be used as a null hypothesis in the analysis of species co-occurrences is one of the controversial problems in community ecology and biogeography. The material presented here gives a way to evaluate a wide range of null models for the analysis of species occurrences on discrete locations. Testing that observed data is consistent with the null hypothesis of a random pattern is not an easy task because several random or randomized distributions (null models) may be compared to empirical data. The method developed in Navarro and Manly (2009) and its extension allowing underdispersed and overdispersed quasi-abundance (Faddy) distributions are interpretable in terms of hypothetical probability distributions of non-observable non-negative discrete random variables: the quasi-abundance. This auxiliary variable is crucial for giving some structure to a procedure that seeks to provide suitable null-model estimates of the probability of occurrence, given the simplest information about the species and locations (expressed as qualitative additive effects). The null model selection approach for presence-absence data has two advantages: the straight availability of cell probabilities for simulation of binary variables, and the ability to estimate differently the probability of occurrence of a particular species even for sites with equal number of species (i.e. equal column totals). Much of this behaviour is associated with the maximum likelihood estimation of species and location effects. As an example, using the best null model, in the lizard data set, the widespread species Phyllodactylus (species 2 in Table 4) has some probabilities of occurrence practically equal to 1 for species-rich islands. If it is desired to generate random binary matrices “by reading” the fitted probabilities obtained with any GLM among the competitor null models, then it will be observed that some cells will always receive a presence in each random matrix (or some cells will always receive an absence). This behaviour has been seen in the simulation protocols where the row and column totals of the randomized matrices are maintained, particularly in highly nested matrices. This is an area where some underlying philosophy about the assumptions in the null models is important.

The null models in this study are characterized by two main assumptions. First, occurrences of different cells in the presence-absence matrix are independent. Second, the models include only additive qualitative effects of species and islands as independent variables. In terms of ecological variables, the null models simply impose two conditions for the fitted probabilities: they are governed by both site and species marginal totals only. Although ecologists have been reluctant to use row and column totals as explanatory variables of cell probabilities, possibly it is a good opportunity to reformulate the assumptions and procedures behind all model-fitting approaches for the estimation of cell probabilities in null-model analyses. The null-model selection strategy described in the present study can be combined with probabilistic approaches proposed in the past, e.g. Gilpin and Diamond’s (1982), and nourish the collection of methods for the analysis of species occurrences. We offer here a vast family of models whose links are induced by Faddy distributions in order to search the best null model by the profile deviance, a well-established statistical method of model selection. This strategy guarantees a systematic method that eliminates the subjectivity involved in the choice of competitor models.

The proper selection of an “optimal” null model should not be ignored, as interpretations may change if the model chosen is not supported by statistical arguments. This was apparent in our method: an extra assumption of the null model [the so-called “assumption of continuity” expressed by Eq. (2), Sect. 2.1] imposed restrictions to the set of valid models to be compared (Sect. 3.5.1). In fact, the strategy for optimality is not complete yet using our protocol, as it may be questionable to apply unconstrained and constrained optimization for the same ML estimation problem. Apparently, the fit for a constrained linear model may not be as good as the fit for a similar unconstrained linear model. This is another issue that needs to be explored more in-depth.

Another question to raise is how divergent the profile deviance can be in selecting the “appropriate” null model. To tackle this question, we suggest: (1) specify a fixed GLM not equal to the model chosen by profile deviance (the “optimal model”), (2) simulate data from that “non-optimal” model, and (3) determine how often the profile deviance selects a non-optimal model. An extra topic worth exploring in depth is the choice of test statistics (indices) to summarize co-occurrence patterns in the observed presence-absence data. The selection of the index or test statistic is crucial for the detection of patterns and there is no general agreement on which index should be used. For instance, in this study we used three indices that presumably address Diamond’s assembly rules: the number of species combinations, the number of species forming perfect checkerboards and the C-score. However, “ad-hoc” tests based on the deviance can be used under the combined GLM-parametric bootstrap used here. Some results were analyzed by Navarro (2003) using the overall minimum deviance (OMD). He found that the OMD is positively associated to the C-score. The connotation of this association means that the larger the deviance the more segregated the species. Thus, the OMD apparently summarise patterns of species’ complementarities in a presence-absence matrix. Further investigation is necessary for explaining the association between these indices based on the deviance and the C-score; the obvious task would be to explore the deviance of a particular model and find out how much it is altered by an increase or decrease of the number of co-occurring species.