1 Introduction

The analysis and modeling of count time series, i.e., of quantitative time series having the range \(\mathbb {N}_0=\{0,1,\ldots \}\), has become a popular area in research and applications; see the books by Davis et al. (2016) and Weiß (2018). An important step during model fitting is diagnostic checks with regard to the considered candidate model, to find out, e.g., a possible misspecification of the model order or the dispersion structure. In the present paper, we are concerned with the latter task, i.e., the aim is to detect a misfit of the (conditional) variance of the actual data-generating process (DGP) \((X_t)_{\mathbb {Z}}\). In this context, we speak about overdispersion (underdispersion) if the DGP has more (less) variance than captured by the fitted model. If concentrating on the marginal dispersion of a stationary DGP, the use of an appropriate type of dispersion index is quite common among practitioners. For example, to check for a marginal equidispersion, Fisher’s dispersion index might be used, which is defined as the quotient of the (sample) variance to the mean, i.e., as \(I=\sigma ^2/\mu \) or \(\hat{I}=S^2/\bar{X}\), respectively. The distribution of \(\hat{I}\) for count time series data was analyzed by Schweer and Weiß (2014), among other works.

If also considering the conditional dispersion structure, statistics based on the (standardized) Pearson residuals appear as a reasonable choice (Harvey and Fernandes 1989; Czado et al. 2009; Jung and Tremayne 2011; Jung et al. 2016; Weiß 2018). Let the candidate model (depending on some parameter vector \({\varvec{\theta }}\in \mathbb {R}^m\)) have the conditional mean \(E[X_t\ |\ X_{t-1},\ldots ;\ {\varvec{\theta }}]\) and variance \(V[X_t\ |\ X_{t-1},\ldots ;\ {\varvec{\theta }}]\). Inserting the estimated parameter \(\hat{{\varvec{\theta }}}\) into these functions of \({\varvec{\theta }}\), the Pearson residuals are defined as

$$\begin{aligned} R_t:=R_t(\hat{{\varvec{\theta }}})\ =\ \frac{X_t-E\big [X_t\ |\ X_{t-1},\ldots ;\ \hat{{\varvec{\theta }}}\big ]}{\sqrt{V\big [X_t\ |\ X_{t-1},\ldots ;\ \hat{{\varvec{\theta }}}\big ]}}. \end{aligned}$$
(1)

Note that these residuals are computed from the same set of data that was used for parameter estimation.

Assuming that the type of candidate model was chosen adequately, the “true” Pearson residuals \(R_t({\varvec{\theta }})\) have mean 0, variance 1, and they are serially uncorrelated. So it is natural to also analyze the estimated Pearson residuals \(R_t(\hat{{\varvec{\theta }}})\) for these properties. For example, Harvey and Fernandes (1989) suggest to “check on whether the sample variance of the residuals is close to 1. A value greater than 1 indicates overdispersion relative to the model that is being fitted.” (p. 413). This conclusion, however, has to be done with some caution. In a recent work, Weiß et al. (2019) did a comprehensive simulation study and showed that statistics based on Pearson residuals offer a good potential for detecting certain misspecifications, but a decision for or against the actual candidate model is difficult because of considerable deviations from the above target values even under model adequacy. Comparing the distributions of statistics based on either \(R_t({\varvec{\theta }})\) or \(R_t(\hat{{\varvec{\theta }}})\), they also showed that main parts of these deviations are caused by estimation uncertainty. Therefore, in this work, we aim at capturing the effect of the estimated parameters on the residuals’ distribution for certain types of count process model. To the best of our knowledge, such derivations have only been done for classical regression models up to now, e.g., by Pierce and Schafer (1986) and Cordeiro and Simas (2009).

Remark 1

Pearson residuals are very popular in practice, also because they are universally applicable in some sense: As long as the conditional mean and variance required for (1) can be computed for the considered model, the Pearson residuals can be used for checking the model adequacy. On the other hand, because of their general definition, they do not use further model properties beyond conditional mean and variance. Therefore, it might be possible to define alternative (and perhaps refined) types of residuals for specific model classes. As an example, Jung et al. (2016) define “component residuals” for the so-called INAR model (to be defined in Sect. 2), which allow to infer on the parts of the INAR recursion separately. Similarly, Zhu and Wang (2010) define residuals being tailor-made for the so-called INARCH model (see Sect. 2). If being concerned with such types of DGP (under the null), it is certainly recommended to also apply these (and further) diagnostic tools in addition to the Pearson residuals.

Furthermore, if not only the null model is specified but also the class of alternative models, this information might be used for constructing a hypothesis test, e.g., in the spirit of Sun and McCabe (2013), who consider INAR models based on the Katz family of distributions. But as mentioned before, here, our focus is on the widely applicable Pearson residuals, which we investigate analytically for certain INAR and INARCH models, and where possible extensions (e.g., to different model classes) are also briefly discussed. Furthermore, our simulation study (Sect. 4) shows that the considered tests show a rather good performance despite the general nature of the Pearson residuals.

Let the computed residuals \(R_t\) be indexed by \(t=1,\ldots ,n\). Since our focus is on the conditional dispersion structure, we consider the statistics

$$\begin{aligned} \textstyle \mathrm{MS}_R\ =\ \frac{1}{n}\,\mathop {\sum }\limits _{t=1}^n R_t^2 \qquad \text {and}\qquad S_R^2\ =\ \mathrm{MS}_R\ -\ \bar{R}^2, \end{aligned}$$
(2)

where \(\bar{R}\, =\, \frac{1}{n}\,\sum _{t=1}^n R_t\). For an adequately chosen model type, both statistics in (2) should take a value “close to 1.” For the two types of Markov count process described in Sect. 2, the aim is to derive closed-form formulae providing an asymptotic approximation to the distribution of (2). This is done in Sect. 3, where we also analyze these asymptotics and illustrate their application with some real-data examples. The finite-sample performance of the asymptotic approximations and the power of the dispersion tests implied by (2) are investigated in Sect. 4. Sect. 5 outlines possible extensions of our approach, and Sect. 6 provides concluding remarks on the residual-based dispersion tests.

2 Markov count processes

The computation of conditional mean and variance, as required for calculating the Pearson residuals (1), is particularly simple if being concerned with a pth-order Markov process. Certainly, the computation is also possible for many further processes, for example, Hidden-Markov processes, see p. 112 in Weiß (2018). But in the present work, we restrict on count-data Markov chains i.e., \(p=1\)), where the distribution of \(X_t\) only depends on the previous observation \(X_{t-1}\). Many models for count-data Markov chains have been proposed in the literature (Weiß 2018), where especially such having a conditional linear autoregressive (CLAR(1)) structure are widely used in practice, i.e., where the conditional mean is of the form \(E[X_t\ |\ X_{t-1}] = \alpha \,X_{t-1}+\beta \) (see Grunwald et al. 2000). Two popular instances of CLAR(1) models are INAR(1) and INARCH(1) model (integer-valued autoregressive (conditional heteroscedasticity)); see McKenzie (1985), Ferland et al. (2006) and Weiß (2018).

The INAR(1) model is defined by the recursion \(X_t\, =\, \alpha \circ X_{t-1} + \epsilon _t\) with \(\alpha \in (0;1)\), where the innovations \(\epsilon _t\) are i. i. d. count random variables satisfying \(E[\epsilon _t]=\mu _{\epsilon }>0\) and \(V[\epsilon _t]=\sigma _{\epsilon }^2>0\). The involved binomial thinning operation “\(\circ \)” (Steutel and van Harn 1979) is defined by requiring that \(\alpha \circ X|X\, \sim Bin (X,\alpha )\), where X is a count random variable. Conditional mean and variance are given by

$$\begin{aligned} \begin{array}{rl} M_t\ =\ E[X_t\ |\ X_{t-1}]\ =&{} \alpha \cdot X_{t-1}\ +\ \mu _{\epsilon },\\ V_t\ =\ V[X_t\ |\ X_{t-1}]\ =&{} \alpha (1-\alpha )\cdot X_{t-1}\ +\ \sigma _{\epsilon }^2, \end{array} \end{aligned}$$
(3)

which are both linear functions of \(X_{t-1}\). A Poi-INAR(1) model assumes Poisson-distributed innovations, say \(\epsilon _t\sim Poi (\beta )\) with \(\mu _{\epsilon }=\sigma _{\epsilon }^2=\beta \). Then also the observations \(X_t\) are Poisson-distributed, now \(X_t\sim Poi (\mu )\) with \(\mu =\sigma ^2=\frac{\beta }{1-\alpha }\). In this case, the parameter vector \({\varvec{\theta }}\) is given by \((\beta ,\alpha )\).

The INARCH(1) model assumes that the conditional mean is linear in the previous observation, i.e.,

$$\begin{aligned} M_t\ =\ \alpha \cdot X_{t-1}\ +\ \beta \quad \text {with } \beta >0 \text { and } \alpha \in (0;1), \end{aligned}$$
(4)

whereas the conditional variance follows from the chosen conditional distribution of \(X_t\) given \(X_{t-1}\). If choosing a Poisson distribution, i.e., if \(X_t\,\sim \,Poi (\alpha \cdot X_{t-1} + \beta )\), then we obtain the Poi-INARCH(1) model satisfying \(V_t=M_t=\alpha \cdot X_{t-1} + \beta \). The two-dimensional parameter vector \({\varvec{\theta }}\) is again commonly chosen as \((\beta ,\alpha )\). Note that the unconditional distribution of \(X_t\) is not Poisson, actually, we have \(I=1/(1-\alpha ^2)>1\) (overdispersion) and \(\mu =\frac{\beta }{1-\alpha }\).

Remark 2

There are many ways of estimating the model parameters of a Poi-INAR(1) or INARCH(1) model, respectively. In this work, the main aim is to obtain closed-form formulae for the asymptotic distribution of the residual-based dispersion tests (2), enabling a detailed analysis of these asymptotics (to be done in Sect. 3). For this reason, we decided to use moment estimators, which are easy to compute by simple formulae (in exactly the same way for both models!), and for which explicit expressions for the asymptotic distribution are readily available, see Weiß and Schweer (2016). These moment estimators use the empirical mean \(\bar{X}\), variance \(\hat{\gamma }(0)\), and first-order autocovariance \(\hat{\gamma }(1)\), and they estimate \(\alpha \) by \(\hat{\rho }(1)=\hat{\gamma }(1)/\hat{\gamma }(0)\) as well as \(\beta \) by \(\bar{X}\big (1-\hat{\rho }(1)\big )\).

But other types of estimators might be used as well for computing the Pearson residuals, e.g., maximum likelihood (ML) estimators as this was done in the simulation study by Weiß et al. (2019). Then, however, it is not possible anymore to find closed-form formulae for the asymptotics of (2), because both the ML estimators themselves as well as their asymptotic distribution can only be computed numerically. In addition, one has to be aware that the Pearson residuals based on ML estimators will behave differently (not necessarily worse) than those based on moment estimators in some situations. This was observed by Weiß et al. (2019); see the last paragraph of their Sect. 3, where deviations in the dispersion structure also affected the mean and autocorrelation of the ML-based residuals. Since the asymptotics for an ML-based implementation of (2) are not explicitly available, hypothesis testing is only possible based on a bootstrap implementation. But this causes a considerable amount of additional computational efforts, whereas the moment-based implementation of (2) only requires to evaluate a few formulae. Nevertheless, we also did some simulation experiments regarding such bootstrap implementations, and these are presented in Sect. 5.

3 Approximating the squared Pearson residual’s distribution

Let \(X_0,\ldots ,X_n\) be the available time series, which is assumed to originate from a Markov count DGP. These data are used to estimate the DGP’s model parameter vector \({\varvec{\theta }}\) on the one hand, and to compute the Pearson residuals \(R_1,\ldots ,R_n\) according to (1) on the other hand. Then, we compute the statistics (2) to check the adequacy of the fitted model’s dispersion structure. For this purpose, we need to approximate the distribution of the statistics (2) under the null hypothesis of having fitted the correct type of model to the data. Our approach for doing this is as follows. Since \(\bar{R}^2\) is expected to produce values being very close to zero, the values of \(S_R^2\) and \(\mathrm{MS}_R\) will usually nearly coincide; also see the examples and simulations as follows. Hence, we approximate the distribution of the empirical variance \(S_R^2\) by the one of \(\mathrm{MS}_R = \frac{1}{n}\,\sum _{t=1}^n R_t(\hat{{\varvec{\theta }}})^2\). The latter, in turn, is derived in two steps. First, we derive an asymptotic approximation for the joint distribution of \(\frac{1}{n}\,\sum _{t=1}^n R_t({\varvec{\theta }})^2\) (mean of squared “true” residuals) and \(\hat{{\varvec{\theta }}}\). Then, in analogy to Weiß et al. (2017), we approximate \(\mathrm{MS}_R\) linearly in \(\hat{{\varvec{\theta }}}-{\varvec{\theta }}\) by

$$\begin{aligned} \textstyle \mathrm{MS}_R(\hat{{\varvec{\theta }}}) \ \approx \ \mathrm{MS}_R({\varvec{\theta }})\ -\ \mathop {\sum }\limits _{i=1}^m\, E\big [\tfrac{\partial V_t}{\partial \theta _i}\,\tfrac{1}{V_t}\big ]\,(\hat{\theta }_i-\theta _i), \end{aligned}$$
(5)

see Appendix A.1. This general expression is finally adapted to the considered types of DGP and used to compute an asymptotic approximation for the distribution of \(\mathrm{MS}_R\). The results obtained for a Poi-INAR(1) DGP are presented in Sect. 3.2, the ones for a Poi-INARCH(1) DGP in Sect. 3.2. The detailed derivations are provided by Appendices A and B, respectively. Since it is not clear in advance that our approximation approach works successfully in practice, its performance has to be checked with simulations, which is done later in Sect. 4.

3.1 Approximation for Poi-INAR(1) DGP

For the Poi-INAR(1) DGP with conditional mean \(M_t=\alpha \, X_{t-1} + \beta \) and variance \(V_t=\alpha (1-\alpha )\, X_{t-1} + \beta \), which was briefly surveyed in Sect. 2, the statistic \(\mathrm{MS}_R\) from (2) is computed as

$$\begin{aligned} \mathrm{MS}_R=\mathrm{MS}_R(\hat{\beta },\hat{\alpha }) \ =\ \frac{1}{n}\,\sum _{t=1}^n \frac{(X_t-\hat{\alpha }\, X_{t-1} - \hat{\beta })^2}{\hat{\alpha } (1-\hat{\alpha })\, X_{t-1} + \hat{\beta }}, \end{aligned}$$
(6)

see (3). In Appendix A.1, we show that (5) implies the following linear approximation for \(\mathrm{MS}_R(\hat{\beta },\hat{\alpha })\):

$$\begin{aligned} \begin{array}{rl} \mathrm{MS}_R(\hat{\beta },\hat{\alpha }) \ \approx &{} \mathrm{MS}_R(\beta ,\alpha ) \ -\ (1-2\alpha )\, E\big [\frac{X_{t-1}}{V_t}\big ]\,(\hat{\alpha }-\alpha ) \\ &{} -\ \frac{1}{\beta }\, \Big (1-\alpha (1-\alpha )\,E\big [\frac{X_{t-1}}{V_t}\big ]\Big )\,(\hat{\beta }-\beta ). \end{array} \end{aligned}$$
(7)

This approximation allows us to separate the randomness of the residuals from the one of the estimated parameters. The required moment

$$\begin{aligned} E\Big [\frac{X_{t-1}}{V_t}\Big ] \ \overset{(3)}{=}\ E\Big [\frac{X_{t-1}}{\alpha (1-\alpha )\, X_{t-1} + \beta }\Big ] \ =\ (1-\alpha )^{-1}\,E\Big [\frac{X_{t-1}}{\alpha \, X_{t-1} + \mu }\Big ] \end{aligned}$$

can be computed numerically as \((1-\alpha )^{-1}\,\sum _{x=0}^M \frac{x}{\alpha \, x + \mu }\, p_x\) with M sufficiently large, where the marginal probabilities \(p_x = P(X_{t-1}=x)\) are from the \(Poi (\mu )\)-distribution. For \(\alpha \rightarrow 0\), we have \(E\big [\frac{X_{t-1}}{V_t}\big ]\rightarrow 1\).

(7) immediately implies the following approach for bias correction,

$$\begin{aligned} \begin{array}{rl} E\big [\mathrm{MS}_R(\hat{\beta },\hat{\alpha })\big ] \ \approx &{} 1\ -\ (1-2\alpha )\, E\big [\frac{X_{t-1}}{V_t}\big ]\,E\big [\hat{\alpha }-\alpha \big ] \\ &{} -\ \frac{1}{\beta }\, \Big (1-\alpha (1-\alpha )\,E\big [\frac{X_{t-1}}{V_t}\big ]\Big )\,E\big [\hat{\beta }-\beta \big ], \end{array} \end{aligned}$$
(8)

where bias approximations for \(E\big [\hat{\alpha }-\alpha \big ], E\big [\hat{\beta }-\beta \big ]\) are to be inserted. For the case of the moment estimators used here (Sect. 2), such bias formulae are provided by Weiß and Schweer (2016).

To derive an approximate distribution for (7), we need the following result.

Theorem 1

Let \((X_t)_{\mathbb {Z}}\) be a Poi-INAR(1) process with \(\mu =\frac{\beta }{1-\alpha }\), and let \(\hat{\beta },\hat{\alpha }\) be moment estimators of \(\beta ,\alpha \); see Sect. 2. Then,

$$\begin{aligned} \sqrt{n}\,\left( \frac{1}{n}\,\sum _{t=1}^n \frac{(X_t-M_t)^2}{V_t}\ -1,\ \hat{\beta }-\beta ,\ \hat{\alpha }-\alpha \right) \end{aligned}$$

is asymptotically normally distributed with mean \({\varvec{0}}\) and covariance matrix \({\varvec{\Sigma }}=(\sigma _{ij})_{i,j=1,2,3}\), where

$$\begin{aligned} \begin{array}{rl} \sigma _{11}\ =&{} 2+\frac{1}{\mu (1-\alpha )}-\frac{\alpha }{\mu }\, E\big [\frac{X_{t-1}}{V_t}\big ]-6\alpha ^2(1-\alpha )^2\, E\big [\frac{X_{t-1}}{V_t^2}\big ] ,\\ \sigma _{12}\ =&{} 1+2\alpha \mu -2\alpha (1-\alpha )\big (\alpha +(1+\alpha )\mu \big )\, E\big [\frac{X_{t-1}}{V_t}\big ],\\ \sigma _{13}\ =&{} -2\alpha +2\alpha (1-\alpha ^2)\, E\big [\frac{X_{t-1}}{V_t}\big ], \qquad \sigma _{22}\ =\ (1-\alpha )\mu + (1-\alpha ^2)\mu ^2, \\ \sigma _{23}\ =&{} -(1-\alpha ^2)\mu , \qquad \sigma _{33}\ =\ 1-\alpha ^2 + \frac{\alpha (1-\alpha )}{\mu }. \end{array} \end{aligned}$$

The proof of Theorem 1 is given in Appendix A.3. The covariances \(\sigma _{22},\sigma _{23},\sigma _{33}\) have first been derived by Freeland and McCabe (2005).

Next, we combine the linear approximation (7) with Theorem 1 and derive the following normal approximation (see Appendix A.4).

Theorem 2

Let \((X_t)_{\mathbb {Z}}\) be a Poi-INAR(1) process with \(\mu =\frac{\beta }{1-\alpha }\), and let \(\hat{\mu },\hat{\alpha }\) be moment estimators of \(\mu ,\alpha \); see Sect. 2. Then, the distribution of the linear approximation (7) for \(\mathrm{MS}_R(\hat{\beta },\hat{\alpha })\) can be approximated by a normal distribution with mean 1 and variance \(\sigma _{\mathrm{MS}_R}^2/n\), where

$$\begin{aligned} \begin{array}{@{}rl} \sigma _{\mathrm{MS}_R}^2\ =&{} \frac{3-5\alpha }{1-\alpha } \ +\ (1-\alpha ) \big ( (1-5\alpha )(1-\alpha ^2) + \frac{\alpha (1-3\alpha )}{\mu } \big )\, E\big [\frac{X_{t-1}}{V_t}\big ]^2 \\ &{}-\ 6 \alpha ^2 (1-\alpha )^2\, E\big [\frac{X_{t-1}}{V_t^2}\big ] \ -\ \big (\frac{\alpha (1-4 \alpha )}{\mu }+2(1-4\alpha -\alpha ^2)\big )\,E\big [\frac{X_{t-1}}{V_t}\big ] . \end{array} \end{aligned}$$

A plot of approximate bias and standard deviation (SD) of \(\mathrm{MS}_R\), as implied by (8) and Theorem 2, is shown in Fig. 1 (where we set \(n=1\)). It becomes clear that the actual marginal mean \(\mu \) has only little effect on these approximations, whereas the effect of the dependence parameter \(\alpha \) is very strong. Note that for \(\alpha \rightarrow 0\), we have \(E\big [\frac{X_{t-1}}{V_t^2}\big ]\rightarrow 1/\mu \). So the limiting value of \(\sigma _{\mathrm{MS}_R}\) for \(\alpha \rightarrow 0\) is given by \(\sqrt{2}\), which can also be recognized from Fig. 1b.

Fig. 1
figure 1

Approximate bias in (a) and SD in (b) of \(\mathrm{MS}_R\) for Poi-INAR(1) DGP (setting \(n=1\)), plotted against \(\alpha \) for different mean levels \(\mu \)

In applications, the normal distribution implied by Theorem 2, together with the bias-corrected mean (8), can now be used to approximate the true distribution of both statistics \(\mathrm{MS}_R\) and \(S_R^2\) from (2) under the null hypothesis of a Poi-INAR(1) DGP. The performance of this approximation is investigated in Sect. 4, where results from a simulation study are presented.

Example 1

Let us consider a time series of monthly counts (Jan. 1987 to Dec. 1994, so 96 observations) of claims caused by burn-related injuries in the heavy manufacturing industry, which was presented in Example 2.5.1 by Freeland (1998) as an illustration for the Poi-INAR(1) model. Under this assumption, together with the moment estimates \(\hat{\mu }\approx 8.604\) and \(\hat{\alpha }\approx 0.452\), we compute 95 Pearson residuals, see (6), where both \(\mathrm{MS}_R\) and \(S_R^2\) take the value \(\approx 1.310\). Following Harvey and Fernandes (1989), this indicates that the data exhibit more variation than captured by the Poi-INAR(1) model. But does 1.310 constitute a significant deviation from 1?

The same data were also analyzed in Schweer and Weiß (2014) by using Fisher’s dispersion index, and they ended up with a “quite narrow decision” against the Poi-INAR(1) model. Let us complement this result by a residual-based test concerning the alternative \(\mathrm{MS}_R,S_R^2>1\). Using the asymptotics of (8) and Theorem 2 (and plugging-in the above moment estimates instead of the unknown model parameters), we approximate the mean and standard deviation of \(\mathrm{MS}_R\) as 0.971 and 0.177, respectively. On a 5%-level, the critical value computes as 1.263, so we actually have a “less narrow” decision (P value 0.028) against the Poi-INAR(1) model than in Schweer and Weiß (2014).

3.2 Approximation for Poi-INARCH(1) DGP

For the Poi-INARCH(1) DGP, conditional mean and variance are equal to each other, both given by \(M_t=\alpha \, X_{t-1} + \beta \), see Sect. 2. So this time, the statistic \(\mathrm{MS}_R\) from (2) is computed as

$$\begin{aligned} \mathrm{MS}_R(\hat{\beta },\hat{\alpha }) \ =\ \frac{1}{n}\,\sum _{t=1}^n \frac{(X_t-\hat{\alpha }\, X_{t-1} - \hat{\beta })^2}{\hat{\alpha }\, X_{t-1} + \hat{\beta }}, \end{aligned}$$
(9)

see (4). In Appendix B.1, we again use (5) to derive the following linear approximation of \(\mathrm{MS}_R(\hat{\beta },\hat{\alpha })\):

$$\begin{aligned} \begin{array}{rl} \mathrm{MS}_R(\hat{\beta },\hat{\alpha }) \ \approx &{} \mathrm{MS}_R(\beta ,\alpha ) \ -\ E\big [\frac{X_{t-1}}{M_t}\big ]\,(\hat{\alpha }-\alpha ) \\ &{} -\ \frac{1}{\beta }\, \Big (1-\alpha \,E\big [\frac{X_{t-1}}{M_t}\big ]\Big )\,(\hat{\beta }-\beta ). \end{array} \end{aligned}$$
(10)

Like before, this approximation allows us to separate the randomness of the residuals from the one of the estimated parameters. The required moment

$$\begin{aligned} E\Big [\frac{X_{t-1}}{M_t}\Big ] \ \overset{(4)}{=}\ E\Big [\frac{X_{t-1}}{\alpha \, X_{t-1} + \beta }\Big ] \end{aligned}$$

can be computed numerically as \(\sum _{x=0}^M \frac{x}{\alpha \, x + \beta }\, p_x\) with M sufficiently large. The marginal probabilities \(p_x=P(X_{t-1}=x)\) can be approximated based on the invariance equation of the INARCH(1) model; see Sect. 3.3 in Weiß et al. (2017) or Remark 2.1.3.4 in Weiß (2018). For \(\alpha \rightarrow 0\), we have \(E\big [\frac{X_{t-1}}{V_t}\big ]\rightarrow 1\).

Like in the INAR(1) case, (10) would also imply a bias correction, namely

$$\begin{aligned} \begin{array}{rl} E\big [\mathrm{MS}_R(\hat{\beta },\hat{\alpha })\big ] \ \approx &{} 1\ -\ E\big [\frac{X_{t-1}}{M_t}\big ]\,E\big [\hat{\alpha }-\alpha \big ] \\ &{} -\ \frac{1}{\beta }\, \Big (1-\alpha \,E\big [\frac{X_{t-1}}{M_t}\big ]\Big )\,E\big [\hat{\beta }-\beta \big ], \end{array} \end{aligned}$$
(11)

where bias approximations for \(E\big [\hat{\alpha }-\alpha \big ], E\big [\hat{\beta }-\beta \big ]\) are to be inserted. For the case of the moment estimators used here (Sect. 2), such bias formulae are provided by Weiß and Schweer (2016). But our simulations to be presented in Sect. 4 show that this time, the bias correction does not work satisfactorily.

To derive an approximate distribution for (10), we again proceed in a stepwise manner and first derive the following result.

Theorem 3

Let \((X_t)_{\mathbb {Z}}\) be a Poi-INARCH(1) process with \(\mu =\frac{\beta }{1-\alpha }\), and let \(\hat{\beta },\hat{\alpha }\) be moment estimators of \(\beta ,\alpha \); see Sect. 2. Then,

$$\begin{aligned} \sqrt{n}\,\left( \frac{1}{n}\,\sum _{t=1}^n \frac{(X_t-M_t)^2}{M_t}\ -1,\ \hat{\beta }-\beta ,\ \hat{\alpha }-\alpha \right) \end{aligned}$$

is asymptotically normally distributed with mean \({\varvec{0}}\) and covariance matrix \({\varvec{\Sigma }}=(\sigma _{ij})_{i,j=1,2,3}\), where

$$\begin{aligned} \begin{array}{rl} \sigma _{11}\ =&{} 2+ \frac{1}{\mu (1-\alpha )} \big (1-\alpha \, E\big [\frac{X_{t-1}}{M_t}\big ]\big ) , \qquad \sigma _{12}\ =\ 1,\qquad \sigma _{13}\ =\ 0, \\ \sigma _{22}\ =&{} \frac{1+2\alpha ^4}{1+\alpha +\alpha ^2}\,\mu + (1-\alpha ^2)\mu ^2, \qquad \sigma _{23}\ =\ -\frac{\alpha ^3(1+2\alpha )}{1+\alpha +\alpha ^2} - (1-\alpha ^2)\mu ,\\ \sigma _{33}\ =&{} 1-\alpha ^2+\frac{\alpha (1+\alpha )(1+2\alpha ^2)}{\mu (1+\alpha +\alpha ^2)} . \end{array} \end{aligned}$$

The proof of Theorem 3 is given in Appendix B.3. The covariances \(\sigma _{22},\sigma _{23},\sigma _{33}\) have first been derived by Weiß (2010).

Next, we combine the linear approximation (10) with Theorem 3 and derive the following normal approximation (see Appendix B.4).

Theorem 4

Let \((X_t)_{\mathbb {Z}}\) be a Poi-INARCH(1) process with \(\mu =\frac{\beta }{1-\alpha }\), and let \(\hat{\mu },\hat{\alpha }\) be moment estimators of \(\mu ,\alpha \); see Sect. 2. Then, the distribution of the linear approximation (10) for \(\mathrm{MS}_R(\hat{\beta },\hat{\alpha })\) can be approximated by a normal distribution with mean 1 and variance \(\sigma _{\mathrm{MS}_R}^2/n\), where

$$\begin{aligned} \begin{array}{rl} \sigma _{\mathrm{MS}_R}^2\ =&{} \frac{3-\alpha }{1-\alpha } + \frac{\alpha ^3(1+2\alpha )}{\mu (1-\alpha )(1-\alpha ^3)} \ -\ (1+\alpha ) \big ( \frac{2}{1-\alpha } + \frac{\alpha (1-\alpha +3\alpha ^2)}{\mu (1-\alpha )(1-\alpha ^3)} \big )\, E\big [\frac{X_{t-1}}{M_t}\big ]\\ &{}+\ \big (\frac{1+\alpha }{1-\alpha } + \frac{\alpha (1+\alpha ^2+\alpha ^3)}{\mu (1-\alpha )(1-\alpha ^3)}\big )\, E\big [\frac{X_{t-1}}{M_t}\big ]^2. \end{array} \end{aligned}$$

A plot of approximate bias and SD of \(\mathrm{MS}_R\), as implied by (11) and Theorem 4, is shown in Fig. 2 (where we set \(n=1\)). The dependence parameter \(\alpha \) has again a very strong effect on these quantities, but in contrast to Fig. 1 for a Poi-INAR(1) DGP, they are also notably influenced by the marginal mean \(\mu \) this time. The limiting value of \(\sigma _{\mathrm{MS}_R}\) for \(\alpha \rightarrow 0\) again equals \(\sqrt{2}\), which can also be recognized from Fig. 2b.

Fig. 2
figure 2

Approximate bias in (a) and SD in (b) of \(\mathrm{MS}_R\) for Poi-INARCH(1) DGP (setting \(n=1\)), plotted against \(\alpha \) for different mean levels \(\mu \)

In applications, the normal distribution implied by Theorem 4 can now be used to approximate the true distribution of both statistics \(\mathrm{MS}_R\) and \(S_R^2\) from (2) under the null hypothesis of a Poi-INARCH(1) DGP. The performance of this approximation is investigated in Sect. 4, where results from a simulation study are presented. There, also the possible bias correction using (11) is investigated.

Example 2

Jung and Tremayne (2011) presented a time series of counts of iceberg orders (per 20 min for 32 consecutive trading days in 2004, so 800 observations) with respect to the Deutsche Telekom shares traded in the XETRA system of Deutsche Börse. These overdispersed data were also analyzed by Weiß (2015), where a Poi-INARCH(1) model turned out to be a reasonable candidate modelFootnote 1. So we compute the 799 (squared) Pearson residuals under this model assumption, see (9), where we have the moment estimates \(\hat{\mu }\approx 1.406\) and \(\hat{\alpha }\approx 0.635\). Both \(\mathrm{MS}_R\) and \(S_R^2\) take the value \(\approx 1.088\), which is rather close to 1. But because of the large sample size, we have a significant though narrow violation of the null hypothesis on a 5%-level, because the P value equals 0.049 (with bias correction: 0.043). This goes along with the results in Weiß (2015), who finally preferred another model for these data.

Example 3

Recalling Example 1, the claims counts time series requires a model with more variation than provided by a Poi-INAR(1) model, and such a model is given by the Poi-INARCH(1) model (also see the conclusions in Schweer and Weiß (2014)). Proceeding as in Example 2, we compute \(\mathrm{MS}_R\) and \(S_R^2\) as \(\approx 1.050\), so again only a small exceedance of 1. But this time, the sample size is much smaller, and the critical value on a 5%-level becomes rather large, namely 1.239 (with bias correction: 1.238). Also the P value 0.365 (with bias correction: 0.363) shows that there is no contradiction against a Poi-INARCH(1) model for the claims counts data.

4 Results from a simulation study

We conducted a simulation study, the aim of which was twofold: to check the finite-sample performance of the approximations for mean and variance of \(\mathrm{MS}_R\) and \(S_R^2\) derived in Sect. 3, and to investigate size and power of the diagnostic tests based on \(\mathrm{MS}_R\) and \(S_R^2\). For each considered scenario, we simulated \(10^5\) replications. As a general result, it turned out that the difference between \(\mathrm{MS}_R\) and \(S_R^2\) from (2) was virtually negligible. For example, if we observed a difference in power at all, this difference was very small and followed a regular pattern. Since \(S_R^2 = \mathrm{MS}_R - \bar{R}^2\ < \mathrm{MS}_R\), \(\mathrm{MS}_R\) lead to equal or slightly better power if an upper-sided test against overdispersion is done, and \(S_R^2\) lead to equal or slightly better power if a lower-sided test against underdispersion is done. So to save some space, the summarizing tables collected in Appendix C are restricted to \(\mathrm{MS}_R\) only.

The first part of our simulations refers to a hypothetical Poi-INAR(1) model, i.e., the Pearson residuals and the asymptotics for \(\mathrm{MS}_R\) are computed as described in Sect. 3.1. The results in Table 1 for the marginal means \(\mu \in \{2,5\}\) and the autocorrelation parameters \(\alpha \in \{0.25,0.5,0.75\}\) show a negative bias for \(\mathrm{MS}_R\), but this bias is captured quite well by the asymptotic approximation (8), at least for sample size \(n\ge 250\). An analogous conclusion applies to the standard errors of \(\mathrm{MS}_R\), where the approximation quality slightly deteriorates with increasing \(\alpha \). In practice, these approximations are used for testing the hypothetical Poi-INAR(1) model, in analogy to Example 1. So the performance of such a test (size & power) is crucial for applications. This is investigated for diverse alternative scenarios: We use upper-sided tests to uncover overdispersion as generated by an NB-INAR(1) DGP in Table 2 or by a ZIP-INAR(1) DGP as in Table 3, and lower-sided tests to uncover underdispersion as generated by a Good-INAR(1) DGP in Table 4. Here, the INAR(1) innovations follow a negative binomial or zero-inflated Poi-or Good distribution, respectively; for background information on these distributions, see Weiß (2018). As a competitor, we use the \(\hat{I}\)-test relying on the sample dispersion index as derived by Schweer and Weiß (2014).

The size values in Table 2 (columns \(I=1\); replicated in Table 3) for the \(\mathrm{MS}_R\)-test are always very close to the nominal 5%-level, whereas those of the \(\hat{I}\)-test are either somewhat larger than 5% for \(\alpha =0.25\), or smaller than 5% for \(\alpha =0.75\). Also concerning the lower-sided tests, see Table 4, the size values the \(\mathrm{MS}_R\)-test are usually more close to the 5%-level than those of the \(\hat{I}\)-test. In particular, with only few exceptions, the \(\mathrm{MS}_R\)-test has better power values than the \(\hat{I}\)-test, both for under- and overdispersion, and also for the case if overdispersion is actually caused by an excessive number of zeros. So we can give a clear recommendation for using the \(\mathrm{MS}_R\)-test instead of the \(\hat{I}\)-test in the INAR(1) case.

The second part of our simulation study refers to a hypothetical Poi-INARCH(1) DGP, see Sect. 3.1. Here, we do not consider the dispersion index as a competitor, because it is known to perform poorly for such INARCH processes. Instead, we consider the \(\widehat{C}_{1;2}\)-test proposed by Weiß et al. (2017), and we also use the parametrizations given there, i.e., \(\mu \in \{2.5,5\}\) and \(\alpha \in \{0.2,0.4,0.6,0.8\}\). Table 5 compares the simulated means and standard errors with the asymptotic approximations implied by (11) and Theorem 4. While the approximation of the standard errors works rather well, especially for \(\mu =5\), the mean approximation is not satisfactory: In most cases, the simulated bias is clearly stronger, but for \(\mu =2.5\) and \(\alpha =0.8\), we also find the opposite deviation. For this reason, we did not further use the bias correction when executing the \(\mathrm{MS}_R\)-test in the INARCH(1) case. The size values of the \(\mathrm{MS}_R\)-test (without bias correction) are given in Table 6 (column “\(\theta =1\)”). Most often, they are reasonably close to the nominal 5%-level, and in case of a notable deviation (essentially for \(T=100\) or \(\mu =2.5\), \(\alpha =0.8\)), the \(\mathrm{MS}_R\)-test is conservative. This differs from the \(\widehat{C}_{1;2}\)-test, the size of which often exceeds the 5%-level, especially for large \(\alpha \).

For power analysis, we use an alternative DGP with additional conditional variation, namely the same NB-INARCH(1) process as in Weiß et al. (2017):

$$\begin{aligned} X_t\ \big |\ X_{t-1}, X_{t-2},\ldots \quad \sim \ NB \left( \frac{\beta +\alpha \, X_{t-1}}{\theta -1},\ \frac{1}{\theta }\right) \quad \text {with }\theta >1. \end{aligned}$$

The corresponding power values in Table 6 show that the \(\mathrm{MS}_R\)-test has better power values in most cases despite its tendency to being conservative. The advantage to the \(\widehat{C}_{1;2}\)-test is particularly large for a highly correlated process (\(\alpha =0.8\)). So taking all results together, we clearly recommend the \(\mathrm{MS}_R\)-test for uncovering a misfit of the DGP’s dispersion structure.

5 Possible extensions

In view of their good performance, as demonstrated in Sect. 4, it appears to be a promising direction for future research to extend the residual-based dispersion tests also to other relevant types of count process. This could be done in two ways. On the one hand, analogous asymptotic approximations could be derived for different types of DGP. For example, if appropriately parametrized, the non-Markovian Poi-INMA(1) process (a moving-average-type process) has the same formulae for 1-step-ahead conditional mean and variance as the Poi-INAR(1) process, see (3). Hence, we also get the same formulae for the test statistic and its linear approximation, namely (6) and (7), respectively. So it remains to compute an asymptotic result analogous to Theorem 1, where the asymptotics derived by Aleksandrov and Weiß (2019) constitute a starting point. Another idea could be to consider models for bounded counts, i.e., having a finite range of the form \(\{0,\ldots ,N\}\) with some \(N\in \mathbb {N}\), like the finite Markov chains binomial AR(1) or INARCH(1); see Weiß (2018) for details and references. These processes, however, have slightly different expressions for the conditional variance (if choosing the model parameters to match the conditional mean), thus also the expressions for the residuals slightly differ from the ones discussed in this article. For \(N\rightarrow \infty \), these formulae converge to those of the respective Poisson counterpart.

A second solution for doing residual-based tests for count DGPs could be based on a parametric bootstrap implementation (regarding the respective hypothetical model for the DGP). This would allow to consider further types of DGP, such as higher-order INAR and IN(G)ARCH models or Hidden-Markov models, or to use different estimation approaches for computing the residuals, e.g., ML instead of moment estimators. To get an idea about the performance of such a bootstrap implementation, we did a simulation experiment for the Poi-INAR(1) case, by comparing the above asymptotic approximation with the parametric bootstrap scheme described in Jentsch and Weiß (2018). Since the bootstrap implementation requires much more computing time, we did only 5 000 Monte Carlo replicates this time (and always 500 bootstrap replicates). The results are summarized in Table 7, where the columns “asym” refer to the asymptotic approximation and are taken from Table 2, and where the columns “boot” contain the new simulation results. Although showing more fluctuations, the sizes for this upper-sided bootstrap test are close to the 5%-level, and the power values regarding an NB-INAR(1) DGP agree quite well with those of the asymptotic approximation (with some deterioration for increasing \(\alpha \)). It should be noted, however, that the lower-sided bootstrap test suffers from oversizing, especially for \(\alpha =0.75\) (see Table 8), so further refinements would be required here.

In addition, we also combined the above bootstrap implementation (Poi- against NB-INAR(1) DGP) with ML-based residuals; also recall the discussion in Remark 2. This, however, caused a further increase in computing time because of the numerical optimizations required for each simulation and each bootstrap run. The obtained simulation results are summarized in Table 9. Again, the sizes show a bit more fluctuations than in the case of the asymptotic implementation based on moment estimators (see Table 7), but are sufficiently close to 0.050. The ML-based power values are nearly the same as those obtained using moment estimators for the low autocorrelation level \(\alpha =0.25\), but they become more and more superior with increasing \(\alpha \). This result appears plausible in view of a better performance of the ML estimators than the moment estimators for highly correlated INAR(1) processes; see the discussion in Weiß and Schweer (2016). So if the computational burden caused by the ML-based bootstrap implementation can be managed, this type of residual-based dispersion test seems to be even more powerful in detecting neglected dispersion than the more simple moment-based implementation.

6 Conclusions

For doing model diagnostics with respect to the conditional dispersion structure of the given count time series, we used statistics based on the squared Pearson residuals. To allow for hypothesis testing, we derived asymptotic approximations for the distribution of the test statistics under the null of a Poi-INAR(1) or INARCH(1) DGP. The simulations demonstrated that the resulting tests show a very good performance for uncovering diverse over- and underdispersion scenarios. Although we concentrated on Poi-INAR(1) and INARCH(1) DGPs (and computed the residuals based on moment estimators), we also argued that extensions to other types of count processes (or to other types of estimators) are possible (e.g., based on an appropriate bootstrap implementation). A more detailed study of such extensions and their performance should be done in a future research.