1 Introduction

During the last decade, modeling and analysis of count time series with a bounded support have become a popular topic with a large quantity of articles in fields like epidemiology, social sciences, economics, life sciences and others. The most commonly used approach to model this kind of data is using the first-order binomial autoregressive (BAR(1)) process proposed by McKenzie (1985). The BAR(1) model was constructed by using the binomial thinning operator, which was proposed by Steutel and Van Harn (1979) and defined as \(\displaystyle \alpha \circ X=\sum \nolimits _{i=1}^{X}B_i,\) where \(\{B_i\}\) is an independent and identically distributed (i.i.d.) Bernoulli(\(\alpha \)) random sequence independent of X. Based on this operator, the definition of the BAR(1) model is given below.

Definition 1

The BAR(1) process \(\{X_t\}\) is defined by the recursion

$$\begin{aligned} X_t=\alpha \circ X_{t-1}+\beta \circ (n-X_{t-1})~\mathrm {with}~X_0\sim \mathrm {B}(n,p), \end{aligned}$$
(1.1)

where \(n\in {\mathbb {N}}\), \(\beta :=p(1-\rho )\), \(\alpha :=\beta +\rho \), \(p\in (0,1)\) and \(\rho \in \displaystyle \left( \max {\left\{ -\frac{p}{1-p},-\frac{1-p}{p}\right\} },1\right) \). All thinnings are performed independently of each other, and the thinnings at time t are independent of \((X_s)_{s<t}\).

The BAR(1) model is a homogeneous Markov chain with h-step transition probabilities

$$\begin{aligned}&{P}(X_t=j|X_{t-h}=i)\nonumber \\&\quad =\sum _{m=\max \{0,i+j-n\}}^{\min \{i,j\}}\left( {\begin{array}{c}i\\ m\end{array}}\right) \left( {\begin{array}{c}n-i\\ j-m\end{array}}\right) \alpha _{h}^{m}(1-\alpha _{h})^{i-m}\beta _{h}^{j-m}(1-\beta _{h})^{n-i-j+m},\qquad \end{aligned}$$
(1.2)

where \(\beta _{h}=p(1-\rho ^{h})\) and \(\alpha _{h}=\beta _{h}+\rho ^{h}\).

During the past ten years, the interest in the BAR(1) process has significantly increased and research on this model has gained plentiful and substantial harvest. For example, Scotto et al. (2014) and Ristić and Popović (2019) investigated different types of bivariate BAR(1) processes. Weiß and Pollett (2014) introduced a class of density dependent BAR(1) process, where the thinning probabilities depend on the current observations. Kim and Weiß (2015) considered modeling of bounded count time series with overdispersion via proposing a random coefficient BAR(1) process. Möller et al. (2016) proposed types of self-exciting threshold BAR(1) models. Kang et al. (2020; 2021) introduced an extended version of BAR(1) model and a mixture BAR(1) model based on the generalized binomial thinning operator and mixing Pegram and binomial thinning operators, respectively. For some other meaningful contributions on the BAR(1) process, we refer to Cui and Lund (2010), Weiß (2009a, 2009b), (2013), Weiß and Pollett (2012), Weiß and Kim (2013a, 2013b, 2015), Kim and Weiß (2015), Yang et al. (2017) and Chen et al. (2021), among others.

In the context of count data with an unbounded support, the concept of zero inflation (excess zeros) is that the proportion of 0’s of a model is greater than the proportion of 0’s of the corresponding Poisson model. Zero-inflation phenomenon is frequently encountered in a great number of fields, such as econometrics, manufacturing defects, medical consultations, sexual behavior and so on (see Ridout et al. 1998). Research on modeling zero inflation is vitally necessary and important. As pointed out by Zuur et al. (2009, Chapter 11), ignoring zero inflation can have at least two consequences: first, the estimated parameters and standard errors may be biased; second, the excessive number of zeros can cause overdispersion. Moreover, Perumean-Chaney et al. (2013) pointed out that when zero inflation in the data was ignored, the estimation results were poor and some potentially significant statistical findings were missing, the misspecifications caused by the ignorance of zero inflation may even lead to erroneous conclusions about the data and bring uncertainty to research and applications. In the context of count time series, modeling of zero inflation has also attracted the attention of researchers. On one hand, Zhu (2012) proposed zero-inflated Poisson and negative binomial integer-valued generalized autoregressive conditional heteroscedastic (INGARCH) models. Jazi et al. (2012) introduced a first-order integer-valued autoregressive (INAR(1)) process based on binomial thinning operator with zero-inflated Poisson innovations. Li et al. (2015) proposed a mixed INAR(1) process with zero-inflated generalized power series innovations. These articles give a solution to modeling of zero-inflated count time series with an unbounded support. On the other hand, there are only a few articles about modeling and testing bounded count time series containing a large number of zeros. Möller et al. (2018) developed four extensions of the BAR(1) model, which can accommodate a broad variety of zero patterns. Kim et al. (2018) proposed statistics to evaluate the number of zeros and the dispersion with respect to a binomial model. Besides, Möller et al. (2020) introduced a state-dependent zero-inflation mechanism for count distributions with an unbounded or bounded support.

However, a count time series data set that contains a great number of zeros along with a large number of ones can also arise in practice. For example, most car owners will claim zero or one time in an insurance period since more than one claims can lead to higher premiums in the next insurance period; one may be infected by seasonal flu for at most one time in a quarter due to the existence of antibodies over a period of time. Research on modeling of zero-and-one inflated count time series is still extremely infrequent and there are only a few articles to concentrate on the relevant issue till now. Maiti et al. (2018) constructed a new mixture INAR(1) process for modeling count time series data, in particular data consisting of many zeros and ones. Qi et al. (2019) and Mohammadi et al. (2022) presented binomial thinning INAR(1) processes with zero-and-one inflated Poisson and zero-and-one inflated Poisson–Lindley innovations, respectively. The above three articles considered modeling of zero-and-one inflated unbounded count time series.

While it is also a vitally necessary and significant issue to come up with a solution to modeling of bounded count time series with a large number of zeros and ones, to our best knowledge, there is only one article that studies the relevant issue till now. Liu et al. (2022) proposed a zero-one-inflated bounded Poisson model with an autoregressive feedback mechanism in intensity to fit the normalcy-dominant ordinal time series. This article concentrates on the establishment of statistical model from a new perspective, so as to achieve the purpose of enriching the method for handling zero-and-one inflated bounded count time series. For this, the concept of hidden Markov model is utilized to help us construct a new version of extended BAR(1) model with zero-and-one inflated binomial marginal distribution. Furthermore, a binomial one-inflation index is presented and further employed to construct test statistics to evaluate the number of ones with respect to a binomial model. To illustrate the significance, it is necessary to explain the contribution of this article, especially when our work is compared with Liu et al. (2022). This issue can be discussed based on the following perspectives:

  1. (i)

    The model proposed by Liu et al. (2022) is constructed based on the zero-one-inflated bounded Poisson distribution and INGARCH framework. But it is well known that the binomial distribution and BAR(1) process are almost the most commonly used models to deal with bounded count data. Hence, constructing the extended BAR(1) model to fit the zero-and-one inflated bounded count time series is a natural issue.

  2. (ii)

    Liu et al. (2022) proposed the zero-one-inflated bounded Poisson INGARCH model to better analyze normalcy-dominant ordinal data, rather than the zero-and-one inflated bounded count time series. As we mentioned before, this article concentrates on modeling of the zero-and-one inflated bounded count time series via a different modeling framework.

  3. (iii)

    Another motivation is that the proposed model is flexible and it is a novel extension of the binomial modeling framework, and the BAR(1) and RZ-BAR(1) (see Möller et al. 2018) models are special cases of our model. Moreover, the proposed model not only has an attractive potential for modeling and analyzing bounded count time series with excess zeros and ones, but also can be practical in other situations when the observed process indicates some characteristics such as overdispersion and bimodal.

The rest contents of this article are organized as follows. In Sect. 2, the new model is introduced and some probabilistic and statistical properties are investigated. In Sect. 3, probability-based, quasi-maximum likelihood, maximum likelihood and Bayesian methods are employed to estimate the model parameters. Section 4 studies the binomial one-inflation index and the relevant test problem is addressed. In Sect. 5, we apply the proposed model to two rainy-days counts and an assaults-on-officers counts. The article ends with a conclusion section and has online supplementary materials.

2 A BAR-hidden Markov model

To model zero-and-one inflated count time series with a bounded support, a direct approach is to construct a modified and generalized version of the BAR(1) process. Suppose that \(\{X_t\}\) is the underlying but unobservable BAR(1) process, it is natural to generate additional zeros and ones via randomly replacing some of the \(X_t\) by zeros or ones. Formally, at each time t, a uniform random variable \(S_{t}\) is realized, independently of the previous \(\{X_m\}_{m<t}\) and \(\{S_{m}\}_{m<t}\). We define the observable process \(\{Y_t\}\), satisfying \(Y_t:=0\) if \(0\le S_t< \phi _{0}\), \(Y_t:=1\) if \(\phi _{0}\le S_t< \phi _{0}+\phi _{1}\) and \(Y_t:=X_{t}\) if \(\phi _{2}< S_t\le 1\), with \(0\le \phi _{0},\phi _{1},\phi _{2}\le 1\), \(\phi _{0}+\phi _{1}+\phi _{2}=1\). It can be seen that \(\{X_t,Y_t\}\) constitutes a hidden Markov model (see Zucchini et al. 2009) with BAR(1) kernel \(\{X_t\}\) being the Markov chain of hidden states. From another point of view, some of the original counts \(\{X_t\}\) are masked by a zero or a one, but the underlying BAR(1) process is not affected by them. The masking by zeros and ones is accomplished totally at random by the i.i.d. uniform random sequence \(\{S_t\}\). The above approach for generating zeros and ones is based on the idea of generating missing values \(\{S_t\}\) completely at random. Hence, the BAR-hidden Markov model can also be viewed as the BAR(1) process with zeros and ones at random. Now, we are ready to give the definition of our model based on the above analysis.

Definition 2

The BAR-hidden Markov model \(\{Y_{t}\}_{t\ge 1}\) can be written as follows:

$$\begin{aligned} Y_{t}=\left\{ \begin{array}{ll} 0,&{}\quad \mathrm {with~probability~} \phi _{0},\\ 1,&{}\quad \mathrm {with~probability~} \phi _{1},\\ X_{t},&{}\quad \mathrm {with~probability~} \phi _{2}, \end{array} \right. \end{aligned}$$
(2.1)

where \(0\le \phi _{0},\phi _{1},\phi _{2}\le 1\), \(\phi _{0}+\phi _{1}+\phi _{2}=1\) and \(\{X_t\}\) is the BAR(1) model defined in Eq. (1.1). We call the new process as the first-order zero-and-one inflated binomial autoregressive (ZOIBAR(1)) process.

Remark 1

The ZOIBAR(1) model given in Eq. (2.1) can be rewritten as

$$\begin{aligned} Y_t=0 \cdot {\mathbb {I}}_{\{0\le S_{t}< \phi _{0}\}}+1 \cdot {\mathbb {I}}_{\{\phi _{0}\le S_{t}<\phi _{0}+\phi _{1}\}}+X_{t} \cdot {\mathbb {I}}_{\{1-\phi _{2}\le S_{t}\le 1\}},~t=1,2,\ldots , \end{aligned}$$

where \(\{S_{t}\}\) is an i.i.d. uniform(0,1) random sequence and \(\{X_t\}\) is the BAR(1) model.

Remark 2

Some special cases of the ZOIBAR(1) process defined in Eq. (2.1) are:

  1. (i)

    the classical BAR(1) process proposed by McKenzie (1985) if \(\phi _0=\phi _1=0\);

  2. (ii)

    the BAR(1) process with zeros at random (RZ-BAR(1) process) proposed by Möller et al. (2018) if \(\phi _1=0\);

  3. (iii)

    the i.i.d. Bernoulli random sequence with \({P}(Y_t=0)=1-{P}(Y_t=1)=1-\phi _{1}=\phi _{0}\) if \(\phi _2=0\).

From Definition 2, the BAR(1) kernel \(\{X_t\}\) obviously is a Markov chain. However, the same conclusion cannot be generalized to the process \(\{Y_t\}\) since some of the \(Y_t\) are randomly masked by zeros and ones. Given the tth kernel count \(X_t\), the corresponding observation \(Y_t\) is not only generated by \(X_t\), but it also depends on the generating of \(S_t\). We give an example to illustrate this statement in supplementary materials. Furthermore, the distribution of \(Y_t\) given \(X_t\) can be given as follows:

$$\begin{aligned} {P}(Y_{t}=y_{t}|X_{t}=x_{t})=\phi _{0}{\mathbb {I}}_{\{y_{t}=0\}} +\phi _{1}{\mathbb {I}}_{\{y_{t}=1\}}+\phi _{2}{\mathbb {I}}_{\{y_{t}=x_{t}\}}. \end{aligned}$$

Now, we turn to concentrate on the statistical properties of the ZOIBAR(1) model. We first give the probability mass function (p.m.f.) of the marginal distribution of the ZOIBAR(1) model as follows:

$$\begin{aligned}{P}(Y_{t}=k)=\left\{ \begin{array}{ll} \phi _{0}+\phi _{2}(1-p)^{n},&{}\quad k=0,\\ \phi _{1}+\phi _{2}np(1-p)^{n-1}, &{}\quad k=1,\\ \phi _{2}\left( {\begin{array}{c}n\\ k\end{array}}\right) p^{k}(1-p)^{n-k},&\quad k=2,3,\ldots ,n. \end{array} \right. \end{aligned}$$

It can be easily seen that the marginal distribution of the ZOIBAR(1) model is a zero-and-one inflated binomial distribution. From the marginal distribution of the ZOIBAR(1) model, the mean and variance can be easily obtained by

$$\begin{aligned} {E}(Y_t)=\phi _1+\phi _2np,~\mathrm {Var}(Y_t)=(\phi _1+\phi _2np)[1-(\phi _1+\phi _2np)]+n(n-1)\phi _{2}p^{2}. \end{aligned}$$

The following proposition gives the autocovariance function and autocorrelation function at lag h for the ZOIBAR(1) process. The corresponding proof is provided by Supplement S5.

Proposition 1

The autocovariance function and autocorrelation function at lag h for the ZOIBAR(1) model are given by

$$\begin{aligned}\mathrm {Cov}(Y_{t},Y_{t+h})=\left\{ \begin{array}{ll} (\phi _1+\phi _2np)[1-(\phi _1+\phi _2np)]+n(n-1)\phi _{2}p^{2},&{}\quad h=0,\\ \rho ^{h}\phi _{2}^{2}np(1-p),&{}\quad h=1,2,\ldots , \end{array} \right. \end{aligned}$$

and

$$\begin{aligned} \rho (h):= & {} \mathrm {Corr}(Y_{t},Y_{t+h})\\= & {} \left\{ \begin{array}{ll} 1,&{}\quad h=0,\\ \rho ^{h}\dfrac{\phi _{2}^{2}np(1-p)}{(\phi _1+\phi _2np)[1-(\phi _1+\phi _2np)]+n(n-1)\phi _{2}p^{2}},&{}\quad h=1,2,\ldots . \end{array} \right. \end{aligned}$$

This proposition implies that the autocorrelation function of the process decays exponentially to 0 as \(h\rightarrow \infty \). This characteristic is similar to that of the BAR(1) process. Furthermore, we have shown that the marginal mean of \(\{Y_t\}\) and the autocovariance function between \(Y_{t+h}\) and \(Y_{t}\) do not rely on the time index t. So it can be concluded that the ZOIBAR(1) model is covariance (weakly) stationary. Furthermore, we can give the partial autocorrelation function based on the autocorrelation function in Proposition 1.

Remark 3

Denote \(\rho _{\mathrm {part}}(h):=a_{hh}\) as the hth-order partial autocorrelation for the ZOIBAR(1) model. According to Box et al. (1994), the \(a_{hj}\) follow the recursive scheme

$$\begin{aligned} a_{h+1,h+1}&=\frac{\rho (h+1)-\sum \nolimits _{i=1}^{h}a_{hi}\cdot \rho (h+1-i)}{1-\sum \nolimits _{i=1}^{h}a_{hi}\cdot \rho (i)},~h\in {\mathbb {N}}_{0}\\ a_{h+1,j}&=a_{hj}-a_{h+1,h+1}\cdot a_{h,h-j+1},~j=1,\ldots ,h, \end{aligned}$$

where \(\rho (\cdot )\) is given in Proposition 1. Based on the above recursive scheme, the first-order and second-order partial autocorrelations for the ZOIBAR(1) model are

$$\begin{aligned} \rho _{\mathrm {part}}(1)=\rho (1),~ \rho _{\mathrm {part}}(2)=\frac{\rho (2)-\rho ^{2}(1)}{1-\rho ^{2}(1)}. \end{aligned}$$

The high-order partial autocorrelation for the ZOIBAR(1) model can also be given in the similar way.

Now, we are ready to introduce the binomial index of dispersion, \(I_d\), which is a useful metric to quantify the dispersion behavior of a count random variable X with a finite range \(\{0,1,\ldots ,n\}\). The binomial index of dispersion \(I_d\) is defined as

$$\begin{aligned} I_{d}=\frac{n\sigma ^2}{\mu (n-\mu )}=1+\frac{n(\sigma ^2-\mu )+\mu ^2}{\mu (n-\mu )}\in (0,\infty ), \end{aligned}$$

where \(\mu \) and \(\sigma ^2\) are the mean and variance of the random variable X, respectively. A finite range count random variable is said to have overdispersion if \(I_d>1\), it is equidispersed if \(I_d=1\), and it is underdispersed if \(I_d<1\). From the mean and variance formulae, we can calculate the binomial dispersion index of the ZOIBAR(1) process as follows:

$$\begin{aligned} I_{d}=1+\frac{(n-1)[\phi _{2}n^2p^2-(\phi _{1}+\phi _{2}np)^2]}{(\phi _1+\phi _2np)[n-(\phi _1+\phi _2np)]}. \end{aligned}$$

The ability of the ZOIBAR(1) model to explain zero inflation is an important topic in our work. It is worth emphasizing that relying entirely on the 0’s proportion, we may determine the extent of zero inflation in a misleading way, especially if we ignore the mean of data. To illustrate this statement, we consider a count time series data set, which represents monthly counts of car beats in Pittsburgh (among \(n=42\) such car beats) that had at least one police offense report of prostitution in that month. The data consist of \(T=96\) observations, starting from January 1990 and ending in December 1997. Figure S1 and Table S1, respectively, show the sample path, histogram and descriptive statistics for the prostitution series. A zero frequency of 63.5% may lead practitioners to firmly believe that the empirical degree of zero inflation is quite moderate. However, we can also observe the mean of the series is 0.5313, which is relatively small. By a simple calculation, a binomial random variable X with mean \({\bar{X}}=0.5313\) and \(n=42\) gives a zero probability of 0.5846, which is a little smaller than 63.5%. Hence, declaring the degree of zero inflation of this series is “quite moderate” may be unconvincing. To quantify and assess the zero-inflation behavior more precisely, Kim et al. (2018) introduced binomial zero-inflation index and sample binomial zero-inflation index for a count random variable X with a finite range \(\{0,1,\ldots ,n\}\) as follows:

$$\begin{aligned} z_{0}=p_{0}\bigg (1-\frac{\mu }{n}\bigg )^{-n}\in (0,\infty ),~{\hat{z}}_{0}={\hat{p}}_{0}\bigg (1-\frac{{\bar{X}}}{n}\bigg )^{-n}, \end{aligned}$$

where \(p_{0}\) and \(\mu \) are the zero probability and mean of the random variable X, \({\hat{p}}_{0}=\frac{1}{T}\sum _{t=1}^{T}{\mathbb {I}}_{\{X_t=0\}}\) and \({\bar{X}}=\frac{1}{T}\sum _{t=1}^{T}X_t\). If X is binomially distributed, then \(z_{0}=1\) and \({\hat{z}}_{0}\) should be close to 1. If \(z_{0}>1\), naturally, X is said to show zero inflation with respect to the binomial distribution. Table S1 shows that the sample zero-inflation index for the prostitution series is \({\hat{z}}_0=1.0845\), which is close to 1. So we cannot conclude that the concerned series is zero-inflated based on this aspect. As a summary, zero-inflation index is the more reasonable assessment criterion rather than 0’s proportion.

The binomial zero-inflation index gives us a powerful tool to measure the zero pattern departure from the binomial model. Hence, we can calculate the binomial zero-inflation index of the ZOIBAR(1) process as follows:

$$\begin{aligned} z_{0}=[\phi _{0}+\phi _{2}(1-p)^{n}]\bigg (1-\frac{\phi _{1}+\phi _{2}np}{n}\bigg )^{-n}. \end{aligned}$$

From the above equation, we find that \(z_0\) increases as \(\phi _0\) increases when \(\phi _1\), \(\phi _2\), n and p are fixed.

The following proposition gives the h-step transition probabilities for theZOIBAR(1) process and the corresponding proof is provided by Supplement S5.

Proposition 2

Let \({P}_{ij}^{(h)}:={P}(Y_{t}=j|Y_{t-h}=i)\), then the h-step transition probabilities for the ZOIBAR(1) model are given by

$$\begin{aligned}{P}_{ij}^{(h)}=\left\{ \begin{array}{ll} \phi _{2}{P}(X_{t}=j|X_{t-h}=i),&{}\quad i,j\ge 2,\\ \phi _{i}\dfrac{{P}(Y_{t}=j)}{{P}(Y_{t-h}=i)} +\phi _{2}\dfrac{{P}(X_{t-h}=i)[\phi _{j}+\phi _{2}{P}(X_{t}=j|X_{t-h}=i)]}{{P}(Y_{t-h}=i)},&{}\quad 0\le i,j\le 1,\\ \dfrac{\phi _{i}{P}(Y_{t}=j)+\phi _{2}^{2}{P}(X_{t-h}=i){P}(X_{t}=j|X_{t-h}=i)}{{P}(Y_{t-h}=i)},&{}\quad i\le 1,j\ge 2,\\ \phi _{j}+\phi _{2}{P}(X_{t}=j|X_{t-h}=i),&{}\quad i\ge 2,j\le 1, \end{array} \right. \end{aligned}$$

where \(\{X_t\}\) is the BAR(1) process and \({P}(X_{t}=j|X_{t-h}=i)\) is given by Eq. (1.2).

Remark 4

As suggested by a referee, it should be pointed out that the maximum likelihood estimation for the ZOIBAR(1) model can not be implemented based on Proposition 2. The reason is that the ZOIBAR(1) process \(\{Y_{t}\}\) is not a Markov chain. To be specific, we have

$$\begin{aligned} {{P}(Y_{t}=y_{t}|Y_{t-h}=y_{t-h})\ne {P}(Y_{t}=y_{t}|Y_{t-h}=y_{t-h},\cdots ,Y_{1}=y_{1}),} \end{aligned}$$

i.e., \(Y_{t}\) will not only depend on \(Y_{t-h}\), but also depends on \(Y_{t-k}\) (\(h\le k \le t-1\)). However, we still investigate the h-step transition probabilities in Proposition 2 based on the following considerations:

  1. (i)

    The h-step transition probabilities can help us implement the quasi-likelihood estimation for the ZOIBAR(1) model (see Sect. 3.2).

  2. (ii)

    The h-step transition probability is an important statistical property for the BAR(1) model. Since the ZOIBAR(1) model is a generalization of the BAR(1) model, it is a straightforward idea to investigate the h-step transition probability for the ZOIBAR(1) model.

From the h-step transition probabilities, we can obtain the conditional h-step moments of the ZOIBAR(1) model

$$\begin{aligned} {E}(Y_{t}|Y_{t-h})=\left\{ \begin{array}{ll} \phi _{1}+\phi _{2}\dfrac{\phi _0np+\phi _2(1-p)^{n}n\beta _{h}}{\phi _0+\phi _2(1-p)^n},&{} \quad Y_{t-h}=0,\\ \phi _{1}+\phi _{2}\dfrac{\phi _{1}np+\phi _{2}np(1-p)^{n-1}(n\beta _h+\rho ^h)}{\phi _1+\phi _2np(1-p)^{n-1}},&{} \quad Y_{t-h}=1,\\ \phi _{1}+\phi _{2}(\rho ^{h} Y_{t-h}+n\beta _{h}),&{} \quad Y_{t-h}\ge 2, \end{array} \right. \qquad \end{aligned}$$
(2.2)

and \({E}(Y_{t}^2|Y_{t-h})\)

$$\begin{aligned} =\left\{ \begin{array}{l} \phi _{1}+\phi _{2}\dfrac{\phi _0[np(1-p)+n^2p^2]+\phi _2(1-p)^{n}[n\beta _h(1-\beta _h)+n^2\beta _h^2]}{\phi _0+\phi _2(1-p)^n},~Y_{t-h}=0,\\ \phi _{1}+\phi _{2}\dfrac{\phi _{1}[np(1-p)+n^2p^2]+\phi _{2}np(1-p)^{n-1}[(n-1)\beta _h(1+n\beta _h+2\rho ^h)+\alpha _h]}{\phi _1+\phi _2np(1-p)^{n-1}},\\ \quad \quad Y_{t-h}=1,\\ \phi _{1}+\phi _{2}[\rho ^h(1-\rho ^h)(1-2p) Y_{t-h}+n\beta _h(1-\beta _h)+(\rho ^h Y_{t-h}+n\beta _h)^2],~Y_{t-h}\ge 2. \end{array} \right. \end{aligned}$$

It can be shown that the joint probability generating function of \(Y_{t-1}\) and \(Y_{t}\) is

$$\begin{aligned}&\Phi _{Y_{t-1},Y_{t}}(u,v)\\&\quad =\phi _{0}^{2}+\phi _{0}\phi _{1}(u+v)+\phi _{1}^{2}uv+\phi _{0}\phi _{2}[(1-p+pu)^{n}+(1-p+pv)^{n}]\\&\qquad +\phi _{1}\phi _{2}[u(1-p+pv)^{n}+v(1-p+pu)^{n}]\\&\qquad +\phi _{2}^{2}[(1-p)(1-\beta )+p(1-p)(1-\rho )(u+v)+p(p-p\rho +\rho )uv]^{n}, \end{aligned}$$

which shows that \(\Phi _{Y_{t-1},Y_{t}}(u,v)\) is symmetric in u and v, so the proposed process is time reversible (see Schweer 2015).

3 Estimation procedure

Suppose that \(\{Y_t\}\) are observations from the ZOIBAR(1) model defined in Definition 2. Our task is to estimate the unknown parameters \(\varvec{\lambda }=(p,\rho ,\phi _{0},\phi _{1})^{\mathrm {\top }}\) from a sample \({\mathbf {Y}}=(Y_1,Y_2,\ldots ,Y_T)\). The parameter n is assumed to be known. Four different estimators, namely probability-based estimator (PBE), quasi-maximum likelihood estimator (QMLE), maximum likelihood estimator (MLE) and Bayesian estimator (BE), are considered.

3.1 Probability-based estimation

The PBE of the parameters \(p,\phi _{0},\phi _{1}\) can be obtained from the fact that

$$\begin{aligned} {P}(Y_{t}=0)&=\phi _{0}+\phi _{2}(1-p)^{n}~ {(\mathrm {or}~{P}(Y_{t}=1)=\phi _{1}+\phi _{2}np(1-p)^{n-1})},\\ {P}(Y_{t}=k)&=\phi _{2}\left( {\begin{array}{c}n\\ k\end{array}}\right) p^{k}(1-p)^{n-k},~k=2,3,.... \end{aligned}$$

The probability \(p_{k}:={P}(Y_t=k)\) can be estimated by using the statistics

$$\begin{aligned} {\hat{p}}_{k}:=\frac{1}{T}\sum _{t=1}^{T}{\mathbb {I}}_{\{Y_{t}=k\}},~k=0,1,2,\ldots ,n. \end{aligned}$$

Therefore, the PBE of the parameters \(p,\phi _{0},\phi _{1}\) can be given by

$$\begin{aligned} {\hat{p}}_{PB}&=\frac{(k+1){\hat{p}}_{k+1}}{(k+1){\hat{p}}_{k+1}+(n-k){\hat{p}}_{k}},~~~ {\hat{\phi }}_{2,PB}=\frac{{\hat{p}}_{k}\Gamma (k+1)\Gamma (n-k+1)}{{\hat{p}}_{PB}^{k}(1-{\hat{p}}_{PB}^{k})^{n-k}\Gamma (n+1)}, \nonumber \\ {\hat{\phi }}_{0,PB}&={\hat{p}}_{0}-{\hat{\phi }}_{2,PB}(1-{\hat{p}}_{PB})^{n}~ {(\mathrm {or}~{\hat{\phi }}_{1,PB}={\hat{p}}_{1}-{\hat{\phi }}_{2,PB}n{\hat{p}}_{PB}(1-{\hat{p}}_{PB})^{n-1})}, \end{aligned}$$
(3.1)

where \(k=2,3,\ldots ,n-1\). We can also obtain the PBE for \(\rho \) by taking advantage of \({P}_{ij}^{(h)}\) given in Proposition 2. However, it is too complex to derived the closed-form estimator \({\hat{\rho }}_{PB}\). So we turn to moment (MM) estimation and obtain the following corresponding estimator:

$$\begin{aligned} {\hat{\rho }}_{MM}=\frac{\sum _{t=1}^{T-1}(Y_t-{\bar{Y}})(Y_{t+1}-{\bar{Y}})}{(T-1){\hat{\phi }}_{2,PB}^{2}n{\hat{p}}_{PB}(1-{\hat{p}}_{PB})},~\mathrm {with}~{\bar{Y}}=\frac{\sum _{t=1}^{T}Y_{t}}{T}. \end{aligned}$$

It is worth mentioning that the selection of k value in Eq. (3.1) is an unavoidable and crucial problem. The reason is that an unappropriate selection of k value may lead to the result that the estimators tend to infinity under some parameter combinations. For example, when we set large values of n, \(\phi _0\) and \(\phi _1\) and small value of p in simulation, it is very likely to happen that big numbers are absent in random numbers. Under this circumstance, assuming \(k=n-1\) may fail to produce available estimators. For the convenience of practitioners in practice, we propose a simple method to select k based on the following two steps: (i) let \({\mathbf {S}}=\{s:{\hat{p}}_{s}>0,{\hat{p}}_{s+1}>0,s=2,3,\ldots ,n-1\}\); (ii) then \(k=\mathop {\arg \max }\limits _{s\in {\mathbf {S}}}{\hat{p}}_{s}\).

3.2 Quasi-maximum likelihood estimation

In order to obtain the maximum likelihood estimates of the ZOIBAR(1) process, we must maximize the log-likelihood function

$$\begin{aligned} \ell (\varvec{\lambda })&=\log {P}(Y_{1}=y_{1},Y_{2}=y_{2},\ldots ,Y_{T}=y_{T})\\&=\log [{P}(Y_{1}=y_{1}){P}(Y_{2}=y_{2}|Y_{1}=y_{1}){P}(Y_{3}=y_{3}|Y_{1}=y_{1},Y_{2}=y_{2})\\&~~~\cdots {P}(Y_{T}=y_{T}|Y_{1}=y_{1},\cdots Y_{T-1}=y_{T-1})], \end{aligned}$$

where \(0<p<1\), \(\max \left\{ -\dfrac{1-p}{p},-\dfrac{p}{1-p}\right\}<\rho <1\), \(0\le \phi _0,\phi _1,\phi _2\le 1\) and \(\phi _0+\phi _1+\phi _2=1\). However, it is not straightforward to give the closed-form expression for

$$\begin{aligned} {P}(Y_{t}=y_{t}|Y_{1}=y_{1},\cdots ,Y_{t-1}=y_{t-1}),~t\in \{2,3,\ldots ,T\}. \end{aligned}$$

For this, we follow Maiti et al. (2018) and employ one-step quasi-maximum likelihood approach, which relies on the approximation

$$\begin{aligned} {P}(Y_{t}=y_{t}|Y_{t-1}=y_{t-1})\!\approx \! {P}(Y_{t}=y_{t}|Y_{1}=y_{1},\ldots ,Y_{t-1}=y_{t-1}), t\in \{2,3,\ldots ,T\} \end{aligned}$$

to simplify the computation. Then, the QMLE can be obtained by maximizing the following logarithmic quasi-likelihood function:

$$\begin{aligned} \ell ^{*}(\varvec{\lambda })=\log {P}(Y_{1}=y_{1})+\sum _{t=2}^{T}\log {P}(Y_{t}=y_{t}|Y_{t-1}=y_{t-1}), \end{aligned}$$

rather than maximizing the actual logarithmic likelihood function. The transition probabilities \({P}(Y_{t}=y_{t}|Y_{t-1}=y_{t-1}),~t\in \{2,\ldots ,T\}\) are given by Proposition 2. It is natural to notice that no closed-form expressions for the QMLE can be found, so the use of numerical procedure is inevitable.

3.3 Maximum likelihood estimation

The key to implement the MLE method is to give the likelihood function. The ZOIBAR(1) process is a stationary HMM, where the hidden process \(\{X_t\}\) is the BAR(1) model. The BAR(1) model has the stationary binomial marginal distribution \(\mathrm {B}(n,p)\) and its transition probabilities are given by Eq. (1.2). Following Zucchini et al. (2009), we denote the corresponding transition matrix by \(\varvec{\Gamma }=(\gamma _{i+1,j+1})_{i,j=0,\ldots ,n}\) with \(\gamma _{i+1,j+1}={P}(X_{t+1}=j|X_{t}=i)\) and \(\varvec{\Gamma }^{k}\) is obtained from \(\varvec{\Gamma }\) by replacing \(\rho \) by \(\rho ^{k}\). We also denote the stationary marginal distribution of \(\{X_t\}\) by \(\varvec{\delta }=(\delta _{1,k+1})_{k=0,\ldots ,n}\) with \(\delta _{1,k+1}={P}(X_{t}=k)\). The diagonal matrices \({\mathbf {P}}(y):=\mathrm {diag}\big ({P}(Y_t=y|X_t=0),\ldots ,{P}(Y_t=y|X_t=n)\big )\in [0,1]^{(n+1)\times (n+1)}\). Using the results and notations from Zucchini et al. (2009), we obtain

$$\begin{aligned} {\varvec{\alpha }_{1}=\varvec{\delta }{\mathbf {P}}(y_{1}), ~\varvec{\alpha }_{t}=\varvec{\alpha }_{t-1}\varvec{\Gamma }{\mathbf {P}}(y_{t}),~\mathrm {for}~t=2,\ldots ,T.} \end{aligned}$$

where \(\varvec{\alpha }_{t}\) is the vector of “forward probabilities" at time t, i.e., \(\alpha _{t,x}={P}(Y_{1}=y_{1},\ldots ,Y_{t}=y_{t},X_{t}=x)\) with \(x\in \{0,1,\ldots ,n\}\). Based on the above formulae, the actual likelihood function can be given by

$$\begin{aligned} {\ell (\varvec{\lambda })=\varvec{\alpha }_{T}{\mathbf {1}}^{\top }.} \end{aligned}$$

3.4 Bayesian estimation

The likelihood inference might be the most commonly used approach in count time series analysis to estimate the model parameters. However, this mainstream method still encounters some difficulties in practice: (i) it is highly affected by the outliers; (ii) the estimators rely on the appropriate selection of numerical optimization procedure and choice of initial values for numerically maximizing the (logarithmic) likelihood function; (iii) it sometimes fails to outperform the moment method and conditional least square method for short time series. As an extension to the classic likelihood method, the Bayesian analysis for time series of counts has attracted a lot of attention and some articles have arisen in the literature. In this section, our interest is focused on estimating \(\varvec{\lambda }=(p,\rho ,\phi _{0},\phi _{1})^{\mathrm {\top }}\) under Bayesian paradigm. As we mentioned in Section 3.2, the actual likelihood function can be approximated by the quasi-likelihood function. Hence, the Bayesian estimator can be derived based on the quasi-likelihood function or actual likelihood function. Besides, the priors play an important role and represent strong pre-experimental assumptions for the possible values of model parameters. Now, we will pay attention to assign the appropriate priors for the model parameters.

3.4.1 Prior for parameter p

Beta distribution is commonly used to play a role as the priors of the autoregressive coefficients. Here, due to the constrain \(p\in (0,1)\) for the ZOIBAR(1) model, we assume that the prior of parameter p is Beta distribution given by

$$\begin{aligned} f_{1}(p)=\frac{1}{\mathrm {B}(a,b)}p^{a-1}(1-p)^{b-1},~0<p<1, \end{aligned}$$

where \(\mathrm {B}(a,b)=\dfrac{\Gamma (a)\Gamma (b)}{\Gamma (a+b)}\) and \(a,b>0\).

3.4.2 Prior for parameter \(\rho \)

The Kumaraswamy distribution was introduced by Kumaraswamy (1980) for modeling double bounded random processes with a wide variety of applications, specially in hydrology. Since the parameter \(\rho \) for our model is double bounded with \(\rho \in \displaystyle \left( \max {\left\{ -\frac{p}{1-p},-\frac{1-p}{p}\right\} },1\right) \), so we assume that the prior of parameter \(\rho \) is Kumaraswamy distribution given by

$$\begin{aligned} f_{2}(\rho )=\frac{\varphi \delta }{d-c}\bigg (\frac{\rho -c}{d-c}\bigg )^{\varphi -1}\bigg [1-\bigg (\frac{\rho -c}{d-c}\bigg )^{\varphi }\bigg ]^{\delta -1},~c<\rho <d, \end{aligned}$$

where \(c=\displaystyle \max {\left\{ -\frac{p}{1-p},-\frac{1-p}{p}\right\} }\), \(d=1\) and \(\delta ,\varphi >0\).

3.4.3 Prior for parameters \(\phi _0\) and \(\phi _{1}\)

Dirichlet distribution is a direct extension of Beta distribution and serves as a bridge between distributions. Due to the characteristic of the values of \(\phi _0,\phi _1,\phi _2\), a suitable prior for \(\phi _0,\phi _1\) is Dirichlet distribution, which is given below:

$$\begin{aligned} f_{3}(\phi _{0},\phi _{1})&=\frac{\Gamma (\theta _0+\theta _1+\theta _2)}{\Gamma (\theta _0)\Gamma (\theta _1)\Gamma (\theta _2)}\phi _{0}^{\theta _0-1}\phi _{1}^{\theta _1-1}(1-\phi _{0}-\phi _{1})^{\theta _2-1}, \\ \phi _0,\phi _1&\ge 0,\phi _0+\phi _1<1, \end{aligned}$$

where \(\theta _0,\theta _1,\theta _2>0\).

Based on the priors of parameters \(p,\rho ,\phi _0,\phi _1\) given above, the prior \(\varvec{\pi }(\varvec{\lambda })\) of our model can be given as

$$\begin{aligned} \varvec{\pi }(\varvec{\lambda })= & {} \frac{\varphi \delta \Gamma (\theta _0+\theta _1+\theta _2)(\rho -c)^{\varphi -1}\phi _{0}^{\theta _0-1}\phi _{1}^{\theta _1-1}p^{a-1}(1-p)^{b-1}}{\mathrm {B}(a,b)\Gamma (\theta _0)\Gamma (\theta _1)\Gamma (\theta _2)(1-c)^{\varphi }(1-\phi _{0}-\phi _{1})^{1-\theta _2}}\\&\times \bigg [1-\bigg (\frac{\rho -c}{1-c}\bigg )^{\varphi }\bigg ]^{\delta -1}, \end{aligned}$$

where \(0<p<1\), \(c<\rho <1\), \(\phi _0,\phi _1\ge 0,\phi _0+\phi _1<1\), \(a,b,\delta ,\varphi ,\theta _0,\theta _1,\theta _2>0\) and \(c=\max \left\{ -\dfrac{1-p}{p},-\dfrac{p}{1-p}\right\} \). Hence, the corresponding posterior distribution can be written as

$$\begin{aligned} \varvec{\pi }(\varvec{\lambda }|\varvec{Y})=\frac{\varvec{\pi }(\varvec{\lambda })\mathrm {L}(\varvec{Y}|\varvec{\lambda })}{\int _{0}^{1}\int _{c}^{1}\int _{0}^{1}\int _{0}^{1-\phi _1}\varvec{\pi }(\varvec{\lambda })\mathrm {L}(\varvec{Y}|\varvec{\lambda }) d\phi _0 d\phi _1d\rho dp}, \end{aligned}$$
(3.2)

where \(\mathrm {L}(\varvec{Y}|\varvec{\lambda })\) is the quasi-likelihood function or actual likelihood function.

Finally, we consider the Bayesian inference under the square error loss function. It is well known that the Bayesian estimation under the square error loss function is the expectation of the posterior distribution. Therefore, the BE for parameter p can be given by

$$\begin{aligned} {\hat{p}}&=\int _{0}^{1}\int _{c}^{1}\int _{0}^{1}\int _{0}^{1-\phi _1} p\varvec{\pi }(\varvec{\lambda }|\varvec{Y})\mathrm{d}\phi _0 \mathrm{d}\phi _1 \mathrm{d}\rho \mathrm{d}p, \end{aligned}$$

where \(\varvec{\pi }(\varvec{\lambda }|\varvec{Y})\) is the posterior distribution given in Eq. (3.2). The BE for parameter \(\rho ,\phi _{0},\phi _{1}\) can be given in a similar way.

3.5 Simulation study

The goal of the simulation study presented in this section is to examine the performances of the PBE, QMLE, MLE and BE previously described and compare their behaviors from the perspective of clean and contaminated data. The initial value \(Y_0\sim \mathrm {ZOIB}(n,p,\phi _0,\phi _1)\). We generate data from the ZOIBAR(1) model and set the sample sizes \(T=50,300,1000\) to reflect small (\(T=50\)), moderate (\(T=300\)) and large (\(T=1000\)) sample sizes. The true values of the parameters are selected as:

  • Model (A): \((n,p,\rho ,\phi _{0},\phi _{1})=(30,0.5,-0.4,0.2,0.1)\);

  • Model (B): \((n,p,\rho ,\phi _{0},\phi _{1})=(30,0.4,-0.1,0.1,0.2)\);

  • Model (C): \((n,p,\rho ,\phi _{0},\phi _{1})=(50,0.5,0.2,0.1,0.1)\);

  • Model (D): \((n,p,\rho ,\phi _{0},\phi _{1})=(50,0.7,0.1,0.2,0.2)\).

Note that we assume n is known. In simulations, we apply the mean absolute deviation error (MADE) and standard error (SD) based on \(m=1000\) replications for each parameter combination to evaluate the performances of the proposed estimators. These criteria are defined as follows:

$$\begin{aligned} \mathrm {MADE}=\frac{1}{m}\sum _{k=1}^{m}|{\hat{\rho }}_{k}-\rho |,~\mathrm {SD}=\frac{1}{m-1}\sqrt{\sum _{k=1}^{m}({\hat{\rho }}_{k}-{\bar{\rho }})^{2}}, \end{aligned}$$

where \({\hat{\rho }}_{k}\) is the estimator of \(\rho \) in the kth replication and \({\bar{\rho }}=\frac{1}{m}\sum _{k=1}^{m}{\hat{\rho }}_{k}\). The hyper-parameters in the prior distributions are assumed to be \(a=3\), \(b=5\), \(\theta _{0}=2\), \(\theta _{1}=4\), \(\theta _{2}=6\), \(\varphi =1\) and \(\delta =2\). Figure S2 shows the sample paths and marginal distributions of Models (A)–(D). From Figure S2, we can see that the ZOIBAR(1) model not only has the ability to capture zero-and-one inflation, but also can well explain the bimodal characteristic. Table S2 lists some statistics of Models (A)-(D) including the mean, variance, binomial dispersion index \(I_d\), zero probability \(p_{0}\) and one probability \(p_{1}\).

3.5.1 Performance without outliers

In this section, we compare performances of the estimators in two settings by using the clean data, i.e., data in absence of outliers. On the one hand, as can be seen from Table 1, the MADEs and SDs values of the four estimators are reduced if the sample size T increases and this situation is identical to our expectation. Broadly speaking, the MADEs and SDs for the BE are smaller than those for the QMLE and MLE when the sample size is small (\(T=50\)). As the sample size increases, the performances of the BE, QMLE and MLE become competitive. The emergence of this phenomenon is understandable since the effect of priors on parameters is negligible when the sample size is prettily large. In addition, although the PBE fails to give satisfactory results compared with the QMLE, MLE and BE, it still has an advantage in calculation. To be specific, the PBE has the closed-form estimators and can help practitioners to avoid the time-consuming approach since it does not need any numerical optimization procedure.

Table 1 The MADEs and SDs of the estimates

On the other hand, we can always compute the estimates for \((p,\rho ,\phi _0,\phi _1)\) according to the discussed methods in Sects. 3.13.4, but the obtained estimates did not always satisfy the restrictions on \((p,\rho ,\phi _0,\phi _1)\). A pair of estimates \(({\hat{p}},{\hat{\rho }},{\hat{\phi }}_0,{\hat{\phi }}_1)\) is classified as being admissible if it satisfies the conditions on \((p,\rho ,\phi _0,\phi _1)\) based on the definition of the ZOIBAR(1) process. Thus, we further evaluate the proposed approaches via referring the percentage of inadmissible estimates of each method with different sample sizes. The corresponding results have been reported in Table 2. For the PBE, the percentages of inadmissible estimates are significantly affected by the parameter combinations and sample sizes. An intuitional phenomenon is that the percentages of inadmissible estimates for the PBE increase if the parameters are close to the boundary (see the cases for \(\rho =0.1\) and \(\phi _0=\phi _1=0.1\)). Differently, it is clear that the QMLE, MLE and BE almost always produces admissible estimates for all parameter combinations even though the sample size is fairly small and parameters are close to the boundary. To illustrate this statement, we further consider the extremely small sample size and find that the above three methods can always produce admissible estimates even though \(T=30\), especially the Bayesian method. Based on these discussions, we conclude that the BE is more reliable than the other methods based on this aspect.

Table 2 Percentages of admissible estimates

3.5.2 Performance with outliers

In this section, we compare performances of the estimators by using the contaminated data. We consider additive outlier generating mechanisms with positive and negative outliers, which can be given in details as follows:

$$\begin{aligned} Z_t=\left\{ \begin{array}{ll} \min \{Y_t+\zeta ,n\}, &{}\quad t=\tau _1,\tau _2,\ldots ,\tau _k,\\ Y_t,&{}\quad \mathrm {otherwise}, \end{array} \right. \end{aligned}$$
(3.3)

and

$$\begin{aligned} Z_t=\left\{ \begin{array}{ll} \max \{Y_t-\zeta ,0\}, &{}\quad t=\tau _1,\tau _2,\ldots ,\tau _k,\\ Y_t,&{}\quad \mathrm {otherwise}, \end{array} \right. \end{aligned}$$
(3.4)

where \(Z_t\) is the contaminated observed value at specific time \(\tau _i\), \(i=1,\ldots ,k\), and \(\zeta \) is the size of outlier.

In this article, we consider cases of consecutive and isolated outliers and assume the size and number of outliers are \(\zeta =2\) and \(k=3,5\), respectively. For the consecutive outliers, we set \(\tau _1=T/2+1\) and illustrate this outliers generating mechanism as follows: when the sample size is \(T=50\) and the number of outliers is \(k=3\), then we will observe 3 consecutive outliers to occur at time \(t=26,27\) and 28. Differently, the outliers are added to the observed data in randomly chosen positions for the isolated outliers.

Tables S3–S6 report the MADEs and SDs of the estimators with sample size \(T=50\) and the contaminated data generated from Eqs. (3.3)–(3.4), respectively. Comparing Tables S3–S4 with Tables S5–S6, it can be seen that the consecutive and isolated outliers, positive and negative outliers have similar effect on the estimators. Meanwhile, we find that the performances of the estimators become more worse as the number of outliers increases, as expected. Additive outliers have been known to be specifically harmful for the estimation of dependence parameters and this point has been verified by the phenomenon that the accuracy of the QMLE, MLE and BE for parameter \(\rho \) has been significantly reduced. Comparing the four estimation methods, we can see that the BE still yields the smallest values of the MADEs and SDs in most situations.

3.5.3 Discussion

We have compared the proposed estimators based on different aspects. For the clean data, the BE yields the smallest values of MADEs and SDs when the simple size is small. Moreover, the BE can also give the biggest percentages of admissible estimators in most cases. For the contaminated data, the BE also gives more robust performances than other methods. In summary, we recommend the use of the Bayesian approach for our model.

It must be pointed out that the quasi-likelihood function is used in Bayesian estimation rather than the actual likelihood function. From Tables 1 and S3–S6, we can see that the MLE and QMLE always give similar performances. To be specific, the quasi-maximum likelihood and maximum likelihood methods yield the same estimates when we consider parameters p, \(\phi _{0}\) and \(\phi _{1}\). Based on the parameter \(\rho \), the MADEs and SDs values of the MLE are a little smaller than the QMLE when the sample size is small. Hence, we conclude that the improvements, which can be accomplished by the MLE, to the QMLE are extremely limited. Moreover, compared to the QMLE, the MLE need more calculation and time costs. To illustrate this point, we give a simulation study in Table S7, which shows the durations, MADEs and SDs of the MLE and QMLE. From Table S7, we can see that the MLE need much more running time than the QMLE, especially in the cases of large n values (\(n=50\) in Models (C) and (D)). Based on the above discussions, the employment of the quasi-likelihood function in the Bayesian estimation is a more reasonable option than the actual likelihood function. Besides, it is worth noting that the time cost of the MLE is not sensitive about the sample size. However, by comparing Models (A) and (B) with Models (C) and (D), the increase of the parameter n value will greatly increase the calculation pressure and running time.

As suggested by a referee, the scenario \(\phi _0=\phi _1=0\) is considered. We pointed out in Remark 2 that the ZOIBAR(1) model will reduce to the BAR(1) model when \(\phi _0=\phi _1=0\). Furthermore, \(\phi _0=\phi _1=0\) is the boundary of the parameter space for the unknown parameters \(\varvec{\lambda }=(p,\rho ,\phi _{0},\phi _{1})^{\mathrm {\top }}\). To investigate the performances of the proposed in the case of \(\phi _0=\phi _1=0\), an addition simulation study is given in Table S8. The parameter combinations are selected as:

  • Model (E1): \((n,p,\rho ,\phi _{0},\phi _{1})=(10,0.5,-0.4,0,0)\);

  • Model (E2): \((n,p,\rho ,\phi _{0},\phi _{1})=(10,0.7,0.1,0,0)\).

From Table S8, we can see that the simulation results of the QMLE, MLE and BE are still satisfactory. This phenomenon implies that the above three methods are still effective even though we consider degradation forms of the ZOIBAR(1) model.

4 Testing for one inflation in BAR(1) model

Weiß and Pollett (2014) developed a test for detecting overdispersion via taking advantage of the sample binomial dispersion index (without bias correction). Kim et al. (2018) improved and generalized their results by deriving a novel test concerning overdispersion and further proposed a zero-inflation index, which was employed to test for detecting zero inflation. Moreover, the authors introduced a joint test for zero inflation/deflation and overdispersion/underdispersion with respect to a BAR(1) process. In this article, our goal is to focus on the following two perspectives:

  1. (i)

    The first aim is to test the null hypothesis of a BAR(1) process against the alternative that the data exhibit more ones than implied by a BAR(1) model. To achieve this objective, a binomial one-inflation index is constructed and further applied to develop a method to test whether one inflated with respect to a BAR(1) model.

  2. (ii)

    The second target is to generalize the interesting results in Kim et al. (2018) via putting forward two new joint tests for zero inflation/deflation, one inflation/deflation and overdispersion/underdispersion with respect to a BAR(1) process.

4.1 Binomial one-inflation index

As we discussed in Sect. 2, in our opinion, it is unreasonable to determine one inflation or one deflation entirely relying on the 1’s proportion since some other important statistical properties should be necessarily considered, in particular the mean of the data (similar discussion has been given in Sect. 2). Hence, we propose a (sample) binomial one-inflation index to measure the departure from binomial model in the following. Let X be an integer-valued random variable with a finite range \(\{0,1,\ldots ,n\}\), then the binomial one-inflation index \(z_1\) and sample binomial one-inflation index \({\hat{z}}_1\) can be defined as follows:

$$\begin{aligned} z_{1}=p_{1}\mu ^{-1}\bigg (1-\frac{\mu }{n}\bigg )^{1-n}\in (0,\infty ),~~~{\hat{z}}_{1}={\hat{p}}_{1}{\bar{X}}^{-1}\bigg (1-\frac{{\bar{X}}}{n}\bigg )^{1-n}, \end{aligned}$$

where \(p_{1}\) and \(\mu \) are the one probability and mean of the random variable X, \({\hat{p}}_{1}=\frac{1}{T}\sum _{t=1}^{T}{\mathbb {I}}_{\{X_t=1\}}\) and \({\bar{X}}=\frac{1}{T}\sum _{t=1}^{T}X_t\). If X is binomially distributed, then \(z_{1}=1\) and \({\hat{z}}_{1}\) should be close to 1. If \(z_{0}>1\), naturally, X is said to show one inflation with respect to the binomial distribution. Thus, we can compute the binomial one-inflation index for the ZOIBAR(1) process as follows:

$$\begin{aligned} z_{1}=\frac{\phi _{1}+\phi _{2}np(1-p)^{n-1}}{\phi _1+\phi _2np}\bigg (1-\frac{\phi _1+\phi _2np}{n}\bigg )^{1-n}. \end{aligned}$$

From the above equation, we find that \(z_1\) increases as the parameter \(\phi _1\) increases when the parameters \(\phi _2\), n and p are fixed.

4.2 Asymptotic distribution of binomial indices

Our objective is to investigate the limiting behavior of the trivariate statistic, which is composed of the sample binomial zero index, sample binomial index of dispersion and sample binomial one index when the data-generating process (DGP) is a BAR(1) process. For this purpose, we first consider the vector-valued process \(\{\varvec{Z}_t\}\) defined by

$$\begin{aligned} \varvec{Z}_t:=\begin{pmatrix} {\mathbb {I}}_{\{X_t=0\}}-(1-p)^{n} \\ X_{t}-np \\ X_{t}^{2}-np(np+1-p) \\ {\mathbb {I}}_{\{X_t=1\}}-np(1-p)^{n-1} \end{pmatrix}~\mathrm {with}~{E}(\varvec{Z}_t)={\mathbf {0}}, \end{aligned}$$
(4.1)

and we derive the asymptotic distribution of \(\dfrac{1}{\sqrt{T}}\sum _{t=1}^{T}\varvec{Z}_t\) in the following theorem.

Theorem 1

Let \(\{X_t\}\) be a BAR(1) process with \(n\ge 2\), with \(p_0=(1-p)^{n}\) and \(p_1=np(1-p)^{n-1}\), define \(\{\varvec{Z}_t\}\) as in Eq. (4.1). Then,

$$\begin{aligned} \frac{1}{\sqrt{T}}\sum _{t=1}^{T}\varvec{Z}_t \mathop {\longrightarrow }\limits ^{d}\mathrm {N}({\mathbf {0}},\varvec{\Sigma }), \end{aligned}$$

where the covariance matrix \(\varvec{\Sigma }\) has entries \(\sigma _{ij}(=\sigma _{ji})\) with \(i,j=1,2,3,4\), which will be given in the proof.

Next theorem establishes the joint asymptotic normality of the indices by using Theorem 1.

Theorem 2

Let \(\{X_t\}\) be a BAR(1) process with \(n\ge 2\). Then, we have

$$\begin{aligned} \begin{pmatrix} {\hat{z}}_{0}-1 \\ {\hat{I}}_{d}-1 \\ {\hat{z}}_{1}-1 \end{pmatrix} \mathop {\longrightarrow }\limits ^{d}\mathrm {N}({\mathbf {0}},\varvec{\Sigma }^{'}), \end{aligned}$$

where the covariance matrix \(\varvec{\Sigma }^{'}\) has entries \(\sigma _{ij}^{'}(=\sigma _{ji}^{'})\) with \(i,j=1,2,3\), which will be given in the proof.

The limiting behavior of \({\hat{z}}_{0},{\hat{I}}_{d},{\hat{z}}_{1}\) shows that they are asymptotically unbiased estimators of \(z_{0},I_{d},z_{1}\), respectively. However, if we compute \({\hat{z}}_{0},{\hat{I}}_{d},{\hat{z}}_{1}\) from a time series of finite length T, these estimators would be obviously biased. Therefore, it is necessary and meaningful to propose a bias-corrected approximation to the mean of \(z_{0},I_{d},z_{1}\) in the following theorem.

Theorem 3

Let \(\{X_t\}\) be a BAR(1) process with \(n\ge 2\). Then, the means of \({\hat{z}}_{0},{\hat{I}}_{d},{\hat{z}}_{1}\) are asymptotically given by

$$\begin{aligned}&{E}({\hat{z}}_{0})\approx 1-\frac{(n-1)p(1+\rho )}{2T(1-p)(1-\rho )}, {E}({\hat{I}}_{d})\approx 1-\frac{(n-1)(1+\rho )}{Tn(1-\rho )},\\&\quad {E}({\hat{z}}_{1})\approx 1-\frac{(n-1)(np-2)(1+\rho )}{2Tn(1-p)(1-\rho )}. \end{aligned}$$

Theorem 3 can be applied to define (approximately) bias-corrected indices \({\hat{z}}_{0;\mathrm {corr}}\), \({\hat{I}}_{d;\mathrm {corr}}\) and \({\hat{z}}_{1;\mathrm {corr}}\) as

$$\begin{aligned}&{\hat{z}}_{0;\mathrm {corr}}={\hat{z}}_{0}+\frac{(n-1){\hat{p}}(1+{\hat{\rho }})}{2T(1-{\hat{p}})(1-{\hat{\rho }})},&{\hat{I}}_{d;\mathrm {corr}}={\hat{I}}_{d}+\frac{(n-1)(1+{\hat{\rho }})}{Tn(1-{\hat{\rho }})},\\&{\hat{z}}_{1;\mathrm {corr}}={\hat{z}}_{1}+\frac{(n-1)(n{\hat{p}}-2)(1+{\hat{\rho }})}{2Tn(1-{\hat{p}})(1-{\hat{\rho }})}, \end{aligned}$$

where the estimates for p and \(\rho \) are recommended to use a plug-in approach, i.e.,

$$\begin{aligned} {\hat{p}}=\frac{{\bar{X}}}{n},~{\hat{\rho }}=\frac{\sum _{t=1}^{T-1}(X_{t}-{\bar{X}})(X_{t+1}-{\bar{X}})}{\sum _{t=1}^{T}(X_{t}-{\bar{X}})^{2}},~\mathrm {with}~{\bar{X}}=\frac{\sum _{t=1}^{T}X_{t}}{T}. \end{aligned}$$

4.3 Tests based on binomial indices

As we discussed before, our first task is to test the null hypothesis of a BAR(1) process against the alternative that the data exhibit more ones than implied by a BAR(1) model. In other words, we are interested in testing \(H_0: x_1,\ldots ,x_T\) stem from BAR(1) with \(p_1=\mu (1-\mu /n)^{n-1}\) against the alternative hypothesis \(H_1: x_1,\ldots ,x_T\) with \(p_1\ne np(1-p)^{n-1}\). We can take advantage of the sample one-inflation index and its corresponding bias-corrected version to propose the following two test statistics:

$$\begin{aligned} \mathrm {U}_1=\frac{{\hat{z}}_{1}-1}{\sqrt{\sigma _{33}^{'}/T}},~\mathrm {U}_{1;\mathrm {corr}}=\frac{{\hat{z}}_{1;\mathrm {corr}}-1}{\sqrt{\sigma _{33}^{'}/T}}. \end{aligned}$$

Let \(q_{\gamma }\) be the \(\gamma \)-quantile of the standard normal distribution, i.e., \(\Phi (q_{\gamma })=\gamma \), where \(\gamma \in (0,1)\) and \(\Phi (\cdot )\) is the distribution function of the standard normal distribution. We reject \(H_0\) at a significance level \(\gamma \) if the test statistics \(\mathrm {U}_{1}\) and \(\mathrm {U}_{1;\mathrm {corr}}\) violate the two-sided critical values, i.e., \( \mathrm {U}_1,\mathrm {U}_{1;\mathrm {corr}} \notin (q_{\frac{\gamma }{2}},q_{1-\frac{\gamma }{2}}). \) Alternatively, we can check if the p value \( 2[1-\Phi (|\mathrm {U}_1|)]~\mathrm {and}~2[1-\Phi (|\mathrm {U}_{1;\mathrm {corr}}|)] \) fall below the significance level \(\gamma \). If a hypothetical value for \({\tilde{\sigma }}_{33}\) is not available, we recommend to use a plug-in approach, i.e., to replace p and \(\rho \) by their moment estimates in the formula for \({\tilde{\sigma }}_{33}\) given by Theorem 2.

Now, we pay attention to accomplish the second target, which is testing \(H_0: x_1,\ldots ,x_T\) stem from BAR(1) with \(n\sigma ^2=\mu (n-\mu )\) and/or \(p_0=(1-\mu /n)^{n}\) and/or \(p_1=\mu (1-\mu /n)^{n-1}\) against the alternative hypothesis \(H_1: x_1,\ldots ,x_T\) with \(n\sigma ^2\ne \mu (n-\mu )\) and/or \(p_0\ne (1-p)^{n}\) and/or \(p_1\ne np(1-p)^{n-1}\). Under the null hypothesis, \( \mathrm {U}_{\mathrm {joint}}:=T (\hat{\varvec{I}}_{\mathrm {joint}}-\varvec{I}_{0})^{\top } (\hat{\varvec{\Sigma }}^{'})^{-1} (\hat{\varvec{I}}_{\mathrm {joint}}-\varvec{I}_{0})\) and \( \mathrm {U}_{\mathrm {joint;corr}}:=T (\hat{\varvec{I}}_{\mathrm {joint;corr}}-\varvec{I}_{0})^{\top } (\hat{\varvec{\Sigma }}^{'})^{-1} (\hat{\varvec{I}}_{\mathrm {joint;corr}}-\varvec{I}_{0}) \) follow an asymptotic \(\chi ^{2}\)-distribution with 3 degrees of freedom, with

$$\begin{aligned} \hat{\varvec{I}}_{\mathrm {joint}}=({\hat{z}}_{0},{\hat{I}}_{d},{\hat{z}}_{1})^{\top },~ \hat{\varvec{I}}_{\mathrm {joint;corr}}=({\hat{z}}_{\mathrm {0;corr}},{\hat{I}}_{d;\mathrm {corr}},{\hat{z}}_{\mathrm {1;corr}})^{\top },~ \varvec{I}_{0}=(1,1,1)^{\top }, \end{aligned}$$

and \(\hat{\varvec{\Sigma }}^{'}\) being the appropriate (plug-in) covariance matrix from Theorem 2. Based on the two test statistics, we reject \(H_0\) at significance level \(\gamma \) if \(\mathrm {U}_{\mathrm {joint}}\ge \chi _{3,1-\gamma }^{2}\) and \(\mathrm {U}_{\mathrm {joint;corr}}\ge \chi _{3,1-\gamma }^{2}\), respectively, where \(\chi _{3,1-\gamma }^2\) denotes the \((1-\gamma )\)-quantile of the \(\chi _{3}^2\)-distribution.

4.4 Simulation study

In this section, some simulations are conducted to investigate the performances of the four test methods. We select the significance level \(\gamma =0.05,0.1\), sample size \(T=50,100,300,1000\) and each experiment is based on 10,000 replications.

4.4.1 Size study

For analyzing the empirical size, the following parameter combinations are considered: (E1) \((n,p,\rho )=(10,0.15,0.25)\), (E2) \((n,p,\rho )=(10,0.15,0.5)\), (E3) \((n,p,\rho )=(10,0.15,0.75)\). As can be seen from Table 3, for the empirical sizes, all the test statistics give the satisfactory performances and the empirical sizes for Models (E1)–(E3) become more closer to the significant levels \(\gamma =0.05,0.1\) as the sample size increases. Comparing with the test statistics \(\mathrm {U}_{1}\) and \(\mathrm {U}_{\mathrm {joint}}\), the bias-corrected test statistics \(\mathrm {U}_{1;\mathrm {corr}}\) and \(\mathrm {U}_{\mathrm {joint;corr}}\) give better performances, especially for the small sample size \(T=50\).

Table 3 Empirical sizes for test statistics
Table 4 Empirical powers for test statistics

4.4.2 Power study

We consider the ZOIBAR(1) model with following parameter combinations to further investigate the power of four test statistics under the alternative hypothesis:

  • (F1) \((n,p,\rho ,\phi _0,\phi _1)=(10,0.15,0.25,0,0.1)\),

  • (F2) \((n,p,\rho ,\phi _0,\phi _1)=(10,0.15,0.25,0,0.2)\),

  • (F3) \((n,p,\rho ,\phi _0,\phi _1)=(10,0.15,0.25,0,0.3)\);

  • (G1) \((n,p,\rho ,\phi _0,\phi _1)=(10,0.15,0.5,0,0.1)\),

  • (G2) \((n,p,\rho ,\phi _0,\phi _1)=(10,0.15,0.5,0,0.2)\),

  • (G3) \((n,p,\rho ,\phi _0,\phi _1)=(10,0.15,0.5,0,0.3)\);

  • (H1) \((n,p,\rho ,\phi _0,\phi _1)=(10,0.15,0.75,0,0.1)\),

  • (H2) \((n,p,\rho ,\phi _0,\phi _1)=(10,0.15,0.75,0,0.2)\),

  • (H3) \((n,p,\rho ,\phi _0,\phi _1)=(10,0.15,0.75,0,0.3)\).

The rejection rates of the tests are reported in Table 4. It can be observed that the rejection rates are increasing with the sample size and \(\phi _1\). In each case, the rejection rates of \(\mathrm {U}_{1}\) and \(\mathrm {U}_{\mathrm {1;corr}}\) are larger than \(\mathrm {U}_{\mathrm {joint}}\) and \(\mathrm {U}_{\mathrm {joint;corr}}\). Reason for this phenomenon is that the DGP is ZOIBAR(1) model with \(\phi _0=0\) and different parameter values of \(\rho \) and \(\phi _1\) and the distinction between DGP and the corresponding BAR(1) model is the 1’s proportion. \(\mathrm {U}_{1}\) and \(\mathrm {U}_{\mathrm {1;corr}}\) are constructed based on the binomial one-inflation index and thus can give higher rejection rates. \(\mathrm {U}_{\mathrm {joint}}\) and \(\mathrm {U}_{\mathrm {joint;corr}}\) take the information of \(I_d\) and \(z_0\) into account but these two indices of DGP and the corresponding BAR(1) model are close. Hence, compared with \(\mathrm {U}_{1}\) and \(\mathrm {U}_{\mathrm {1;corr}}\), the powers of \(\mathrm {U}_{\mathrm {joint}}\) and \(\mathrm {U}_{\mathrm {joint;corr}}\) are reduced. These findings can also be corroborated by another power analysis, which is given by power graphs in Figs. 1 and 2. There is an abnormal occurrence in Figs. 1 and 2. To be specific, when \(\phi _0=0.2,0.3\), the powers of \(\mathrm {U}_{\mathrm {1;corr}}\) and \(\mathrm {U}_{\mathrm {joint;corr}}\) decrease as \(\phi _1\) increases under the condition \(0<\phi _1\le 0.2\). This circumstance may be explained by that \(\phi _0=0.2,0.3\) and \(0<\phi _1\le 0.2\) may effect the mean of the model and thus lead to the result that the \({\hat{z}}_0\), \({\hat{I}}_d\) and \({\hat{z}}_1\) cannot well reflect the deviation between DGP and the corresponding BAR(1) model.

Fig. 1
figure 1

Power analysis for test statistics with significance level \(\gamma =0.1\)

Fig. 2
figure 2

Power analysis for test statistics with significance level \(\gamma =0.1\)

5 Real data examples

In this section, three applications are conducted to illustrate the performance of the ZOIBAR(1) process in explaining zero-inflation, one-inflation and overdispersion phenomena. We compare our process with four extended BAR(1) models, which were proposed by Möller et al. (2018) to handle zero inflation:

  • BAR(1) model with zeros at random (RZ-BAR(1) model);

  • BAR(1) model with innovational zeros (IZ-BAR(1) model);

  • zero-inflated binomial thinning AR(1) model (ZIB-AR(1) model);

  • BAR(1) model with zero threshold (ZT-BAR(1) model).

Moreover, we take the classical BAR(1) process as a benchmark due to its usefulness and popularity in real-world applications. Although we pointed out the BE is the most appropriate method for our model, for the purpose of fairness, the (quasi) conditional maximum likelihood approach is employed for all the alternative models. We also compute the following statistics of the fitted models: Akaike information criterion (AIC), Bayesian information criterion (BIC), binomial dispersion index \(I_d\), binomial zero-inflation index \(z_0\) and binomial one-inflation index \(z_1\).

5.1 Rainy-days counts in Germany

In this section, we consider the number of rainy-days per week at Hamburg-Neuwiedenthal in Germany. The data were collected from January 1st, 2005 till December 31th, 2010 by the German Weather Service (DWD =“Deutscher WetterDienst,” http://www.dwd.de/), where weeks are defined from Saturday to Friday. The length of the data is 313 and the fixed upper limit is \(n=7\). This data set was also investigated by Chen et al. (2020).

The sample path, histogram, ACF, PACF and summary statistics of the observations are given in Figure S3 and Table 5. The binomial dispersion index \(I_d\) of the data is 1.7084, which indicates that the data set is overdispersed. The zero frequency and one frequency are, respectively, 0.0479 and 0.0831, which are close to zero. However, as we discussed before, relying entirely on 0’s or 1’s proportion may mislead one to determine zero inflation or one inflation and the corresponding indices are more reasonable assessment criteria rather than 0’s or 1’s proportion. The sample binomial zero-inflation and one-inflation indices are, respectively, 12.4623 and 2.5437, which make us confirm that this data set is zero-inflated and one-inflated.

Table 5 Descriptive statistics for rainy-days counts in Germany

As can be seen from Table 6, the ZT-BAR(1) and BAR(1) models are not suitable for this data set since they give the biggest values of AIC and BIC among alternative models. Furthermore, the ZT-BAR(1) and BAR(1) processes are the only two models, which fail to reflect the overdispersion feature of the counts. Although the IZ-BAR(1) model yields the smaller values of AIC and BIC and captures the overdispersion characteristic of the data, it cannot give the information about the zero and one patterns of the data since it is complicated to derive a closed-form expression for the stationary marginal distribution of the IZ-BAR(1) process. As a special case of our model, the ZR-BAR(1) model can accurately explain zero inflation, but it gives the wrong information that the data set is zero-deflated. Based on AIC and BIC, the ZIB-AR(1) and ZOIBAR(1) processes are the most appropriate model to fit the data. However, the ZIB-AR(1) model also encounters the difficulty that the information about zero and one patterns are not easily obtained. Differently, the ZOIBAR(1) process can accurately capture zero-inflation, one-inflation and overdispersion characteristics and gives the smallest values of AIC and BIC. Hence, we recommend the use of the ZOIBAR(1) process to fit the data set.

Table 6 Estimates of the parameters and statistics for rainy-days counts in Germany
Table 7 Properties of the standardized Pearson residuals for rainy-days counts in Germany

To further compare the above models, we consider their corresponding Pearson residual analysis. The standardized Pearson residual is defined as

$$\begin{aligned} e_t=\frac{X_t-{E}(Y_{t}|Y_{t-1})}{\sqrt{\mathrm {Var}(Y_t|Y_{t-1})}},~t=2,\ldots ,T. \end{aligned}$$

If the model is correctly specified, then the residuals should have zero mean, unit variance, and no significant serial correlation in \(e_t\) and \(e_t^2\). Table 7 shows the mean, variance, ACF(1) of \(e_t\) (\({\hat{\rho }}_{e_{t}}\)) and ACF(1) of \(e_t^2\) (\({\hat{\rho }}_{e_{t}^{2}}\)). The above models all have satisfactory performances when we consider the mean of the residuals. Based on the variance of the residuals, although all alternative models give bigger variance values than 1, the ZOIBAR(1) process gives the significantly better result than the other models. So we conclude that the standardized Pearson residual supports the above conclusion that the ZOIBAR(1) process is the most suitable one to fit this data set. One may ask if there is correlation within the residuals since \({\hat{\rho }}_{e_t}=0.1294\), which is not close to 0. For this, Ljung–Box Q-test for the residuals and square of residuals shows that p values are equal to 0.3993 and 0.4077 based on 15 lags at the 5% level. This situation suggests that there is no significant autocorrelation in the residuals and square of residuals.

As pointed out by a referee, it is an important topic to discuss the actual meaning of the parameter estimates for the ZOIBAR(1) model. In this case, on the one hand, the hidden process \(\{X_{t}\}\) in Eq. (2.1) can be viewed as the number of rainy-days per week without special circumstances. \({\hat{p}}=0.6054\) shows that the probability of rain for each day is 0.6054, which means that this area is rainy. \({\hat{\rho }}=0.0977\) indicates that the correlations of the number of rainy-days per week between two consecutive weeks are weak, which reveals that the local weather is changeable and difficult to predict. On the other hand, the observed process \(\{Y_{t}\}\) in Eq. (2.1) is the real data of the number of rainy-days per week. The difference between \(X_{t}\) and \(Y_{t}\) is that \(X_{t}\) is masked by the zero or one with probability \({\hat{\phi }}_{0}+{\hat{\phi }}_{1}=0.1139\) due to some unusual changes in the climate and these unusual changes can lead to the less rainfall. \({\hat{\phi }}_{0}=0.0463\) and \({\hat{\phi }}_{1}=0.0676\) represent the probabilities of the zero and one times rainy-days per week caused by unusual changes in the climate.

5.2 Rainy-days counts in Sweden

The second data set is the number of rainy-days per week at Stockholm in Sweden. The data were obtained from the ECA &D website, http://www.ecad.eu and were collected from January 1st, 2000 till March 21th, 2008 by the German Weather Service (DWD =“Deutscher WetterDienst”, http://www.dwd.de/), where weeks are defined from Saturday to Friday. The length of the data is 480 and the fixed upper limit is \(n=7\). The sample path, histogram, ACF, PACF and summary statistics of the observations are given in Figure S4 and Table S9. The binomial dispersion index \(I_d\) of the data is 1.8746, which indicates that the data set is overdispersed. The sample binomial zero index and one index are, respectively, 3.3525 and 1.7472, which seem that this data set is zero-and-one inflated.

As can be seen from Table S10, the BAR(1) and ZR-BAR(1) processes are not suitable for the data since they give the bigger AIC and BIC values than other models. Moreover, the ZR-BAR(1) process overestimates the degree of zero inflation and gives the wrong information that the data set is one-deflated. The IZ-BAR(1), ZIB-AR(1) and ZT-BAR(1) models have the similar performances when we consider AIC and BIC. The IZ-BAR(1) and ZIB-AR(1) models can also capture the overdispersion, but the unobtained marginal distributions constrain their applications in explaining zero inflation and one inflation. Similar to the first data set, the ZOIBAR(1) process still can accurately capture zero-inflation, one-inflation and overdispersion characteristics and gives the smallest values of AIC and BIC. The Pearson residual analysis also gives the information that the ZOIBAR(1) process is the most appropriate model to fit this data set (see Table S11).

The discussion about the actual meaning of the parameter estimates for the ZOIBAR(1) model can be proceeded in the similar way as we did in Sect. 5.1. Again, the hidden process \(\{X_{t}\}\) is the number of rainy-days per week without special circumstances. \({\hat{p}}=0.5551\) shows that the probability of rain for each day is 0.5551, which means that this area is also rainy. \({\hat{\rho }}=0.2493\) indicates that the correlations of the number of rainy-days per week between two consecutive weeks are significant, which implies that the local weather is easier to predict. We observe the real data of the number of rainy-days per week \(\{Y_{t}\}\) by masking \(\{X_{t}\}\) with zero or one and the masking probability is \({\hat{\phi }}_{0}+{\hat{\phi }}_{1}=0.1873\), which reveals that the unusual climate changes in Sweden are more commonly encountered than in Germany.

5.3 Assaults-on-officers counts

The third data set is collected from the file PghCarBeat.csv, which was downloaded from The Forecasting Principles site (http://www.forecastingprinciples.com). The data set is collected for 42 different car beats and from January 1990 to June 2001. For each month t, the value \(y_t\) counts the number of car beats reported at least one case of assaults-on-officers. Hence, the considered data set has finite range with fixed upper limit \(n=42\) and the series contains 138 observations. The sample path, histogram, ACF, PACF and summary statistics of the observations are given in Figure S5 and Table S12. The binomial dispersion index \(I_d\) of the data is 1.1490, which indicates that the data set is overdispersed. The sample binomial zero index and one index are, respectively, 0.9507 and 1.2008, which indicate that the considered data set is zero-deflated but one-inflated.

The performances of the alternative models are reported in Table S13. Based on AIC and BIC, the models are competitive and the BAR(1) process yields the smallest BIC value. The reason is that BIC gives more severe penalty for the number of model parameters. It can be observed that the parameter \(\omega \) for the ZR-BAR(1) model equals to 0, which indicates that the ZR-BAR(1) model reduces to the BAR(1) model in this case. This is also well understood since this data set is zero-deflated, and thus the zero-inflated binomial marginal distribution is just the opposite to what one wishes. Except for the ZOIBAR(1) model, all the alternatives fail to capture zero deflation and one inflation. Again, the ZOIBAR(1) model successfully capture zero deflation, one inflation and overdispersion and the corresponding indices are prettily close to the empirical values. During our study, we also tried to apply the ZIB-AR(1) model to fit this data set but the workable results are unavailable. Thus, the relevant information is not reported here. The Pearson residual analysis also supports the ZOIBAR(1) model since its variance value of residuals is the closest to 1 among the alternatives. One may ask if there is correlation within the square of residuals since \({\hat{\rho }}_{e_t^2}=-0.1218\), which is not close to 0. For this, Ljung–Box Q-test for the residuals and square of residuals shows that p values are equal to 0.2097 and 0.9796 based on 15 lags at the 5% level. This situation suggests that there is no significant autocorrelation in the residuals and square of residuals.

Now we turn to discuss the actual meaning of the parameter estimates for the ZOIBAR(1) model in the assaults-on-officers case. On the one hand, the hidden process \(\{X_{t}\}\) is the number of car beats reported at least one case of assaults-on-officers per month during the period of normal state. \({\hat{p}}=0.0484\) shows that the probability of the appearance of assaults-on-officers for each car beat in a month is 0.0484, which is close to zero and reflects that this area is in a state of harmony and stability. \({\hat{\rho }}=0.3013\) indicates that the correlations of the number of car beats reported at least one case of assaults-on-officers per month between two consecutive months are visibly significant. This phenomenon can be interpreted by the fact that the improvement or deterioration of social order changes gradually over time. Hence, the assaults-on-officers cases in a month will certainly affect the assaults-on-officers cases in next month. On the other hand, the observed process \(\{Y_{t}\}\) is the real data of the number of car beats reported at least one case of assaults-on-officers per month. The difference between \(X_{t}\) and \(Y_{t}\) is that \(X_{t}\) is masked by the zero or one with probability \({\hat{\phi }}_{0}+{\hat{\phi }}_{1}=0.4116\) due to the government management measures and these measures can lead to the social stability. Based on \({\hat{\phi }}_{0}+{\hat{\phi }}_{1}=0.4116\), we can guess that the local government attached great importance to the security situation and the corresponding measures were frequently adopted. \({\hat{\phi }}_{0}=0.1606\) and \({\hat{\phi }}_{1}=0.2510\) represent the probabilities of the zero and one car beat reported at least one case of assaults-on-officers under the circumstances of government management.

5.4 Prediction

Prediction problem is always a popular issue in time series analysis to check the adequacy and predictability of the selected model. Three different prediction methods will be discussed in the following:

First, we discuss the classical method for predicting the model. Conditional expectation method is the most common approach for constructing classical prediction in time series models due to its optimal property of the mean squared error. The h-step ahead predictor for \(Y_{t+h}\) is \({\hat{Y}}_{t+h}|Y_{t}={E}(Y_{t+h}|Y_{t}),\) where \({E}(Y_{t+h}|Y_{t})\) is given by Eq. (2.2). In practice, the parameters \(p,\rho ,\phi _{0}\) and \(\phi _{1}\) are replaced by their corresponding QMLE.

However, the conditional expectation method goes against the data coherence and prediction coherence since the integer-valued predictors can hardly be produced. Therefore, we call for an adaptive procedure which has the ability to produce integer-valued predictors. Freeland and McCabe (2004) proposed a feasible approach which uses the h-step-ahead predictive conditional distributions to predict the future value. One can obtain the point prediction from the median or the mode of the predictive distribution. This method has been employed by some researchers, see Möller et al. (2016), Maiti and Biswas (2017) and Kang et al. (2021), among others. Following this path, we compute the h-step-ahead conditional distribution of \(Y_{t+h}\) given \(Y_1=y_1,\ldots ,Y_t=y_t\), i.e., \({P}(Y_{t+h}=y_{t+h}|Y_1=y_1,\ldots ,Y_{t}=y_{t})\) of the ZOIBAR(1) process based on the HMM theory (see Zucchini et al. 2009, Section 5.3). Motivated by the idea of the quasi-likelihood estimation, another alternative way is to calculate the h-step-ahead quasi-conditional distribution, i.e., \({P}(Y_{t+h}=y_{t+h}|Y_{t}=y_{t})\) based on Proposition 2.

The Bayesian prediction method is another approach to produce coherent predictors. The future value is derived via using the h-step-ahead Bayesian predictive probability function, which is based on the assumption that both \(Y_{t+h}\) and unknown parameters are random and given by

$$\begin{aligned}&{P}(Y_{t+h}=y_{t+h}|\varvec{Y})\\&\quad =\int _{0}^{1}\int _{c}^{1}\int _{0}^{1}\int _{0}^{1-\phi _1}{P}(Y_{t+h}=y_{t+h}|\varvec{Y},\varvec{\lambda })\varvec{\pi }(\varvec{\lambda }|\varvec{Y})\mathrm{d}\phi _0 \mathrm{d}\phi _1\mathrm{d}\rho \mathrm{d}p, \end{aligned}$$

where \(y_{t+h}\in \{0,1,\ldots ,n\}\), \(\varvec{Y}=(Y_{1},\ldots ,Y_{T})^{\top }\) and the posterior distribution \(\varvec{\pi }(\varvec{\lambda }|\varvec{Y})\) is given by Eq. (3.2). Also, we can obtain the point prediction from the median or the mode of the h-step-ahead Bayesian predictive probability function.

Table 8 The MADEs of the h-step-ahead predictor for rainy-days counts in Germany

To compare the above four prediction approaches, we evaluate the performances in the above three data sets via MADE of out-of-sample prediction, where the last h observations are excluded when estimating parameters for out-of-sample prediction. The corresponding results are shown in Tables 8 and S15–S16 and h-step-ahead predictive quasi-conditional distribution, h-step-ahead predictive conditional distribution and h-step-ahead Bayesian predictive conditional distribution of the considered data sets are given in Figs. 3, 4, 5 and S6–S11. We can see that the h-step-ahead predictive conditional distribution is similar to the h-step-ahead Bayesian predictive conditional distribution, so the predictors produced by the two methods are close. This phenomenon is well understood since the only difference between predictive conditional distribution and Bayesian predictive conditional distribution is the priors. It is also easily observed that the conditional expectation prediction can produce more stable results and conditional distribution prediction and Bayesian prediction succeed in giving better predictors than the conditional expectation prediction in some situations. In general, we are more inclined to conditional distribution prediction and Bayesian prediction. The reason is that these two methods not only can give us reliable predictors, but also have the ability to provide more information such as the predictive distribution. The predictive distribution can help practitioners establish early warning system of risk via utilizing the idea of Value-at-Risk (VaR, see Chen and Watanabe 2019). Take assaults-on-officers counts as an example, from Figures S9–S11, it is a small probability event that the observation is larger than 2. Hence, if we once observe the number of car beats reported at least one case of assaults-on-officers is larger than 2, it can be deduced that some events have caused unrest in social security and the corresponding treatment measure should be adopted. The similar discussion can be proceeded in the rainy-days counts case.

Fig. 3
figure 3

h-step-ahead predictive quasi-conditional distribution of rainy-days counts in Germany

Fig. 4
figure 4

h-step-ahead predictive conditional distribution of rainy-days counts in Germany

Fig. 5
figure 5

h-step-ahead Bayesian predictive conditional distribution of rainy-days counts in Germany

6 Conclusion

This article proposes a hidden Markov model with zero-and-one inflated binomial marginals to better analyze bounded counts with excess zeros and ones. The stochastic properties of the new model are investigated and estimators of the model parameters are derived by the probability-based, quasi-maximum likelihood, maximum likelihood and Bayesian approaches. A binomial one-inflation index is constructed and further utilized to develop a method to test whether zero-and-one inflated with respect to a BAR(1) model. Three applications to the real-data examples are given to assess the performance of our model.

However, more research is still needed for some aspects of the new model. The first issue is that the estimation problem for the ZOIBAR(1) model should be treated in more detail. For example, it would be interesting in applying the empirical likelihood approach to the ZOIBAR(1) model and investigating the asymptotic behavior of the estimators. Extensions to the ZOIBAR(1) model are the second issue. The construction of high-order and multivariate ZOIBAR processes may be an interesting topic. The third issue is that modeling for bounded counts with excess zeros and ones is still rare and the corresponding Markov models are urgently needed. Relative research will be devoted as future work.