1 Introduction

Given a random sample of observations \(X_{1},\ldots ,X_{n}\) with unknown probability mass function (pmf) f, which is supported on the discrete set \(\mathbb {T}\) (\(\mathbb {N}\), \(\mathbb {Z}\) or \(\mathbb {T}=\{0,1,\ldots ,N\}\)), the discrete kernel estimator \(\hat{f}_{h}(x)\) of \(f(x)=\Pr (X_{i}=x)\) has been defined as follows [see for example Kokonendji and Senga Kiessé (2011)]:

$$\begin{aligned} \hat{f}_{h}(x)=\frac{1}{n}\sum \limits _{i=1}^{n}K_{x,h}(X_{i}), \end{aligned}$$

where \(h=h(n)>0\) is a bandwidth (or smoothing parameter) and \(K_{x,h}\) is a discrete kernel assumed to be a suitable pmf with support \(\mathbb {S}_{x}\) not depending on h; see, e.g., Kokonendji et al. (2007) and Kokonendji and Senga Kiessé (2011). Naturally, the use of the discrete kernels is more appropriate than continuous kernels for estimating any discrete function; see again Kokonendji et al. (2007) and Kokonendji and Senga Kiessé (2011). See also Aitchison and Aitken (1976) for categorical data and finite discrete distributions and Wang and Ryzin (1980) for ordered discrete variables.

In view of the fact that the bias of \(\hat{f}_{h}\) is O(h) as \(h\rightarrow 0\), this paper considers improvements in discrete kernel estimation that reduce the order of magnitude in bias to \(O(h^{2})\), while the order of magnitude in variance is maintained. In the case of symmetric kernels, this kind of rate improvements can be typically achieved by employing higher-order kernels; see Jones and Foster (1993) for methods of generating higher-order kernels from a given second-order kernel. To our best knowledge, equivalent techniques are yet to be proposed for discrete kernels. Instead, this paper applies two classes of multiplicative bias correction (MBC) techniques in order to attain the rate improvements. The MBC approaches were proposed and largely studied by several authors in symmetric and asymmetric kernel density estimation (continuous situation), see, e.g., Terrell and Scott (1980), Jones et al. (1995), Hirukawa (2010), Hirukawa and Sakudo (2014), Hirukawa and Sakudo (2015), Zougab and Adjabi (2015) and Funke and Kawka (2015). The first class of our considered MBC method is concerned with the construction of a multiplicative combination of two density estimators using different smoothing parameters. This idea is originally proposed by Terrell and Scott (1980) as an additive bias correction to the logarithm of densities. The second class of MBC in the spirit of Jones et al. (1995) is based on the idea of expressing \(f(x)=\hat{f}(x)\left\{ f(x)/\hat{f}(x)\right\} \) and estimating the bias-correction term \(f(x)/\hat{f}(x)\) nonparametrically. When applied to discrete kernel estimation, both MBC techniques still yield estimators that are free of boundary bias. In addition, these estimators have a practically appealing property. They always generate nonnegative density estimates everywhere by construction, as \(\hat{f}_{h}\) does.

This paper is organized as follows. Section 2 briefly recalls discrete kernels for pmf estimation. In Sect. 3 we first introduce the MBC discrete kernel estimators. Second, we develop asymptotic properties like bias and variance of the newly proposed estimators. Third, we adapt the unbiased-cross validation (UCV) procedure for choosing the bandwidth. Section 4 conducts Monte Carlo simulations to compare the finite sample performance of standard discrete kernel estimators and the proposed MBC discrete kernel estimators. Section 5 provides an application on real data. Section 6 concludes the paper. All proofs are given in the “Appendix”.

2 Discrete kernel estimator

Given a random sample \(X_{1},\ldots ,X_{n}\) with unknown probability mass function (pmf) f, which is supported on the discrete set \(\mathbb {T}\) (\(\mathbb {N}\), \(\mathbb {Z}\) or \(\mathbb {T}=\{0,1,\ldots ,N\}\)), the discrete kernel estimator of \(f(x)=\Pr (X_{i}=x)\) using kernel \(\mathrm{L=DT,DDU},\) \(\mathrm{WVR,LR}\) can be expressed as

$$\begin{aligned} \widehat{f}_{L}(x)=\frac{1}{n}\sum _{i=1}^{n}K_{L(x,h)}(X_{i}), \end{aligned}$$
(1)

where \(x\in \mathbb {T}\) is the target (point where the pmf is estimated) and \(h>0\) is a bandwidth (or smoothing parameter) and the explicit forms of the kernels are listed in Table 1. The asymptotic properties of the estimator (1) are studied in detail in Kokonendji and Senga Kiessé (2011). The asymptotic bias when \(h\rightarrow 0\) is given by

$$\begin{aligned} \mathrm{bias}(\widehat{f}_{L}(x))=q(x,f)h+o(h), \end{aligned}$$

where the explicit forms of q(xf) for a specific-kernel L are given in Table 2.

Table 1 Univariate discrete kernels
Table 2 Explicit forms of q(xf)

Similarly, when \(n\rightarrow \infty \) and \(h\rightarrow 0\) the asymptotic variance is

$$\begin{aligned} \mathrm{Var}(\widehat{f}_{L}(x))=\frac{1}{n}f(x)\{1-f(x)\}K^{2}_{L(x,h)} (x)+o\left( \frac{1}{n}\right) . \end{aligned}$$

The mean integrated squared error (\(\mathrm{MISE}\)) is given in Kokonendji and Senga Kiessé (2011) and Kokonendji et al. (2007) and is expressed as

$$\begin{aligned} \mathrm{MISE}(\widehat{f}_{L})= & {} \sum \limits _{x\in \mathbb {T}}\mathrm{bias}^{2}(\widehat{f}_{L}(x))+ \sum \limits _{x\in \mathbb {T}}\mathrm{Var}(\widehat{f}_{L}(x)) \nonumber \\= & {} h^{2}\sum \limits _{x\in \mathbb {T}}q^{2}(x,f)+\frac{1}{n}\sum \limits _{x\in \mathbb {T}}f(x)\{1-f(x)\}K^{2}_{L(x,h)}(x)+ o\left( h^{2}+\frac{1}{n}\right) . \end{aligned}$$

In analogy to kernel density estimation, the choice of a suitable bandwidth is also a crucial issue in the discrete kernel method. For this reason, several approaches have been proposed in the literature. The common methods of continuous kernel estimators that adopt the mean integrated squared error (MISE) as a criterion and the techniques of cross-validation (CV) are also developed for discrete kernel estimation techniques; see, e.g, Kokonendji et al. (2007), Kokonendji and Senga Kiessé (2011) and Chu et al. (2015).

3 MBC for discrete kernel estimators

In this section, we adapt the mentioned MBC methods to the estimation of probability mass functions where the special kernels used for pmf estimation are called discrete kernels, which have support on some discrete set including \(\mathbb {N}\), \(\mathbb {Z}\) or a finite number of integers. Our proposed approaches have the same intuition as in the continuous case, each of the MBC methods is shown to improve the bias convergence of univariate pmf estimators from O(h) to \(O(h^{2})\) while their variance convergence remains unchanged at \(o(n^{-1})\). Globally the proof strategies of each MBC method in pmf estimation largely follow those of the corresponding method that is originally developed for kernel density estimation of scalar continuous random variables see, e.g., Terrell and Scott (1980), Jones et al. (1995), Hirukawa (2010), Hirukawa and Sakudo (2014), Zougab and Adjabi (2015) and Funke and Kawka (2015).

3.1 Estimators

Following the idea of the geometric estimator of Terrell and Scott (1980) and Hirukawa (2010) abbreviated as ”TS”. This can be readily extended to the discrete kernel L in the context of probability mass function estimation. For a given kernel L, let \(\hat{f}_{L,h}(x)\) and \(\hat{f}_{L,h/c}(x)\) are the pmf estimators using smoothing parameters h and h / c, respectively, where \(c\in (0,1)\) is some predetermined constant that does not depend on the design point x. Then, the TS-MBC kernel pmf estimator can be adapted as follows:

$$\begin{aligned} \tilde{f}_{TS,L}(x)=\left\{ \hat{f}_{L,h}(x)\right\} ^{\frac{1}{1-c}} \left\{ \hat{f}_{L,h/c}(x)\right\} ^{-\frac{c}{1-c}}. \end{aligned}$$
(2)

The second approach of MBC techniques for symmetric kernel density estimators is attributed to Jones et al. (1995) [see also Hirukawa (2010), Hirukawa and Sakudo (2014), Zougab and Adjabi (2015) and Funke and Kawka (2015) by using asymmetric kernels] abbreviated as ”JLN” and utilizes a single smoothing parameter h. The JLN technique proposed by Jones et al. (1995) is based on the identity \(f(x)=\hat{f}_{L,h}(x)\left\{ f(x)/\hat{f}_{L,h}(x)\right\} \). In analogy to their estimators, using the discrete kernel L, we denote as \(\tilde{f}_{JLN,L}(x)\) the following estimator:

$$\begin{aligned} \tilde{f}_{JLN,L}(x)=\hat{f}_{L,h}(x)\left\{ \frac{1}{n}\sum _{i=1}^{n} \frac{K_{L(x,h)}(X_{i})}{\hat{f}_{L,h}(X_{i})}\right\} , \end{aligned}$$
(3)

where \(K_{L(x,h)}\) is the kernel L. Recognize that the term inside the brackets is a natural estimator of the bias-correction term \(f(x)/\hat{f}_{L,h}(x)\). Also, by construction, both \(\tilde{f}_{TS,L}(x)\) and \(\tilde{f}_{JLN,L}(x)\) always generate nonnegative probability mass function estimates everywhere.

3.2 Asymptotic properties

The asymptotic bias and variance of the MBC estimators are presented in the following theorems. We assume that

  1. A1.

    The derivatives of f at each point \(x\in \mathbb {N}\) are replaced by the finite differences [see Kokonendji and Senga Kiessé (2011)]

    $$\begin{aligned} f^{(j)}(x)=\{f^{(j-1)}(x)\}^{(1)}, \end{aligned}$$

    where

    $$\begin{aligned} f^{(1)}(x)=\left\{ \begin{array}{ll} \{f(x+1)-f(x-1)\}/2 &{} \quad \hbox {if}\quad x \in \mathbb {N} \setminus \{0\}; \\ f(1)-f(0) &{}\quad x=0. \end{array} \right. \end{aligned}$$
  2. A2.

    The smoothing parameter \(h=h(n)\) satisfies \(h\rightarrow 0\) as \(n\rightarrow \infty \).

Theorem 1

Let \(\tilde{f}_{TS,L}\) be the TS-MBC estimator using kernel L defined by (2). For a given target x, and assuming A1 and A2, it holds:

\((i )\) The bias of the TS-MBC discrete kernel estimator takes the following approximation

$$\begin{aligned} \mathrm{bias}(\tilde{f}_{TS,L}(x))=\frac{1}{c}\left[ \frac{1}{2} \left\{ \frac{l_{1}^{2}(x,f)}{f(x)}-l_{2}(x,f)\right\} \right] h^{2}+ o(h^{2}), \end{aligned}$$

where the explicit forms of \(l_{1}(x,f)\) and \(l_{2}(x,f)\) are respectively given in Tables 3 and 4. \((ii )\) The variance of the TS-MBC estimator with the kernel L admits the following expansion

$$\begin{aligned} \mathrm{Var}(\tilde{f}_{TS,L}(x))=\frac{f(x)(1-f(x))}{n(1-c)^{2}}\left( K_{L,h} (x)-cK_{L,h/c}(x)\right) ^{2}+o\left( \frac{1}{n}\right) . \end{aligned}$$
Table 3 Explicit forms of \(l_{1}(x,f)\)
Table 4 Explicit forms of \(l_{2}(x,f)\)

Proof

The proof is given in the “Appendix”.

Theorem 2

Let \(\tilde{f}_{JLN,L}\) be the JLN-MBC estimator with kernel L defined by (3). For a given target x, and assuming A1 and A2, it holds that:

\((i )\) The bias of the JLN-MBC discrete kernel estimator takes the following approximation

$$\begin{aligned} \hbox {bias}(\tilde{f}_{JLN,L}(x))=-f(x)l_{1}(x,g)h^{2}+o(h^{2}), \end{aligned}$$

where \(l_{1}(x,g)\) is obtained by replacing f by g in \(l_{1}(x,f)\) with \(g=g(x,f)=l_{1}(x,f)/f(x)\).

\((ii )\) The variance of the JLN-MBC discrete kernel estimator takes the following expression

$$\begin{aligned} Var(\tilde{f}_{JLN,L}(x))=\frac{f(x)(1-f(x))}{n}K^{2}_{L,h}(x) +o\left( \frac{1}{n}\right) . \end{aligned}$$

Proof

The proof is given in the “Appendix”. \(\square \)

3.3 Global property

We use the mean integrated squared error (MISE) as a criterion for the global property defined as

$$\begin{aligned} \mathrm{MISE}(\tilde{f}_{MBC,L})= & {} \sum \limits _{x\in \mathbb {T}}\mathrm{bias}^{2}(\tilde{f}_{MBC,L}(x))+ \sum \limits _{x\in \mathbb {T}}\mathrm{Var}(\tilde{f}_{MBC,L}(x)), \end{aligned}$$

where \(\tilde{f}_{MBC,L}\) is the TS-L or the JLN-L kernel estimator. The mean integrated squared error (MISE) of the TS-L kernel estimator presented in (2) is given by

$$\begin{aligned} \mathrm{MISE}(\widehat{f}_{TS,L})= & {} \frac{h^{4}}{c^{2}}\sum \limits _{x\in \mathbb {T}}\left[ \frac{1}{2}\left\{ \frac{l_{1,L}^{2}(x,f)}{f(x)}-l_{2,L}(x,f)\right\} \right] ^{2}\\&+ \sum \limits _{x\in \mathbb {T}}\frac{f(x)(1-f(x))}{n(1-c)^{2}}\big (K_{L,h}(x)-cK_{L,h/c}(x)\big )^{2}+ o\left( h^{4}+\frac{1}{n}\right) . \end{aligned}$$

Similarly, the mean integrated squared error of the JLN-L kernel estimator presented in (3) is expressed as

$$\begin{aligned} \mathrm{MISE}(\widehat{f}_{JLN,L})= & {} h^{4}\sum \limits _{x\in \mathbb {T}}f(x)^{2}l_{1}^{2}(x,g)+\frac{1}{n} \sum \limits _{x\in \mathbb {T}}f(x)\{1-f(x)\}K^{2}_{L(x,h)}(x)\\&+\, o\left( h^{4}+\frac{1}{n}\right) . \end{aligned}$$

Remark 1

We can easily transmit the results when bias and variance have to be established at a point x. As we have seen, the bias remains unchanged and is uniformly of order \(O(h^{2})\) over the whole support. Moreover, the variance exhibits the following order:

$$\begin{aligned} Var(\tilde{f}_{TS,L}(x))=Var(\tilde{f}_{JLN,L}(x))=Var(\hat{f}_{L}(x))=o(n^{-1}). \end{aligned}$$

The two theorems demonstrate that both TS and JLN estimators are free of boundary bias. More importantly, these two MBC estimators reduce the order of magnitude in bias from O(h) to \(O(h^{2})\), while their variances are still \(o(n^{-1})\). The variance of JLN estimator is first-order asymptotically equivalent to that of the corresponding classical estimator. Besides, since the variance of TS-MBC estimators depends on \(c\in (0,1)\), the variance of these estimators tends to be larger than that of the classical estimator, but not least importantly, see, e.g., Hirukawa (2010) and Hirukawa and Sakudo (2014) for more details in continuous case.

3.4 Normalization

Neither \(\tilde{f}_{TS,L}(x)\) nor \(\tilde{f}_{JLN,L}(x)\) sum up to one. In general, MBC leads to lack of normalization, Hirukawa (2010) for example, argues that this issue can be resolved, and propose two renormalized beta MBC kernel density estimators. Taking the structures of \(\tilde{f}_{TS,L}(x)\) and \(\tilde{f}_{TS,L}(x)\) into account. Following Hirukawa (2010), we adopt their macro approach to obtain the renormalized of our MBC estimators.

$$\begin{aligned} \tilde{f}^{R}_{TS,L}(x)= & {} \frac{\tilde{f}_{TS,L}(x)}{\sum _{x \in \mathbb {T}}\tilde{f}_{TS,L}(x)},\\ \tilde{f}^{R}_{JLN,L}(x)= & {} \frac{\tilde{f}_{JLN,L}(x)}{\sum _{x \in \mathbb {T}}\tilde{f}_{JLN,L}(x)}. \end{aligned}$$

Since

$$\begin{aligned} \mathbb {E}\left( \sum _{x \in \mathbb {T}}\tilde{f}_{TS,L}(x)\right)= & {} \sum _{x \in \mathbb {T}}\left( \mathbb {E}(\tilde{f}_{TS,L}(x))\right) ,\\= & {} 1+\frac{1}{c}\sum _{x \in \mathbb {T}}\left[ \frac{1}{2}\left\{ \frac{l_{1}^{2}(x,f)}{f(x)}-l_{2}(x,f)\right\} \right] h^{2}+ o(h^{2}) \end{aligned}$$

and

$$\begin{aligned} \mathbb {E}\left( \sum _{x \in \mathbb {T}}\tilde{f}_{JLN,L}(x)\right)= & {} \sum _{x \in \mathbb {T}}\left( \mathbb {E}(\tilde{f}_{JLN,L}(x))\right) ,\\= & {} 1-\sum _{x \in \mathbb {T}}f(x)l_{1}(x,g)h^{2}+o(h^{2}), \end{aligned}$$

biases of \(\tilde{f}^{R}_{TS,L}(x)\) and \(\tilde{f}^{R}_{JLN,L}(x)\) can be approximated by

$$\begin{aligned}&bias\left( \tilde{f}^{R}_{TS,L}(x)\right) \sim \frac{1}{c}\left[ \frac{1}{2}\left\{ \frac{l_{1}^{2}(x,f)}{f(x)}-l_{2}(x,f)\right\} -\sum _{x \in \mathbb {T} }\frac{1}{2}\left\{ \frac{l_{1}^{2}(x,f)}{f(x)}-l_{2}(x,f)\right\} \right] h^{2},\\&bias\left( \tilde{f}^{R}_{JLN,L}(x)\right) \sim \left[ -f(x)l_{1}(x,g)+\sum _{x \in \mathbb {T} }f(x)l_{1}(x,g)\right] h^{2}. \end{aligned}$$

3.5 Choice of smoothing parameter for discrete MBC kernel estimators

We adapt in this section the popular unbiased cross validation (UCV) method. First, we consider the case of TS-kernel estimators based on the kernel L. The optimal smoothing parameter \(h^{opt}_{TS,L}\) is given by

$$\begin{aligned} h^{opt}_{TS,L}=arg~~\min \limits _{h}~~UCV_{TS,L}(h), \end{aligned}$$

where

$$\begin{aligned} UCV_{TS,L}(h)= & {} \sum _{x\in \mathbb {T}}\tilde{f}^{2}_{TS-L}(x)-\frac{2}{(n-1)}\sum _{i=1}^{n}\tilde{f}^{(-i)}_{TS-L}(X_{i})\\= & {} \sum _{x\in \mathbb {T}}\left\{ \hat{f}_{L,h}(x)\right\} ^{\frac{2}{1-c}}\left\{ \hat{f}_{L,h/c}(x)\right\} ^{-\frac{2c}{1-c}} -\frac{2}{n(n-1)}\\&\times \sum _{i=1}^{n}\left[ \left\{ \sum _{j\ne i}K_{L(X_{i},h)}(X_{j})\right\} ^{\frac{1}{1-c}}\left\{ \sum _{j\ne i}K_{L(X_{i},h/c)}(X_{j})\right\} ^{-\frac{c}{1-c}}\right] . \end{aligned}$$

In the case of JLN-kernel estimators, the expression of UCV takes the following form:

$$\begin{aligned} UCV_{JLN,L}(h)= & {} \sum _{x\in \mathbb {T}}\tilde{f}^{2}_{JLN,L}(x)-\frac{2}{(n-1)}\sum _{i=1}^{n}\tilde{f}^{(-i)}_{JLN,L}(X_{i})\\= & {} \frac{1}{n^{2}}\sum _{x\in \mathbb {T}}\hat{f}^{2}_{L,h}(x)\left\{ \sum _{i=1}^{n}\frac{K_{L(x,h)}(X_{i})}{\hat{f}_{L,h}(X_{i})}\right\} ^{2}\\&- \frac{2}{n(n-1)}\times \sum _{i=1}^{n}\sum _{j\ne i}K_{L(X_{i},h)}(X_{j})\frac{\hat{f}_{L,h}(X_{i})}{\hat{f}_{L,h}(X_{j})} \end{aligned}$$

and the bandwidth \(h^{opt}_{JLN,L}\) is defined as follows

$$\begin{aligned} h^{opt}_{JLN,L}=arg~~\min \limits _{h}~~UCV_{JLN,L}(h). \end{aligned}$$

4 Illustrations from simulated data

This section investigates the performances of TS-DDU, TS-DT, JLN-DDU and JLN-DT kernel estimators considered in the previous section and compares their performances with the standard DDU and DT kernel estimators. Note that for the DT kernel, we used the arm \(a=2\), see for example Kokonendji and Senga Kiessé (2011). We consider six probability mass functions defined as follows:

(a):

\(\mathbf F _{1}\) a Poisson distribution with parameter \(\lambda =8\):

$$\begin{aligned} f(x)=e^{-8} \frac{8^{x}}{x!},~~~x\in \mathbb {N}. \end{aligned}$$
(b):

\(\mathbf F _{2}\) a mixture of three Poisson distributions with parameters \(\mu _{1}=3\), \(\mu _{2}=12\) and \(\mu _{3}=24\):

$$\begin{aligned} f(x)=\frac{1}{3}e^{-3} \frac{3^{x}}{x!}+ \frac{1}{3}e^{-12} \frac{12^{x}}{x!}+\frac{1}{3}e^{-24} \frac{24^{x}}{x!},~~~x\in \mathbb {N} \end{aligned}$$
(c):

\(\mathbf F _{3}\) a Geometric distribution with parameter \(p=0.1\):

$$\begin{aligned} f(x)= 0.1\cdot (0.9)^{x-1},~~~x\in \mathbb {N}. \end{aligned}$$
(d):

\(\mathbf F _{4}\) a mixture of Poisson and Geometric distributions with parameters \(\mu =10\) and \(p=0.1\):

$$\begin{aligned} f(x)=\frac{2}{5}\cdot e^{-10} \frac{10^{x}}{x!}+ \frac{3}{5}\cdot 0.1\cdot (0.9)^{x-1},~~~x\in \mathbb {N}. \end{aligned}$$
(e):

\(\mathbf F _{5}\) a negative binomial distribution with parameters \(n_{1}=20\) and \(p=2/3\):

$$\begin{aligned} f(x)=\frac{(19+x)!}{x!19!}\left( \frac{2}{3}\right) ^{20}\left( \frac{1}{3}\right) ^{x} ~~~x\in \mathbb {N}. \end{aligned}$$
(f):

\(\mathbf F _{6}\) a binomial distribution with parameters \(n_{1}=5\) and \(p=0.1\):

$$\begin{aligned} f(x)=\frac{5!}{x!(5-x)!}0.1^{x}\cdot 0.9^{5-x},~~~x\in \{0,1,\ldots ,5\}. \end{aligned}$$
Table 5 Some expected values of \(\mathrm{ISE}\) based on 500 replications for the previous considered pmfs
Table 6 Empirical \(\mathrm{ISB}\) values based on 500 replications for the previous considered pmfs
Table 7 Empirical IV values based on 500 replications for the previous considered pmfs

Note that for these considered pmfs, 500 replications of sizes \(n=20, 50, 100\) and 200 are generated. The MBC-DDU (TS-DDU and JLN-DDU) and MBC-DT (TS-DT and JLN-DT) discrete kernel estimators are applied to estimate the pmfs generated from Poisson\((\lambda =8)\), a mixture of three Poisson with \((\mu _{1}=3,\mu _{2}=12, \mu _{3}=24)\), a Geometric\((p=0.1)\), mixture of Poisson\((\mu =10)\) and Geometric\((p=0.1)\) distributions, a negative binomial distribution BN\((n_{1}=20,p=2/3)\) and a binomial distribution B\((n_{1}=5,p=0.1)\). Note that, for our simulations the value of c which is obtained in the sense of mean integrated squared error (MISE) see, e.g., Hirukawa (2010) is fixed at \(c=0.5\). We use the standard DDU and DT kernel estimators to compare their performance with the MBC-DDU and MBC-DT kernel estimators. For the choice of the bandwidth, we use the UCV technique proposed in the previous section. Finally, the performances of the different standard estimators and MBC estimators are examined via the integrated squared error (ISE) and the integrated squared bias (ISB) given respectively as follows:

$$\begin{aligned} \mathrm{ISE}:=\sum _{x \in \mathbb {T} } \left[ \widehat{f}(x)-f(x)\right] ^{2} \end{aligned}$$

and

$$\begin{aligned} \mathrm{ISB}:=\sum _{x \in \mathbb {T}}\left[ \mathbb {E}\{\widehat{f}(x)\}-f(x)\right] ^{2}. \end{aligned}$$

We also compute the integrated variance IV given by

$$\begin{aligned} \mathrm{IV}:=\sum _{x \in \mathbb {T} } \left[ Var\{\widehat{f}(x)\}\right] . \end{aligned}$$

Through simulation results (Tables 5, 6, 7), we can observe immediately that:

  1. 1.

    for all estimators, the means of ISE and ISB based on 500 simulations decrease as sample size n increases, which indicates that our estimators are consistent;

  2. 2.

    in terms of ISE and ISB, the performances of JLN-DDU and TS-DDU kernel estimators are mixed depending on the distribution. For example, in case of the binomial distribution, the JLN-DDU kernel estimator in general works better than the other competitors in the sense of ISB;

  3. 3.

    for all sample sizes the TS-DDU, JLN-DDU, TS-DT, JLN-DT kernel estimators outperform the standard DDU and DT kernel estimators in the senses of \(\mathrm{ISE}\) and \(\mathrm{ISB}\).

Note that the performances of the WVR and LR kernels are similar to those obtained by the DDU kernel, for this reason and to avoid making the manuscript more cumbersome, we have considered the DDU kernel rather than the WVR nor the LR in simulations and empirical illustrations.

The comparison is also illustrated in Figs. 1 and 2. We have plotted the estimates for sample size \(n = 200\) with DDU and DT(\(a=2\)) kernel. The solid lines represent the true pmf, the dotted lines represent the classical (C) estimator with DDU or DT kernel, the TS-DDU and TS-DT estimator are represented by the dashed lines, the JLN-DDU and JLN-DT estimators by the solid lines in gray. The plot shows that in general the MBC-DDU and MBC-DT estimators improves the standard DDU and DT kernel estimator for all pmfs. The smoothing quality is considered satisfactory. We have obtained the best smoothing quality by using the MBC-DDU (TS-DDU and JLN-DDU) or the MBC-DT (TS-DT and JLN-DT) kernel estimators.

Fig. 1
figure 1

The pmf estimation of Binomial data with \(n=200\) using the standard and MBC discrete associated kernel estimators. a DDU kernel. b DT(\(a=2\)) kernel

Fig. 2
figure 2

The pmf estimation of mixture of Poisson and Geometric distributions data with \(n=200\) using the standard and MBC discrete associated kernel estimators. a DDU kernel. b DT(\(a=2\)) kernel

The TS-MBC estimator depends on two smoothing parameters h and h / c, these two smoothing parameters also play a role in determining the boundary region. Controlling both h and h / c is a cumbersome task. Because \(0<c<1\), the pmf estimator using h / c tends to be oversmoothed, which is potentially a source of a large bias in every TS-MBC estimator. On the other hand, we make c too long in order to have a reasonable short of h / c.

5 Illustrations from real data

To complete our Monte Carlo simulations, we consider in this section two real data applications. First we illustrate the performances of the MBC techniques for discrete kernel estimators based on DDU and DT(\(a=2\)) kernels for the travel mode choice (between Sydney and Melbourne, Australia) data from Greene (2011). This data consists of \(n=210\) observations and \(m=4\) categories (air, train, bus and car). Note that the relative proportions of air, train, bus and car are 0.28, 0.30, 0.14 and 0.28 respectively. Table 8 provides the summary statistics of these real data observations. The second real application is related to the development of an insect parasite called the spiraling whitefly and observed in Republic of Congo, see Senga Kiessé and Mizère (2012). This insect pest plant causes some damages as sucking the sap, decreasing photosynthesis activity and drying up the leaves. The congolese biologists are searching for a suitable modeling by studying some count data characterizing the growth of spiraling whitefly such as the longevity of the adult insect (see Table 9).

Table 8 Summary Statistics for Travel Mode Choice Data
Table 9 Data of longevity of adult insects observed in days

Now we apply the MBC-DDU and MBC-DT kernel estimators to estimate the pmfs for the considered real data. The value of c is fixed at 0.5 for TS-DDU and TS-DT kernel estimators. The standard DDU and DT kernel estimators are also used for comparison.

In order to measure the performance of all estimators, we simply use the practical integrated squared error given by [see Kokonendji and Senga Kiessé (2011)]:

$$\begin{aligned} \mathrm{ISE^{0}}:=\sum _{x \in \mathbb {N} } \left[ \widehat{f}(x)-f_{0}(x)\right] ^{2}, \end{aligned}$$

where \(f_{0}(x)\) is the empirical (naive) estimator. Categorical independent variables can be used in a nonparametric pmf estimation, but they need to be coded. In our study we use the following code: 1=”air”; 2=”train”; 3=”bus”; 4=”car”. The bandwidths for the estimators are chosen by using the popular UCV technique. The obtained values of \(h_{ucv}\) and \(ISE^{0}\) for both applications are given in Tables 10 and 11.

Table 10 Results from bandwidth and \(ISE^{0}\) by discrete and MBC kernels estimators of real data from the travel mode choice (between Sydney and Melbourne, Australia) of \(n = 210\)
Table 11 Results from bandwidth and \(ISE^{0}\) by discrete and MBC kernels estimators of real data from longevity of adult insects observed in days of \(n = 82\)

We can see that in terms of the \(\mathrm{ISE^{0}}\), the MBC discrete kernel estimators with UCV bandwidths perform better than the standard kernel estimators for both applications. We have also plotted the estimations obtained by the classical (C) estimator and the MBC (TS and JLN) estimators with the Dirac Discrete Uniform and the triangular kernels for the second real data of longevity of adults insects observed in days. From Fig. 3, we observe that the smoothing quality is satisfactory. But the smoothing contributed by the JLN estimator is more suitable.

Fig. 3
figure 3

The pmf estimation of real data of longevity of adults insects observed in days with \(n=82\) using the standard and MBC discrete associated kernel estimators. a DDU kernel. b DT(\(a=2\)) kernel

6 Conclusion

This paper has proposed two multiplicative bias correction (MBC) techniques for discrete kernels in the context of probability mass function (pmf) estimation. We have shown that these two classes of MBC techniques improve the order of magnitude in bias from O(h) to \(O(h^{2})\). The performances of the MBC techniques for discrete kernel estimators (TS-DDU, TS-DT, JLN-DT and JLN-DDU kernel estimators) with unbiased cross-validation (UCV) bandwidth selectors are investigated through a simulation study and a real data application for count and categorical data. In general, the MBC discrete kernel estimators perform better than the standard discrete kernel estimators in the sense of integrated squared error (ISE) and integrated squared bias (ISB).

This paper deals only with the univariate case. An extension is obviously given by the estimation of multivariate pmfs. We are aware of two recent publications, which deal with multivariate (discrete) kernels. Kokonendji and Somé (2015) investigated multivariate kernels for the estimation of the density of continuously distributed random vectors. Moreover, discrete multivariate kernels have been studied by Belaid et al. (2016). In the latter one, a Bayesian bandwidth selection method for those kernels has been proposed.

Hence, let \(X=\{(X_{i1},\ldots ,X_{id}),~i=1,\ldots ,n\}\) be a sample of i.i.d. random vectors of dimension \(d\ge 1\). When following the approach by Belaid et al. (2016), we define the multivariate version of a discrete kernel with diagonal bandwidth matrix \(H=diag(h_1,\ldots ,h_d)\) and kernel L according to

$$\begin{aligned} \hat{f}_L(\mathbf x ):=\frac{1}{n}\sum _{i=1}^n\prod _{j=1}^dK_{L(x_j,h_j)}^{[j]}(X_{ij}), \end{aligned}$$

where \(K_{L(x_j,h_j)}^{[j]}(X_{ij})\) denotes the univariate discrete kernel studied in this paper and

$$\begin{aligned} \mathbf x :=(x_{1},\ldots ,x_{d})\in \mathbb {T}^d:=\times _{j=1}^{d} \mathbb {T}_{1}^{[j]}\subseteq \mathbb {Z}^{d} \end{aligned}$$

denotes the target vector. Moreover, \(\mathbb {T}^d\) denotes the support of the underlying pmf f, which has to be estimated at \(\mathbf x \).

In view of our univariate findings and in analogy of the asymmetric kernel based approach by Funke and Kawka (2015), we define the multivariate version of the TS estimator as

$$\begin{aligned} \hat{f}_{TS,L}(\mathbf x ):=\left( \hat{f}_L(\mathbf x )\right) ^{\frac{1}{1-c}} \left( \hat{f}_L(\mathbf x )\right) ^{-\frac{c}{1-c}}. \end{aligned}$$

In an analogous way, the multivariate JLN estimator is defined according to

$$\begin{aligned} \hat{f}_{JLN}(\mathbf x ):=\hat{f}_L(\mathbf x )\frac{1}{n}\sum _{i=1}^n \frac{\prod _{j=1}^dK_{L(x_j,h_j)}^{[j]}(X_{ij})}{\hat{f}_L(\mathbf X _i)}. \end{aligned}$$

Under appropriate assumptions, it can be shown that the mean squared errors of both estimators are given by

$$\begin{aligned} MSE\left( \hat{f}_{TS,L}(\mathbf x )\right) {=}O\left( h^4+\frac{\prod _{j=1}^d \left( K_{L(x_j,h)}(x_j)-cK_{L(x_j,h/c)}(x_j)\right) ^{2}}{n}\right) \text { as }n{\rightarrow }\infty , \end{aligned}$$

as well as

$$\begin{aligned} MSE\left( \hat{f}_{JLN}(\mathbf x )\right) =O\left( h^4+\frac{1}{n}\prod _{j=1}^d K_{L(x_j,h)}^2(x_j)\right) \text { as }n\rightarrow \infty , \end{aligned}$$

where, for the sake of simplicity, the bandwidth vector h is given by \(h\equiv h_1=\cdots =h_d\). Analytical exact expressions of both bias terms are under investigation and will be covered in a following paper.