1 Introduction

Nonparametric methods are gradually becoming popular in statistical analysis for analyzing problems in many fields, such as economics, biology, and actuarial science. In most cases, this is because of a lack of information on the variables being analyzed. Smoothing concerning functions, such as density or cumulative distribution, plays a special role in nonparametric analysis. Knowledge on a density function, or its estimate, allows one to characterize the data more completely. We can derive other characteristics of a random variable from an estimate of its density function, such as the probability itself, hazard rate, mean, and variance value.

Let \(X_1,X_2,...,X_n\) be independently and identically distributed random variables with an absolutely continuous distribution function \(F_X\) and a density \(f_X\). Parzen (1962) and Rosenblatt (1956) introduced the kernel density estimator (we will call it the standard one) as a smooth and continuous estimator for density functions. It is defined as

$$\begin{aligned} {\widehat{f}}_X(x)=\frac{1}{nh}\sum _{i=1}^n K\left( \frac{x-X_i}{h}\right) , \; \; \; x\in {\mathbb {R}}, \end{aligned}$$
(1)

where K is a function called a “kernel”, and \(h>0\) is the bandwidth, which is a parameter that controls the smoothness of \({\widehat{f}}_X\). It is usually assumed that K is a symmetric (about 0) continuous nonnegative function with \(\int _{-\infty }^\infty K(v)\mathrm {d}v=1\), as well as \(h\rightarrow 0\) and \(nh\rightarrow \infty \) when \(n\rightarrow \infty \). It is easy to prove that the standard kernel density estimator is continuous and satisfies all the properties of a density function.

A typical general measure of the accuracy of \({\widehat{f}}_X(x)\) is the mean integrated squared error, defined as

$$\begin{aligned} MISE({\widehat{f}}_X)=E\left[ \int _{-\infty }^\infty \{{\widehat{f}}_X(x)-f_X(x)\}^2 w(x)\mathrm {d}x\right] , \end{aligned}$$
(2)

where w is a weight function. In this article, we consider only \(w(x)=1\). For point-wise measures of accuracy, we will use bias, variance, and the mean squared error \(MSE[{\widehat{f}}_X(x)]=E[\{{\widehat{f}}_X(x)-f_X(x)\}^2]\). It is well known that the MISE and the MSE can be computed with

$$\begin{aligned} MISE({\widehat{f}}_X)= & {} \int _{-\infty }^\infty MSE[{\widehat{f}}_X(x)]\mathrm {d}x, \end{aligned}$$
(3)
$$\begin{aligned} MSE[{\widehat{f}}_X(x)]= & {} Bias^2[{\widehat{f}}_X(x)]+Var[{\widehat{f}}_X(x)]. \end{aligned}$$
(4)

Under the condition that \(f_X\) has a continuous second order derivative \(f_X''\), it has been proved by the above-mentioned authors that, as \(n\rightarrow \infty \),

$$\begin{aligned} Bias[{\widehat{f}}_X(x)]= & {} \frac{h^2}{2}f_X''(x)\int u^2 K(u)\mathrm {d}u+o(h^2),\\ Var[{\widehat{f}}_X(x)]= & {} \frac{f_X(x)}{nh}\int K^2(u)\mathrm {d}u+o\left( \frac{1}{nh}\right) . \end{aligned}$$

There have been many proposals in the literature for improving the bias property of the standard kernel density estimator. Typically, under sufficient smoothness conditions placed on the underlying density \(f_X\), the bias is reduced from \(O(h^2)\) to \(O(h^4)\), and the variance remains in the order of \(n^{-1}h^{-1}\). Those methods that could potentially have greater impact include bias reduction by geometric extrapolation by Terrel and Scott (1980), variable bandwidth kernel estimators by Abramson (1982), variable location estimators by Samiuddin and El-Sayyad (1990), nonparametric transformation estimators by Ruppert and Cline (1994), and multiplicative bias correction estimators by Jones et al. (1995). One also could use, of course, the so-called higher order kernel functions, but this method has a disadvantage in that negative values might appear in the density estimates and distribution function estimates.

All of the previous explanations implicitly assume that the true density is supported on the entire real line. If we deal with a nonnegative supported distribution, for instance, the standard kernel density estimator will suffer the so-called boundary bias problem, and this is the main problem that we are going to overcome. Throughout this paper, the interval [0, h] is called a “boundary region”, and points greater than h are called “interior points”.

In the boundary region, the standard kernel density estimator \({\widehat{f}}_X(x)\) usually underestimates \(f_X(x)\). This is because it does not “feel” the boundary, and it puts weights for the lack of data on the negative axis. To be more precise, if we use a symmetric kernel supported on \([-1,1]\), we have

$$\begin{aligned} Bias[{\widehat{f}}_X(x)]=\left[ \int _{-1}^c K(u)\mathrm {d}u-1\right] f_X(x)-hf_X'(x)\int _{-1}^c uK(u)\mathrm {d}u+O(h^2) \end{aligned}$$

when \(x\le h\), \(c=\frac{x}{h}\). This means that this estimator is not consistent at \(x=0\) because

$$\begin{aligned} \lim _{n\rightarrow \infty }Bias[{\widehat{f}}_X(0)]=\left[ \int _{-1}^c K(u)\mathrm {d}u-1\right] f_X(0)\ne 0, \end{aligned}$$

unless \(f_X(0)=0\).

Several ways of removing the boundary bias problem, each with their own advantages and disadvantages, are data reflection (Schuster 1985) simple nonnegative boundary correction (Jones and Foster 1996), boundary kernels (Muller 1991, 1993; Muller and Wang 1994), pseudodata generation (Cowling and Hall 1996), a hybrid method (Hall and Wehrly 1991), empirical transformation (Marron and Ruppert 1994), a local linear estimator (Lejeune and Sarda 1992; Jones 1993), data binning and a local polynomial fitting on the bin counts (Cheng et al. 1997), and others. Most of them use symmetric kernel functions as usual, and then modify their forms or transform the data.

Chen (2000) proposed a simple way to circumvent the boundary bias that appears in the standard kernel density estimation. The remedy consists in replacing symmetric kernels with asymmetric gamma kernels, which never assign a weight outside of the support. In addition to satisfactory asymptotic features, Chen (2000) reported good finite sample performances of this cure through a simulation study.

Let K(yxh) be an asymmetric function parameterized by x and h, called an “asymmetric kernel”. Then, the definition of the asymmetric kernel density estimator is

$$\begin{aligned} {\widehat{f}}(x)=\frac{1}{n}\sum _{i=1}^n K(X_i;x,h). \end{aligned}$$
(5)

Since the density of \(Gamma(xh^{-1}+1,h)\),

$$\begin{aligned} \frac{y^{\frac{x}{h}}e^{-\frac{y}{h}}}{\Gamma \left( \frac{x}{h}+1\right) h^{\frac{x}{h}+1}}, \end{aligned}$$
(6)

is an asymmetric function parameterized by x and h, it is natural to use it as an asymmetric kernel. Hence, Chen (2000) defined his first gamma kernel density estimator as

$$\begin{aligned} {\widehat{f}}_C(x)=\frac{1}{n}\sum _{i=1}^{n} \frac{X_i^{\frac{x}{h}}e^{-\frac{X_i}{h}}}{\Gamma \left( \frac{x}{h} +1\right) h^{\frac{x}{h}+1}}. \end{aligned}$$
(7)

The intuitive approach to seeing how Eq. (7) can be used as a consistent estimator is as follows. Let Y be a \(Gamma(xh^{-1}+1,h)\) random variable with the pdf stated in Eq. (6); then,

$$\begin{aligned} E[{\widehat{f}}_C(x)]=\int _0^\infty f_X(y)K(y;x,h)\mathrm {d}y=E[f_X(Y)]. \end{aligned}$$

By Taylor expansion,

$$\begin{aligned} E[f_X(Y)]=f_X(x)+h\left[ f_X'(x)+\frac{1}{2}xf_X''(x)\right] +o(h), \end{aligned}$$

which will converge to \(f_X(x)\) as \(n\rightarrow \infty \). For a detailed theoretical explanation regarding the consistency of asymmetric kernels, see Bouezmarni and Scaillet (2005).

The bias and variance of Chen’s first gamma kernel density estimator are

$$\begin{aligned} Bias[{\widehat{f}}_C(x)]= & {} \left[ f_X'(x)+\frac{1}{2}xf_X''(x)\right] h+o(h),\end{aligned}$$
(8)
$$\begin{aligned} Var[{\widehat{f}}_C(x)]= & {} {\left\{ \begin{array}{ll} \frac{f_X(x)}{2\sqrt{\pi x}n\sqrt{h}}, &{} \frac{x}{h}\rightarrow \infty , \\ \frac{\Gamma (2\kappa +1)f_X(x)}{2^{2\kappa +1}\Gamma ^2(\kappa +1)nh}, &{} \frac{x}{h}\rightarrow c, \end{array}\right. } \end{aligned}$$
(9)

for some \(c>0\). Since the result is quite similar, we do not discuss Chen’s second gamma kernel density estimator in this article; see Chen (2000) for reference.

Chen’s gamma kernel density estimator obviously solved the boundary bias problem because the gamma pdf is a nonnegative supported function, so no weight will be put on the negative axis. However, it also has some problems; they are:

  • The variance depends on a factor \(x^{-1/2}\) in the interior, which means the variance becomes much larger quickly when x is small,

  • Zhang (2010) showed that the MSE is \(O(n^{-2/3})\) when x is close to the boundary (worse than the standard kernel density estimator).

In this article, we try to improve Chen’s estimator. Using a similar idea but with different parameters of gamma density as a kernel function, we intend to reduce the variance. Then, we strive to reduce the bias by modifying it with expansions of exponential and logarithmic functions. Hence, our modified gamma kernel density estimator is not only free of the boundary bias, but the variance also has smaller orders both in the interior and near the boundary, compared with Chen’s method. As a result, the optimal orders of the MSE and the MISE are smaller as well. A simulation study is discussed in Section 3, and detailed proofs can be found in the appendices.

2 New type of gamma kernel density estimator formulation

Before starting our discussion, we need to impose assumptions; they are:

A1.:

The bandwidth \(h>0\) satisfies \(h\rightarrow 0\) and \(nh\rightarrow \infty \) when \(n\rightarrow \infty \),

A2.:

The density \(f_X\) is three times continuously differentiable, and \(f_X^{(4)}\) exists,

A3.:

The integrals \(\int \left[ \frac{f_X'(x)}{f_X(x)}\right] ^2\mathrm {d}x\), \(\int x^4\left[ \frac{f_X''(x)}{f_X(x)}\right] ^2\mathrm {d}x\), \(\int x^2[f_X''(x)]^2\mathrm {d}x\), and \(\int x^6[f_X'''(x)]^2\mathrm {d}x\) are finite.

The first assumption is the usual assumption for the standard kernel density estimator. Since we will use exponential and logarithmic expansions, we need A2 to ensure the validity of our proofs. The last assumption is necessary to make sure we can calculate the MISE.

As we stated before, the modification of the gamma kernel, done to improve the performance of Chen’s method, is started by replacing the shape and scale parameters of the gamma density with suitable functions of x and h, and this kernel is defined as a new gamma kernel. Our purpose in doing this is to reduce the variance so that it is smaller than the variance of Chen’s method. After trying several combinations of functions, we chose the density of \(Gamma(h^{-1/2},x\sqrt{h}+h)\), which is

$$\begin{aligned} K(y;x,h)=\frac{y^{\frac{1}{\sqrt{h}}-1}e^{-\frac{y}{x\sqrt{h}+h}}}{\Gamma \left( \frac{1}{\sqrt{h}}\right) (x\sqrt{h}+h)^\frac{1}{\sqrt{h}}}, \end{aligned}$$
(10)

as a kernel, and we define the new gamma kernel density “estimator” as

$$\begin{aligned} A_h(x)=\frac{\sum _{i=1}^{n}X_i^{\frac{1}{\sqrt{h}}-1}e^{-\frac{X_i}{x \sqrt{h}+h}}}{n\Gamma \left( \frac{1}{\sqrt{h}}\right) (x\sqrt{h}+h)^\frac{1}{\sqrt{h}}}, \end{aligned}$$
(11)

where n is the sample size, and h is the bandwidth.

Remark 2.1

Even though the formula in Eq. (11) can work as a density estimator properly, it is not our proposed method (that is why we put quotation marks around the word “estimator”). As we will state later, we need another modification for Eq. (11) before our proposed estimator is created.

After this, we need to derive the bias and the variance formulas of \(A_h(x)\). Consult the following theorem.

Theorem 2.2

Assuming A1 and A2, for the function \(A_h(x)\) in Eq. (11), its bias and variance are

$$\begin{aligned} Bias[A_h(x)]=\left[ f_X'(x)+\frac{1}{2}x^2f_X''(x)\right] \sqrt{h}+o(\sqrt{h}) \end{aligned}$$
(12)

and

$$\begin{aligned} Var[A_h(x)]= {\left\{ \begin{array}{ll} \frac{R^2\left( \frac{1}{\sqrt{h}}-1\right) f_X(x)}{2(x+\sqrt{h}) \sqrt{\pi (1-\sqrt{h})}R\left( \frac{2}{\sqrt{h}}-2\right) nh^{\frac{1}{4}}} +O\left( \frac{h^\frac{1}{4}}{n}\right) , &{} \frac{x}{h}\rightarrow \infty \\ \frac{R^2\left( \frac{1}{\sqrt{h}}-1\right) f_X(x)}{2(c\sqrt{h}+1) \sqrt{\pi (1-\sqrt{h})}R\left( \frac{2}{\sqrt{h}}-2\right) nh^{\frac{3}{4}}} +O\left( \frac{1}{nh^\frac{1}{4}}\right) , &{} \frac{x}{h}\rightarrow c, \end{array}\right. } \end{aligned}$$
(13)

for some positive number c, and

$$\begin{aligned} R(z)=\frac{\sqrt{2\pi }z^{z+\frac{1}{2}}}{e^z\Gamma (z+1)}. \end{aligned}$$
(14)

Remark 2.3

The function R(z) (Brown and Chen 1999) monotonically increases with \(\lim _{z\rightarrow \infty }R(z)=1\) and \(R(z)<1\), which means \(\frac{R^2\left( \frac{1}{h}-1\right) }{R\left( \frac{2}{h}-2\right) }\le 1\). From these facts, we can conclude that \(Var[A_h(x)]\) is \(O(n^{-1}h^{-1/4})\) when x is in the interior, and it is \(O(n^{-1}h^{-3/4})\) when x is near the boundary. Both of these rates of convergence are faster than the rates of the variance of Chen’s gamma kernel estimator for both cases, respectively. Furthermore, instead of \(x^{-1/2}\), \(Var[A_h(x)]\) depends on \((x+\sqrt{h})^{-1}\), which means the value of the variance will not speed up to infinity when x approaches 0.

Even though we have succeeded in reducing the order of the variance, we now encounter a larger bias order. To avoid this problem, we use geometric extrapolation to change the order of bias back to h.

Theorem 2.4

Let \(A_h(x)\) be the function in Eq. (11). Assuming A1 and A2, if we define \(J_h(x)=E[A_h(x)]\), then

$$\begin{aligned}{}[J_h(x)]^2[J_{4h}(x)]^{-1}=f_X(x)+O(h). \end{aligned}$$
(15)

Remark 2.5

The function \(J_{4h}(x)\) is the expectation of the function in Eq. (11) with 4h as the bandwidth. Furthermore, the term after \(f_X(x)\) in Eq. (15) is in the order h, which is the same as the order of bias for Chen’s gamma kernel density estimator. This theorem will lead us to the idea to modify \(A_h(x)\). We present the explicit asymptotic formula of O(h) in the appendices.

Theorem 2.4 gives us the idea to modify \(A_h(x)\) and to define our new estimator. Hence, we propose

$$\begin{aligned} {\widetilde{f}}_X(x)=[A_h(x)]^2[A_{4h}(x)]^{-1} \end{aligned}$$
(16)

as the modified gamma kernel density estimator, our proposed method. This idea is actually straightforward. It uses the fact that the expectation of the operation of two statistics is asymptotically equal (in probability) to the operation of the expectation of each statistic. Though we do not use any concept of convergence in probability in our proofs, the idea is still applicable when using Taylor expansion.

For the bias of our proposed estimator, we have the following theorem.

Theorem 2.6

Assuming A1 and A2, the bias of the modified gamma kernel density estimator is

$$\begin{aligned} Bias[{\widetilde{f}}_X(x)]=-2\left[ b(x)-\frac{a(x)}{2f_X(x)}\right] h +o(h)+O\left( \frac{1}{nh^{\frac{1}{4}}}\right) , \end{aligned}$$
(17)

where

$$\begin{aligned} a(x)= & {} f_X'(x)+\frac{1}{2}x^2f_X''(x), \end{aligned}$$
(18)
$$\begin{aligned} b(x)= & {} \left( x+\frac{1}{2}\right) f_X''(x)+x^2\left( \frac{x}{3} +\frac{1}{2}\right) f_X'''(x). \end{aligned}$$
(19)

As expected, the bias’ leading term is actually the same as the explicit form of O(h) in Theorem 2.4 (see appendices). Its order of convergence changed back to h, the same as the bias of Chen’s method. This is quite the accomplishment because if we can keep the order of the variance the same as \(Var[A_h(x)]\), we can then conclude that the MSE of our modified gamma kernel density estimator is smaller than the MSE of Chen’s gamma kernel estimator. However, before jumping into the calculation of variance, we need the following theorem.

Theorem 2.7

Assuming A1 and A2, for the function in Eq. (11) with bandwidth h, \(A_h(x)\), and with bandwidth 4h, \(A_{4h}(x)\), the covariance of them is equal to

$$\begin{aligned} Cov[A_h(x),A_{4h}(x)]= & {} \frac{R\left( \frac{1}{\sqrt{h}}-1\right) R \left( \frac{1}{2\sqrt{h}}-1\right) }{2\sqrt{\pi }R\left( \frac{3}{2\sqrt{h}} -2\right) (3x+5\sqrt{h})}\frac{\left( \frac{3}{2}-2\sqrt{h}\right) ^{\frac{3}{2\sqrt{h}} -\frac{3}{2}}}{(2-2\sqrt{h})^{\frac{1}{\sqrt{h}}-\frac{1}{2}}(1 -2\sqrt{h})^{\frac{1}{2\sqrt{h}}-\frac{1}{2}}}\\&\times \left( \frac{x+\sqrt{h}}{3x+5\sqrt{h}}\right) ^{\frac{1}{2 \sqrt{h}}-1}\left( \frac{2x+4\sqrt{h}}{3x+5\sqrt{h}}\right) ^{\frac{1}{\sqrt{h}} -1}\frac{f_X(x)}{nh^{\frac{1}{4}}}+O\left( \frac{h^\frac{1}{4}}{n}\right) , \end{aligned}$$

when \(xh^{-1}\rightarrow \infty \), and

$$\begin{aligned} Cov[A_h(x),A_{4h}(x)]=\frac{R\left( \frac{1}{\sqrt{h}}-1\right) R \left( \frac{1}{2\sqrt{h}}-1\right) }{2\sqrt{\pi }R\left( \frac{3}{2\sqrt{h}} -2\right) (3c\sqrt{h}+5)}\frac{\left( \frac{3}{2}-2\sqrt{h}\right) ^{\frac{3}{2 \sqrt{h}}-\frac{3}{2}}}{(2-2\sqrt{h})^{\frac{1}{\sqrt{h}} -\frac{1}{2}}(1-2\sqrt{h})^{\frac{1}{2\sqrt{h}}-\frac{1}{2}}}\\ \times \left( \frac{c\sqrt{h}+1}{3c\sqrt{h}+5}\right) ^{\frac{1}{2\sqrt{h}}-1} \left( \frac{2c\sqrt{h}+4}{3c\sqrt{h}+5}\right) ^{\frac{1}{\sqrt{h}}-1} \frac{f_X(x)}{nh^{\frac{3}{4}}}+O\left( \frac{1}{nh^\frac{1}{4}}\right) , \end{aligned}$$

when \(xh^{-1}\rightarrow c>0\).

Theorem 2.8

Assuming A1 and A2, the variance of the modified gamma kernel density estimator is

$$\begin{aligned} Var[{\widetilde{f}}_X(x)]=4Var[A_h(x)]+Var[A_{4h}(x)]-4Cov[A_h(x),A_{4h}(x)] +o\left( \frac{1}{nh^{\frac{1}{4}}}\right) ,\nonumber \\ \end{aligned}$$
(20)

where its orders of convergence are \(O(n^{-1}h^{-1/4})\) in the interior and \(O(n^{-1}h^{-3/4})\) in the boundary region.

As a conclusion to Theorems 2.6 and 2.8 , with the identity of MSE, we have

$$\begin{aligned} MSE[{\widetilde{f}}_X(x)]= {\left\{ \begin{array}{ll} O(h^2)+O\left( \frac{1}{nh^{\frac{1}{4}}}\right) , &{} \frac{x}{h}\rightarrow \infty ,\\ O(h^2)+O\left( \frac{1}{nh^{\frac{3}{4}}}\right) , &{} \frac{x}{h}\rightarrow c. \end{array}\right. } \end{aligned}$$
(21)

The theoretical optimum bandwidths are \(h=O(n^{-4/9})\) in the interior and \(h=O(n^{-4/11})\) in the boundary region. As a result, the optimum orders of convergence are \(O(n^{-8/9})\) and \(O(n^{-8/11})\), respectively. Both of them are smaller than the optimum orders of Chen’s estimator, which are \(O(n^{-4/5})\) in the interior and \(O(n^{-2/3})\) in the boundary region. Furthermore, since the MISE is just the integration of MSE, it is clear that the orders of convergence of the MISE are the same as of the MSE.

Calculating the explicit formula of \(MISE({\widetilde{f}}_X)\) is nearly impossible because of the complexity of the formulas of \(Bias[{\widetilde{f}}_X(x)]\) and \(Var[{\widetilde{f}}_X(x)]\). However, there is one thing we would like to discuss regarding this matter. Using a similar argument stated by Chen (2000), the boundary region part of \(Var[{\widetilde{f}}_X(x)]\) is negligible while integrating the variance. Thus, instead of computing \(\int _{boundary}Var[{\widetilde{f}}_X(x)]+\int _{interior}Var[{\widetilde{f}}_X(x)]\), it is sufficient to just calculate \(\int _0^\infty Var[{\widetilde{f}}_X(x)]\mathrm {d}x\) using the formula of the variance in the interior. With that, computing

$$\begin{aligned} MISE({\widetilde{f}}_X)=\int _0^\infty Bias^2[{\widetilde{f}}_X(x)]\mathrm {d}x+\int _0^\infty Var[{\widetilde{f}}_X(x)]\mathrm {d}x \end{aligned}$$

can be approximated by using numerical methods (assuming \(f_X\) is known).

3 Simulation study

In this section, we provide the results of a simulation study we did to show the performances of our proposed method and compare them with other estimators’ results. The measures of error we use in this article are the MISE, the MSE, bias, and variance. Since we are working under assumptions A1, A2, and A3, the MISE of our proposed estimator is finite. We calculated the average integrated squared error (AISE), the average squared error (ASE), simulated bias, and simulated variance, with a sample size of \(n=50\) and 10000 repetitions for each case.

We compared four gamma kernel density estimators: Chen’s gamma kernel density estimator \({\widehat{f}}_C(x)\) in Chen (2000), two nonnegative bias-reduced Chen’s gamma estimators \({\widehat{f}}_{KI1}(x)\) and \({\widehat{f}}_{KI2}(x)\) (Igarashi and Kakizawa 2015, Eq. 10 and 11), and our modified gamma kernel density estimator \({\widetilde{f}}_X(x)\). We generated several distributions for this study; they are exponential distribution exp(1 / 2), gamma distribution Gamma(2, 3), log-normal distribution log.N(0, 1), inverse Gaussian distribution IG(1, 2), Weibull distribution Weibull(3, 2), and absolute normal distribution abs.N(0, 1). The least squares cross-validation technique was used to determine the value of the bandwidths.

Table 1 Comparison of the average integrated squared error (\(\times 10^5\))
Table 2 Comparison of the average squared error (\(\times 10^5\)) when \(x=0.01\)
Table 3 Comparison of the bias (\(\times 10^4\)) when \(x=0.01\)
Table 4 Comparison of the variance (\(\times 10^5\)) when \(x=0.01\)
Fig. 1
figure 1

Comparison of point-wise bias, variance, and ASE of \({\widetilde{f}}_X(x)\), \({\widehat{f}}_C(x)\), \({\widehat{f}}_{KI1}(x)\), and \({\widehat{f}}_{KI2}(x)\) for estimating density of exp(1 / 2) with sample size \(n=150\)

Table 1 compares AISEs, representing the general measure of error. As we can see, the proposed method outperformed the other estimators by giving us the smallest errors (numbers in bold). Since one of our main concerns is eliminating the boundary bias problem, it is necessary to take our attention to the values of the measures of error in the boundary region. Tables 23, and 4 show the ASE, bias, and variance of those four estimators when \(x=0.01\). Once again, our estimator had the best results (bold numbers). Though the differences among the values of bias were relatively not big (Table 3), from Table 4, we can witness how our variance reduction has an effect.

Fig. 2
figure 2

Comparison of the point-wise bias, variance, and ASE of \({\widetilde{f}}_X(x)\), \({\widehat{f}}_C(x)\), \({\widehat{f}}_{KI1}(x)\), and \({\widehat{f}}_{KI2}(x)\) for estimating density of Gamma(2, 3) with sample size \(n=150\)

Fig. 3
figure 3

Comparison of the point-wise bias, variance, and ASE of \({\widetilde{f}}_X(x)\), \({\widehat{f}}_C(x)\), \({\widehat{f}}_{KI1}(x)\), and \({\widehat{f}}_{KI2}(x)\) for estimating density of abs.N(0, 1) with sample size \(n=150\)

As further illustrations, we also provide graphs of point-wise ASE, bias, squared bias, and variance to compare our estimator’s performances with those of the others. We generated exponential, gamma, and absolute normal distributions 1000 times to produce Figs. 12, and 3 .

In some cases, we found that the bias value of our proposed estimator was away from 0 more than the other estimators (e.g., Fig. 1a around \(x=1\), Fig. 2a around \(x=4\), and Fig. 3a around \(x=0.2\)). Though this could reflect poorly on the proposed estimator, from the variance parts (Figs. 1b, 2b, and 3b), we see that our estimator never failed to give the smallest value of variance, confirming that we succeeded in reducing variance with our method. Moreover, the result of the variance reduction is the reduction of point-wise ASE itself, shown in Figs. 1d, 2d, and 3d. One may take note of Fig. 2d when \(x\in [1,4]\) because the estimators of Igarashi and Kakizawa (2015) slightly outperformed the proposed method. However, as x got larger, \(ASE[{\widehat{f}}_{KI1}(x)]\) and \(ASE[{\widehat{f}}_{KI2}(x)]\) failed to get closer to 0 (they will when x is large enough), while \(ASE[{\widetilde{f}}_X(x)]\) approached 0 immediately.

4 Conclusion

We proposed a new estimator for the density function of nonnegative data. First, we defined a function as a “pre-estimator” by choosing suitable parameters to reduce the order of variance and then using geometric extrapolation to reduce the order of convergence of the bias from \(O(\sqrt{h})\) to O(h). As a result, the MISE of the proposed method is smaller than Chen’s gamma kernel estimator. Moreover, the results of a simulation study revealed the superior performances of our proposed method. Establishing new estimators using this kernel for other functions such as the distribution function or the hazard rate function would be a promising study to consider.