Abstract
The integral \({\int }_{0}^{\infty }x^{m} e^{-\frac {1}{2}(x-a)^{2}}dx\) appears in likelihood ratios used to detect a change in the parameters of a normal distribution. As part of the mth moment of a truncated normal distribution, this integral is known to satisfy a recursion relation, which has been used to calculate the first four moments of a truncated normal. Use of higher order moments was rare. In more recent times, this integral has found important applications in methods of changepoint detection, with m going up to the thousands. The standard recursion formula entails numbers whose values grow quickly with m, rendering a low cap on computational feasibility. We present various aspects of dealing with the computational issues: asymptotics, recursion and approximation. We provide an example in a changepoint detection setting.
Article PDF
Similar content being viewed by others
Avoid common mistakes on your manuscript.
References
Barr D R, Sherrill E T (1999) Mean and variance of truncated normal distributions. Amer Statist 53:357–361
Burkardt J (2014) The truncated normal distribution. https://people.sc.fsu.edu/~jburkardt/presentations/truncated_normal.pdf
Cook JD (2010) Upper bounds on non-central chi-square tails and truncated normal moments. https://www.johndcook.com/non_central_chi_square.pdf
Dhrymes P J (2005) Moments of truncated (normal) distributions. http://www.columbia.edu/~pjd1/papers.html
Gordon L, Pollak M (1995) A robust surveillance scheme for stochastically ordered alternatives. Ann Statist 23:1350–1375
Horrace WC (2015) Moments of the truncated normal distribution. J Prod Anal 43:133–138
Krieger AM, Pollak M, Yakir B (2003) Surveillance of a simple linear regression. J Amer Statist Assoc 98:1–15
Lee A (1914) Table of the Gaussian “tail” tunctions; when the “tail” is larger than the body. Biometrika 10:208–214
Liquet B, Nazarathy Y (2015) A dynamic view to moment matching of truncated distributions. Statist Prob Letts 104(53):87–93
MATLAB (2014) MATLAB and statistics toolbox release 2014b. The Mathworks Inc., Natick
Toolbox (2016) MATLAB multiprecision computing toolbox (ADVANPIX Release 2016). The MathWorks, Inc., Natick. http://www.advanpix
O’Connor AN (2011) Probability distributions used in reliability engineering. Reliability information analysis center (RIAC)
Pearson K, Lee A (1908) On the generalized probable error in multiple normal correlation. Biometrika 6:59–68
Pollak M (1987) Average run lengths of an optimal method of detecting a change in distribution. Ann Statist 15:749–779
Pollak M, Siegmund D (1991) Sequential detection of a change in a normal mean when the initial value is unknown. Ann Statist 19:394–416
Pollak M, Croarkin C, Hagwood C (1993) Surveillance schemes with application to mass calibration. NISTIR 5158 Technical Report, Statistical Division, The National Institute of Standards and Technology, Gaithersburg, MD 20899, USA
Quesenberry C (1991) SPC Q processes for start-up processes and short or long runs. J Qual Tech 23:213–224
van Dobben de Bruyn CS (1968) Cumulative Sum Tests: Theory and Practice. Griffin, London
Acknowledgements
This work was supported by grant 1450/13 from the Israel Science Foundation and by the Marcy Bogen Chair of Statistics at the Department of Statistics, The Hebrew University of Jerusalem. The authors would like to thank the referees for their salient comments, which greatly improved the paper.
Author information
Authors and Affiliations
Corresponding author
Appendices
Appendix A: Proof of Theorem 1
The case a = 0 is trivial. In the following we assume a≠ 0. Recall that
Hence by Stirling’s approximation
and so
We break the sum in Eq. 1 into pieces and analyze them separately.
Let γ > e and let \(M=a\sqrt {m}\gamma \). We first show that the sum from M to \(\infty \) in Eq. 1 is negligible. Again, by Stirling’s approximation
and so
Hence
For large enough m and \(j=M=|a|\sqrt {m}\gamma \), this expression equals
Let 0 < ε. For j ≥ M the derivative with respect to j of
equals
It follows that as \(m\rightarrow {\infty }\)
For j ≤ M
It follows that
Note that if X ∼ Poisson(λ) then \(P(X>y)\leq \frac {Ee^{tX}}{e^{ty}}=\frac {e^{\lambda (e^{t}-1)}}{e^{ty}}\) for all t > 0, which is minimal at \(t=\log (\frac {y}{\lambda })\); hence \(\log (P(X>y))<{e^{y-\lambda }}/{(\frac {y}{\lambda })^{y}}\). Recall that \(\frac {1}{2}<\eta <1\) and denote \(\lambda =a\sqrt {m}+\frac {1}{4}\gamma a^{2}\) , \(y=a\sqrt {m}+(a\sqrt {m})^{\eta }\). Thus
If X ∼ Poisson(λ) then \(P(X<y)\leq \frac {Ee^{-tX}}{e^{-ty}}=\frac {e^{\lambda (e^{-t}-1)}}{e^{-ty}}\) for all t > 0, which is minimal at \(t=-\log (\frac {y}{\lambda })\); hence \(\log (P(X<y))<{e^{y-\lambda }}/{(\frac {y}{\lambda })^{y}}\). Hence for \(j<a\sqrt {m}-(a\sqrt {m})^{\eta }\)
Finally, for \( a\sqrt {m}-(a\sqrt {m})^{\eta }\leq j\leq a\sqrt {m}+(a\sqrt {m})^{\eta }\)
Hence \(\frac {j^{2}}{m}=a^{2}[1+O((a\sqrt {m})^{\eta -1})]\) and
Combining Equations (1)–(6) accounts for Theorem 1.
Appendix B: Computing
This is a MATLAB (2014) program for calculating
Input: m > 2,a,c. Output: psi =\( \psi _{a,c}=\frac {{\int }_{c}^{\infty } x^{m} e^{-\frac {1}{2}(x-a)^{2}}dx}{ {\int }_{0}^{\infty } x^{m} e^{-\frac {1}{2}x^{2}}dx}e^{-a \sqrt {m}}\).
Remark 1
The program above is intended to cover all cases of c, includingc = 0.When n becomes large, \(\log ({\Gamma }(n/2))\)is calculated more precisely than Γ(n/2),so that when c≠ 0the last summands in both psi(n)and xi(n)should be completely calculated first on the log scale and only then exponentiated.
Appendix C: A Sequential Changepoint Detection Context
The need for calculating ψ0,a(m) appears in a number of changepoint problems (cf. Pollak et al. 1993 and Krieger et al. 2003). For example, consider independent normally distributed random variables observed sequentially whose mean may (or may not) increase at an unknown time ν and one monitors the sequence to detect such a change. Formally,
where X1,X2,… are independent. Consider the case that μ and σ are unknown and one considers an increase of δ standard deviations to be of import. Define:
The sequence {Zi} is invariant with respect to the unknown parameters μ and σ. Therefore, the likelihood ratio
of the first n − 1 invariant statistics Z2,…,Zn (for a change occurring at the ν = kth observation vs. no change ever occurring) can be calculated. Once done, the Shiryaev–Roberts statistic
can be used to declare that a change is in effect; an alarm would be raised if Rn crosses a pre-specified threshold A.
Since one cannot differentiate between the case that there is no change ever and the case that a change occurred at the very beginning, necessarily \({{\Lambda }_{1}^{n}}= 1\). Clearly,
Letting
a lengthy calculation (similar to that in Pollak et al. 1993) yields for 2 ≤ k ≤ n
(Note that \(\frac {g_{0,a}(m)}{g_{0,0}(m)}=\psi _{0,a}(m)e^{a\sqrt {m}}\). For the sake of precision, it is advisable to first calculate the log of the components of \({{\Lambda }_{k}^{n}}(\delta )\) and then exponentiate their sum.)
As an example where m can be very large, we consider the list of weights at birth of 196710 babies born at a large hospital in Israel between 13/10/2004 and 18/5/2017 whose weight at birth was between 2000 and 5000 grams. Figure 1 is a histogram of the data. A normal distribution seems to fit the data well.
During the last decades, worldwide, obesity and macrosomia have been on the rise. It would have been reasonable to monitor the weight of newborn infants for an increase in mean. Since a rise could be gradual, it would be reasonable to start by trying to detect a small change — for example’s sake, we choose δ = 0.1, an increase of one tenth of a standard deviation (ca. 46 grams). Figure 2 presents the sequence of surveillance statistics Rn for the first 5000 observations and Fig. 3 for the first 65000 observations.
The choice of the threshold A is made in light of the risk one is willing to take regarding a false alarm. Suppose one were willing to tolerate a false alarm on the average once in 20 years. With an average of roughly 15000 babies born each year, this would mean a false alarm on the average once in 300000 observations. Using a renewal-theoretic approximation for the ARL2FA (Pollak 1987), this means that the cutoff level A should be 300000/1.06 ∼ 283000. It is clear from Figs. 2 and 3 that such a change would not been detected within the first 65000 observations.
Figure 4 presents the sequence of surveillance statistics Rn for the entire sequence. Clearly, Rn exceeds A = 283000 a short while after the 70000th observation. In fact,the 70440th is the first to carry Rn over the threshold. This means that it would have been declared after the 70440th newborn that the mean weight has increased. Figure 5 depicts the loglikelihood function \(\log {\Lambda }^{70440}_{k}(\delta = 0.1)\), k = 1,…, 70440. This function attains its maximum at k = 65834, so 65834 can be regarded at the time of stopping (70400) as the maximum likelihood estimate of the changepoint. This could be interpreted as the increase being detected 4606 observations (ca. 4 months) after its occurrence.
In fact, the average weight of the first 65833 newborns is 3286 grams, whereas the average weight of newborns #65834 − #67000 is 3332 grams (an increase of approximately 0.1 standard deviations).
Continuing with the ensuing newborns, Rn drops again, and until the end of the sequence does not cross the A = 283000 level again. In fact, the average weight of the newborns from babies #67001 − 196710 is 3291 grams. So, it seems as if the increase was either temporary, or apparent. Was the alarm false? After observing 67000 newborns, it would have seemed that the increase was real. With hindsight, one may either believe that the increase was real, but a change (a decrease) took place thereafter and the mean weight reverted to its original level, or one may interpret the episode as having been a false alarm.
The calculation of Rn was done in the following way. R1,…,R5000 were calculated using the recursions of Section ??; ensuing Rn′s were calculated by applying Theorem 1. Rather than calculating \(\frac {g_{0,\delta U_{k}(n)}(n-2)}{g_{0,0}(n-2)}\) by the recursions separately for each δUk(n), calculation time drops immensely by creating a fine enough grid of \(\frac {g_{0,x}(n)}{g_{0,0}(n)}\) and interpolating. Creation of such a grid (in our example for n = 1,…, 5000 and approximately 500 values of x, spaced so that between adjacent x’s the function \(\frac {g_{0,x}(n)}{g_{0,0}(n)}\) is almost perfectly linear). This grid is calculated much faster by the recursions of Section ?? than by the method of Section ?? (after all, the recursion that generated \(\frac {g_{0,x}(5000)}{g_{0,0}(5000)}\) created all of \(\frac {g_{0,x}(n)}{g_{0,0}(n)}\), n = 1,…, 5000 along the way, whereas by the method of Section ?? each \(\frac {g_{0,x}(n)}{g_{0,0}(n)}\) has to be calculated separately.)
Appendix D: Remarks
Remark 1
A lower bound can be obtained by Jensen’s inequality:
Remark 2
Cook (2010) presented an upper bound
for c > 0.On the log scale, Cook’s upper bound has an asymptotic(\(m \rightarrow \infty \)) order of magnitudem2, whereas Theorem 1 positsan order of magnitude \(m\log {m}\).
Remark 3
A similar type of analysis can be done for\({\int }_{-\infty }^{\infty }x^{m}e^{-\frac {1}{2}(x-a)^{2}}dx\)(cf. Pollak et al. 1993). Note that
and (unless m = 0)that (depending on a) one of the two integrals is asymptotically negligible with respectto the other. Thus Theorem 1 can be applied to obtain an approximation to themthmoment of a (non-truncated) normal distribution.
Remark 4
A similar type of analysis can be done when the truncation is from both ends. To see this, it sufficesto consider \({{\int }_{0}^{c}} x^{m}e^{-\frac {1}{2}(x-a)^{2}}dx\)for c > 0.Clearly, \(\frac {{{\int }_{0}^{c}} x^{m}e^{-\frac {1}{2}(x-a)^{2}}dx}{{{\int }_{0}^{c}} x^{m}dx}\rightarrow e^{-\frac {1}{2}(c-a)^{2}}\)as\(m \rightarrow \infty \). It followsthat
Without loss of generality,assume that c > |b|. A recursionfor \({{\int }_{b}^{c}} x^{m}e^{-\frac {1}{2}(x-a)^{2}}dx\)that builds on thisis
where\(h_{a,b,c}(m)=\frac {m + 1}{c^{m + 1}}{{\int }_{b}^{c}} x^{m}e^{-\frac {1}{2}(x-a)^{2}}dx\). (Notethat for |b| > c > b,\({{\int }_{b}^{c}} x^{m}e^{-\frac {1}{2}(x-a)^{2}}dx= (-1)^{m} {\int }_{-c}^{-b} x^{m}e^{-\frac {1}{2}(x+a)^{2}}dx\).)
Remark 5
A similar recursion can be constructed for densities proportional to\(e^{{const} \times x^{2}}I(b<x<c)\)whereconst > 0andb,c are finite.
Remark 6
Tables relevant to the truncated normal distribution have been around for over a century(cf. Pearson and Lee 1908and Lee 1914). Even today it is considered as part and parcelof applied distributions (cf. O’Connor 2011) and papers have been devoted to estimation ofits parameters (cf. Barr and Sherrill 1999; Horrace 2015; Liquet and Nazarathy 2015). Formost practical purposes, the first four moments of the distribution (mean, variance, skewnessand kurtosis) have been of applied interest. Higher order moments may appear in quadraturemethods (cf. Burkardt 2014); their order of magnitude would seem to be in the tens at most.
Rights and permissions
About this article
Cite this article
Pollak, M., Shauly-Aharonov, M. A Double Recursion for Calculating Moments of the Truncated Normal Distribution and its Connection to Change Detection. Methodol Comput Appl Probab 21, 889–906 (2019). https://doi.org/10.1007/s11009-018-9622-7
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11009-018-9622-7