1 Introduction

The normal model, although very attractive, is not always appropriate to fit a dataset, especially if the data present extreme or outlying observations. Due to this problem, new regression models which are not so easily affected by extreme or outlying observations have been developed in the statistical literature. The symmetric family of distributions supplies an extension of the normal distribution, including other ones with heavier and lighter tails than the normal, such as Cauchy, Student-t, generalized Student-t, logistic I and II, generalized logistic, power exponential, Kotz distribution and generalized Kotz distribution, among others. This family of distributions provides a vast source of alternative models for analyzing data containing outlying observations. These models have been widely studied in the statistical literature. There are several recent articles considering the symmetric distribution; see Cordeiro et al. (2000), Cordeiro (2004), Cysneiros et al. (2010a), Cysneiros et al. (2010b), Vanegas et al. (2013), Maior and Cysneiros (2016). Further details about the symmetric family of distributions can be seen in Fang et al. (1990) and Fang and Anderson (1990).

When the dispersions are not constant over the observations, the inference strategies for regression parameters are different. Thus, it is extremely important to test whether variable dispersion is present in the data. One way to do this is to model the dispersion parameter as a function which depends on regressors and an unknown parameter vector in such a way that for a specific value of the parameter vector, the function corresponds to varying dispersion. Following this approach, one can formulate a hypothesis test in which the null hypothesis leads to constant dispersion. To that end, the likelihood ratio test is commonly used. Under the null hypothesis, the likelihood ratio statistic (LR) is asymptotically chi-square (\(\chi ^2\)) distributed up to an error of order \(n^{-1},\) where n is the sample size. However, it is well known that for small samples this test can provide very distorted rejection rates. This happens due to the approximation of the null distribution of the LR statistic to the \(\chi ^2\) distribution, which is not accurate. One way to improve the approach of the LR statistic distribution to the \(\chi ^2\) distribution, and consequently reduce its distortion, is to multiply the LR statistic by a Bartlett correction factor (Bartlett 1937). This method was generalized later by Lawley (1956). The resulting statistic has a \(\chi ^2_k\) null distribution up to an error of order \(n^{-2}\), where k is the difference between the dimensions of the parameter space under the two hypothesis being tested.

Another factor that must be considered is the presence of nuisance parameters, which can have a profound impact on inference. Many approaches have been proposed to eliminate or reduce their impact. In the presence of nuisance parameters, generally it is feasible to perform inference based on profile likelihood, which is a function where the nuisance parameters are replaced by consistent estimates in the likelihood function, resulting in a function that depends only on the parameters of interest. The profile likelihood has some properties of the usual likelihood (Pace and Salvan 1997, Chapter 4), but for problems with large numbers of nuisance parameters, this procedure results in inconsistent or inefficient estimates. Another problem caused by the number of nuisance parameters is the poor approximation of the LR statistic distribution to the \(\chi ^2\) distribution. To reduce the effect of the nuisance parameters, Cox and Reid (1987, 1992) proposed a modified profile likelihood. The modified profile likelihood ratio statistic (\(LR_m\)) has asymptotic \( \chi ^ 2_k\) null distribution up to an error of order \(n^{-1}\) and it can also be corrected by Bartlett correction (DiCiccio and Stern 1994), producing a more accurate inference, as can be seen in Ferrari et al. (2004, 2005), Cysneiros and Ferrari (2006) and Melo et al. (2009).

To improve the large-sample \(\chi ^2\) approximation to the null distribution of the LR and \(LR_m\) statistics in many parametric models, the Bartlett correction is widely used. Although the Bartlett correction factors are somewhat complex to obtain, they can be readily implemented in computer programs. Moreover, this is a worthwhile practice, since Bartlett corrections generally provide a considerable improvement. In recent years, interest in Bartlett corrections has resurfaced and some articles have been published considering this issue. See for example Fujita et al. (2010), Lemonte et al. (2012), Bayer and Cribari-Neto (2013), Stein et al. (2014).

The main purpose of this article is to derive Bartlett correction factors to improve the inference of the dispersion parameter based on the likelihood ratio statistic and modified profile likelihood ratio statistic in the class of heteroscedastic symmetric nonlinear models (HSNLM), a class of models proposed by Cysneiros et al. (2010a), when the number of the observations available is small. To that end, we will follow the approach of modeling the dispersion parameter vector as a function of regressors and unknown parameters such that under the null hypothesis the function is constant, that is, the null hypothesis leads to the symmetric nonlinear regression model. Our results extend some of those obtained in Cordeiro (2004), since we consider a regression structure for the dispersion parameter while they assume the dispersion parameter as a scalar in the class of symmetric nonlinear regression models. We also extend the results obtained in Ferrari et al. (2004), who improved likelihood-based tests for heteroscedasticity in linear regression models. In this work, we also consider the bootstrap Bartlett correction introduced by Rocke (1989) as a numerical alternative to analytical Bartlett correction. A Monte Carlo simulation study is performed to evaluate the performance of the corrected tests and their uncorrected versions. We expected the proposed tests to deliver more trustworthy inferences in small samples and our Monte Carlo simulation results show that, in fact, this happens. That is, the corrected likelihood ratio and corrected modified profile likelihood ratio tests proposed here are attractive alternatives to the usual likelihood ratio test in the HSNLM class when the sample size is small. We are unaware of any simulation study in the statistical literature comparing the performance of the proposed tests in HSNLM. Thus, this paper fills this gap.

The article is organized as follows. In Sect. 2, we define the model and present some inferential aspects such as estimation and hypothesis testing of regression parameters. In Sect. 3, we discuss Bartlett corrections to improve the usual likelihood ratio and modified profile likelihood ratio tests in HSNLM. We also present the bootstrap Bartlett correction for likelihood ratio statistic. A Monte Carlo simulation is performed in Sect. 4 to evaluate the performance of the studied tests. An application using real data is considered in Sect. 5. Conclusions about the results obtained are presented in Sect. 6. Finally, appendices with technical details are presented at the end.

2 Heteroscedastic symmetric nonlinear models

Let \(y_1,\ldots ,y_n\) be n independent random variables. Each \(y_\ell ,\,\,\ell =1,\ldots ,n,\) follows a continuous symmetric distribution with location parameter \(\mu _\ell \in {\mathbb {R}}\) and dispersion parameter \(\phi _\ell >0\) if its probability density function is of the form

$$\begin{aligned} \pi (y_\ell ;\mu _\ell ,\phi _\ell )=\frac{1}{\sqrt{\phi _\ell }}g(u_\ell ), \quad y_\ell \in {\mathbb {R}} \end{aligned}$$
(1)

where \(g:{\mathbb {R}} \rightarrow [0, \infty )\) is generally known as the density generator, such that, \(\int _0^{\infty }g(u)du<\infty ,\) with \(u_\ell =(y_\ell -\mu _\ell )^2/\phi _\ell \). In what follows, we will denote \(y_\ell \sim S(\mu _\ell ,\phi _\ell ,g).\) The density generation function \(g(\cdot )\) for some symmetric distributions is given in Table 1.

Table 1 Density generation function \(g(\cdot )\) for some symmetric distributions

The heteroscedastic symmetric nonlinear regression model proposed by Cysneiros et al. (2010a) is defined as:

$$\begin{aligned} y_\ell = \mu _\ell + \sqrt{\phi _\ell }e_\ell , \quad \ \ell =1,\ldots ,n, \end{aligned}$$
(2)

where \(\mu _\ell =f(x_\ell ;{\varvec{\beta }})\) is a continuous and twice differentiable nonlinear regression structure with respect to the components of a vector of unknown regression parameters \({\varvec{\beta }}=(\beta _1,\ldots ,\beta _p)^\top \), \((p<n),\,\,{\varvec{x}}_\ell =(x_{\ell 1},\ldots ,x_{\ell P})^\top \) is a vector of known explanatory variables associated with the \(\ell \)th observation and \(e_\ell \sim S(0,1,g)\). Moreover, we assume that \({\varvec{\beta }}\) is defined in a subset \(\varvec{\varOmega _{\beta }} \in {\mathbb {R}}^p\,\,(p<n)\) such that the \(n \times p\) matrix of derivatives of \({\varvec{\mu }}=(\mu _1, \ldots , \mu _n)^\top \) with respect to \({\varvec{\beta }},\) denoted by \(\tilde{\varvec{X}}=\partial {\varvec{\mu }}/\partial {\varvec{\beta }},\) has rank p for all \({\varvec{\beta }}.\) In addition we consider \( \phi _\ell =\sigma ^2 m(\varvec{\omega }_\ell ^\top {\varvec{\delta }})\), where \(m(\cdot )>0\) is any known one-to-one continuously differentiable function, where \(\varvec{\omega }_\ell =(\omega _{\ell 1},\ldots ,\omega _{\ell k})^\top \) is a vector of explanatory variables that may have components in common with \({\varvec{x}}_\ell \), \({\varvec{\delta }}=(\delta _1,\ldots ,\delta _k)^\top \) is a vector of unknown parameters to be estimated, and \(\sigma ^2 \in (0 , +\infty )\) is an unknown constant. We also assume that a unique value \(\varvec{\delta _0}\) of \({\varvec{\delta }}\) exists such that \(m(\varvec{\omega _\ell }^\top \varvec{\delta _0})=1\) for all \(\ell .\) Consequently, \(\phi _\ell =\sigma ^2,\) if \({\varvec{\delta }}=\varvec{\delta _0},\) which implies that the \(y_\ell '\)s have constant dispersion. Note that the concept of heteroscedasticity in this context refers to varying dispersion, i.e., we say that the model is homoscedastic when all dispersion parameters \(\phi _1,\ldots , \phi _n\) are equal; otherwise we say that the model is heteroscedastic. To perform the procedure, an explicit form for m must be chosen. A possible and common choice is to consider \(m(\varvec{\omega _\ell }^\top {\varvec{\delta }})=\exp (\varvec{\omega _\ell }^\top {\varvec{\delta }}),\) since this functional form for m does not impose any restriction on the components of \(\varvec{\omega _\ell }\) (Cook and Weisberg 1983; Lin et al. 2009).

In this way, we are interested in assessing the constancy of the dispersion parameter in model (2) by testing the null hypothesis \(H_0:{\varvec{\delta }}=\varvec{\delta _0}\) against the alternative hypothesis \(H_1:{\varvec{\delta }}\ne \varvec{\delta _0}\), where \(\varvec{\delta _0}\) is a \(k \times 1\) vector of specified constant such that \(m_\ell =m(\varvec{\omega }_\ell ^\top \varvec{\delta _0})=1\) for \(\ell =1,\ldots ,n\). In other words, we are performing a test for heteroscedasticity in symmetric nonlinear regression models, since under the null hypothesis, model (2) reduces to the aforementioned class of models. The number of parameters of interest is k and the number of nuisance parameters is \(p+1\). The total log-likelihood function for the parameter vector \({\varvec{\theta }}=({\varvec{\beta }}^\top , {\varvec{\delta }}^\top , \sigma ^2)^\top \) given \(y_1,\ldots , y_n\) in model (2) is expressed by:

$$\begin{aligned} l(\varvec{y};{\varvec{\theta }})=-\frac{n}{2}\log \sigma ^2 -\frac{1}{2}\sum _{\ell =1}^n \log (m_\ell ) + \sum _{\ell =1}^n t(z_\ell ), \end{aligned}$$

where \(t(z_\ell )=\log g(z^2_\ell )\) with \(z_\ell =\sqrt{u_\ell }=\frac{(y_\ell - \mu _\ell )}{\sqrt{\phi _\ell }}\).

Note that to obtain the maximum likelihood estimator (MLE) of \({\varvec{\delta }}\) we maximize the profile log-likelihood function

$$\begin{aligned} l_p({\varvec{\delta }})=l\Big (\varvec{y};{\varvec{\delta }},\varvec{\hat{\beta }_\delta }, {\hat{\sigma }}^2_{{\varvec{\delta }}}\Big ), \end{aligned}$$

where \(\varvec{\hat{\beta }_\delta }\) and \({\hat{\sigma }}^2_{{\varvec{\delta }}}\) are MLEs of \({\varvec{\beta }}\) and \(\sigma ^2\) given \({\varvec{\delta }}\), respectively. Under usual regularity conditions, \(\varvec{\hat{\beta }}_{{\varvec{\delta }}}\) and \({\hat{\sigma }}^2_{{\varvec{\delta }}}\) are solutions of the equations \(\varvec{U_\beta } = \varvec{0}\) and \(U_{\sigma ^2} = \varvec{0}\), respectively, which cannot be obtained in closed form. Thus, \(\varvec{\hat{\beta }}_{{\varvec{\delta }}}\) and \({\hat{\sigma }}^2_{{\varvec{\delta }}}\) are derived from computationally iterative restricted maximization techniques. Further details of these techniques can be obtained from Nocedal and Wright (1999).

The likelihood ratio statistic (LR) for test \(H_0\) can be written as

$$\begin{aligned} LR=2\{l_p(\varvec{\hat{\delta }})-l_p(\varvec{\delta _0}) \}, \end{aligned}$$

where \(\varvec{\hat{\delta }}\) is the MLE of \({\varvec{\delta }}\). Asymptotically and under the null hypothesis, LR has \(\chi ^2_k\) distribution.

When replacing the nuisance parameters by their maximum likelihood estimates, we are in a way treating them as known, and as a consequence, the profile log-likelihood function may present biases in the score and information function (Ferrari et al. 2005). This procedure is also known to provide inconsistent or inefficient estimates for problems with large numbers of nuisance parameters. Cox and Reid (1987) proposed a modified version of the profile likelihood function in order to attenuate the impact of the number of nuisance parameters on the resulting inference. However, that version requires orthogonality between parameters of interest and nuisance ones. Therefore, in our case \({\varvec{\delta }}\) should be orthogonal to the remaining parameters. For this, a transformation \(({\varvec{\delta }}^\top , {\varvec{\beta }}^\top , \sigma ^2)^\top \rightarrow ({\varvec{\delta }}^\top , {\varvec{\beta }}^\top , \gamma )^\top \) is necessary and sufficient, where \(E[-\partial l/\partial \delta _a \partial \gamma ]=0, \ a=1,\ldots ,k.\) Following Cox and Reid (1987, Eq. 4), the desired transformation is obtained by solving

$$\begin{aligned} \frac{n}{2 \sigma ^4} \frac{\partial \sigma ^2}{\partial \delta _a} = -\frac{1}{2 \sigma ^2} \sum _{\ell =1}^n \frac{\partial m_\ell }{\partial \delta _a} \frac{1}{m_\ell }, \end{aligned}$$

which has solution (Simonoff and Tsai 1994)

$$\begin{aligned} \sigma ^2 = \frac{\gamma }{(\prod _{\ell =1}^n m_\ell )^{1/n}}. \end{aligned}$$

Considering the reparameterized model, the modified profile log-likelihood function for \({\varvec{\delta }}\) (Cox and Reid 1987) is given by

$$\begin{aligned} l_{CR}^{*}({\varvec{\delta }}) = l_p^{*}({\varvec{\delta }}) - \frac{1}{2} \log \{ \det [j^{*}({\varvec{\delta }}; \varvec{\hat{\beta }_{\delta }}, \hat{\gamma }_{{\varvec{\delta }}})] \}, \end{aligned}$$
(3)

where \(l^{*}_p({\varvec{\delta }})=l^*(\varvec{y};{\varvec{\delta }},\varvec{\hat{\beta }_{\delta }}, \hat{\gamma }_{{\varvec{\delta }}})= -\frac{n}{2} \log \gamma + \sum _{\ell =1}^n t(z_\ell )\) correspond to the profile log-likelihood function for \({\varvec{\delta }},\,\,j^{*}({\varvec{\delta }};\varvec{\hat{\beta }_{\delta }}, \hat{\gamma }_{{\varvec{\delta }}})\) denotes the block of the observed information matrix for the nuisance parameters \(({\varvec{\beta }}^\top , \gamma )^\top \)evaluated at \(({\varvec{\delta }},\varvec{\hat{\beta }_{\delta }}, \hat{\gamma }_{{\varvec{\delta }}}),\) and \(\hat{\gamma }_{{\varvec{\delta }}}={\hat{\sigma }}^2_{{\varvec{\delta }}}\left( \prod _{\ell =1}^n m_\ell \right) ^{1/n}.\) The matrix \(j^{*}({\varvec{\delta }};\varvec{\hat{\beta }_{\delta }}, \hat{\gamma }_{{\varvec{\delta }}})\) is shown in Appendix A.

The modified profile likelihood ratio statistic (\(LR_m\)) for testing \(H_0\) against \(H_1\) is given by

$$\begin{aligned} LR_m = 2 \{ l_{CR}(\varvec{\hat{\delta }}) - l_{CR}(\varvec{\delta _0}) \}, \end{aligned}$$

where \(\varvec{\hat{\delta }}\) is the MLE of \({\varvec{\delta }}\). Under the null hypothesis, the \(LR_m\) statistic has asymptotic \(\chi ^2_k\) distribution.

3 Bartlett corrections

For large samples, the null distributions of LR and \(LR_m\) statistics are approximated by the \(\chi ^2\) distribution. On the other hand, if the sample size is not large enough, it is well known that these approximations cannot be satisfactory, leading to size-distorted tests. In order to improve such approximations, some correction factors for incorporation in LR and \(LR_m\) statistics have been proposed in the literature, yielding corrected test statistics whose null distributions are better approximated by the reference \(\chi ^2\) distribution, that is, the approximation error from the corrected test statistic distribution to the \(\chi ^2\) one reduces from order \(n^{-1}\) to \(n^{-2}.\) The ideas of transforming the LR and \(LR_m\) statistics to make their distributions better approximated by the chi-squared distribution are due to Bartlett (1937) and DiCiccio and Stern (1994), respectively. These correction factors do not depend on a particular parametric model, i.e., they are very general and need to be obtained for each problem of interest.

3.1 Bartlett correction for the LR statistic

It is known that for large samples and under the null hypothesis, the LR statistic has chi-square distribution up to error of order \(n^{-1}\). Bartlett (1937) proposed to multiply the LR statistic by a correction factor, denoted by \((1+c/k)^{-1}\), resulting in a corrected statistic \(LR^*\) given by

$$\begin{aligned} LR^*=\frac{LR}{1+c/k}, \end{aligned}$$

where c is a constant of order \(n^{-1}\) that can be estimated under \(H_0\). Moreover, c can be written in terms of moments of likelihood derivatives up to the fourth order, see Lawley (1956). Particularly, \(P(LR^*\le x)=P(\chi ^2_k\le x)+O(n^{-2})\)Footnote 1 under the null hypothesis. Further details on Bartlett corrections can be seen in Cordeiro and Cribari-Neto (2014).

In what follows, we shall consider the case of heteroscedasticity with multiplicative effects, that is, the special case where \(m_\ell =\exp (\varvec{\omega _\ell }^\top {\varvec{\delta }})\). Thus, to test \(H_0: {\varvec{\delta }}=\varvec{\delta _0}\) against \(H_1: {\varvec{\delta }}\ne \varvec{\delta _0}\) in the class of HSNLM, the constant c from the Bartlett correction factor for the LR statistic can be expressed as

$$\begin{aligned} c = \epsilon _k({\varvec{\delta }})+\epsilon _{p,k}({\varvec{\beta }},{\varvec{\delta }})+\epsilon _{p,k}({\varvec{\delta }},\gamma )+\epsilon _{p,k}({\varvec{\beta }},{\varvec{\delta }},\gamma ), \end{aligned}$$
(4)

where

$$\begin{aligned} \epsilon _k({\varvec{\delta }})= & {} \frac{\varDelta _2}{4}\text{ tr }(\varvec{H_d^2}) + \frac{\varDelta _1^2}{6}\varvec{\iota }^\top \varvec{H^{(3)}}\varvec{\iota } + \frac{\varDelta _1^2}{4}\varvec{\iota }^\top \varvec{H}\varvec{H_d}\varvec{H}\varvec{\iota },\\ \epsilon _{p,k}({\varvec{\beta }},{\varvec{\delta }})= & {} - \frac{\varDelta _6}{4\delta _{(0,1,0,0,0)}}\varvec{\iota }^\top \varvec{Q}\varvec{H_d}\varvec{Z_{\beta d}}\varvec{\iota } - \frac{\varDelta _7}{4\delta _{(0,1,0,0,0)}}\varvec{\iota }\varvec{H_d}\varvec{Z_{\beta d}}\varvec{\iota }\\&+ \frac{\delta _{(0,0,1,0,1)}}{2\delta _{(0,1,0,0,0)}}\varvec{\iota }\varvec{Q}\varvec{H_d}\varvec{Z_{\beta d}}\varvec{\iota } + \varvec{\iota }^\top \varvec{Q}\varvec{H_d}\varvec{Z_{\beta d}}\varvec{\iota }\\&- \frac{\varDelta _6}{4\delta _{(0,1,0,0,0)}}\varvec{\iota }^\top \varvec{Q}\varvec{H_d}\varvec{Z_{\beta d}}\varvec{\iota } - \frac{\varDelta _7}{4\delta _{(0,1,0,0,0)}}\varvec{\iota }\varvec{H_d}\varvec{Z_{\beta d}}\varvec{\iota }\\&+ \left( \frac{\varDelta _7^2}{2(\delta _{(0,1,0,0,0)})^2} - \frac{\varDelta _7}{\delta _{(0,1,0,0,0)}} \right) \varvec{\iota }^\top \varvec{Q}\varvec{Z_{\beta }}\odot \varvec{H}\odot \varvec{Z_{\beta }}\varvec{Q}\varvec{\iota }\\&+ \frac{\varDelta _7^2}{4(\delta _{(0,1,0,0,0)})^2}\varvec{\iota }^\top \varvec{Q}\varvec{Z_{\beta d}}\varvec{H}\varvec{Z_{\beta d}}\varvec{Q}\varvec{\iota }\\&+\frac{\varDelta _1 \varDelta _7}{2\delta _{(0,1,0,0,0)}}\varvec{\iota }^\top \varvec{Q}\varvec{Z_{\beta d}}\varvec{H}\varvec{H_d}\varvec{\iota },\\ \epsilon _{p,k}({\varvec{\delta }}, \gamma )= & {} -\frac{\varDelta _4\varDelta _8}{2} \text{ tr }(\varvec{H_d}) - \varDelta _1 \varDelta _4 \text{ tr }(\varvec{H_d})- \frac{\varDelta _1^2 \varDelta _4}{2}\varvec{\iota }^\top \varvec{H^{(2)}}\varvec{\iota }\\&- \frac{\varDelta _1^2 \varDelta _4}{4} (\text{ tr }(\varvec{H_d}))^2 + \left( \frac{\varDelta _1 \varDelta _5}{4} + \frac{\varDelta _1(\varDelta _5 - 4\varDelta _3)}{4} \right) \varDelta _4^2\text{ tr }(\varvec{H}_{\varvec{d}}) \\ \text{ and }\\ \epsilon _{p,k}({\varvec{\beta }},{\varvec{\delta }},\gamma )= & {} - \frac{\varDelta _1 \varDelta _4 \varDelta _7}{4\delta _{(0,1,0,0,0)}} \left[ (\varvec{\iota }^\top \varvec{W}\varvec{Z_{\beta d}} \varvec{\iota })\odot (\varvec{\iota }^\top \varvec{H_d}\varvec{\iota }) + \varvec{\iota }^\top \varvec{Q}\varvec{H_d}\varvec{Z_{\beta d}}\varvec{\iota } \right] , \end{aligned}$$

where \(\varDelta _\ell ,\,\,\ell =1,\ldots ,8\), are scalars expressed as

$$\begin{aligned} \varDelta _1= & {} -\frac{1}{8} \{\delta _{(0,0,1,0,3)} + 3\delta _{(0,1,0,0,2)} + \delta _{(1,0,0,0,1)}\},\\ \varDelta _2= & {} \frac{1}{16} \{ \delta _{(0,0,0,1,4)} + 6\delta _{(0,0,1,0,3)} + 7\delta _{(0,1,0,0,2)} + \delta _{(1,0,0,0,1)}\},\\ \varDelta _3= & {} - \frac{n}{2} \{2 + 3\delta _{(1,0,0,0,1)} + \delta _{(0,1,0,0,2)}\},\\ \varDelta _4= & {} \frac{4}{n \{2+\delta _{(0,1,0,0,2)} + 3\delta _{(1,0,0,0,1)} \}},\\ \varDelta _5= & {} -\frac{n}{8} \{8 + 15\delta _{(1,0,0,0,1)} + 9\delta _{(0,1,0,0,2)} + \delta _{(0,0,1,0,3)} \},\\ \varDelta _6= & {} \frac{1}{4} \{\delta _{(0,0,0,1,2)} + 3\delta _{(0,0,1,0,1)} \}, \\ \varDelta _7= & {} \frac{1}{2} \{\delta _{(0,0,1,0,1)} + 2\delta _{(0,1,0,0,0)} \} \ \text{ and } \\ \varDelta _8= & {} \frac{1}{16} \{ \delta _{(0,0,0,1,4)} +8\delta _{(0,0,1,0,3)} + 13\delta _{(0,1,0,0,2)} + 3\delta _{(1,0,0,0,1)} \}. \\ \end{aligned}$$

The \(\delta 's\) correspond to \(\delta _{(a,b,c,d,e)}=E\{t(z_\ell )^{(1)a}t(z_\ell )^{(2)b}t(z_\ell )^{(3)c}t(z_\ell )^{(4)d} z_\ell ^e \}\) for \(a,\,b,\,c,\,d,\,e \in \{1,2,3,4\}\) and \(t(z_\ell )^{(k)}=\frac{\partial ^k t(z_\ell )}{\partial z_\ell ^k}\) for \(k=1,2,3,4\). Some \(\delta 's\) values for symmetric distributions studied in the literature can be found in Uribe-Opazo et al. (2008). In addition, \(\varvec{H}=\{ h_{\ell s} \} = - (\varvec{W} - {\bar{\varvec{W}}})[(\varvec{W} - {\bar{\varvec{W}}})^\top \varvec{V} (\varvec{W} - {\bar{\varvec{W}}})]^{-1}(\varvec{W} - {\bar{\varvec{W}}})^\top \), with \((\varvec{W} - {\bar{\varvec{W}}})=(\varvec{w}_1 - \bar{\varvec{w}},\ldots ,\varvec{w}_n - \bar{\varvec{w}})^\top ,\,\,\varvec{V}\) representing the diagonal matrix of order n with \(v_\ell =(1-\delta _{(0,1,0,0,2)})/4\) and \(\ell , s = 1,\ldots ,n.\) We also have \(\varvec{H^{(2)}}=(h_{\ell s}^2)\), \(\varvec{H^{(3)}}=(h_{\ell s}^3)\), \(\varvec{Q}=\text{ diag }(q_1,\ldots ,q_n)\), with \(q_\ell =\exp \{-(\varvec{\omega }_\ell - \bar{\varvec{\omega }})^\top {\varvec{\delta }} \}\) and \(\bar{\varvec{\omega }}=(\bar{\omega }_1,\ldots ,\bar{\omega }_k)^\top \), \(\varvec{Z_\beta }=\tilde{\varvec{X}}(\tilde{\varvec{X}}^\top \varvec{Q} \tilde{\varvec{X}})^{-1}\tilde{\varvec{X}}^\top ,\) an \(n \times p\) matrix \(\varvec{W}\) with the ijth element given by \(\omega _{ij}\) and an \(n \times 1\) vector of ones represented by \(\varvec{\iota }.\) The subscripts d in some matrices indicate that only diagonal elements of these matrices were considered. Finally, the symbol \(\odot \) denotes the Hadamard (elementwise) product of matrices.

Therefore, we can observe that the constant c involves only simple matrix operations and can be easily implemented in symbolic computation packages and programming languages that allow performing basic linear algebra operations, such as Ox and R. Details about the derivation of the expression c are presented in Appendices B and C.

3.2 Bartlett correction for the \(LR_m\) statistic

The modified profile likelihood ratio statistic, \(LR_m\), as well as the usual likelihood ratio statistic, LR, has null asymptotic \(\chi ^2_k\) distribution up to an error of order \(n^{-1}\). DiCiccio and Stern (1994) proposed a Bartlett correction for \(LR_m\), reducing the approximation error from its distribution to the reference \(\chi ^2\) distribution to order \(n^{-2}\). The corrected statistic is defined by

$$\begin{aligned} LR_m^*=\frac{LR_m}{1+c_m/k}, \end{aligned}$$

where \(c_m\) is a constant of order \(n^{-1}\) such that under the null hypothesis the expected value of the corrected modified test statistic is written as \(E(LR^*_m)=k + O(n^{-3/2})\). The general expression of \(c_m\) was defined in DiCiccio and Stern (1994, Eq. 25). In this context, Ferrari et al. (2004, Eq. 5) obtained an equation to calculate \(c_m\) from normal linear regression models which can be used in any class of models that uses the partition of the parameter vector we are using and where orthogonality holds. To test \(H_0\) in the HSNLM class considering heteroscedasticity with multiplicative effect, the constant \(c_m\) for the correction factor for the \(LR_m\) statistic is used, which is presented in details in Appendix D. It is written in matrix notation as

$$\begin{aligned} c_m= & {} \frac{1}{4} \varDelta _2 \text{ tr } (\varvec{H_d^2}) - \frac{1}{4} \varDelta _1^2 \varDelta _4 [\text{ tr }(\varvec{H_d})]^2 + \frac{1}{4} \varDelta ^2_1 \varvec{\iota }^\top \varvec{H_d H H_d} \varvec{\iota } + \frac{1}{6} \varDelta ^2_1 \varvec{\iota }^\top \varvec{H^{(3)}}\varvec{\iota } \nonumber \\&- \varDelta _1 \varDelta _4 \text{ tr }(\varvec{H_d}) - \varDelta _1 \varDelta _3 \varDelta _4^2 \text{ tr }(\varvec{H_d}) - \frac{1}{2} \varDelta ^2_1 \varDelta _4 \varvec{\iota }^\top \varvec{H^{(2)}}\varvec{\iota }. \end{aligned}$$
(5)

We can see that the constant \(c_m\) involves only simple matrix operations, such as the constant c in (4) for the correction factor of the LR statistic. We also observe that \(c_m\) in (5) involves only matrix \(\varvec{W}\) of covariates (defined in 4), the number of unknown parameters in \(\phi _\ell \) and the number of observations, not depending on unknown parameters or the number of nuisance parameters. Both c and \(c_m\) depend on the symmetric distribution considered, since the \(\delta '\)s values change from one distribution to another.

3.3 Bootstrap Bartlett correction

As an alternative to the asymptotic likelihood ratio and modified profile likelihood ratio tests, we can carry out the inference based on a test with critical values (\(p-\)values) obtained from the bootstrap technique introduced by Efron (1979). The bootstrap likelihood ratio test (\(LR_{boot}\)) offers a reliable inference and does not involve complex calculations. However, it requires very intensive computing. Considering the bootstrap a numerical alternative to the Bartlett correction factor for the LR statistic, deriving the bootstrap Bartlett technique, Rocke (1989) proposed corrected likelihood ratio statistic (\(LR^*_{boot}\)), which is obtained as follows. Initially, we generate B bootstrap resamples \((y_1^*,\ldots ,y_B^*)\) from the assumed model under \(H_0,\) replacing the unknown parameter vector with its respective restricted estimates (i.e., the estimates obtained under the null hypothesis), calculated using the original sample \((y_1,\ldots ,y_n)\). After that, we calculate the LR statistic for each pseudo sample \(y_1^*,\ldots ,y^*_B\), denoted by \(LR^b_{boot}, \ b=1,\ldots ,B\). The bootstrap Bartlett corrected likelihood ratio statistic is obtained by

$$\begin{aligned} LR^*_{boot}=\frac{LR}{\overline{LR}^*_{boot}}k, \end{aligned}$$

where \(\overline{LR}^*_{boot}=\frac{1}{B}\sum ^B_{b=1}LR^b_{boot}\) and k is the number of restrictions imposed by \(H_0\). Under the null hypothesis, \(LR^*_{boot}\) has asymptotic \(\chi ^2_k\) distribution (Rocke 1989).

The \(LR_{boot}\) statistic does not follow the \(\chi ^2\) distribution. Instead it, is based on this statistic, performed as described below: for a fixed nominal level \(\alpha ,\) we calculate the \(1-\alpha \) percentile of \(LR_{boot},\) which is estimated by \(\hat{q}_{(1-\alpha )}\) such that \(\#\{LR_{boot} \le \hat{q}_{(1-\alpha )}\}/B= 1-\alpha ,\) with \(\#\) denoting the cardinality set. Then, we reject the null hypothesis if \(LR > \hat{q}_{(1-\alpha )}.\) Alternatively, the decision rule can be written based on the bootstrap p-value given by \(p^*= \#\{LR^b_{boot} \ge LR \}/B.\)

Recent works have developed inference based on these tests, see for example Bayer and Cribari-Neto (2013), Cribari-Neto and Queiroz (2014), and Loose et al. (2016). An advantage of using the bootstrap Bartlett correction instead of the usual bootstrap technique is its computational efficiency. To obtain a critical value using the bootstrap Bartlett correction requires a resample with smaller size than the one needed when using the bootstrap technique, which implies that the bootstrap Bartlett correction is computationally more efficient than the usual bootstrap technique, see (Rocke 1989).

Table 2 Null rejection rates: \(t_5\) model with \(p=3,5\), \(k=3\) and several values for n

4 Simulation results

In this section, we present Monte Carlo simulation results to compare the performance of seven tests in HSNLM: the likelihood ratio test (LR); the modified profile likelihood ratio test (\(LR_m\)); the bootstrap likelihood ratio test (\(LR_{boot}\)); their corrected versions, denoted by \(LR^*\), \(LR^*_m\) and \(LR^*_{boot}\), respectively; and the score test (\(S_r\)) in small and moderate-sized samples. As known, an advantage of the \(S_r\) test in relation to the others is its easy implementation, since it involves estimations under the null hypothesis only. The number of Monte Carlo replications was 10, 000 and for each Monte Carlo replication we performed 1000 bootstrap replications. All simulations were performed using the programming language Ox (Doornik 2006). We considered the heteroscedastic symmetric nonlinear regression model given by

$$\begin{aligned} y_\ell = \beta _0 + \exp \{\beta _1 x_{\ell 1} \} + \sum _{s=2}^{p-1} \beta _s x_{\ell s} + \epsilon _\ell , \quad \ \ell =1,\ldots ,n, \end{aligned}$$

where \(\epsilon _\ell \sim S(0, \sigma ^2\exp \{\varvec{\omega _\ell }^\top {\varvec{\delta }}\}, g)\). The covariates \(x_1, \ldots , x_{p-1}\) and \(\omega _1,\ldots ,\omega _q\) were obtained as random draws from the uniform U(0, 1) and their values were kept fixed during the simulations. We considered two symmetric distributions for the errors, namely Student-t with 5 degrees of freedom (\(\nu \)) and power exponential with shape parameter \(\kappa =0.3\). The test to be considered is \(H_0:\delta _1=\cdots =\delta _q=0\) against \(H_1:\delta _i \ne 0\)for at least one \( i, i=1,\ldots ,q\). The true values of the parameters for the simulations were taken as \(\beta _0=\cdots =\beta _{p-1}=1, \ \sigma ^2=1, \ \delta _1=0.1, \delta _2=0.3, \delta _3=0.5\) and \(\delta _4=\delta _5=1.0\). The null hypothesis was tested for sample sizes 30, 35, 40, 50 and 100 considering the three nominal levels \(\alpha =10, \ 5 \ \text{ and } \ 1\%\).

Table 3 Null rejection rates: power exponential model with \(\kappa =0.3\), \(p=3,5\), \(k=3\) and several values for n.

The null rejection rates of the seven tests for different sample sizes are presented in Tables 2 and 3. In these tables, it can be noted that the likelihood ratio test is substantially oversized, for example, in Table 2 when \(p=5, \alpha =5\%\) and considering all sample sizes (\(n=30,40,50\) and 100), the null rejection rates of the LR test are \(22.6, 16.8, 12.3 \ \text{ and } \ 7.7\%,\) respectively. The corrected version of the likelihood ratio test tends to attenuate the oversized behavior of the usual likelihood ratio test, but still presents higher rejection rates. For example, considering again \(p=5, \alpha =5\%\) and all sample sizes, the \(LR^*\) test presents the following null rejection rates: \(12.9, 8.8, 7.4 \ \text{ and } \ 5.8\%\), respectively. In general, for both tests, as the sample size increases, the distortion of the tests decreases.

The simulation results for the \(t_5\) model shown in Table 2 indicate that the \(S_r\) test and the corrected tests \(LR^*_m\) and \(LR^*_{boot}\) present better results than the other ones. Furthermore, for \(p=5 \ \text{ and } \ \alpha =5\%\), the rejection rates for the \(S_r\) test considering all four sample sizes are, respectively, 5.5, 5.5, 5.4 and \(5.1\%\) and the corresponding rates for the \(LR^*_m\) test are 4.5, 5.2, 4.6 and \(5.2\%\) and for the \(LR^*_{boot}\) test are 5.8, 4.8, 5.2 and \(4.7\%\). Table 3 presents the results for the power exponential model. We can observe that, in general, the tests \(LR_{boot}\) and \(LR^*_{boot}\) perform better than the others, followed by the \(S_r\) test. For example, when \(p=5 \ \text{ and } \ \alpha =10\%\), the rejection rates for the \(LR_{boot}\) test considering all four sample sizes are 9.9, 10.6, 9.3 and \(9.6\%\), while for the \(LR^*_{boot}\) test they are 9.8, 10.7, 9.3 and \(9.7\%\), and for the \(S_r\) test they are 11.1, 10.4, 10.8 and \(10.3\%.\)

Table 4 Null rejection rates: the \(t_5\) model and power exponential \(\kappa =0.3\) with \(n=35\), \(k=3\) and multiple values for p

In Table 4 we present the null rejection rates of the tests and evaluate the effect of the number of nuisance parameters on the performance of the tests, fixing the sample size (\(n=35\)), the number of parameters of interest (\(k=3\)) and varying the number of nuisance parameters (\(p= 2, 3, 4 \) and 5). For both considered models, the usual LR test and its corrected version are quite distorted, and when the number of nuisance parameters increases, the distortion of the \(LR \ \text{ and } \ LR^*\) tests also increases. In contrast, the number of nuisance parameters is indifferent to the other tests, and the bootstrap tests \(LR_{boot}\) and \(LR^*_{boot}\) performed best. For example, for model \(t_5\) when \(p=4 \ \text{ and } \ \alpha =5\%\), the null rejection rates of the tests are \(15.8\% \ (LR)\), \(8.8\% \ (LR^*)\), \(4.5\% \ (LR_m)\), \(4.7\% \ (LR^*_m)\), \(5.0\% \ (LR_{boot})\), \(4.8\% \ (LR^*_{boot})\) and \(4.3\% \ (S_r)\). Assuming the same scenario, we have for the power exponential model the following null rejection rates: \(14.7\% \ (LR)\), \(8.8\% \ (LR^*)\), \(3.4\% \ (LR_m)\), \(4.1\% \ (LR^*_m)\), \(5.2\% \ (LR_{boot})\), \(5.2\% \ (LR^*_{boot})\) and \(4.9\% \ (S_r)\).

Table 5 Null rejection rates: \(t_5\) model and power exponential \(\kappa =0.3\) with \(n=35\), \(p=3\) and different values of k

Table 5 reports results for the situation where \(n=35\), \(p=3\) and \(k=2,3,4 \ \text{ and } \ 5.\) The tests’ performances are similar to that shown in Table 4, with the \(LR^*_{boot}\) test outperforming the other ones, especially in the power exponential model. For example, considering the scenario where \(k=3\) and \(\alpha =1\%\), the null rejection rates for the power exponential model are \(3.9\% \ (LR)\), \(1.1\% \ (LR^*)\), \(0.7\% \ (LR_m)\), \(0.9\% \ (LR^*_m)\), \(1.1\% \ (LR_{boot})\), \(1.2\% \ (LR^*_{boot})\) and \(1.1\% \ (S_r)\). Considering the same scenario for the \(t_5\) model, we have \(4\% \ (LR)\), \(1.8\% \ (LR^*)\), \(0.8\% \ (LR_m)\), \(0.8\% \ (LR^*_m)\), \(1.1\% \ (LR_{boot})\), \(0.9\% \ (LR^*_{boot})\) and \(0.9\% \ (S_r)\).

The numerical results presented in Tables 2, 3, 4 and 5 show that the corrected tests outperform the uncorrected tests in small and moderate sample sizes, except for the \(S_r\) test, which performs as good as the corrected tests in the indicated scenarios. Moreover, for some cases the \(S_r\) test also outperforms the \(LR_m^*\) test, one of the best performing tests. The simulation results showed that the \(LR_{boot}^*\) and \(LR^*_m\) tests are the best performing corrected tests, followed by the \(S_r\) test. The \(LR^*\) test attenuates the oversized behavior of the LR test, but still presents distorted rejection rates, especially when the number of parameters in the models increases.

Table 6 Non-null rejection rates: \(t_5\) model and power exponential \(\kappa =0.3\) with \(n=35\), \(\alpha =10\%\), \(p=3\) and \(k=3\)

Table 6 presents the test non-null rejection rates, i.e., their power. The data generation was carried out using different values of \(\delta .\) We only considered the tests \(LR_m\), \(LR_m^*\), \(LR_{boot}\), \(LR^*_{boot}\) and \(S_r.\) The usual and Bartlett-corrected likelihood ratio test are not included in the power comparison since they are considerably size distorted. The results in Table 6 show what is expected, that is, the tests are more powerful as \(\delta \) moves away from zero. The difference in powers is very small. For the \(t_5\) model, we have that the \(LR_{boot}\), \(LR^*_{boot}\) and \(S_r\) tests are slightly more powerful than the others. For the power exponential model, we have that the \(LR_{boot}\) and \(LR^*_{boot}\) tests are slightly more powerful than the others.

Fig. 1
figure 1

Quantile relative discrepancies for \(t_5\) and power exponential model with \(p=3, \ k=3,\) (a) and (c), and \(p=5, \ k=3,\) (b) and (d)

Figures 1a–b show the relative quantile discrepancies of the test statistics against the corresponding asymptotic quantiles (\(\chi ^2\)) for the \(t_5\) model and Fig. 1c–d for the power exponential model, considering \(n=35, \ p=3,5 \ \text{ and } \ k=3\). Relative quantile discrepancy is defined as the ratio of the difference between the exact quantile (estimated by simulation) and the asymptotic quantile by asymptotic quantile. We consider the score test and the corrected tests due to their performances. The closer the curve is to zero ordinate, the better approximated to the reference \(\chi ^2\) distribution is the test statistic’s null distribution. For both models, it is clear that the corrected likelihood ratio test statistic’s null distribution is not well approximated by the reference \(\chi ^2\) distribution in all considered scenarios. In contrast, the null distributions of the score statistic, the corrected modified profile likelihood ratio statistic and bootstrap Bartlett corrected likelihood ratio statistic are well approximated to the reference \(\chi ^2\) distribution, since their quantile discrepancy curves are very close to zero ordinate, as reflected in the performance of these tests presented in Tables 2, 3, 4 and 5.

5 An illustrative example

In this section, we apply the test method presented in the previous sections to a real dataset. The computer code for computing these statistics can be requested from us by email. The data analyzed refer to the weight of eye lenses of European rabbit in Australia (Oryctolagus Cuniculus), y,  in mg, and the age of the animal, x,  in days, in a sample containing 71 observations. These data were analyzed by Wei (1998, Example 6.8), who verified the suspicion of two aberrant points under least squares estimation, thus indicating that the dataset supports errors with heavier tails than the normal. Cysneiros et al. (2005) also analyzed this dataset under the Student-t distribution with 4 degrees of freedom. The choice of the degrees of freedom was based on the model that presented the lowest AIC. The residual plot (see Cysneiros et al. 2005, Sect. 3.2) showed that they are not uniformly distributed around zero, giving evidence of heteroscedasticity. Motivated by that, we decided to consider a more general model than that adjusted by Cysneiros et al. (2005), introducing a regression structure for modeling dispersion.

We consider the following heteroscedastic model

$$\begin{aligned} y_\ell = \exp \left( \beta _0 - \frac{\beta _1}{x_\ell + \beta _2} \right) e^{\epsilon _\ell }, \end{aligned}$$

where \(\epsilon _\ell \sim S(0,\sigma ^2 \exp \{\delta x_\ell \})\), \(\ell =1,\ldots ,71.\) Our main goal now is to test \(H_0: \delta =0\) (homoscedasticity) against a two-sided alternative (heteroscedasticity).

To test \(H_0,\) the observed values of the test statistics are \(LR=8.368\) (p-value: 0.004), \(LR^*=7.865\) (p-value: 0.005), \(LR_m=8.919\) (p-value: 0.003), \(LR_m^*=8.871\) (p-value: 0.003), \(S_r=6.766\) (p-value: 0.009) and \(LR^*_{boot}=5.777\) (p-value: 0.016). The p-value for the bootstrap likelihood is 0.026. So, at the \(1\%\) nominal level, the bootstrap based tests lead to not rejecting the null hypothesis, that is, the dispersion is constant over the observations (homoscedasticity), while a different decision was reached when we employed all other tests, leading to rejection of the null hypothesis, that is, heteroscedasticity of the dispersion. Recalling our previous section and the literature, the LR and \(S_r\) tests are size distorted when we deal with small or even moderate-sized samples, leading us not to rely on the inference delivered by these tests. Also, recall from our simulation study that the bootstrap based tests performed best in most scenarios and were not affected by the sample size, number of parameters of interest or nuisance parameters, so, the bootstrap based tests should be preferable.

6 Concluding remarks

Symmetric models have received increasing attention in the statistical literature in recent years and much of this attention is due to the fact that such models are less sensitive than the normal model to the presence of outlying observations in the dataset to be modeled. Many works have addressed the class of symmetric models. Cysneiros et al. (2010a) proposed the HSNLM class, whose parametric estimation is performed by numerically maximizing the log-likelihood function, since the maximum likelihood estimators do not have closed form. In this class of models, hypothesis testing of model parameters are usually based on the likelihood ratio test, which is based on first-order asymptotic approximations. Thus, for small and even moderate sample sizes, the likelihood ratio statistic’s null distribution is not well approximated from the reference \(\chi ^2\) distribution and as a consequence, the test based on its statistic shows distorted rejection rates, making it important to develop strategies to yield more accurate inferences when the sample size is not large enough.

In this paper, we presented Bartlett corrections to improve hypothesis tests based on the likelihood ratio and modified profile likelihood ratio statistics in the HSNLM class. Our work extends some results presented by Cordeiro (2004) and Ferrari et al. (2004), who obtained Bartlett correction factors for the likelihood ratio statistic in symmetric nonlinear models and for the modified profile likelihood ratio statistic in normal linear models, respectively. In order to compare the performance of the tests, we also considered the score test and the bootstrap likelihood ratio and bootstrap Bartlett corrected likelihood ratio tests.

Numerical results showed that the usual likelihood ratio test is somewhat oversized and its corrected version, although attenuating this tendency, still has distorted rejection rates that increase as the number of parameters in the model increases. The simulation results also showed that the inference based on the modified profile likelihood ratio test outperforms the usual test and does not suffer influence from the number of parameters in the model when this increases. Moreover, The numerical evidence showed the better performance of the corrected tests and the uncorrected score test. The score test is very simple to compute, since it involves only estimations under the null hypothesis. In particular, the numerical results showed the superior performance of the corrected modified profile likelihood ratio test and also both bootstrap-based tests in small and moderate sample sizes. Thus, we encourage practitioners to use these tests in applications.