1 Introduction

The linear regression model with exact linear restrictions is widely used in applied econometrics and statistics. An example in financial econometrics is the simple linear model for the excess return of a security regressed on the excess return of the market portfolio. When the intercept is set equal to zero, this becomes the capital asset pricing model.

The method of restricted least squares (RLS) provides natural estimators of the regression coefficients in a linear model with exact restrictions. Basic distributional properties of the RLS estimator, efficiency comparisons, hypothesis tests and real-world applications can be found in Chipman and Rao (1964), Trenkler (1987), Ramanathan (1993), Greene (2007) and Wooldridge (2013). It is well known that the RLS estimator can be expressed in terms of the ordinary least squares (OLS) estimator. In particular, Magnus and Neudecker (1999) studied the RLS estimator in various specific situations. Gross (2003) and Rao et al. (2008) explored its relationship to a ridge estimator. However, it seems that few publications so far have treated the RLS estimator from a viewpoint of influence diagnostic or sensitivity analyses, even though such analyses are needed and have increasingly been used; see, for example, the monographs Cook and Weisberg (1982) for early studies on residuals and influence in regression, Chatterjee and Hadi (1988) and Rao et al. (2008) for sensitivity analysis in linear regression, Atkinson and Riani (2000) and Atkinson et al. (2004) for robust diagnostic regression analysis, and Kleiber and Zeileis (2008) for regression diagnostics incorporated in a recent textbook on applied econometrics. To our knowledge, only Liu and Neudecker (2007) provided a local sensitivity result for the RLS estimator. Two alternatives to the RLS method are the model reduction and Lagrange multiplier methods. As pointed out by Hocking (2003, p. 622), in the model reduction method it is difficult to assess the effect of adjoining restrictions due to the lack of a specific expression for the reduced model. The Lagrange multiplier method is more amenable to such theoretical developments. Therefore, in the present paper, our approach is based on this last method. We deal with the estimation and diagnostic issues in a systematic manner, which links the unique solution of the restricted maximum likelihood (RML) and RLS estimators to the local influence method both proven to be very useful in practice.

The local influence method was introduced by Cook (1986). Subsequently, alternative approaches were suggested by, for example, Billor and Loynes (1993, 1999), Poon and Poon (1999) and Shi and Huang (2011). Unlike case deletion methods, which can suffer from masking effects, the local influence method is a powerful tool for detecting observations that can be influential in the estimation of model parameters. It now becomes a general method for assessing the influence of local departures from model assumptions on maximum likelihood (ML) estimates. The local influence method is employed in several areas of applied econometrics and statistics. For example, there are a number of applications and studies in regression modelling and time series analysis; see Cook (1986), Galea et al. (1997), Liu (2000, 2002, 2004), Díaz-García et al. (2003), Galea et al. (2008) and Shi and Chen (2008) for studies in linear regression and time series models, de Castro et al. (2007) and Galea and de Castro (2012) for heteroskedastic errors-in-variables models, Leiva et al. (2007, 2014) for influence diagnostics with censored and uncensored data, Barros et al. (2010) for a Tobit model and Paula et al. (2012) for robust modelling applied to insurance data. In particular, the local influence method can play an important role in regression models involving restrictions. Paula (1993) used this method to handle the linear model with inequality restrictions; Cysneiros and Paula (2005) and Paula and Cysneiros (2010) considered parameter constraints in univariate symmetrical linear regression models; and Liu et al. (2009) studied a normal linear model with stochastic linear restrictions.

The objective of this paper is to provide a methodology to assess local influence in the possibly heteroskedastic linear regression model with exact linear restrictions. While influence diagnostics were studied for the spherical linear model by Cook (1986) and the capital asset pricing model by Galea et al. (2008), no such studies have been carried out for the RLS estimator. Therefore, we fill this gap. In addition, heteroskedasticity is a classic issue that is widely encountered in practical situations; see, for example, Greene (2007) and Wooldridge (2013). It was investigated by de Castro et al. (2007) and Galea and de Castro (2012) in models with errors-in-variables, but not with linear restrictions. Furthermore, we present results which are relevant to but different from those considered by Cysneiros and Paula (2005), Liu and Neudecker (2007), Liu et al. (2009) and Paula and Cysneiros (2010). We use the ML method under normality of the model errors to estimate the corresponding parameters with exact restrictions. We deal with not only the general linear model for spherical disturbances, corresponding to the univariate models studied by Cysneiros and Paula (2005) and Paula and Cysneiros (2010), but also the general linear model for non-spherical or heteroskedastic disturbances as an extension. Our approach differs from the quadratic penalty function approach considered in Cysneiros and Paula (2005) and Paula and Cysneiros (2010) by the fact that we employ the Lagrange multiplier method with a linear penalty function, and that the RML estimators are closely related to the RLS or generalized least-squares (GLS) estimators. In addition, we introduce three global influence statistics and compare our local influence statistics with them.

Note that exact restrictions in the model often arise from past experience, economic or financial theory, or the area under study, which must be treated as prior information. We treat such prior information as equivalent to the sample data, although we also allow the possibility that the linear restrictions on the coefficients are not really prior information, but just a null hypothesis, proposed to simplify the relationship between the response variable (called response hereafter) and explanatory variables (called covariables hereafter).

The rest of the paper is organized as follows. In Sects. 2 and 3, we propose a methodology to assess local influence in the normal linear regression model with restrictions. Specifically, in Sect. 2, we formulate and discuss the model, RML estimation, the information matrix and inference. In Sect. 3, we describe the local influence method with different perturbation matrices, and three global influence statistics for comparison purposes. In Sect. 4, we illustrate and validate the proposed methodology with real-world data. In Sect. 5, we make some concluding remarks about this work. The technical proofs of our results are included in the Appendix.

2 Modelling and estimation

In this section, we formulate the general linear regression model with restrictions in two cases: the first with spherical disturbances and the second with non-spherical disturbances. We then provide several estimators for the corresponding parameters, together with the corresponding information matrix.

2.1 Formulation of the model

Spherical disturbance Consider the general linear regression model given by

$$\begin{aligned} {\varvec{y}} = {\varvec{X}} {\varvec{\beta }} + {\varvec{u}}, \end{aligned}$$
(1)

where \({\varvec{y}} = (Y_1, \ldots , Y_n)^\top \) is an \(n \times 1\) response vector, \({\varvec{X}}\) is an \(n \times p\) known design matrix of rank p containing the values of the covariables, \({\varvec{\beta }}\) is a \(p \times 1\) vector of unknown parameters to be estimated, and \({\varvec{u}}\) is an \(n \times 1\) error vector with expectation \(\text {E}[{\varvec{u}}] = {\varvec{0}}_{n\times 1}\) and variance-covariance matrix \(\text {D}[{\varvec{u}}] = \sigma ^2 {\varvec{I}}_n\). Here, \({\varvec{0}}_{n\times 1}\) is an \(n \times 1\) zero vector, \(\sigma ^2 > 0\) is an unknown parameter to be estimated and \({\varvec{I}}_n\) is the \(n \times n\) identity matrix.

For the model given in (1), suppose we have prior information about \({\varvec{\beta }}\) in the form of a set of q independent exact linear restrictions expressed as

$$\begin{aligned} {\varvec{r}} = {\varvec{R}} {\varvec{\beta }}, \end{aligned}$$
(2)

where \({\varvec{R}}\) is a \(q \times p\) known matrix of rank \(q \le p\) and \({\varvec{r}}\) is a \(q \times 1\) vector of known elements. The \(k \times 1\) parameter vector in the formulation given in (1) and (2) is \({\varvec{\theta }} = ({\varvec{\beta }}^\top , \sigma ^2)^\top \), where \(k=p+1\).

Non-spherical disturbance To extend the spherical disturbance assumption for the model given in (1), we consider a groupwise heteroskedastic case. Without loss of generality, we split the data into two groups to correspond to the non-spherical disturbance by writing \({\varvec{y}} = ({\varvec{y}}_1^\top , {\varvec{y}}_2^\top )^\top \), \({\varvec{X}} = ({\varvec{X}}_1^\top , {\varvec{X}}_2^\top )^\top \) and \({\varvec{u}} = ({\varvec{u}}_1^\top , {\varvec{u}}_2^\top )^\top \), where \(n_1+n_2=n\). We extend the model given in (1) by assuming the variance-covariance matrix \(\text {D}[{\varvec{u}}] = \text {diag}(\sigma _1^2 {\varvec{I}}_{n_1}, \sigma _2^2 {\varvec{I}}_{n_2}) ={\varvec{V}}\) (say), with \(\sigma _g^2 > 0\), for \(g=1, 2\), and \(\sigma _1^2 \ne \sigma _2^2\) (non-spherical or heteroskedastic case). We assume the same prior information about \({\varvec{\beta }}\) as in (2). The \(k \times 1\) parameter vector is now \({\varvec{\theta }}_{\text {G}} = ({\varvec{\beta }}^\top , \sigma _1^2, \sigma _2^2)^\top \), where \(k=p+2\).

2.2 Estimation

Spherical disturbance The RLS estimators of the elements of the parameter vector \({\varvec{\theta }}\) in the formulation (1) and (2) are well-known to be

$$\begin{aligned} \hat{{\varvec{\beta }}}_{\text {RLS}}= & {} {\varvec{b}} - ({\varvec{X}}^\top {\varvec{X}})^{-1} {\varvec{R}}^\top [{\varvec{R}}({\varvec{X}}^\top {\varvec{X}})^{-1} {\varvec{R}}^\top ]^{-1}({\varvec{R}} {\varvec{b}}- {\varvec{r}}), \end{aligned}$$
(3)
$$\begin{aligned} \hat{\sigma }^2_{\text {RLS}}= & {} {({\varvec{y}}-{\varvec{X}} \hat{{\varvec{\beta }}}_{\text {RLS}})^\top ({\varvec{y}}-{\varvec{X}} \hat{{\varvec{\beta }}}_{\text {RLS}}) \over n-p+q}, \end{aligned}$$
(4)

where \({\varvec{b}}=({\varvec{X}}^\top {\varvec{X}})^{-1} {\varvec{X}}^\top {\varvec{y}}\) is the (unrestricted) OLS estimator of \({\varvec{\beta }}\).

If we add the normality assumption \({\varvec{u}} \sim \text {N}_n({\varvec{0}}_{n\times 1}, \sigma ^2 {\varvec{I}}_n)\) to the formulation given in (1) and (2), we can use the ML method to estimate \({\varvec{\theta }}\). In this case, the log-likelihood function for \({\varvec{\theta }}\) to be optimised subject to the restriction \({\varvec{r}} = {\varvec{R}} {\varvec{\beta }}\) (called the restricted log-likelihood function) is given by

$$\begin{aligned} \ell =\! \ell ({\varvec{\theta }}) \!= - \frac{n}{2} \log (2\pi ) - \frac{n}{2} \log (\sigma ^2) - \frac{1}{2 {\sigma }^2}({\varvec{y}}- {\varvec{X}} {\varvec{\beta }})^\top ({\varvec{y}} - {\varvec{X}} {\varvec{\beta }}) \!- \varvec{\uplambda }^\top ({\varvec{R}} {\varvec{\beta }} \!-\! {\varvec{r}}),\nonumber \\ \end{aligned}$$
(5)

where \(\varvec{\uplambda }\) is the \(q \times 1\) vector of Lagrange multipliers. In order to obtain the RML estimator, we use matrix calculus to take the differential of \(\ell \) given in (5) with respect to \({\varvec{\beta }}\) and \(\sigma ^2\) and equate it to \({\varvec{0}}_{k \times 1}\). We get

$$\begin{aligned} \hat{\varvec{\uplambda }} = {[{\varvec{R}}({\varvec{X}}^\top {\varvec{X}})^{-1} {\varvec{R}}^\top ]^{-1} ({\varvec{R}} {\varvec{b}} - {\varvec{r}}) \over \hat{\sigma }^2} \end{aligned}$$

and then the corresponding RML estimators are

$$\begin{aligned} \hat{{\varvec{\beta }}}= & {} {\varvec{b}} - ({\varvec{X}}^\top {\varvec{X}})^{-1}{\varvec{R}}^\top [{\varvec{R}}({\varvec{X}}^\top {\varvec{X}})^{-1}{\varvec{R}}^\top ]^{-1}({\varvec{R}} {\varvec{b}}-{\varvec{r}}), \end{aligned}$$
(6)
$$\begin{aligned} \hat{\sigma }^2= & {} {({\varvec{y}}-{\varvec{X}} \hat{{\varvec{\beta }}})^\top ({\varvec{y}}-{\varvec{X}} \hat{{\varvec{\beta }}}) \over n}, \end{aligned}$$
(7)

where \({\varvec{b}}\) is defined in (3).

Rao et al. (2008) noted that the RML and RLS estimators of \({\varvec{\beta }}\) given in (3) and (6) are the same, but those for \({\sigma }^2\) given in (4) and (7) are not the same. They considered a special case with exact knowledge of a subvector and stepwise inclusion of exact linear restrictions. From (6) we obtain

$$\begin{aligned} \text {E}(\hat{{\varvec{\beta }}})= & {} {\varvec{\beta }}, \end{aligned}$$
(8)
$$\begin{aligned} \text {D}(\hat{{\varvec{\beta }}})= & {} {\varvec{\sigma }}^2 [({\varvec{X}}^\top {\varvec{X}})^{-1} - ({\varvec{X}}^\top {\varvec{X}})^{-1} {\varvec{R}}^\top [{\varvec{R}} ({\varvec{X}}^\top {\varvec{X}})^{-1}{\varvec{R}}^\top ]^{-1} {\varvec{R}}^\top ({\varvec{X}}^\top {\varvec{X}})^{-1}]. \qquad \end{aligned}$$
(9)

Note from (8) and (9) that \(\hat{{\varvec{\beta }}}\) is unbiased and more efficient than \({\varvec{b}}\), which has variance \({\varvec{\sigma }}^2 ({\varvec{X}}^\top {\varvec{X}})^{-1}\).

Non-spherical disturbance In this case, the restricted log-likelihood function for \({\varvec{\theta }}_{\text {G}}\) is given by

$$\begin{aligned} \ell ({\varvec{\theta }}_{\text {G}})= & {} - \frac{n}{2} \log (2\pi ) - \sum _{g=1}^2 \frac{n_g }{2} \log \left( \sigma _g^2\right) \nonumber \\&- \sum _{g=1}^2 \frac{1}{2 {\sigma _g}^2}({\varvec{y}}_g- {\varvec{X}}_g {\varvec{\beta }})^\top ({\varvec{y}}_g - {\varvec{X}}_g {\varvec{\beta }}) - \varvec{\uplambda }^\top ({\varvec{R}} {\varvec{\beta }} - {\varvec{r}}), \end{aligned}$$
(10)

where \(\varvec{\uplambda }\) is again the \(q \times 1\) vector of Lagrange multipliers. We again use matrix calculus to find the RML estimators under groupwise heteroskedasticity, obtaining

$$\begin{aligned} \hat{{\varvec{\beta }}}_{\text {G}}= & {} {\varvec{b}}_{\text {G}} - ({\varvec{X}}^\top \hat{{\varvec{V}}}^{-1} {\varvec{X}})^{-1}{\varvec{R}}^\top [{\varvec{R}}({\varvec{X}}^\top \hat{{\varvec{V}}}^{-1} {\varvec{X}})^{-1}{\varvec{R}}^\top ]^{-1} ({\varvec{R}} {\varvec{b}}_{\text {G}}-{\varvec{r}}), \end{aligned}$$
(11)
$$\begin{aligned} \hat{\sigma }^2_g= & {} {({\varvec{y}}_g-{\varvec{X}}_g \hat{{\varvec{\beta }}}_{\text {G}})^\top ({\varvec{y}}_g-{\varvec{X}}_g \hat{{\varvec{\beta }}}_{\text {G}}) \over n_g}, ~g=1,2, \end{aligned}$$
(12)

where \({\varvec{b}}_{\text {G}} = ({\varvec{X}}^\top \hat{{\varvec{V}}}^{-1} {\varvec{X}})^{-1} {\varvec{X}}^\top \hat{{\varvec{V}}}^{-1} {\varvec{y}}\) is the GLS estimator of \({\varvec{\beta }}\) with \(\hat{{\varvec{V}}} = \text {diag}(\hat{\sigma }_1^2 {\varvec{I}}_{n_1}, \hat{\sigma }_2^2 {\varvec{I}}_{n_2})\).

From the formulas given in (11) and (12), we need to calculate the RML estimates iteratively. We can employ a procedure starting with the OLS estimate \({\varvec{b}}\) and then iterating between \(\hat{{\varvec{V}}}\) and \({\varvec{b}}_{\text {G}}\) and \(\hat{{\varvec{\beta }}}_{\text {G}}\). Under the usual assumptions, when the restrictions \({\varvec{R}} {\varvec{b}}_{\text {G}} = {\varvec{r}}\) are true, in large samples the distribution of \(\hat{{\varvec{\beta }}}_{\text {G}}\) can be approximated by a normal distribution with mean \({\varvec{\beta }}\) and a variance matrix which is consistently estimated by

$$\begin{aligned} ({\varvec{X}}^\top \hat{{\varvec{V}}}^{-1} {\varvec{X}})^{-1} - ({\varvec{X}}^\top \hat{{\varvec{V}}}^{-1}{\varvec{X}})^{-1} {\varvec{R}}^\top [{\varvec{R}} ({\varvec{X}}^\top \hat{{\varvec{V}}}^{-1}{\varvec{X}})^{-1}{\varvec{R}}^\top ]^{-1} {\varvec{R}} ({\varvec{X}}^\top \hat{{\varvec{V}}}^{-1} {\varvec{X}})^{-1}. \end{aligned}$$

For further ideas with examples and relevant asymptotic results, see, for example, Efron and Hinkley (1978), Judge et al. (1988) and Greene (2007). By assuming \(\sigma ^2_1 = \sigma ^2_2\), we note that the results given in (11) and (12) reduce to the RML estimators for \({\varvec{\beta }}\) given in (6) and for \({\sigma }^2\) given in (7). When \({\varvec{V}}\) in (11) is known, we obtain

$$\begin{aligned} \text {E}(\hat{{\varvec{\beta }}}_{\text {G}})= & {} {\varvec{\beta }},\\ \text {D}(\hat{{\varvec{\beta }}}_{\text {G}})= & {} ({\varvec{X}}^\top {\varvec{V}}^{-1} {\varvec{X}})^{-1} - ({\varvec{X}}^\top {\varvec{V}}^{-1}{\varvec{X}})^{-1} {\varvec{R}}^\top \\&\times [{\varvec{R}} ({\varvec{X}}^\top {\varvec{V}}^{-1}{\varvec{X}})^{-1}{\varvec{R}}^\top ]^{-1} {\varvec{R}} ({\varvec{X}}^\top {\varvec{V}}^{-1}{\varvec{X}})^{-1}. \end{aligned}$$

2.3 Information matrices

Spherical disturbance Using the Hessian matrix derived in Appendix 1 and given by

$$\begin{aligned} {\varvec{H}}({{\varvec{\theta }}}) = - \left( \begin{array}{lll} \displaystyle \frac{1}{ {\sigma ^2}} {\varvec{X}}^\top {\varvec{X}} &{} &{}\quad \displaystyle \frac{1}{{\sigma ^4}} {\varvec{X}}^\top ({\varvec{y}}-{\varvec{X}} {{\varvec{\beta }}})\\ \displaystyle \frac{1}{ {\sigma ^4}} ({\varvec{y}}-{\varvec{X}} {{\varvec{\beta }}})^\top {\varvec{X}} &{} &{}\quad - \displaystyle \frac{n}{2 {\sigma ^4}} + \frac{1}{ {\sigma ^6}} ({\varvec{y}}-{\varvec{X}} {{\varvec{\beta }}})^\top ({\varvec{y}}-{\varvec{X}} {{\varvec{\beta }}}) \end{array}\right) , \end{aligned}$$
(13)

we obtain the expected Fisher information matrix given by

$$\begin{aligned} {\varvec{I}}({{\varvec{\theta }}}) = \left( \begin{array}{lll} \displaystyle \frac{1}{ {\sigma ^2}} {\varvec{X}}^\top {\varvec{X}} &{} &{}\quad \displaystyle 0\\ \\ \displaystyle 0 &{} &{}\quad \displaystyle \frac{n}{2 {\sigma ^4}} \end{array}\right) . \end{aligned}$$

The observed Fisher information matrix is \({\varvec{J}}(\hat{{\varvec{\theta }}}) = - {\varvec{H}}(\hat{{\varvec{\theta }}}) = {\varvec{I}}(\hat{{\varvec{\theta }}})\).

Non-spherical disturbance In this case, the Hessian matrix is obtained as

$$\begin{aligned} {\varvec{H}}({{\varvec{\theta }}_{\text {G}}}) = - \left( \begin{array}{ccc} {\varvec{h}}_{\beta \beta } &{}\quad {\varvec{h}}_{\beta \sigma ^2_1} &{}\quad {\varvec{h}}_{\beta \sigma ^2_2} \\ {\varvec{h}}_{\beta \sigma ^2_1}^\top &{}\quad {\varvec{h}}_{\sigma ^2_1 \sigma ^2_1} &{}\quad {\varvec{h}}_{\sigma ^2_1 \sigma ^2_2} \\ {\varvec{h}}_{\beta \sigma ^2_2}^\top &{}\quad {\varvec{h}}_{\sigma ^2_1 \sigma ^2_2}^\top &{}\quad {\varvec{h}}_{\sigma ^2_2 \sigma ^2_2} \\ \end{array}\right) , \end{aligned}$$
(14)

where

$$\begin{aligned} {\varvec{h}}_{\beta \beta }= & {} {\varvec{X}}^\top {{\varvec{V}}}^{-1} {\varvec{X}} = \displaystyle \frac{1}{{\sigma ^2_1}} {\varvec{X}}_1^\top {\varvec{X}}_1 + \frac{1}{ {\sigma ^2_2}} {\varvec{X}}_2^\top {\varvec{X}}_2, \\ {\varvec{h}}_{\beta \sigma ^2_1}= & {} \displaystyle \frac{1}{ {\sigma ^4_1}} {\varvec{X}}_1^\top ({\varvec{y}}_1-{\varvec{X}}_1 {{\varvec{\beta }}}), \\ {\varvec{h}}_{\beta \sigma ^2_2}= & {} \displaystyle \frac{1}{ {\sigma ^4_2}} {\varvec{X}}_2^\top ({\varvec{y}}_2-{\varvec{X}}_2 {{\varvec{\beta }}}), \\ {\varvec{h}}_{\sigma ^2_1 \sigma ^2_1}= & {} - \displaystyle \frac{n_1}{2 {\sigma ^4_1}} + \frac{1}{{\sigma ^6_1}}({\varvec{y}}_1-{\varvec{X}}_1 {{\varvec{\beta }}})^\top ({\varvec{y}}_1-{\varvec{X}}_1 {{\varvec{\beta }}}), \\ {\varvec{h}}_{\sigma ^2_1 \sigma ^2_2}= & {} 0, \nonumber \\ {\varvec{h}}_{\sigma ^2_2 \sigma ^2_2}= & {} - \displaystyle \frac{n_2}{2 {\sigma ^4_2}} + \frac{1}{{\sigma ^6_2}}({\varvec{y}}_2-{\varvec{X}}_2 {{\varvec{\beta }}})^\top ({\varvec{y}}_2-{\varvec{X}}_2 {{\varvec{\beta }}}). \end{aligned}$$

Then, the corresponding expected Fisher information matrix is given by

$$\begin{aligned} {\varvec{I}}({{\varvec{\theta }}_{\text {G}}}) = \left( \begin{array}{lllll} \displaystyle {\varvec{X}}^\top {\varvec{V}}^{-1} {\varvec{X}} &{} &{} \quad \displaystyle 0 &{} &{}\quad \displaystyle 0\\ \\ \displaystyle 0 &{} &{}\quad \displaystyle \frac{n_1}{2 {\sigma _1^4}} &{} &{}\quad \displaystyle 0 \\ \\ \displaystyle 0 &{} &{}\quad \displaystyle 0 &{} &{}\quad \displaystyle \frac{n_2}{2 {\sigma _2^4}} \end{array}\right) \end{aligned}$$

and we have \({\varvec{J}}(\hat{{\varvec{\theta }}}_{\text {G}}) = - {\varvec{H}}(\hat{{\varvec{\theta }}}_{\text {G}}) = {\varvec{I}}(\hat{{\varvec{\theta }}}_{\text {G}})\).

3 Influence diagnostics

In this section, we present the local influence method, the perturbation matrices for some different schemes, and then three global influence measures.

3.1 Local influence

Let \(\ell ({\varvec{\theta }})\) be the log-likelihood function for the model given in (1) and (2), which we call the postulated or non-perturbed model. Here \({\varvec{\theta }}\) is the \(k \times 1\) unknown parameter vector to be estimated and we denote its ML estimator by \(\hat{{\varvec{\theta }}}\). Let \({\varvec{w}} =(w_1,\ldots ,w_m)^\top \) denote an \(m \times 1\) perturbed vector and \({\varvec{\varOmega }}\) an open set of relevant perturbations such that \({\varvec{w}} \in {\varvec{\varOmega }}\). Then, let \(\ell ({\varvec{\theta }}|{\varvec{w}}) = \ell _{{\varvec{w}}}\) be the log-likelihood function for the perturbed model and \(\hat{{\varvec{\theta }}}_w\) be the corresponding ML estimator of \({\varvec{\theta }}\). Let \({\varvec{w}}_0 \in {\varvec{\varOmega }}\) denote an \(m \times 1\) non-perturbed vector with \({\varvec{w}}_0 =(0,\ldots ,0) ^\top \), or \({\varvec{w}}_0 =(1,\ldots ,1) ^\top \), or even a third choice, depending on the context, such that \(\ell ({\varvec{\theta }}) = \ell ({\varvec{\theta }}|{\varvec{w}}_0)\). Suppose that \(\ell ({\varvec{\theta }}|{\varvec{w}})\) is twice continuously differentiable in a neighborhood of \((\hat{{\varvec{\theta }}}, {\varvec{w}}_0)\). We are interested in comparing the parameter estimates \(\hat{{\varvec{\theta }}}\) and \(\hat{{\varvec{\theta }}}_w\) using the idea of local influence to detect how the inference is affected by the perturbation. Specifically, in the Cook local influence method, the likelihood displacement (LD) is given by

$$\begin{aligned} \text {LD}({\varvec{w}}) = 2(\ell (\hat{{\varvec{\theta }}})- \ell (\hat{{\varvec{\theta }}}_{{\varvec{w}}})), \end{aligned}$$

which can be used to assess the influence of the perturbation \({\varvec{w}}\). Here, large values of LD\(({\varvec{w}})\) indicate that \(\hat{{\varvec{\theta }}}\) and \(\hat{{\varvec{\theta }}}_w\) differ considerably in relation to the contours of the non-perturbed log-likelihood function \(\ell ({\varvec{\theta }}\)). The idea is based on studying the local behaviour of \(\text {LD}({\varvec{w}})\) and the normal curvature \(C_{{\varvec{l}}}\) in a unit-length direction vector \({\varvec{l}}\), where \(||{\varvec{l}}||=1\). According to Cook (1986), the normal curvature is given by \(C_{{\varvec{l}}} = 2|{\varvec{l}} ^\top {\varvec{F}}({\varvec{\theta }}) {\varvec{l}}|\). The maximum curvature \(C_{\max }\) and the corresponding direction vector \({\varvec{l}}_{\max }\) may reveal those observations that exercise more influence on LD\(({\varvec{w}})\), where \(C_{\max } = \max _{||{\varvec{l}}||=1} C_{{\varvec{l}}}\). To examine the magnitude of influence, it is useful to have a benchmark value for \(C_{\max }\) and for the elements of \({\varvec{l}}_{\max }\). For \(C_{\max }\), 2 or a q-value measure suggested by Shi and Huang (2011) can be used. For example, when the q-value is greater than 2, we can say the associated direction vector is significant for detecting influential observations. For the elements of \({\varvec{l}}_{\max }\), Poon and Poon (1999) suggested \(1/\sqrt{n}\) and Shi and Huang (2011) noted \(2/\sqrt{n}\) can be more reasonable, where n is the sample size. In our case, we consider as influential those observations with absolute values of \({\varvec{l}}_{\max }\) exceeding \(2/\sqrt{n}\). To find \(C_{\max }\) and \({\varvec{l}}_{\max }\), we need to calculate the \(m \times m\) matrix \({\varvec{F}}({\varvec{\theta }})\) defined by

$$\begin{aligned} {\varvec{F}}({\varvec{\theta }})= - {\varvec{\varDelta }}({\varvec{\theta }})^\top {\varvec{H}}({\varvec{\theta }})^{-1} {\varvec{\varDelta }}({\varvec{\theta }}), \end{aligned}$$
(15)

where \({\varvec{\varDelta }}({\varvec{\theta }})\) is a \(p \times m\) matrix for the perturbed model (perturbation matrix), which must be obtained from \(\text {d}^2_{{\varvec{\theta }} {\varvec{w}}} \ell _{{\varvec{w}}}\) and evaluated at \({\varvec{\theta }}=\hat{{\varvec{\theta }}}\) and \({\varvec{w}}={\varvec{w}}_0\). Here, \({\varvec{H}}({\varvec{\theta }})\) is obtained from (13) or (14). In order to detect local influence, one or both of the following two options can be considered:

  1. (i)

    The vector \({\varvec{f}}=({\varvec{F}}_{11}, \ldots , {\varvec{F}}_{nn})^\top \), where \({\varvec{F}}_{ii}\) is the ith diagonal element of \({\varvec{F}}({\varvec{\theta }})\) given in (15), for \(i=1,\ldots ,n\). Clearly, \({\varvec{F}}_{ii}\) indicates the possible impact of the perturbation of the ith observation on the RML estimates of the model parameters. We consider the ith case as outstanding if \({\varvec{F}}_{ii} \ge 2\,\overline{{\varvec{F}}}\), similarly to \(C_i \ge 2\,\overline{C}\), where \(C_i\) is the ith total local influence corresponding to \({\varvec{F}}_{ii}\), and \(\overline{C} = {1 \over n} \sum _{i=1}^{n}C_i\) is as given in, for example, Leiva et al. (2007) and Paula and Cysneiros (2010).

  2. (ii)

    The vector \({\varvec{l}}_{\max }\), which is the eigenvector associated with the largest eigenvalue \(C_{\max }\) of \({\varvec{F}}({\varvec{\theta }})\); see Cook (1986), Leiva et al. (2007), Liu et al. (2009) and Paula et al. (2012). Large absolute values of those elements of \({\varvec{l}}_{\max }\), that is, those greater than \(2/\sqrt{n}\), indicate the corresponding observations to be influential.

3.2 Perturbation matrices

3.2.1 Model perturbation

Spherical disturbance We replace the normal distribution in (1) and (2) by \({\varvec{u}}_{{\varvec{w}}} \sim \text {N}_n({\varvec{0}}_{n\times 1}, \sigma ^2 {\varvec{W}}^{-1})\), where \({\varvec{W}}=\text {diag}(w_1, \ldots , w_n)\) is an \(n \times n\) diagonal matrix with its elements \(w_i\) being the perturbations or weights and \({\varvec{W}}_0=\text {diag}(1, \ldots , 1)\) is the \(n \times n\) identity matrix for non-perturbed values, with \(i=1,\ldots ,n\). We also use \(\text {vec} {\varvec{W}} = {\varvec{S}} {\varvec{w}}\), where \({\varvec{S}}\) is the \(n^2 \times n\) selection matrix, \({\varvec{w}}=(w_1,\ldots ,w_n)^\top \) and \(\text {vec} {\varvec{W}}\) is the vectorization of \({\varvec{W}}\); see Neudecker et al. (1995) and Liu et al. (2014). In this perturbation scheme, the relevant part of the log-likelihood function subject to the restriction \({\varvec{r}} = {\varvec{R}} {\varvec{\beta }}\) is given by

$$\begin{aligned} \ell _{{{\varvec{w}}}_1}({\varvec{\theta }}) \!=\! - \frac{n}{2} \log (\sigma ^2) \!+\frac{1}{2} \log (|{\varvec{W}}|) - \frac{1}{2 \sigma ^2}({\varvec{y}}-{\varvec{X}} {\varvec{\beta }})^\top {\varvec{W}}({\varvec{y}}-{\varvec{X}} {\varvec{\beta }}) \!- \varvec{\uplambda }^\top ({\varvec{R}} {\varvec{\beta }} \!- {\varvec{r}}).\nonumber \\ \end{aligned}$$
(16)

Using the differential of \(\ell _{{{\varvec{w}}}_1}({\varvec{\theta }})\) given in (16) with respect to \({\varvec{\theta }}\) detailed in Appendix 2, we obtain \({\varvec{\varDelta }}(\hat{{\varvec{\theta }}})\) defined in (15) as

$$\begin{aligned} {\varvec{\varDelta }}(\hat{{\varvec{\theta }}}) = \left( \begin{array}{l} \frac{1}{\hat{\sigma ^2}} \left( \left( {\varvec{y}}-{\varvec{X}} \hat{{\varvec{\beta }}}\right) ^\top \otimes {\varvec{X}}^\top \right) {\varvec{S}}\\ \frac{1}{2 \hat{\sigma ^4}} \left( \left( {\varvec{y}}-{\varvec{X}} \hat{{\varvec{\beta }}}\right) ^\top \otimes \left( {\varvec{y}}-{\varvec{X}} \hat{{\varvec{\beta }}}\right) ^\top \right) {\varvec{S}} \end{array}\right) , \end{aligned}$$

where \(\otimes \) is the Kronecker product.

Non-spherical disturbance In this case, the relevant part of the log-likelihood function subject to the restriction \({\varvec{r}} = {\varvec{R}} {\varvec{\beta }}\) is given by

$$\begin{aligned} \ell ({\varvec{\theta }}_{\text {G}})= & {} - \sum _{g=1}^2 \frac{n_g }{2} \log (\sigma _g^2) + \frac{1}{2}\log (|{\varvec{W}}|) - \sum _{g=1}^2 \frac{1}{2 {\sigma _g}^2}({\varvec{y}}_g- {\varvec{X}}_g {\varvec{\beta }})^\top {\varvec{W}}_g ({\varvec{y}}_g - {\varvec{X}}_g {\varvec{\beta }})\\&- \varvec{\uplambda }^\top ({\varvec{R}} {\varvec{\beta }} - {\varvec{r}}). \end{aligned}$$

We use again matrix calculus to get

$$\begin{aligned} {\varvec{\varDelta }}(\hat{{\varvec{\theta }}_{\text {G}}}) = \left( \begin{array}{ccc} \frac{1}{\hat{\sigma _1^2}} \left( \left( {\varvec{y}}_1-{\varvec{X}}_1 \hat{{\varvec{\beta }}_{\text {G}}}\right) ^\top \otimes {\varvec{X}}_1^\top \right) {\varvec{S}}_1 &{}&{} \frac{1}{\hat{\sigma _2^2}} \left( \left( {\varvec{y}}_2-{\varvec{X}}_2 \hat{{\varvec{\beta }}_{\text {G}}}\right) ^\top \otimes {\varvec{X}}_2^\top \right) {\varvec{S}}_2\\ \frac{1}{2 \hat{\sigma _1^4}} \left( \left( {\varvec{y}}_1-{\varvec{X}}_1 \hat{{\varvec{\beta }}_{\text {G}}}\right) ^\top \otimes \left( {\varvec{y}}_1-{\varvec{X}}_1 \hat{{\varvec{\beta }}_{\text {G}}}\right) ^\top \right) {\varvec{S}}_1 &{}&{} 0 \\ 0 &{}&{} \frac{1}{2 \hat{\sigma _2^4}} \left( \left( {\varvec{y}}_2-{\varvec{X}}_2 \hat{{\varvec{\beta }}_{\text {G}}}\right) ^\top \otimes \left( {\varvec{y}}_2-{\varvec{X}}_2 \hat{{\varvec{\beta }}_{\text {G}}}\right) ^\top \right) {\varvec{S}}_2 \end{array}\right) , \end{aligned}$$

where \(S_g\) is the \(n_g^2 \times n_g\) selection matrix, for \(g=1,2\).

3.2.2 Response perturbation

Spherical disturbance We replace the assumption of normality by \({\varvec{u}}_{{\varvec{w}}} \sim \text {N}_n({\varvec{0}}_{n\times 1}, \sigma ^2 {\varvec{I}}_n)\), where \({\varvec{u}}_w = {\varvec{y}}+ {\varvec{w}}-{\varvec{X}} {\varvec{\beta }}\) is based on \({\varvec{y}}+{\varvec{w}}\) instead of \({\varvec{y}}\) given in (1), \({\varvec{w}} = (w_1, \ldots , w_n)^\top \) is an \(n \times 1\) perturbed vector, and \({\varvec{w}}_0 = (0, \ldots , 0)^\top \) is an \(n \times 1\) non-perturbed vector. In this perturbation scheme, the relevant part of the log-likelihood function subject to the restriction \({\varvec{r}} = {\varvec{R}} {\varvec{\beta }}\) is given by

$$\begin{aligned} \ell _{{{\varvec{w}}}_2}({\varvec{\theta }}) = - \frac{n}{2} \log (\sigma ^2) - {1 \over 2 \sigma ^2}(\varvec{y}+\varvec{w}-\varvec{X\beta })^\top ({\varvec{y}}+ {\varvec{w}}-{\varvec{X}} {\varvec{\beta }}) - \varvec{\uplambda }^\top ({\varvec{R}} {\varvec{\beta }} - {\varvec{r}}). \end{aligned}$$
(17)

Taking the differential of \(\ell _{{{\varvec{w}}}_2}({\varvec{\theta }})\) given in (17) with respect to \({\varvec{\theta }}\) as detailed in Appendix 2, we get

$$\begin{aligned} {\varvec{\varDelta }}(\hat{{\varvec{\theta }}}) = \left( \begin{array}{c} \frac{1}{\hat{\sigma ^2}} {\varvec{X}}^\top \\ \frac{1}{\hat{\sigma ^4}} ({\varvec{y}} -{\varvec{X}} \hat{{\varvec{\beta }}})^\top \end{array}\right) . \end{aligned}$$

Non-spherical disturbance In this case, the relevant part of the log-likelihood function subject to the restriction \({\varvec{r}} = {\varvec{R}} {\varvec{\beta }}\) is given by

$$\begin{aligned} \ell ({\varvec{\theta }}_{\text {G}})= & {} - \sum _{g=1}^2 \frac{n_g}{2} \log (\sigma _g^2) - \sum _{g=1}^2 \frac{1}{2 {\sigma _g}^2}({\varvec{y}}_g + {\varvec{w}}_g - {\varvec{X}}_g {\varvec{\beta }})^\top ({\varvec{y}}_g + {\varvec{w}}_g - {\varvec{X}}_g {\varvec{\beta }})\\&- {\uplambda }^\top ({\varvec{R}} {\varvec{\beta }} - {\varvec{r}}). \end{aligned}$$

By using matrix calculus, we get

$$\begin{aligned} {\varvec{\varDelta }}(\hat{{\varvec{\theta }}_{\text {G}}}) = \left( \begin{array}{ccc} \frac{1}{\hat{\sigma _1^2}} {\varvec{X}}_1^\top &{}&{} \frac{1}{\hat{\sigma _2^2}} {\varvec{X}}_2^\top \\ \\ \frac{1}{\hat{\sigma _1^4}} ({\varvec{y}}_1 -{\varvec{X}}_1 \hat{{\varvec{\beta }}_{\text {G}}})^\top &{}&{} 0 \\ \\ 0 &{}&{} \frac{1}{\hat{\sigma _2^4}} ({\varvec{y}}_2 -{\varvec{X}}_2 \hat{{\varvec{\beta }}_{\text {G}}})^\top \end{array}\right) . \end{aligned}$$

3.2.3 Covariable perturbation

Spherical disturbance We now assume \({\varvec{u}}_w \sim \text {N}_n({\varvec{0}}_{n\times 1}, \sigma ^2 {\varvec{I}}_n)\), where \({\varvec{u}}_w ={\varvec{y}}-({\varvec{X}}+{\varvec{W}} {\varvec{A}}) {\varvec{\beta }}\) is based on \({\varvec{X}}+{\varvec{W}} {\varvec{A}}\) instead of \({\varvec{X}}\) given in (1), with \({\varvec{W}}\) being an \(n \times p\) perturbed matrix, \({\varvec{W}}_0=\varvec{0}\) an \(n \times p\) non-perturbed matrix, \({\varvec{A}} = \text {diag}(a_1, \ldots , a_p)\) a \(p \times p\) diagonal matrix, and \(a_j\) the standard deviation of \({\varvec{x}}_j\) corresponding to the jth column of \({\varvec{X}}\), for \(j=1, \ldots , p\). In this perturbation scheme, the relevant part of the log-likelihood function subject to the restriction \({\varvec{r}} = {\varvec{R}} {\varvec{\beta }}\) is given by

$$\begin{aligned} \ell _{{{\varvec{w}}}_3}({\varvec{\theta }}) \!=\! - \frac{n}{2} \log (\sigma ^2) \!- \frac{1}{2 \sigma ^2}({\varvec{y}}-({\varvec{X}}\!+\!{\varvec{W}} {\varvec{A}}) {\varvec{\beta }})^\top ({\varvec{y}}-({\varvec{X}}\!+\!{\varvec{W}} {\varvec{A}}) {\varvec{\beta }}) \!-\! \varvec{\uplambda }^\top ({\varvec{R}} {\varvec{\beta }} - {\varvec{r}}). \nonumber \\ \end{aligned}$$
(18)

Based on the differential of \(\ell _{{{\varvec{w}}}_3}({\varvec{\theta }})\) given in (18) with respect to \({\varvec{\theta }}\) as detailed in Appendix 2, we obtain

$$\begin{aligned} {\varvec{\varDelta }}(\hat{{\varvec{\theta }}}) = \left( \begin{array}{c} \frac{1}{\hat{\sigma ^2}}\left( ({\varvec{y}}-{\varvec{X}} \hat{{\varvec{\beta }}})^\top \otimes {\varvec{A}}\right) {\varvec{K}}- \frac{1}{\hat{\sigma ^2}} \hat{{\varvec{\beta }}}^\top {\varvec{A}} \otimes {\varvec{X}}^\top \\ - \frac{1}{\hat{\sigma }^4} \hat{{\varvec{\beta }}}^\top {\varvec{A}} \otimes ({\varvec{y}}- {\varvec{X}} \hat{{\varvec{\beta }}})^\top \end{array}\right) , \end{aligned}$$

where \({\varvec{K}}\) is the \(np \times np\) commutation matrix such that \(\text {vec} {\varvec{W}}^\top = {\varvec{K}} \text {vec} {\varvec{W}}\) for the \(n \times p\) matrix \({\varvec{W}}\); see Magnus and Neudecker (1999) and Liu (2002) for the definition, properties and applications of \({\varvec{K}}\). In particular, if we perturb only \({\varvec{x}}_j\) to \({\varvec{x}}_j + a_j {\varvec{w}}\), where \({\varvec{w}} =(w_1,\ldots ,w_n)^\top \) is the \(n \times 1\) perturbed vector and \({\varvec{w}}_0 =(0,\ldots ,0)^\top \) is the \(n \times 1\) non-perturbed vector, we obtain

$$\begin{aligned} {\varvec{\varDelta }}(\hat{{\varvec{\theta }}})_j = a_j \left( \begin{array}{c} \frac{1}{\hat{\sigma ^2}} {\varvec{s}}_j ({\varvec{y}}-{\varvec{X}} \hat{{\varvec{\beta }}})^\top - \frac{1}{\hat{\sigma ^2}} \hat{\beta }_j {\varvec{X}}^\top \\ - \frac{1}{\hat{\sigma }^4} \hat{\beta }_j ({\varvec{y}}- {\varvec{X}} \hat{{\varvec{\beta }}})^\top \end{array}\right) , \end{aligned}$$

where \({\varvec{s}}_j\) is the \(p \times 1\) vector with an one in the the jth position and zeros in the other positions, for \(j=1,\ldots ,p\).

Non-spherical disturbance In this case, the relevant part of the log-likelihood function subject to the restriction \({\varvec{r}} = {\varvec{R}} {\varvec{\beta }}\) is given by

$$\begin{aligned} \ell ({\varvec{\theta }}_{\text {G}})= & {} - \sum _{g=1}^2 \frac{n_g }{2} \log (\sigma _g^2) - \sum _{g=1}^2 \frac{1}{2 {\sigma _g}^2}({\varvec{y}}_g- ({\varvec{X}}_g + {\varvec{W}}_g {\varvec{A}}_g) {\varvec{\beta }})^\top \\&\times ({\varvec{y}}_g - ({\varvec{X}}_g + {\varvec{W}}_g {\varvec{A}}_g) {\varvec{\beta }}) - \varvec{\uplambda }^\top ({\varvec{R}} {\varvec{\beta }} - {\varvec{r}}), \end{aligned}$$

where \({\varvec{W}}_g\) is the \(n_g \times p\) perturbed matrix, \({\varvec{A}}_g = \text {diag}(a_{g1}, \ldots , a_{gp})\) is the \(p \times p\) diagonal matrix, and \(a_{gj}\) is the standard deviation of the jth column of \({\varvec{X}}_g\), for \(g=1,2\) and \(j=1,\ldots ,p\). We use matrix calculus to obtain

$$\begin{aligned} {\varvec{\varDelta }}(\hat{{\varvec{\theta }}_{\text {G}}}) = \left( \begin{array}{lll} \frac{1}{\hat{\sigma _1^2}}\left( \left( {\varvec{y}}_1-{\varvec{X}}_1 \hat{{\varvec{\beta }}_{\text {G}}}\right) ^\top \otimes {\varvec{A}}_1\right) &{}&{}\quad \frac{1}{\hat{\sigma _2^2}}\left( \left( {\varvec{y}}_2-{\varvec{X}}_2 \hat{{\varvec{\beta }}_{\text {G}}}\right) ^\top \otimes {\varvec{A}}_2\right) \\ \quad \times {\varvec{K}}_1- \frac{1}{\hat{\sigma _1^2}} \hat{{\varvec{\beta }}_{\text {G}}}^\top {\varvec{A}}_1 \otimes {\varvec{X}}_1^\top &{}&{}\qquad \times {\varvec{K}}_2- \frac{1}{\hat{\sigma _2^2}} \hat{{\varvec{\beta }}_{\text {G}}}^\top {\varvec{A}}_2 \otimes {\varvec{X}}_2^\top \\ - \frac{1}{\hat{\sigma }_1^4} \hat{{\varvec{\beta }}_{\text {G}}}^\top {\varvec{A}}_1 \otimes ({\varvec{y}}_1- {\varvec{X}}_1 \hat{{\varvec{\beta }}_{\text {G}}})^\top &{}&{}\quad 0 \\ \\ 0 &{}&{}\quad - \frac{1}{\hat{\sigma }_2^4} \hat{{\varvec{\beta }}_{\text {G}}}^\top {\varvec{A}}_2 \otimes ({\varvec{y}}_2- {\varvec{X}}_2 \hat{{\varvec{\beta }}_{\text {G}}})^\top \end{array}\right) , \end{aligned}$$

where \({\varvec{K}}_g\) is the \(n_g p \times n_g p\) commutation matrix such that \(\text {vec} {\varvec{W}}_g^\top = {\varvec{K}}_g \text {vec} {\varvec{W}}_g\) for the \(n_g \times p\) matrix \({\varvec{W}}\). In particular, if we perturb only \({\varvec{x}}_j\) to \({\varvec{x}}_j + a_j {\varvec{w}}\), we obtain

$$\begin{aligned} {\varvec{\varDelta }}(\hat{{\varvec{\theta }}_{\text {G}}})_j = a_j \left( \begin{array}{ccc} \frac{1}{\hat{\sigma _1^2}} {\varvec{s}}_j ({\varvec{y}}_1-{\varvec{X}}_1 \hat{{\varvec{\beta }}_{\text {G}}})^\top - \frac{1}{\hat{\sigma _1^2}} \hat{\beta }_{\text {G}j} {\varvec{X}}_1^\top &{}&{} \frac{1}{\hat{\sigma _2^2}} {\varvec{s}}_j ({\varvec{y}}_2-{\varvec{X}}_2 \hat{{\varvec{\beta }}_{\text {G}}})^\top - \frac{1}{\hat{\sigma ^2}} \hat{\beta }_{\text {G}j} {\varvec{X}}^\top \\ \\ - \frac{1}{\hat{\sigma }_1^4} \hat{\beta }_{\text {G}j} ({\varvec{y}}_1- {\varvec{X}}_1 \hat{{\varvec{\beta }}_{\text {G}}})^\top &{}&{} 0 \\ \\ 0 &{}&{} - \frac{1}{\hat{\sigma }_2^4} \hat{\beta }_{\text {G}j} ({\varvec{y}}_2- {\varvec{X}}_2 \hat{{\varvec{\beta }}_{\text {G}}})^\top \end{array}\!\right) \!, \end{aligned}$$

where \({\varvec{s}}_{j}\) is the \(p \times 1\) vector with an one in the the jth position and zeros in the other positions, for \(j=1, \ldots , p\).

3.3 Global influence

In addition to our local influence statistics \({\varvec{F}}_{ii}\) and \({\varvec{l}}_{\max }\) given in Sect. 3.1, using the case deletion method, we introduce three global influence statistics: LD, relative change (RC) and generalised Cook distance (GCD). These are defined by

  1. (a)

    \(\text {LD}_i = 2[\ell (\hat{{\varvec{\theta }}}) - \ell (\hat{{\varvec{\theta }}_{i}})]\),

  2. (b)

    \(\text {RC}_i = ||\hat{{\varvec{\theta }}}_{(i)} - \hat{{\varvec{\theta }}}||/||\hat{{\varvec{\theta }}}||\),

  3. (c)

    \(\text {GCD}_i = (\hat{{\varvec{\theta }}}_{(i)} -\hat{{\varvec{\theta }}})^\top \text {J}(\hat{{\varvec{\theta }}})^{-1}(\hat{{\varvec{\theta }}}_{(i)} -\hat{{\varvec{\theta }}})/k\),

for measuring differences between fits with and without the ith observation, for \(i=1, \ldots , n\). In the spherical disturbance case, to calculate these global influence measures, \(\ell \) is given in (5), \(\hat{{\varvec{\theta }}}\) is obtained using all the observations, \(\hat{{\varvec{\theta }}}_{(i)}\) is computed using the data without the ith observation, \(\varvec{J}(\hat{{\varvec{\theta }}})\) is given in (13) and \(k=p+1\) is the dimension of \({\varvec{\theta }}\). In the non-spherical disturbance case, \(\ell \) is given in (10), \(\hat{{\varvec{\theta }}}_{\text {G}}\) is obtained using all the observations, \(\hat{{\varvec{\theta }}}_{\text {G} i}\) is calculated using the data without the ith observation, \(\varvec{J}(\hat{{\varvec{\theta }}}_{\text {G}})\) is in (14) and \(k=p+2\) is the dimension of \({\varvec{\theta }}_{\text {G}}\).

4 Numerical illustration

In this section, we illustrate and validate the diagnostic methodology proposed in this work with three empirical examples of real-world data. The first two examples are for the spherical disturbance case and the third one is for the non-spherical disturbance case.

4.1 Example 1

We analyze the data for a response with six covariables observed in 40 metropolitan areas used in a study by Ramanathan (1993, Table 10.1) and Cysneiros and Paula (2005). The objective of this study is to regress the number (in thousands) of subscribers with cable TV (Y) against the number (in thousands) of homes in the area (\(X_1\)), the per capita income for each TV market with cable (\(X_2\)), the installation fee (\(X_3\)), the monthly service charge (\(X_4\)), the number of TV signals carried by each cable system (\(X_5\)) and the number of TV signals received with good quality without cable (\(X_6\)). As Y is a count, we take its square root to try to stabilize the variance. Thus, the model is

$$\begin{aligned} {\sqrt{Y}_i} = \beta _0 + \sum _{j=1}^6 x_{ij} \beta _j + \varepsilon _i, \end{aligned}$$

where \(\varepsilon _i \sim \text {N}(0, \sigma ^2)\) are mutually independent errors, for \(i=1,\ldots ,40\). It is reasonable to expect the effect of each coefficient to be unidirectional, such as in Cysneiros and Paula (2005), so that the opposite direction is theoretically impossible. We may focus on assessing whether the number of subscribers changes as the monthly service charge changes, that is, to assess whether \(\beta _4 = 0\) or not, which can be treated as an exact linear restriction. Of course, in the same way for the remaining covariables, we may be interested in assessing other equality restrictions. Therefore, we use \({\varvec{R}} {\varvec{\beta }} = {\varvec{0}}\), where

$$\begin{aligned} {\varvec{R}} = \left( \begin{array}{ccccccc} 0 &{}\quad 0 &{}\quad 1 &{}\quad 0 &{}\quad 0 &{}\quad 0 &{}\quad 0 \\ 0 &{}\quad 0 &{}\quad 0 &{}\quad 1 &{}\quad 0 &{}\quad 0 &{}\quad 0 \\ 0 &{}\quad 0 &{}\quad 0 &{}\quad 0 &{}\quad 1 &{}\quad 0 &{}\quad 0 \\ 0 &{}\quad 0 &{}\quad 0 &{} \quad 0 &{}\quad 0 &{}\quad 1 &{} \quad 0 \end{array}\right) , \end{aligned}$$

and \({\varvec{\beta }} = (\beta _0, \beta _1, \ldots , \beta _6)^\top \), such as in Cysneiros and Paula (2005). This corresponds to the exact linear restrictions \(\beta _2 = \beta _3 = \beta _4 = \beta _5 = 0\). For these data, Cysneiros and Paula (2005) found case #14 to be most influential using the standardised residuals, and both cases #1 and #14 to be the most influential using the total local influence based on a quadratic penalty function. Employing our formulas provided in Sect. 3, we obtain the RML estimates and the plots of the diagonal elements of \({\varvec{F}}\) and \({\varvec{l}}_{\max }\). The plot of standardised residuals in Fig. 1 may indicate two extremal observations. However, the potentially influential observations in Fig. 2 include cases #21 and #16 identified by \(|{\varvec{F}}_{ii}|\), plus those found by Cysneiros and Paula (2005) and our \({\varvec{l}}_\text {max}\). In addition, the three global influence statistics in Fig. 3 indicate that only cases #14 and #1 are influential.

Fig. 1
figure 1

Plot of standardised residual versus index for data of Example 1 in the indicated perturbation scheme

Fig. 2
figure 2

Plots of diagonal element of \({\varvec{F}}\) versus index (left) and of element of \({\varvec{l}}_{\max }\) versus index (right) for data of Example 1 in the indicated perturbation scheme

Fig. 3
figure 3

Plots of the indicated global influence statistic versus index for data of Example 1

4.2 Example 2

We use the data considered in Paula and Cysneiros (2010, Application 1). The response (Y) is the weight to height ratio (scaled by a factor of 100) versus age (X) for 72 children from birth to 71.5 months. A restricted normal model is proposed with response required to be cubic for \(x \le t_0\) and linear for \(x > t_0\) (\(t_0 = 16\) months), that is,

$$\begin{aligned} Y_i = \beta _0 + \sum _{j=1}^3 x_i^j \beta _j + \beta _4 (x_i - t_0)_+^3 + \varepsilon _i, \quad i=1,\ldots ,72, \end{aligned}$$

where \(\varepsilon _i \sim \text {N}(0, \sigma ^2)\) are mutually independent errors, and \((x_i - t_0)_+ = 0\) if \(x \le t_0\), or \((x_i - t_0)_+ = x_i - t_0\) if \(x \ge t_0\) with the restrictions \({\varvec{R}} {\varvec{\beta }} = {\varvec{0}}\) to guarantee a linear tendency after \(t_0\). Thus,

$$\begin{aligned} {\varvec{R}} = \left( \begin{array}{ccccc} 0 &{}\quad 0 &{}\quad 1 &{}\quad 0 &{}\quad -3 t_0 \\ 0 &{}\quad 0 &{}\quad 0 &{}\quad 1 &{}\quad 1 \end{array}\right) , \end{aligned}$$

and \({\varvec{\beta }} = (\beta _0, \beta _1, \beta _2, \beta _3, \beta _4)^\top \).

Using a quadratic penalty function under a model perturbation scheme, Paula and Cysneiros (2010, Application 1) applied the total local influence method to \({\varvec{\beta }}\) and then to \(\sigma ^2\). They found two young children identified as cases #2 and #8 having a large total local influence on \({\varvec{\beta }}\) and cases #8, #21 and #25 having a large influence on \(\sigma ^2\). Using our formulas provided in Sect. 3, we obtain the RML estimates and the plots of the diagonal elements of \({\varvec{F}}\) and \({\varvec{l}}_\text {max}\). The plot of standardised residuals in Fig. 4 indicates four extremal observations. However, the potentially influential cases in Fig. 5 include #1, #72 and #9, and even #2 and #8, identified by \({\varvec{F}}_{ii}\) and \({\varvec{l}}_\text {max}\), in addition to those found by Paula and Cysneiros (2010, Application 1) with case #1 or #72 being the most influential ones in one or two perturbation schemes. In Fig. 6, the LD statistics may indicate that cases #8, #2, #1 and #21 are influential. The other two global influence statistics indicate that only cases #2 and #1 are influential.

Fig. 4
figure 4

Plot of standardised residual versus index for data of Example 2

Fig. 5
figure 5

Plots of diagonal element of \({\varvec{F}}\) versus index (left) and of element of \({\varvec{l}}_{\max }\) versus index (right) for data of Example 2 in the indicated perturbation scheme

Fig. 6
figure 6

Plots of the indicated global influence statistic versus index for data of Example 2

4.3 Example 3

The objective of this example is to apply our results with restrictions for the non-spherical case. We use the data analysed in Examples 9.8 and 9.9 in Wooldridge (2013) to study if the R&D intensity increases with firm size, while an equality restriction on the coefficient may be involved. Following Wooldridge (2013), we suppose that R&D expenditures as a percentage of sales are related to sales and profits as a percentage of sales by

$$\begin{aligned} Y_i = \beta _0 + x_{i1} \beta _1 + x_{i2} \beta _2 + \varepsilon _i, \quad i=1, \ldots , 32, \end{aligned}$$

where the responses \(Y_i\) are for the R&D intensity, the covariables \(x_{i1}\) and \(x_{i2}\) are the firm sales (in millions) and the profit margins, respectively. For a simple illustration, we impose \({\varvec{R}}=(0, 1, 0)\) and \({\varvec{r}}=0.00005\), which agrees with the data analysis in Wooldridge (2013). After ordering the data by the sizes of \(x_{i1}\), that is, the sales, and splitting the data into two groups of 16 observations, we assume \(\varepsilon _i \sim \text {N}(0, \sigma _1^2)\), for \(i=1, \ldots , 16\), and \(\varepsilon _i \sim \text {N}(0, \sigma _2^2)\), for \(i=17, \ldots , 32\), are mutually independent errors. We fit the model and conduct the Goldfeld-Quandt test for heteroscedasticity with \(p\text { value}=0.012 < 0.05 =\alpha \), which supports the appropriateness of a non-spherical disturbance model with \(\sigma _1^2 \ne \sigma _2^2\).

We present a plot of the standardised residuals in Fig. 7. Note that this residual for case #1 is greater than 2. In Fig. 8, the local influence statistics find case #1 to be most influential and cases #22, #30, #32 and #10 to be possibly influential. In Fig. 9, the global influence statistics suggest cases #1 and #30 to be influential, with cases #32 and #10 to be noted. Using a dummy variable approach, Wooldridge (2013) found cases #10 and #1 to be individually influential. They are the largest firm (case #10) and the firm with the highest value of R&D intensity (case #1). We find that case #1 is more influential than case #10, and identify additional observations including cases #22 and #30 as potentially influential using our local influence method with \({\varvec{F}}_{ii}\) and even \({\varvec{l}}_\text {max}\). These cases are not identified by the global influence or dummy variable approaches.

Fig. 7
figure 7

Plot of standardised residual versus index for data of Example 3

Fig. 8
figure 8

Plots of diagonal element of \({\varvec{F}}\) and \({\varvec{l}}_\text {max}\) versus index for data of Example 3 in the indicated perturbation scheme

Fig. 9
figure 9

Plots of the indicated global influence statistic versus index for data of Example 3

5 Concluding remarks

We have established results for influence diagnostics in the possibly heteroskedastic regression linear model with exact linear restrictions. We have used the restricted maximum likelihood estimators with Lagrange multipliers for the linear penalty function to find the diagnostic matrix. On one hand, the empirical examples have indicated that our results can be used to make findings similar to those on the same datasets provided in Cysneiros and Paula (2005) and Paula and Cysneiros (2010). On the other hand, we have seen that our results are different from those given in Cysneiros and Paula (2005) and Paula and Cysneiros (2010) in a number of ways. Our methodology may be used in the non-spherical disturbance case and further extended models. For diagnostics, we have used not just the total local influence statistics which are based on the diagonal elements of the diagnostic matrix, but also the direction eigenvector associated with its largest eigenvalue. We have compared the local influence statistics with three global influence statistics by examining the possible influential observations identified by these sets of influence statistics. Our local influence statistics have identified additional influential observations than the local influence statistics in Cysneiros and Paula (2005) and Paula and Cysneiros (2010) and our global influence statistics. These results are directly related to the restricted least squares estimators, which have been widely used in econometrics and statistics. They also complement those results for the linear model established by Liu and Neudecker (2007) using a sensitivity analysis approach. Our results can be used and implemented in a reasonably easy way by computer packages. Our Matlab codes are available on request.