Inverse prediction for multivariate mixed models with standard software

LaMotte, Lynn R.; Wells, Jeffrey D.

doi:10.1007/s00362-016-0815-2

Inverse prediction for multivariate mixed models with standard software

Regular Article
Published: 04 August 2016

Volume 57, pages 929–938, (2016)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Statistical Papers Aims and scope Submit manuscript

Inverse prediction for multivariate mixed models with standard software

Download PDF

385 Accesses
7 Citations
Explore all metrics

Abstract

Inverse prediction (IP) is reputed to be computationally inconvenient for multivariate responses. This paper describes how IP can be formulated in terms of a general linear mixed model, along with a flexible modeling approach for both mean vectors and variance–covariance matrices. It illustrates that results can be had as standard output from widely-available statistical computing packages.

General Aspects of Fitting Regression Models

Simple and Multiple Linear Regression

Further remarks on the connection between fixed linear model and mixed linear model

Article 04 October 2014

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

In a model relating a response y to a factor t, inverse prediction refers to a process for inferring t from the observed response $y_*$ for a subject with an unknown value of t. The inference is based on a model fit to training data from subjects at known values of t. The subject in question, with unknown t, will be termed the mystery specimen (MS) here, and its true (but unknown) value of t will be denoted $t_*$. It is presumed that the training data are from a process that is credibly similar to the one that produced the mystery specimen.

In applications, usually there are other factors that affect the response and the relation between the response and the principal factor t. Models for the response would include potential effects of those factors. However, to simplify and focus the developments here, we shall not explicitly include these other factors in formulations.

Our investigation into this statistical problem is prompted particularly by the objective, to estimate the time since death when a body is discovered and there is suspicion of criminal activity. Larvae of carrion-feeding flies grow and develop in regular patterns over time, and so their sizes can serve as a biological clock to guess their ages. Their growth is affected by multiple conditions, of which temperature is usually a major factor. As an example to correspond to the setting described above, y might be length and t time since egg deposition. The MS found at the scene has length $y_*$. Other dimensions of the size of the larva can be measured (weight, for example), and they may provide more information relevant to the age of the MS. Further, such larvae develop through identifiable stages, and so some information is categorical. Training data might already exist from rearing experiments on larvae of the same species. Or they might be produced after the discovery, perhaps even from larvae or adults found at the scene.

In practice, this setting is complex. See Catts (1992). The temperature profile often is not known, and it must be guessed. How, and which, specimens are collected at the scene can affect inferences. Multiple other conditions affect the rate of decomposition and the presence and growth of insect larvae. Although the useful species seldom lay eggs on living bodies, the time elapsed since the body was exposed and when the eggs were laid is not known. Thus, even if the age of a larva were known exactly, one could say only that the body had been exposed at least that long. In forensic terms, the age of a larva provides a minimum postmortem interval.

Despite these complexities, the crux of the statistical problem is the comparison of the response from the MS to the training data at any proposed time $t_0$, in answer to the question, “Is it tenable to think that this specimen could be of age $t_0$?” We assume that the response variable is quantitative, as opposed to categorical. We consider it to be multivariate in this paper. LaMotte and Wells (2016) give a corresponding development for univariate responses.

The problem has been approached in three broad ways for univariate responses, and these approaches carry over into multivariate responses, denoted here by ${\varvec{y}}$. The first is to fit a family of functions of t, $f(t;{\varvec{\beta }})$, to responses ${\varvec{y}}$ to get a representation like $\hat{{\varvec{y}}}(t)\,=\,f(t; \hat{{\varvec{\beta }}})$, where the estimate $\hat{{\varvec{\beta }}}$ of the vector of parameters ${\varvec{\beta }}$ results from fitting the model to the training data ${\varvec{y}}_1, \ldots , {\varvec{y}}_n$. Then estimate $t_*$ by minimizing an appropriate norm of $f(\hat{t}_*; \hat{{\varvec{\beta }}}) - {\varvec{y}}_*$ and construct an interval estimate in the form $\hat{t}_* \pm q_\alpha \widetilde{\text {SE}}(\hat{t}_*)$, where $q_\alpha $ is the $\alpha $ quantile of the standard normal distribution or of a Student’s t distribution, and $\widetilde{\text {SE}}(\hat{t}_*)$ is an approximate standard error of $\hat{t}_*$. The second approach is to fit a family of functions of ${\varvec{y}}$, $h({\varvec{y}}; {\varvec{\gamma }})$, to t, resulting in $\hat{t}({\varvec{y}})\,=\,h({\varvec{y}};\; \hat{{\varvec{\gamma }}})$, and then estimate $t_*$ as $\hat{t}_* \,=\, h({\varvec{y}}_*; \hat{{\varvec{\gamma }}})$, along with a prediction interval as if t were the random response variable.

The third, like the first, fits a family of functions of $f(t;\; {\varvec{\beta }}) $ to ${\varvec{y}}_1, \ldots , {\varvec{y}}_n$. Then, for each value $t_0$ over a range of values of t, ${\varvec{y}}_*$ is tested as an outlier against $f(t_0;\;\hat{{\varvec{\beta }}})$. If such tests can be had at the $\alpha $ level of significance, then the range of values $t_0$ not rejected constitutes a $100(1-\alpha )\%$ confidence set on the true $t_*$ (Lehmann 1959, p. 79). This is the basic approach taken here. It is the same as the approach that Oman and Wax (1984, p. 951) designate (ii) and attribute to Brown (1982), and it corresponds to Eq. (5.14) in Brown (1993, p. 88) and Eq. (2.11) in Sundberg (1999, p. 168). It is the direct extension of the test statistic for univariate $y_*$ as an outlier at $t_0$ in a linear regression of y on t. There, the confidence set comprises the range of values $t_0$ such that a horizontal line at height $y_*$ intersects the $100(1-\alpha )\%$ prediction interval on y at $t_0$.

The first two approaches yield interval estimates of $t_*$. The third may yield an interval, or a range fixed at only one end, or a collection of intervals, or an empty set. Under assumptions of independence, homogeneous variance, and normality, the coverage probability of confidence sets by the third approach is the nominal $100(1-\alpha )\%$. There is no such finite-sample probabilistic property of the other two approaches, although simulation results indicate that in some settings they perform satisfactorily. See, for example, Krutchkoff (1967) and other works spawned by that paper. LaMotte (2014) compares coverage rates of confidence sets on $t_*$ for the second (t vs. ${\varvec{y}}$) and third (${\varvec{y}}$ vs. t) approaches. In simulation results reported there, the performance characteristics of confidence sets on $t_*$ based on the second approach degraded when the variance–covariance matrix of bivariate ${\varvec{y}}$ was not constant in t, while the performance using the third approach did not.

There is a considerable literature on inverse prediction and calibration (a term also used for the same sort of methodology). See Osborne (1991) and Sundberg (1999) for comprehensive reviews and Brown (1993) for thorough development. For the most part, there is the tacit assumption that the variance or the variance–covariance matrix of the response is constant over t. The paper by Oman and Wax (1984) was one of the first papers to illustrate the application of multivariate calibration. Although they modeled the variance–covariance matrix as constant, they applied two adjustments to correspond to greater variance and lesser correlation as t increased. Clarke (1992) dealt with a multivariate response and a nonlinear model for the mean vector, along with constant variance–covariance matrix. Liao (2005) devised formulations of IP confidence sets based on mixed models that included random effects of batches; however, variances did not differ with the target of inference (t here). See also Brown (1982) for seminal methodological development, also with the variance assumed to be constant.

Sundberg (1999, pp. 168–169) summarizes the discussion provoked by a feature that Brown (1993, pp. 88–90) pointed out. The numerator sum of squares in the test statistic under the third approach can be expressed as the sum of two squared norms. One is of $t_0 - \hat{t}_*$, with $\hat{t}_*$ as described above for the first approach. The other, which Brown (1993) denotes by R, is the squared norm of ${\varvec{y}}_* - \hat{{\varvec{y}}}(\hat{t}_*)$, the difference between the MS ${\varvec{y}}_*$ and the predicted value of ${\varvec{y}}$ given by the model fit to training data, evaluated at $\hat{t}_*$. This is the minimum value of the norm mentioned above in the description of the first approach to find $\hat{t}_*$. It is 0 if the response is univariate and the model is linear in t with non-zero slope. Otherwise, it is a measure of “multivariate inconsistency” (a term attached by a reviewer), of the failure of the model to be able to match all components of ${\varvec{y}}_*$ simultaneously at a single value of t. Thus rejection of the proposed $t_0$ as untenable in light of the data depends both on how close $t_0$ is to the estimated true value $\hat{t}_*$ and on the magnitude of this multivariate inconsistency. Brown (1993, p. 89) says, “This unsatisfactory behavior sullies the procedure’s exact confidence property.” Sundberg (1999, p. 169) says,

...if R is high enough the region will be empty. In principle it is OK that a confidence region is empty when data do not fit the model, but here the shrinkage of the region with increasing R is misleading when we think of the size of the region as reflecting the precision of the estimation procedure. A number of alternatives without this annoying feature have been proposed. ...The time does not yet, if ever, appear ripe for declaring one region superior to the others.

Point estimation of $t_*$ is not an explicit goal in the third approach. It addresses the question, whether it is unreasonable to assert that ${\varvec{y}}_*$ might have come from $t_0$. In our opinion, splitting the squared norm of the residual into these two parts is not an essential part of addressing that question. Further, we opine that “think[ing] of the size of the region as reflecting the precision of the estimation procedure,” in this context of inverse prediction, is itself off-target and slightly misleading. That interpretation is widespread, and it is at times a convenient shortcut. However, it conflates not rejecting a hypothesized value $t_0$ with asserting that $t_0$ could be the true value, which then comes out as saying that the true value is in the confidence set, and hence that the smaller is the confidence set, the more precise is the inference on $t_*$. These interpretations are based on a single realization, not on the probabilistic properties of the methodology.

The primary statistical performance characteristic of a method to construct confidence sets is its coverage profile, which describes, for each true value $t_*$ and each possible value t, the probability that the set contains t. Ideally, given $t_*$, that probability would be the stated level of confidence at $t\,=\,t_*$, and it would fall off monotonically, the steeper the better, as t recedes from $t_*$. It is unclear how that performance might be related to the possibility that a set might be empty. And how a single realized set depends functionally on R does not say anything about this performance.

On the other hand, if, in some sense, R persistently accounts for a considerable part of the numerator sum of squares for the test statistic, then the assumed form of the model does not adequately mimic the path (in the univariate t) of the parameters of the distribution of the multivariate response. As both R and $\hat{t}_*$ depend on ${\varvec{y}}_*$, this feature cannot be assessed with the training data alone. One diagnostic approach would be to treat each observation in the training data as a ${\varvec{y}}_*$, fit the model to the remaining data, and compute R and the numerator sum of squares to assess the influence of this multivariate inconsistency. This may indicate that changes to the model are needed. It must be stressed, though, that $\hat{t}_*$ and R are not produced as part of the output for standard mixed-models programs, and so an effort like this would require special ad-hoc programming.

The methods we present here extend established methods that have known, exact properties under some conditions. We do not attempt to document their performance characteristics here, but because of their grounding, there is no a priori reason to think that their performance would not be satisfactory when compared to others. Furthermore, no comprehensive methodology has otherwise been proffered, as far as we have been able to find, for constructing confidence sets in the setting that has motivated our developments.

In the setting of forensic entomology mentioned above, sizes of fly larvae and their variances and covariances change greatly with age (t). Wells and LaMotte (1995) adapted Satterthwaite’s approximation to deal with t-dependent variances, by modeling both means and variances as linear interpolates between sampling points for a univariate response. LaMotte (2014) sketched how t-dependent variance–covariance matrices could be accommodated in linear-interpolation models for multivariate responses, using the Fai and Cornelius (1996) multivariate extension of Satterthwaite’s approximation. The developments in those two papers (1995, 2014) required sample means and sample variance–covariance matrices at each sampled t, and they did not consider models for the means and variance–covariance matrices beyond linear interpolation between adjacent t points.

The objective of this paper is to describe how IP can be performed with routine, default computations in widely-available mixed-models programs, for a general setting in which ${\varvec{y}}$ is multivariate and its variance–covariance matrix is modeled as varying with t and other factors, as needed. Flexible models, using polynomial splines in t, are described for both means and variances.

2 IP in mixed models

The formulation of the general mixed linear model shown here follows the notational conventions used in the SAS PROC MIXED documentation (SAS 2012). Suppose the response is q-variate and that there are n subjects in the training data with response vectors ${\varvec{Y}}_1, \ldots , {\varvec{Y}}_n$, with observed values ${\varvec{y}}_1, \ldots , {\varvec{y}}_n$. The model for the nq-vector response ${\varvec{Y}} \,=\, ({\varvec{Y}}_1^{\prime }, \ldots , {\varvec{Y}}_n^{\prime })^{\prime }$ is

$$\begin{aligned} {\varvec{Y}} \,=\,X{\varvec{\beta }} + Z{\varvec{\gamma }} +{\varvec{\varepsilon }}. \end{aligned}$$

The $nq\times qp_m$ matrix X and the $nq\times qp_v$ matrix Z are fixed and known. The vector ${\varvec{\beta }}$ comprises unknown fixed-effects parameters. The random effects are the entries in ${\varvec{\gamma }}$, which is assumed to have mean vector ${\varvec{0}}$ and variance–covariance matrix $\text {Var}({\varvec{\gamma }})\,=\, G$. The error term ${\varvec{\varepsilon }}$ is assumed to be independent of ${\varvec{\gamma }}$ and to have mean ${\varvec{0}}$ and $\text {Var}({\varvec{\varepsilon }})\,=\,R$. Denote the vector of realized values of ${\varvec{Y}}$ by ${\varvec{y}}$. The matrices G and R generally are modeled as specified functions of a set ${\varvec{\theta }}$ of parameters, sometimes termed variance or covariance components, such that

$$\begin{aligned} \text {Var}({\varvec{Y}}; {\varvec{\theta }}) \,=\, ZG({\varvec{\theta }})Z^{\prime } + R({\varvec{\theta }}) \end{aligned}$$

is positive definite. Finally, ${\varvec{\gamma }}$ and ${\varvec{\varepsilon }}$ are assumed to follow jointly a multivariate normal distribution.

In the setting of inverse prediction (IP) in this framework, t is a real-valued variable, taking values $t_1, \ldots , t_n$ in the training data, and entries in corresponding rows of X and Z are functions of t (and perhaps of other factors as well). Schematically, the training data are $t_i, {\varvec{y}}_i$, $i\,=\,1, \ldots , n$. Each $t_i$ defines the $p_m$ columns of the row vector ${\varvec{x}}_i^{\prime }$ and the $p_v$ columns of the row vector ${\varvec{z}}_i^{\prime }$. The values $t_1, \ldots , t_n$ may include repetitions for multiple observations at the same value of t. As illustrated here, columns of ${\varvec{x}}_i^{\prime }$ and ${\varvec{z}}_i^{\prime }$ are polynomial B-splines evaluated at $t_i$. From these, the matrices X and Z are as shown in the following table. Transpose of ${\varvec{x}}$ is denoted ${\varvec{x}}^{\prime }$, and $\otimes $ denotes the Kronecker product.

t	${\varvec{y}}$	x	Z
$t_1$	${\varvec{y}}_1$	$\text {I}_q\otimes {\varvec{x}}_1^{\prime }$	$\text {I}_q\otimes {\varvec{z}}_1^{\prime }$
$\vdots $	$\vdots $	$\vdots $	$\vdots $
$t_n$	${\varvec{y}}_n$	$\text {I}_q\otimes {\varvec{x}}_n^{\prime }$	$\text {I}_q\otimes {\varvec{z}}_n^{\prime }$

The question that IP addresses is: given the observed response ${\varvec{y}}_*$ from a MS, from what value of t did it come? Or, for each potential value $t_0$, is it tenable to think that ${\varvec{y}}_*$ came from the population at $t_0$? This can be addressed with a p value for the null hypothesis H$_0: E({\varvec{Y}}_*) \,=\, (\text {I}_q\otimes {\varvec{x}}(t_0)^{\prime }){\varvec{\beta }}$ in the model above. This is the same as testing the observed value ${\varvec{y}}_*$ as an outlier at $t_0$. This is a linear hypothesis, and its test statistic and p value can be had with appropriate statements and options within a general mixed linear models program. However, it is well-known that the same information can be had, usually more efficiently, from results computed by default in practically all statistical packages, as described next.

Formulate the model as above, but with ${\varvec{y}}_*$ as an additional observation at $t\,=\,t_0$; this appends rows ${\varvec{y}}_*$, $\text {I}_q\otimes {\varvec{x}}_0^{\prime }$, and $\text {I}_q\otimes {\varvec{z}}_0^{\prime }$ to ${\varvec{y}}$, x, and Z, respectively, where ${\varvec{x}}_0\,=\,{\varvec{x}}(t_0)$ and ${\varvec{z}}_0\,=\,{\varvec{z}}(t_0)$. Create q dummy variables $d_0$ that are 0 in all rows except the q $t_0$ rows, where they are $\text {I}_q$, and include them as q fixed-effect predictor variables in the model. In the resulting computations and output, the p value for testing H$_0$ appears as the p value for the q regression coefficients ${\varvec{\delta }}_0$ of the dummy variables: that is, for H$_0: {\varvec{\delta }}_0 \,=\, {\varvec{0}}$.

Repeating this over a grid of values of $t_0$ produces a table of values of $t_0$ and the corresponding p values. Further refinement can be had by interpolating between these points. The set of values of $t_0$ for which the p value is not less than .05 (say) constitutes—approximately—a $95\%$ confidence set on $t_*$.

The structure of the components of the model, augmented by the grid of artificial cases and dummy variables, is shown in the following table, where $t_{01}, \ldots , t_{0g}$ constitute the grid of values of $t_0$.

t	${\varvec{y}}$	x	Z	$d_{01}$	$\cdots $	$d_{0g}$
$t_1$	${\varvec{y}}_1$	$\text {I}_q\otimes {\varvec{x}}_1^{\prime }$	$\text {I}_q\otimes {\varvec{z}}_1^{\prime }$	$0_{q\times q}$	$ \cdots $	$ 0_{q\times q}$
$\vdots $	$\vdots $	$\vdots $	$\vdots $	$\vdots $	$\vdots $	$\vdots $
$t_n$	${\varvec{y}}_n$	$\text {I}_q\otimes {\varvec{x}}_n^{\prime }$	$\text {I}_q\otimes {\varvec{z}}_n^{\prime }$	$0_{q\times q}$	$ \cdots $	$0_{q\times q}$
$t_{01}$	${\varvec{y}}_*$	$\text {I}_q\otimes {\varvec{x}}_{01}^{\prime }$	$\text {I}_q\otimes {\varvec{z}}_{01}^{\prime }$	$\text {I}_q$	$ \cdots $	$0_{q\times q}$
$\vdots $	$\vdots $	$\vdots $	$\vdots $	$\vdots $	$\vdots $	$\vdots $
$t_{0g}$	${\varvec{y}}_*$	$\text {I}_q\otimes {\varvec{x}}_{0g}^{\prime }$	$\text {I}_q\otimes {\varvec{z}}_{0g}^{\prime }$	$0_{q\times q} $	$\cdots $	$ \text {I}_q$

Because of the heteroscedasticity inherent in mixed models, the probability distributions of test statistics under the null hypothesis are approximated by known distributions (usually F-distributions). The accuracy of these approximations is unknown, and it differs with the setting. Several papers have examined this topic, and they indicate that the Kenward and Roger (1997) approximation provides reasonable accuracy of p values. It is clear, though, that asymptotics are not much comfort here, because ${\varvec{y}}_*$ is a single observation.

An advantage of this approach, in terms of general mixed linear models, is that there is no need to have training data repeated at each sampled value $t_i$ of t, as was required in Wells and LaMotte (1995) and LaMotte (2014). Further, partial multivariate observations, in which some components of ${\varvec{y}}$ are unobserved, are incorporated naturally within the maximum likelihood algorithm, with no need for special remedies, other than the caveat that the missing-ness may be related to the factors under study and may therefore affect inferences.

3 Flexible models

The use of polynomial splines to model the mean of a univariate response as a function of t is well known and widely used. The same can be done for multivariate mean vectors. Choosing the degree of the polynomial (e.g., 1 for linear interpolation, 3 for cubic interpolation) and a set of values of t to serve as knots causes predictor variables $x_1, \ldots ,x_{p_m}$ to be defined so that the mean vector ${\varvec{\mu }}(t)$ is modeled as

$$\begin{aligned} {\varvec{\mu }}(t) \,=\, x_1(t){\varvec{\eta }}_1 + \cdots + x_{p_m}(t){\varvec{\eta }}_{p_m}. \end{aligned}$$

For degree d and k knots, the number of spline functions is $p_m\,=\,d+k+1$. There are multiple ways to formulate these predictor variables to produce polynomial interpolation. The most widely used functions are B-splines; they are the functions used here. They have the property that the values $x_j(t)$ are non-negative and they sum to 1. This models the mean vector ${\varvec{\mu }}(t)$ as a weighted average of the parameter vectors ${\varvec{\eta }}_1, \ldots , {\varvec{\eta }}_{p_m}$.

Within the context of a general mixed linear models package, the same approach can be taken to model the variance–covariance matrix. For the variance–covariance matrix $\Sigma (t)$ of a q-variate observation on a single subject at time t, the model takes the form

$$\begin{aligned} \Sigma (t) \,=\, w_1(t)\Gamma _1 + \cdots + w_{p_v}(t)\Gamma _{p_v}, \end{aligned}$$

where $w_1, \ldots , w_{p_v}$ are defined by B-splines in t specified by degree and knots. The variance–covariance matrix of the vector of all observations is block diagonal with $\Sigma (t_i)$, $i\,=\,1, \ldots , n$, on the diagonal. This is $ZG({\varvec{\theta }})Z^{\prime }$ in the general form, and $R({\varvec{\theta }})$ is not needed.

As with the means, the $w_j(t)$s are non-negative and they sum to 1. This property is particularly useful for modeling variance–covariance matrices, because nonnegative-definite matrices are closed under such convex combinations.

In the model for the mean vector, the coefficients ${\varvec{\eta }}_1, \ldots , {\varvec{\eta }}_{p_m}$ are unknown parameters, and they are components of ${\varvec{\beta }}$ in the general formulation. Similarly, the symmetric matrices $\Gamma _1, \ldots , \Gamma _{p_v}$ are unknown parameters; they are components of ${\varvec{\theta }}$.

In concept, the steps in this process leading to confidence sets on $t_*$ are clear. For the MS observation ${\varvec{y}}_*$, create multiple artificial cases with response ${\varvec{y}}_*$ at each value in a grid $t_{01}, \ldots , t_{0g}$, and create the gq corresponding dummy variables. Choose polynomial degrees and knots for the models of the mean vector and variance–covariance matrix. Compute $x_1(t), \ldots , x_{p_m}(t)$ and $w_1(t), \ldots , w_{p_v}(t)$ for each value of t in the training-data set and the grid. Compose X and Z from these. The next step is the computation of maximum likelihood estimates and approximate p values for the coefficients ${\varvec{\delta }}_{01}, \ldots , {\varvec{\delta }}_{0g}$ of the dummy variables corresponding to the grid on $t_0$.

By far the best way to accomplish the computations is to use a tested, stable, de-bugged program like PROC MIXED in SAS, which implements the Kenward–Roger approximation to variance and degrees of freedom, or corresponding programs in other packages. To use such programs requires specifying the particular model with the syntax of the program.

Commonly, programs communicate in terms of fixed effects, random effects, repeated measures, and factor and interaction effects. While the mathematical models, as described above, are clear, still they must be translated carefully in the program’s syntax in order to specify exactly the desired model. In most applications, some data handling is necessary to compose X and Z, including the polynomial splines and extra rows and columns for the dummy variables. In order to get the desired model for the variance–covariance matrix, columns of Z are composed as square roots of the B-spline variables, $z_j(t)\,=\,\sqrt{w_j(t)}$, and declared to represent random-coefficient effects. For a q-component response vector, the model must provide for different sets of coefficients in ${\varvec{\beta }}$ and different random coefficients for each component of the response, and the forms of variance–covariance matrices for the response and for the random coefficients must be specified. The details will differ from package to package.

4 Illustration

Following is an illustration. The response ${\varvec{y}}$ is bivariate. Data were simulated from a ‘true’ model in which the two components of the mean vector and the three components of the variance–covariance matrix changed smoothly with t going from 0 to 10.

The mathematical model from which the observations were generated was configured to resemble characteristics of size measurements (length and weight, for example) found in fly larvae. They start small, with small variances, go through a period of rapid growth in both dimensions during which variances increase and correlations shift; and as they approach pupation, they cease feeding and their growth slows, and their size may even decrease. The time scale has been shifted and scaled in this simulation to range betwen 0 and 10; in growth experiments, the first measurements would typically be taken soon after egg-hatch, but not at $t=0$.

The model for the mean vectors is in terms of cubic B-splines in t with a single interior knot at $t=5$. This generates $\textit{qp}_m=2\times 5$ columns in x. The matrix Z has $q\times p_v = 2\times 5$ columns, in five pairs, each with a pair of random coefficients in the model. Entries in Z are square roots of cubic B-splines with a single interior knot at $t=5$. The training data set comprised 5 observations each at values 0, 1, 2, 5, 8, and 10 of t. Dummy variables $d_{01}\text {I}_2, \ldots , d_{10}\text {I}_2$ and corresponding artificial cases were created for $t_0$ in increments of 0.25 between 0 and 10. In Table 1, the two components of ${\varvec{y}}$ are indexed by comp, which takes values 1 and 2 for components 1 and 2. The response ${\varvec{y}}_*$ for the MS was generated from the population at $t_*=3$.

Table 1 shows only the tests of the coefficients of the dummy variables from SAS PROC MIXED. Based on the p values in the right-most column, at the 5 % level of significance, only $t_0 = 3$ and 4 are tenable values of $t_*$.

Figure 1 shows the data used in this example, and it illustrates estimates of means and variance–covariance matrices (corresponding to the ellipses). It illustrates the p values by showing nominal 95 % prediction ellipsoids at selected values of t.

Table 1 P values (Pr> F) for tests of ${\varvec{y}}_*$ as an outlier at each $t_0=0, \ldots , 10$. Results from PROC MIXED using Kenward-Roger approximation and degrees of freedom

Full size table

References

Brown PJ (1982) Multivariate calibration. J R Stat Soc Ser B 44:287–321
MathSciNet MATH Google Scholar
Brown PJ (1993) Measurement, regression, and calibration. Clarendon Press, Oxford
MATH Google Scholar
Catts EP (1992) Problems in estimating the postmortem interval in death dnvestigations. J Agric Entomol 9(4):245–255
Google Scholar
Clarke GPY (1992) Inverse estimates from a multiresponse model. Biometrics 48(4):1081–1094
Article Google Scholar
Fai AHT, Cornelius PL (1996) Approximate F-tests of multiple degree of freedom hypotheses in generalized least squares analyses of unbalanced split-plot experiments. J Stat Comput Simul 54:363–378
Article MathSciNet MATH Google Scholar
Kenward MG, Roger JH (1997) Small sample inference for fixed effects from restricted maximum likelihood. Biometrics 53:983–997
Article MATH Google Scholar
Krutchkoff RG (1967) Classical and inverse regression methods of calibration. Technometrics 9(3):425–439
Article MathSciNet Google Scholar
LaMotte LR (2014) On inverse prediction in mixed linear models. Commun Stat Simul Comput 43(9):2106–2116
Article MathSciNet MATH Google Scholar
LaMotte, LR, and Wells JD (2016). Inverse prediction for heteroscedastic response using mixed models software. Commun Stat Simul Comput. (Published online 21 Nov 2015)
Lehmann EL (1959) Testing hypotheses. Wiley, New York
MATH Google Scholar
Liao JJZ (2005) A linear mixed-effects calibration in qualifying experiments. J Biopharm Stat 15:3–15
Article MathSciNet Google Scholar
Oman SD, Wax Y (1984) Estimating fetal age by ultrasound measurements: an example of multivariate calibration. Biometrics 40:947–960
Article Google Scholar
Osborne C (1991) Statistical calibration: a review. Int Stat Rev 59(3):309–336
Article MathSciNet MATH Google Scholar
Sundberg R (1999) Multivariate calibration—direct and indirect regression methodology. Scand J Stat 26:161–207
Article MathSciNet MATH Google Scholar
SAS Institute Inc. (2012) SAS 9.4. Cary NC, USA
Wells JD, LaMotte LR (1995) Estimating maggot age from weight using inverse prediction. J Forensic Sci JFSCA 40(4):585–590
Google Scholar

Download references

Acknowledgments

This project was supported by Award No. 2013-DN-BX-K042, awarded by the National Institute of Justice, Office of Justice Programs, U.S. Department of Justice. The opinions, findings, conclusions or recommendations expressed in this publication/program/exhibition are those of the author(s) and do not necessarily reflect those of the Department of Justice.

Author information

Authors and Affiliations

LSU Health Sciences Center, New Orleans, LA, USA
Lynn R. LaMotte
Florida International University, Miami, FL, USA
Jeffrey D. Wells

Authors

Lynn R. LaMotte
View author publications
You can also search for this author in PubMed Google Scholar
Jeffrey D. Wells
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Lynn R. LaMotte.

Rights and permissions

Reprints and permissions

About this article

Cite this article

LaMotte, L.R., Wells, J.D. Inverse prediction for multivariate mixed models with standard software. Stat Papers 57, 929–938 (2016). https://doi.org/10.1007/s00362-016-0815-2

Download citation

Received: 07 October 2015
Revised: 25 July 2016
Published: 04 August 2016
Issue Date: December 2016
DOI: https://doi.org/10.1007/s00362-016-0815-2

Keywords

Mathematics Subject Classification

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

t	\({\varvec{y}}\)	x	Z
\(t_1\)	\({\varvec{y}}_1\)	\(\text {I}_q\otimes {\varvec{x}}_1^{\prime }\)	\(\text {I}_q\otimes {\varvec{z}}_1^{\prime }\)
\(\vdots \)	\(\vdots \)	\(\vdots \)	\(\vdots \)
\(t_n\)	\({\varvec{y}}_n\)	\(\text {I}_q\otimes {\varvec{x}}_n^{\prime }\)	\(\text {I}_q\otimes {\varvec{z}}_n^{\prime }\)

t	\({\varvec{y}}\)	x	Z	\(d_{01}\)	\(\cdots \)	\(d_{0g}\)
\(t_1\)	\({\varvec{y}}_1\)	\(\text {I}_q\otimes {\varvec{x}}_1^{\prime }\)	\(\text {I}_q\otimes {\varvec{z}}_1^{\prime }\)	\(0_{q\times q}\)	\( \cdots \)	\( 0_{q\times q}\)
\(\vdots \)	\(\vdots \)	\(\vdots \)	\(\vdots \)	\(\vdots \)	\(\vdots \)	\(\vdots \)
\(t_n\)	\({\varvec{y}}_n\)	\(\text {I}_q\otimes {\varvec{x}}_n^{\prime }\)	\(\text {I}_q\otimes {\varvec{z}}_n^{\prime }\)	\(0_{q\times q}\)	\( \cdots \)	\(0_{q\times q}\)
\(t_{01}\)	\({\varvec{y}}_*\)	\(\text {I}_q\otimes {\varvec{x}}_{01}^{\prime }\)	\(\text {I}_q\otimes {\varvec{z}}_{01}^{\prime }\)	\(\text {I}_q\)	\( \cdots \)	\(0_{q\times q}\)
\(\vdots \)	\(\vdots \)	\(\vdots \)	\(\vdots \)	\(\vdots \)	\(\vdots \)	\(\vdots \)
\(t_{0g}\)	\({\varvec{y}}_*\)	\(\text {I}_q\otimes {\varvec{x}}_{0g}^{\prime }\)	\(\text {I}_q\otimes {\varvec{z}}_{0g}^{\prime }\)	\(0_{q\times q} \)	\(\cdots \)	\( \text {I}_q\)

Inverse prediction for multivariate mixed models with standard software

Abstract

Similar content being viewed by others