Abstract
In this paper we present a novel method for testing growth curves when the analysis is based on spline functions. The new method is based on the use of a spline approximation. For the approximated spline model an exact F-test is developed. This method also applies under a certain type of correlation structures that are especially important in the analysis of repeated measures and growth data. We tested this method on the glucose data of Zerbe (J Am Stat Assoc 74:215–221, 1979) and also investigated it by simulation experiments. The new method proved to be a very powerful modeling and testing tool especially in situations, where the growth curve may not be easy to approximate using simple parametric models.
Access provided by Autonomous University of Puebla. Download conference paper PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
3.1 Introduction
Longitudinal research has an important role in various fields of science, for example in medicine, economics, social sciences, and engineering. The aim is to analyze the change caused, e.g., by growth, degradation, maturation, and ageing when individuals are followed over time or according to some other ordered sequence of measurements. In this paper the focus is on complete and balanced data. One of the most important statistical models for these data is the growth curve model of Potthoff and Roy (1964). The early development of this model was mainly based on the unstructured MANOVA assumption of the covariance matrix of independent random vectors (e.g., Khatri 1966 and Grizzle and Allen 1969). Later, however, more attention has been paid to modeling the covariance matrix by using parsimonious covariance structures (see, e.g., Azzalini 1987, Lee 1988 and Nummi 1997). For excellent reviews of the growth curve model we refer to the books by Kshirsagar and Smith (1995) and Pan and Fang (2002).
Our approach is to use cubic smoothing splines to model the mean growth curve. As is very well known cubic smoothing splines are very flexible curves with interesting mathematical properties (see, e.g., Green and Silverman 1994). For an up-to-date summary of recent methods of smoothing splines and nonparametric regression we refer to Wu and Zhang (2006). Approximate inference with smoothing splines have been studied, e.g., in Eubank and Spiegelman (1990), Schimek (2000), and Cantoni and Hastie (2002). In their simulation study Liu and Wang (2004) compared six testing statistics. Nummi et al. (2011) provided a test of a regression model against spline alternative for correlated data. The main focus in these studies have been on testing the order of the polynomial model against a spline alternative. However, testing if two or more splines are equal would be very important in many applications. Nummi and Koskela (2008) introduced some results for the estimation and rough testing of growth curves when the analysis is based on spline functions. However, very little research about testing equality of smoothing splines, especially for correlated data, has been carried out so far. In this paper we focus on testing if the progression in time is equal over the set of correlated observations.
In Sect. 3.2 we introduce the basic growth model and its estimation using cubic smoothing splines. In Sect. 3.3 a spline approximation is introduced and a test for mean curves is developed. In Sect. 3.4 a computational example of Glucose data is presented and the method is also investigated by simulation experiments.
3.2 Basic Spline Growth Model and Some Properties
One of the most important statistical models for balanced complete multivariate repeated measures data is the GMANOVA (Generalized Multivariate Analysis of Variance Model) of Potthoff and Roy (1964). The model is often also refered to as the growth curve model. This model can be written as
where \(\mathbf{Y} = (\mathbf{y}_{1},\mathbf{y}_{2},\ldots,\mathbf{y}_{n})\) is the matrix of independent response vectors, T is a q ×p within-individual design matrix, A is an n ×m between-individual design matrix, B is an unknown p ×m parameter matrix to be estimated, and E is a q ×n matrix of random errors. It is assumed that the columns \(\mathbf{e}_{1},\ldots,\mathbf{e}_{n}\) of E are independently normally distributed as \(\mathbf{e}_{i} \sim N(\mathbf{0},\boldsymbol{\Sigma }),\ i = 1,\ldots,n.\) In the original model formulation \(\boldsymbol{\Sigma }\) was assumed to be an unstructured covariance matrix and the analyses were mainly based on the methods developed for linear models and multivariate analysis.
Often when analyzing growth data the true growth function is more or less unknown and there may not be any theoretical justification for any specific parametric form of the curve. Parametric models are then used for descriptive purposes rather than interpretative to summarize the information of development profile. A natural first choice in such situations is a low order polynomial curve. However, in many cases these models may fail to reveal important features of the growth process and more complicated models are therefore also needed.
Our approach is to use the cubic smoothing splines to model the mean growth curve. As is very well known cubic smoothing splines are very flexible curves with interesting mathematical properties (see, e.g., Green and Silverman 1994). We can write the model (3.1) in a slightly more general form as (see also Nummi and Koskela 2008)
where \(\mathbf{G} = (\mathbf{g}_{1},\ldots,\mathbf{g}_{m})\) is the matrix of smooth mean growth curves in time points \(t_{1},t_{2},\ldots,t_{q}\). We assume that the covariance matrix \(\boldsymbol{\Sigma }\) takes certain type of parsimonious structure \(\boldsymbol{\Sigma } {=\sigma }^{2}\mathbf{R}(\theta )\) with covariance parameters \(\theta\). In sequel we refer to this model as the spline growth model (SGM). The growth curve model of Potthoff and Roy (1964) is now the special case G = T B. The smooth solution for G can be obtained by minimizing the penalized least squares (PLS) criterion
where we denote \(\dot{\mathbf{G}} = \mathbf{G}\mathbf{A}^{\prime}\), \(\mathbf{H} ={ \mathbf{R}}^{-1}\), and K is the so-called roughness matrix arising from the common roughness penalty R P = ∫ g ′ 2 and α is a fixed smoothing parameter. For cubic smoothing splines the roughness matrix is
where the nonzero elements of banded q ×(q − 2) and \((q - 2) \times (q - 2)\) matrices ∇ and \(\boldsymbol{\Delta }\), respectively, are
and
where \(h_{j} = x_{j+1} - x_{j},j = 1,2,\ldots,(q - 1)\), and \(k = 1,2,\ldots,(q - 2)\). It can be shown that Q can be rewritten in an alternative form
where c is a constant and (H + α K) is a positive definite matrix. The function Q is minimized for given α and H when \(\dot{\mathbf{G}} = {(\mathbf{H} +\alpha \mathbf{K})}^{-1}\mathbf{H}\mathbf{Y}.\) This gives the spline estimator
However, the covariance matrix H may not be known and therefore the estimator (3.8) maybe difficult to use in practical situations. Fortunately, it can be shown that in certain important special cases the general spline estimator (3.8) simplifies to simple linear functions of the original observations Y. One obvious condition for such kind of simplification is
and since now K = K H the spline estimators \(\tilde{\mathbf{G}}\) can be simplified as
where the smoother matrix is \(\mathbf{S} = {(\mathbf{I} +\alpha \mathbf{K})}^{-1}\). Covariance matrices satisfying the condition (3.9) have been studied in Nummi and Koskela (2008) and Nummi et al. (2011). Some important special cases of these structures useful for growth data are R = I, \(\mathbf{R} = \mathbf{I} +\sigma _{ d}^{2}\mathbf{1}\mathbf{1}^{\prime}\), \(\mathbf{R} = \mathbf{I} +\sigma _{ d^{\prime}}^{2}\mathbf{X}\mathbf{X}^{\prime}\) and \(\mathbf{R} = \mathbf{I} + \mathbf{X}\mathbf{D}\mathbf{X}^{\prime}\), where X = (1, x) and x is a vector of q measuring times.
If we apply the result \(\mbox{ vec}(\mathbf{A}\mathbf{B}\mathbf{C}) = (\mathbf{C}^{\prime} \otimes \mathbf{A})\mbox{ vec}(\mathbf{B})\), where the vec operation rearranges the columns of a matrix underneath each other, we can write the basic model (3.2) in a vector form
where y = vec(Y) and g = vec(G). If the spline estimates are written in vector form we have
and the smoother of the whole data is
where we denote \(\mathbf{P}_{a} = \mathbf{A}{(\mathbf{A}^{\prime}\mathbf{A})}^{-1}\mathbf{A}^{\prime}\) and \(\mathbf{S}_{{\ast}} = (\mathbf{P}_{a} \otimes \mathbf{S})\). The effective degrees of freedom of the smoother can now be given as
where edf = tr(S) is the effective degrees of freedom of the smoother S. It is further easy to see that the generalized cross-validation criteria for choosing the smoothing parameter α take the form
where y i and \(\hat{y}_{i}\) are individual elements of the observed and smoothed vectors y and \(\hat{\mathbf{y}}\), respectively.
3.3 Testing of Mean Curves
It is very well known that exact tests may be difficult to develop when making statistical inference based on smoothing splines. Our interest in this study focuses on testing if the progression in time is the same in treatment groups considered. In this study an exact test based on spline approximations for testing growth curves is developed.
3.3.1 Spline Approximation
It has been demonstrated by Nummi et al. (2011) that the approximation discussed in this paper is quite good for relatively smooth data. More detailed consideration of spline approximations can be found, e.g., in Hastie (1996). In a general case the smoother matrix S is not a projection matrix and therefore certain results, e.g. in testing, developed for general linear models are not directly applicable. Our approach is to utilize an approximation for the smoother matrix S with the properties of a projection matrix. As discussed by Hastie (1996) the smoother matrix can be written as
where M is the matrix of q orthogonal eigenvectors of K and \(\Lambda \) is a diagonal matrix of corresponding q eigenvalues. It is easily seen that K and S share the same set of eigenvectors \(\mathbf{m}_{1},\mathbf{m}_{2},\ldots,\mathbf{m}_{q}\) and the eigenvalues are connected such that the eigenvalues of S are \(\gamma = 1/(1+\alpha \lambda )\). In sequel we assume that eigenvectors \(\mathbf{m}_{1},\mathbf{m}_{2},\ldots,\mathbf{m}_{q}\) are ordered according to the eigenvalues of S. It is well known that the sequence of eigenvectors appears to increase in complexity like a sequence of orthogonal polynomials. The first two eigenvalues of S are always 1. We can set \(\mathbf{m}_{1} = \mathbf{1}/\sqrt{n}\) and \(\mathbf{m}_{2} = \mathbf{t}_{{\ast}},\) where \(\mathbf{t}_{{\ast}} = (\mathbf{t} -\bar{ t}\mathbf{1})/S_{t},\) \(\bar{t}\) is the mean and \(S_{t} = \sqrt{\sum _{i=1 }^{q }{(t_{i } -\bar{ t})}^{2}}\) is the square root of the sum of squares of the time points \(t_{1},\ldots,t_{q}\). Therefore the first two eigenvectors m 1 and m 2 span the subspace corresponding to the straight line model. In the mixed model formulation of the spline solution (e.g. Verbyla et al. 1999) this corresponds to the fixed part of the model. It is also easily observed that if the value of the smoothing parameter α increases the fit approaches the straight line model and the fitted line (fixed part) is not influenced by any specific choice of α.
Clearly, one obvious approximation of the spline fit (3.10) is the spline model
where \(\mathbf{P}_{m} = \mathbf{M}_{{\ast}}\mathbf{M}_{{\ast}}^{\prime}\) and M ∗ contains the c( ≤ q) first eigenvectors of M. This corresponds to minimizing the least squares (LS) criteria
where \(\dot{\mathbf{G}} = \mathbf{G}\mathbf{A}^{\prime}\). Note that the smoother matrix S and the smoothing parameter need not be computed here. However, the number of eigenvectors c from K used in the approximation needs to be estimated. This is easily done by, for example, using a modified generalized cross-validation criteria
where \(\bar{y}_{i}\) is now computed using the formula (3.11) with S replaced by P m .
3.3.2 Constructing a Test for Mean Spline Curves
First, consider the set of fitted spline curves
As discussed in the previous section we may use the approximation
where we denoted \(\hat{\Omega } = \mathbf{M}_{{\ast}}^{\prime}\mathbf{Y}\mathbf{A}{(\mathbf{A}^{\prime}\mathbf{A})}^{-1}\). All the relevant information for testing mean profiles is now in the matrix \(\hat{\Omega }\), which can now be considered to be an unbiased estimate of the unknown parameter matrix of the statistical model \(E(\mathbf{Y}) = \mathbf{M}_{{\ast}}\Omega \mathbf{A}^{\prime}\). Therefore in sequel we confine in testing linear hypothesis of the form
where C and D are known ν ×c and m ×g matrices with ranks ν and g, respectively. Since \(\mbox{ vec}(\mathbf{A}\mathbf{B}\mathbf{C}) = (\mathbf{C}^{\prime} \otimes \mathbf{A})\mbox{ vec}(\Omega )\), the vector form of H 0 is given by
where \(\omega = \mbox{ vec}(\Omega )\). If we take the vector form of \(\hat{\Omega }\), we get
It is now easily seen that the covariance matrix of \(\hat{\omega }\) is
If we denote \(\mathit{Var}(\mathbf{D}^{\prime} \otimes \mathbf{C})\hat{\omega } = \mathbf{W}\), it is then obvious that under the null hypothesis
and
By using the results \(\mbox{ tr}(\mathbf{A}\mathbf{Z}^{\prime}\mathbf{B}\mathbf{Z}\mathbf{C}) = (\mbox{ vec}\ \mathbf{Z})^{\prime}(\mathbf{C}\mathbf{A} \otimes \mathbf{B})\mbox{ vec}\ \mathbf{Z}\), it is further easy to see that Q ∗ can be rewritten as
If σ 2 is estimated by
it can be shown that \(n(q - c) {\times \hat{\sigma }}^{2} \sim \chi _{n(q-c)}^{2}\) and since Q ∗ and \({\hat{\sigma }}^{2}\) are independent testing can be based on the F-ratio. Then under the null hypothesis
Testing can then be based on the quantiles of the F-distribution. However, in practical situations the matrix R contains unknown parameters that need to be estimated and therefore the distribution of F in general case is only approximate. However, if we are only interested in progression in time we can drop the first eigenvector m 1 corresponding to the constant term in the approximation model (see Sect. 3.3.1). Therefore we can take C = [0, I], and if we assume the uniform covariance model \(\mathbf{R} = {d}^{2}\mathbf{1}\mathbf{1}^{\prime} + \mathbf{I}\), it can be shown that
where \(\mathbf{e}_{1} = (1,0,\ldots,0)^{\prime}\). Therefore the term Q ∗ simplifies to
which does not contain unknown parameters of the covariance matrix and therefore for this special case the distribution of the F-statistic is exact. This is an important result since the uniform covariance model is quite common and a good approximation in many situations. The F-test proposed here provides means to test if the progression in time is the same over treatment groups when the models are based on spline curves. Following the same kind of considerations it would be easy to develop an exact F-statistic to test if the progression around the fitted straight line (the so-called random part in mixed model formulation) is the same over treatment groups with the more general assumption of linear correlation structure \(\mathbf{R} = \mathbf{X}\mathbf{D}\mathbf{X}^{\prime} + \mathbf{I}\).
3.4 Computational Examples
3.4.1 Standard Glucose Tolerance Test
As the first computational example we consider the glucose data of Zerbe (1979). In these data glucose tolerance tests were administered to 13 control and 20 obese patients. Plasma inorganic phosphate measurements determined from blood samples drawn 0, 0.5, 1, 1.5, 2, 3, 4, and 5 h after standard oral glucose dose were taken. The curves plotted for the control and obese patients are plotted in Fig. 3.1. In Fig. 3.1, two features of the plotted curves are quite obvious. First, there is a considerable variation in patient’s individual levels. Secondly, the functional form of the dependency of plasma inorganic phosphate and time is quite complicated and possibly different for control and obese patients. In Zerbe (1979) a polynomial of degree of 4 was used to model this relationship.
To set up the spline growth model the between-individual design matrix A was first defined. For 13 control patients the rows of A are \((1,0),\ i = 1,\ldots,13\) and for 20 obese patients the rows of A are \((0,1),\ i = 14,\ldots,33\). The minimum value of G C V = 0. 4484152 is obtained at α = 0. 09410597. This gives the total effective degrees of freedom e d f ∗ = 9. 310273. The fitted curves are plotted in Fig. 3.1. It can be observed that the fitted spline curves very nicely depict the mean performance of measurements in both groups.
To test if the progression in time is the same in both groups we first determined the dimension c needed in the spline approximation. Minimizing the modified generalized cross-validation criterion gives c = 5. To test the null hypothesis we took
Next we calculated the estimate \(\mathbf{C}\hat{\Omega }\mathbf{D}\). This yields
and the residual variance estimate for this setup is \({\hat{\sigma }}^{2} = 0.09408348.\) For the covariance matrix R we assumed the uniform correlation model and therefore the exact version of the test statistics can be used. Then the value of Q ∗ is given as
and the value of the test statistics is then
If this is compared to the critical value F 0. 95(4, 99) = 2. 447, the null hypothesis of equal progression in mean plasma inorganic phosphate for control and obese patients is clearly rejected.
3.4.2 A Simulation Study
In order to demonstrate the advances of the methodology presented we conducted a simulation study. In this study two models were tested
with \(t = 1,\ldots,10\) and independent random errors ε i ∼ N(0, 1). The coefficient a takes the values 0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, and 0.7. The first group of growth curves consists of 100 random vectors generated from model 3.27 and the second consists of 100 random vectors generated from model 3.28. So, for each value of a these two sets of growth curves were generated. The mean growth curves are then tested against the null hypothesis that the progression in time is the same in both groups. Two methods were utilized. The spline testing method presented in this paper and the second method utilized here was the basic parametric least squares fit of the third degree polynomial model. The power was estimated with the significance level 0.05 by counting the percentage of rejections in the 1,000 repetitions.
The results are shown in Fig. 3.2. Clearly, the spline test presented in this paper performed better than the test based on the least squares fit of the third degree polynomial. This is obviously due to the fact that the fit provided by the splines better depicts the peculiarities of the unknown growth function.
3.5 Concluding Remarks
Traditional analyses of growth curves are often based on simple parametric curves, which may not satisfactorily depict all the features of the growth process during the testing period. The method presented in this paper is based on cubic smoothing splines, which provides a very flexible modeling tool for the analysis. However, very little research on the statistical inference (especially testing) of cubic smoothing splines for correlated data has been carried out. The novel test presented in this paper seems to provide a good alternative, especially when more accurate modeling of growth process is required.
References
Azzalini, A., (1987). Growth Curves analysis for patterned covariance matrices. In: New perspectives in theoretical and applied statistics (eds puri, M., Vilaplana, J.P. & Wertz, W) New York: Wiley, 63–70
Cantoni, E., & Hastie, T. (2002). Degrees-of-freedom tests for smoothing splines. Biometrika, 89(2), 251–263.
Eubank, R.L., & Spiegelman, C.H. (1990). Testing the goodness of fit of a linear model via nonparametric regression techniques. Journal of the American Statistical Association, 85(410), 387–392.
Green, P.J., & Silverman, B.W. (1994). Nonparametric regression and generalized linear models. London: Chapman and Hall.
Grizzle, J.E., & Allen, D.M. (1969). Analysis of growth and dose response curves. Biometrics, 25, 357–381.
Hastie, T. (1996). Pseudosplines. Jounal of the Royal Statistical Society, Series B, 58, 379–396.
Khatri, C.G. (1966). A note on a MANOVA model applied to problems in growth curves. Annals of the Institute of Statistical Mathematics, 18, 75–86.
Kshirsagar, A.M., & Smith, W.B. (1995). Growth curves. New York: Marcel Dekker.
Lee, J.C. (1988). Tests and model solution for general growth curve model. Biometrics, 47, 147–159.
Liu, A., & Wang, Y. (2004). Hypothesis testing in smoothing spline models. Journal of Statistical Computation and Simulation, 74, 581–597.
Nummi, T. (1997). Estimation in random effects growth curve model. Journal of Applied Statistics, 24(2), 157–168.
Nummi, T., & Koskela, L. (2008). Analysis of growth curve data using cubic smoothing splines. Journal of Applied Statistics, 35, 1–11.
Nummi, T., Jianxin, P., Siren, T., & Liu, K. (2011). Testing for cubic smoothing splines under dependent data. Biometrics, 67(3), 871–875. DOI: 10.1111/j.1541-0420.2010.01537.x.
Pan, J., & Fang, K. (2002). Growth curve models and statistical diagnostics, Springer series in Statistics. New York: Springer.
Potthoff, R.F., & Roy, S.N. (1964). A generalized multivariate analysis of variance model useful especially for growth curve problems. Biometrika, 5, 313–326.
Schimek, M.G. (2000). Estimation and Inference in partially linear models with smoothing splines. Journal of Statistical Planning and Inference, 91, 525–540.
Verbyla, A.P., Cullis, B.R., Kenward, M.G., & Welham, S.J. (1999). The analysis of designed experiments and longitudinal data by using smoothing splines (with discussion). Journal of the Royal Statistical Society, Series C, 48, 269–311.
Wu, L., & Zhang, J.T. (2006). Nonparametric regression methods for longitudinal data analysis. New Jersey: Wiley.
Zerbe, G.O. (1979). Randomization analysis of the completely randomized design extended to growth and response curves. Journal of the American Statistical Association, 74, 215–221.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer Science+Business Media New York
About this paper
Cite this paper
Nummi, T., Mesue, N. (2013). Testing of Growth Curves with Cubic Smoothing Splines. In: Dasgupta, R. (eds) Advances in Growth Curve Models. Springer Proceedings in Mathematics & Statistics, vol 46. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-6862-2_3
Download citation
DOI: https://doi.org/10.1007/978-1-4614-6862-2_3
Published:
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4614-6861-5
Online ISBN: 978-1-4614-6862-2
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)