Subgroup Analysis with Partial Linear Regression Model

Zhou, Yizhao; Yuan, Ao; Tan, Ming T.

doi:10.1007/978-3-030-40105-4_11

Yizhao Zhou¹⁰,
Ao Yuan¹⁰ &
Ming T. Tan¹⁰

Part of the book series: Emerging Topics in Statistics and Biostatistics ((ETSB))

948 Accesses

Abstract

In clinical trials it is common that the treatment has different effects on different subjects. This motivates the precision medicine, the goal is to identify the treatment favorable or unfavorable subgroups, if they exist, and classify the subjects into one of the subgroups based on their covariate values. In practice, some covariate(s) is known to affect the response non-linearly, in this case the existing linear model is not adequate. To address this issue, we use a partial linear model, in which the effect of some specific covariates is a non-linear monotone function, along with a linear part for the rest of the covariates. This approach not only makes the model more flexible than the parametric linear model, and more interpretable and efficient than the full nonparametric model. The Wald statistics is used to test the existence of subgroups, and the Neyman-Pearson rule is used to classify the subjects. Simulation studies are conducted to evaluate the performance of the method, and then the method is used to analyze a real clinical trial data.

Access provided by Autonomous University of Puebla. Download chapter PDF

A Framework of Statistical Methods for Identification of Subgroups with Differential Treatment Effects in Randomized Trials

Quantile-Based Subgroup Identification for Randomized Clinical Trials

Article 26 June 2020

A robust threshold t linear mixed model for subgroup identification using multivariate T distributions

Article 23 May 2022

1 Introduction

In clinical studies, often treatment effect is not uniform over all the patients, some subgroup of patients may benefit significantly from the treatment and others may not so. Thus one of goals of precision medicine is to find out if such subgroups exist or not, and if existence is justified, identify the subgroups of patients according to their covariate values. For example, in IBCSG (2002), patients with ER-negative tumors were likely to benefit from chemotherapy, while those with ER-positive tumors did not.

Subgroup analysis is recently a very active research area see, e.g., Sabine (2005), Song and Chi (2007), Ruberg et al. (2010), Foster et al. (2011), Lipkovich et al. (2011), Friede et al. (2012), Shen and He (2015), Fan et al. (2017), and Ma and Huang (2017). Rothmann et al. (2012) discussed issues for subgroups testing and analysis. Fokkema (2018) used generalized linear mixed-effect model tree (GLMM tree) algorithm detecting treatment-subgroup interactions in clustered datasets. Yuan et al. (2018, 2020) proposed semiparametric methods for this problem.

Existing methods for this problem often use linear model. In practice, sometimes it is known that some covariate has non-linear effect on the response, incorporating such information can improve the quality of the analysis. Here we consider such case and apply a more featured partial linear model to identify the existence of subgroups and to classify the subjects into different subgroups if the existence of subgroup is confirmed. This model assumes a monotone non-linear effect of some covariate, and linear effects from the rest covariates. First, a partial model with individual subgroup membership as latent variable and with a covariate whose effect are known as non-linear are formulated and the model regression parameters is estimated with expectation-maximization algorithm (E-M algorithm), and isotonic regression method is used for the maximum likelihood of the nonparametric non-linear part. Then null hypothesis of non-existence of subgroups are tested with Wald Statistics. If the existence of subgroup is confirmed, we use the Neyman-Pearson rule to classify each subject so that the misclassification error for the treatment favored group is under control while the misclassification error for the other subgroup is minimized.

The rest of the chapter is organized as follows. In Sect. 11.2 we describe the model and parameter estimation, Sect. 11.3 elaborates the testing and classification method, and Sect. 11.4 illustrates the simulation study and real data analysis.

2 The Method

The observed data is denoted as D _n = {(y _i, x _i, z _i), i = 1, …, n}, where y _i ∈ R is the response variable of i-th subject, x _i = (x _i1, …, x _id)′∈ R ^d and z _i is another covariate, which is known to have a non-linear monotone effect on the response. Each subject i receives the same treatment, and we assume that bigger value of the response corresponds to better treatment effects. We want to test if there are treatment favorable and non-favorable subgroups in the patients. If subgroup does exist, we need to classify each subject into corresponding subgroup based on his/her covariate profile. In this paper, we assume that there are only two potential subgroups: treatment-favorable and treatment-nonfavorable subgroups. We need first to specify the model, estimate the model parameters, and then perform the hypothesis test and classification of subjects.

2.1 The Semiparametric Model Specification

We specify the semiparametric partial linear model as

$$ \vspace*{-3pt}\begin{aligned}y_i = \boldsymbol{\beta}'\boldsymbol{x}_i+g(z_i)+\delta_i\eta+\epsilon_i, ~~~~ ~~~~ \epsilon\sim N(0,1),~~~~~~~~g\in \mathcal{G},\vspace*{-3pt}\end{aligned}$$

where δ _i is a latent indicator for whether subject i belongs to the treatment favorable subgroup (δ _i = 1) or not (δ _i = 0). β is a d-vector of unknown parameters, η is the effect of treatment favorable subgroup, and the constraint η ≥ 0 is used for the identifiability with the intercept vector term in β. It is assumed that the covariate z _i has a non-linear effect g(⋅) to the response y _i, we only know that $g(\cdot )\in \mathcal {G}$, the collection of all monotone increasing functions on R.

Denote the i.i.d. copy of the (y _i, x _i, z _i, δ _i, ε _i)’s as (y, x, z, δ, ε). Let λ = P(δ = 1) and θ = (β′, η, λ)′ be the vector of all the Euclidean parameters. Conditioning on (x, z), the density of y is the mixture

$$ \vspace*{-3pt}\begin{aligned}h(y|\boldsymbol{x},z,\boldsymbol{\theta}) = \lambda \phi\Big(y-\boldsymbol{\beta}'\boldsymbol{x}-g(z)-\eta\Big)+(1-\lambda)\phi\Big(y-\boldsymbol{\beta}'\boldsymbol{x}-g(z)\Big).\end{aligned}\vspace*{-3pt} $$

where ϕ(⋅) is the density function of the standard normal distribution. The log-likelihood of the observed data is

$$\displaystyle \begin{aligned}\ell(\boldsymbol{\theta},g|D_n) =& \sum_{i=1}^n \log \Big(\lambda \phi(y_i{-}\boldsymbol{\beta}'\boldsymbol{x}_i{-}g(z_i){-}\eta)+(1-\lambda)\phi(y_i-\boldsymbol{\beta}'\boldsymbol{x}_i-g(z_i))\Big), \\ &\boldsymbol{\theta}\in\boldsymbol{\Theta},~~g\in ;\mathcal{G}.\raisetag{12pt} \end{aligned} $$

(11.1)

Direct computation of the maximum likelihood estimate (MLE) from a mixture model (11.1) is not convenient, especially in the presence of the nonparametric component g(⋅), and it is known that E-M algorithm (Dempster et al. 1977) is typically easy to use. For this, we treat the latent variable δ _i’s as missing data, with δ _i = 1 if the i-th subject belongs to the treatment-favorable subgroup, otherwise δ _i = 0. The likelihood based on the ‘complete data’ $D^c_n=\{(y_i,\boldsymbol {x}_i,z_i,\delta _i): i=1,\ldots ,n)\}$ is

$$\vspace*{-3pt} \begin{aligned}L(\boldsymbol{\theta},g|D^c_n)=\prod_{i=1}^n \bigg(\lambda \phi(y_i{-}\boldsymbol{\beta}'\boldsymbol{x}_i{-}g(z_i){-}\eta)\bigg)^{\delta_i} \bigg((1{-}\lambda) \phi(y_i{-}\boldsymbol{\beta}'\boldsymbol{x}_i{-}g(z_i))\bigg)^{1{-}\delta_i},\end{aligned}\vspace*{-3pt}$$

the corresponding log-likelihood is

$$\displaystyle \begin{aligned}\ell (\boldsymbol{\theta},g|D^c_n)=\sum_{i=1}^n \Big(\delta_i\log \phi(y_i-\boldsymbol{\beta}'\boldsymbol{x}_i-g(z_i)-\eta)\vspace*{-12pt}\end{aligned}$$

$$\displaystyle \begin{aligned}+ (1-\delta_i)\log \phi(\boldsymbol{y}_i-\boldsymbol{\beta}'\boldsymbol{x}_i-g(z_i)) +\delta_i\log\lambda +(1-\delta_i)\log(1-\lambda)\Big). {}\end{aligned} $$

(11.2)

The semiparametric MLE $(\hat {\boldsymbol {\theta }}_n,\hat {f}_n)$ of the true parameter (θ ₀, f ₀) is given by

$$\displaystyle \begin{aligned}(\hat{\boldsymbol{\theta}}_n,\hat{g}_n) = \arg\max_{(\theta,g)\in (\Theta,\mathcal{G})} \ell(\boldsymbol{\theta},g|D^c_n). {}\end{aligned} $$

(11.3)

2.2 Estimation of Model Parameters

As the δ _i’s are missing, $(\hat {\boldsymbol {\theta }}_n,\hat {g}_n)$ in (11.3) cannot be computed directly, the EM algorithm is used instead. For this a starting value θ ⁽⁰⁾ of θ is needed, then find $g^{(1)}(\cdot ) \in \mathcal {G}$ as the maxima of $\ell (\boldsymbol {\theta }^{(0)},g|D^c_n)$, then fix g ⁽¹⁾, find θ ⁽¹⁾ ∈ Θ as the maxima of ℓ _n(θ, g ⁽¹⁾), and so on…. until convergence of the sequence {(θ ^(r), g ^(r))}, which is increasing the likelihood at each iteration, and will converge to at least some local maxima of ℓ _n(θ, g). In fact, the increasing likelihood property is obvious, as for all integer r,

$$\displaystyle \begin{aligned}\ell(\boldsymbol{\theta}^{(r+1)},g^{(r+1)}|D^c_n) \geq \ell(\boldsymbol{\theta}^{(r)},g^{(r+1)}|D^c_n) \geq \ell(\boldsymbol{\theta}^{(r)},g^{(r)}|D^c_n).\end{aligned}$$

A formal justification of the convergence of the above iterative algorithm is a case of the block coordinate descent methods in Bertsekas (2016).

Our algorithm is a semiparametric version of EM algorithm, see also Tan et al. (2009, chap. 2) for bio-medical applications of this algorithm. The semiparametric and nonparametric EM algorithm was used in a large number of literatures, such as in Mun̂oz (1980), Campbell (1981), Hanley and Parnes (1983), Groeneboom and Wellner (1992, Section 3.1), and see the argument there for the convergence of such algorithm (p. 67–68). Chen et al. (2002) applied the EM algorithm to a semiparametric random effects model, Bordes et al. (2007) applied the EM algorithm to a semiparametric mixture model, using simulation studies to justify the convergence of the algorithm. Balan and Putter (2019) developed an R-package of EM algorithm for semiparametric shared frailty models.

Now we give the detail of the algorithm. At each iteration r, do the following:

Step 0. For fixed (g ⁽⁰⁾, θ ⁽⁰⁾), compute $\{\delta _i^{(0)}\}$ with E-step of E-M algorithm.
Step 1. For fixed (g ^(r), θ ^(r)), compute
$$\displaystyle \begin{aligned}H_n(\boldsymbol{\theta},g|\boldsymbol{\theta}^{(r)}, g^{(r)}) &= E_{\boldsymbol{\delta}}[\ell(\boldsymbol{\theta},g|D^c_n)|D_n,\boldsymbol{\theta}^{(r)}, g^{(r)}] \\ &= \sum_{i=1}^n\Big(\delta^{(r)}_i \log \phi(y_i-\boldsymbol{\beta}'\boldsymbol{x}_i-g(z_i)-\eta)\\&\quad +\delta^{(r)}_i\log\lambda) + (1-\delta^{(r)}_i)\log \phi(y_i-\boldsymbol{\beta}'\boldsymbol{x}_i-g(z_i)) \\&\quad +(1-\delta^{(r)}_i)\log(1-\lambda))\Big),\raisetag{12pt}\end{aligned} $$
(11.4)
where the expectation is taken with respect to the missing δ, and as if the true data is generated from parameters (θ ^(r), g ^(r)). In particular, the r-th step estimates of the δ _i’s (for i = 1, …., n;r = 0, 1, 2,…), are
$$\begin{aligned} \delta_i^{(r)} &=E(\delta_i|y_i,x_i,z_i,g^{(r)},\boldsymbol{\theta}^{(r)})=P(\delta_i=1|y_i,x_i,z_i,g^{(r)},\boldsymbol{\theta}^{(r)})\\ &=\frac{P(y_i|\delta_i=1,x_i,z_i,g^{(r)},\boldsymbol{\theta}^{(r)})P(\delta_i=1|x_i,z_i,g^{(r)},\boldsymbol{\theta}^{(r)})}{P(y_i|x_i,z_i,g^{(r)},\boldsymbol{\theta}^{(r)})}\\ &=\frac{\lambda^{(r)}\phi\Big(\boldsymbol{y}_i{-}\boldsymbol{\beta}^{'(r)}\boldsymbol{x}_i{-}g^{(r)}(z_i){-}\eta^{(r)}\Big)}{\lambda^{(r)}\phi\Big(\boldsymbol{y}_i{-}\boldsymbol{\beta}^{'(r)}\boldsymbol{x}_i{-}g^{(r)}(z_i){-}\eta^{(r)}\Big)+(1{-}\lambda^{(r)})\phi\Big(\boldsymbol{y}_i{-}\boldsymbol{\beta}^{'(r)}\boldsymbol{x}_i{-}g^{(r)}(z_i)\Big)}. \end{aligned} $$
Step 2. In the M-step for θ, compute
$$\displaystyle \begin{aligned}\boldsymbol{\theta}^{(r+1)} = \arg\sup_{\theta\in\Theta} H_n(\boldsymbol{\theta}, g^{(r)}|\boldsymbol{\theta}^{(r)}, g^{(r)}) .\end{aligned}$$
This step can be computed by standard optimization packages. Especially,
$$\displaystyle \begin{aligned}\lambda^{(r+1)} = \frac{1}{n}\sum_{i=1}^n\delta_i^{(r)}.\end{aligned} $$
Step 3. For fixed $(\boldsymbol {\theta }^{(r+1)},\delta _i^{(r+1)})$ compute
$$\displaystyle \begin{aligned}g^{(r+1)}(\cdot)= \arg\max_{g\in \mathcal{G}} H_n(\boldsymbol{\theta}^{(r+1)}, g|\boldsymbol{\theta}^{(r)}, g^{(r)}).\end{aligned} $$
This step computes the nonparametric maximum likelihood estimate of $\hat {g}$ under shape restriction, which is non-trivial, we describe it below.

2.2.1 Computation of g ^(r+1)

The pool adjacent violators algorithm (PAVA, see for example, Best and Chakravarti (1990)) is a convenient computational tool to perform such order restricted maximization or minimization, and is available in R. Patrick et al. (2009) gives a review of the algorithm history and computational aspects. In particular, the computation of $\hat {g}(z_i)=\hat {g}_i$ is as follows.

$$\displaystyle \begin{aligned} g^{(r+1)}(\cdot)&= \arg\max_{g\in \mathcal{G}} H_n(\boldsymbol{\theta}^{(r+1)}, g|\boldsymbol{\theta}^{(r)}, g^{(r)})\\ &= \arg\min_{g\in \mathcal{G}} \sum_{i=1}^{n}\Big(\delta_i^{(r)}\big(y_i-\beta^{(r)}x_i-\eta^{(r)}-g_i\big)^2\\&\quad + (1-\delta_i^{(r)})\big(y_i-\boldsymbol{\beta}^{(r)}\boldsymbol{x}_i-g_i\big)^2\Big)\\ &=\arg\min_{g\in \mathcal{G}} \sum_{i=1}^{n}\big(y_i-\beta^{(r)}x_i-\eta^{(r)}\delta_i^{(r)}-g_i\big)^2 \end{aligned} $$

Generally, let v _i = y _i −β′x _i − δ _iη, w _i = 1, then

$$\displaystyle \begin{aligned}\hat{g}=\arg\min_{g\in \mathcal{G}} \sum_{i=1}^{n}w_i(v_i-g_i)^2\end{aligned}$$

The above is the standard form of isotonic regression procedure, and $\hat {g}$ can be computed using the R-function isoreg(⋅).

2.3 Asymptotic Results of the Estimates

Zhou et al. (2019) derived asymptotic results for $\hat {\boldsymbol {\theta }}$ and $\hat {g}(\cdot )$, as presented below. Detailed regularity conditions and proofs can be found there.

Theorem 11.1

Under regularity conditions, as n →∞

$$\displaystyle \begin{aligned}\|\hat{\boldsymbol{\theta}}-\boldsymbol{\theta}_0\| \overset{a.s.}{\to} 0,~~~~\int |\hat{g}(z)-g_0(z)|dz \overset{a.s.}{\to} 0.\end{aligned}$$

Denote $\stackrel {D}{\to }$ for convergence in distribution.

Theorem 11.2

Under regularity conditions, as n →∞,

$$\displaystyle \begin{aligned}\sqrt{n}(\hat{\boldsymbol{\theta}}-\boldsymbol{\theta}_0) \overset{D}{\to} N(\mathbf{0}, I^{*-1}(\boldsymbol{\theta}_0|g_0)),\end{aligned}$$

where I ^∗(θ ₀|g ₀) = E[ℓ ^∗(X, Z|θ ₀, g ₀)ℓ ^∗‘(X, Z|θ ₀, g ₀)] is the efficient Fisher information matrix of θ for fixed g ₀, and ℓ ^∗(X, Z|θ ₀, g ₀) is the efficient score for θ.

Let $\mathbb {B}(\cdot )$ be the two-sided Brownian motion originating from zero: a mean zero Gaussian process on R with $\mathbb {B}(0)=0$, and $E\big (\mathbb {B}(s)-\mathbb {B}(h)\big )^2=|s-h|$ for all s, h ∈ R.

Theorem 11.3

Denote ${\dot g}_0(z)=d g_0(z)/dz$ and density of z as q(z). Assume q(z) > 0. Under regularity conditions, as n →∞,

$$\displaystyle \begin{aligned}n^{1/3}(\hat{g}_n(z)-g_0(z)) \stackrel{D}{\to} \Big(\frac{4{\dot g}_0(z)}{q(z)}\Big)^{1/3} \arg\max_{h\in R}\{\mathbb{B}(h)-h^2\}.\end{aligned}$$

3 Testing the Null Hypothesis and the Classification Rules

3.1 Test the Null Hypothesis

After the model parameters are estimated, we need to test the existence of subgroups, which is formulated as testing the null hypothesis H ₀ : η = 0 vs the alternative H ₁ : η ≠ 0. For parametric model, commonly used test statistic including the likelihood ratio statistic, score statistic and the Wald statistic, and the three statistics are asymptotically chi-squared distributed and equivalent. However, in our case when η = 0, λ is non-identifiable in the model, although the other parameters are still identifiable and estimable. In this case, the likelihood ratio statistic cannot be applied. So we use the Wald statistic.

Denote θ = (θ ₁, θ ₂) with dim(θ) = d and dim(θ ₁) = d ₁, and $\hat {\boldsymbol {\theta }}=(\hat {\boldsymbol {\theta }}_1,\hat {\boldsymbol {\theta }}_2)$ is the MLE of θ under the full model. Consider the null hypothesis H ₀ : θ ₁ = θ _1,0. The Wald test statistic is

$$\displaystyle \begin{aligned}W_n= (\hat{\boldsymbol{\theta}}_1-\boldsymbol{\theta}_{1,0})'Var^{-1}(\hat{\boldsymbol{\theta}}_1) (\hat{\boldsymbol{\theta}}_1-\boldsymbol{\theta}_{1,0}).\end{aligned}$$

If $Cov(\hat {\boldsymbol {\theta }}_1)$ is known, then asymptotically $W_n \sim \chi ^2_{d_1}$. If $Cov(\hat {\boldsymbol {\theta }}_1)$ is estimated, asymptotically $W_n/d_1 \sim F_{d_1,n-d}$. For our problem, θ ₁ = η, θ _1,0 = 0, we treat $Cov(\hat {\eta })$ to be known, so $W_n=\hat {\eta }_nVar^{-1}(\hat {\eta }_n)\hat {\eta }_n \sim \chi ^2_1$ asymptotically, and if $W_n > \chi ^2_1(1-\alpha )$, which is the upper (1 − α)-th quantile of the $\chi ^2_1$ distribution, then H ₀ is rejected.

3.2 The Classification Rule

After the existence of subgroup is justified, or the null hypothesis above is rejected, we need to classify the subjects. There are different classification rules. In subgroup analysis, the correct classification of the treatment favorable subgroup is of significant clinical meaning, so we use the Neyman-Pearson rule in Yuan et al. (2018, 2020) as it can control the miss-classification error for the treatment favorable subgroup.

To be specific, for each subject i, denote the i-th likelihood ratio

$$\displaystyle \begin{aligned}LR(y_i,\boldsymbol{x}_i)=\frac{f(y_i,\boldsymbol{x}_i,z_i|\hat{\boldsymbol{\theta}},\delta=1)}{f(y_i,\boldsymbol{x}_i,z_i|\hat{\boldsymbol{\theta}},\delta=0)} \approx\frac{\phi(y_i-\hat{\boldsymbol{\beta}}'\boldsymbol{x}_i-\hat{g}(z_i)-\hat{\eta})} {\phi(y_i-\hat{\boldsymbol{\beta}}'\boldsymbol{x}_i-\hat{g}(z_i))}.\end{aligned}$$

Parallel to the NP uniformly most powerful test procedure for testing the simple hypothesis H ₀ : η = 0 vs. H ₁ : η ≠ 0. For given significance level α, the optimal classification rule is: classify the i-th subject to subgroup S ₁ if

$$\begin{aligned}LR(y_i,\boldsymbol{x}_i,z_i) \geq K(\alpha),~\mbox{with }K(\alpha)\mbox{ determined by}~P_{H_0}\big(LR(Y,\boldsymbol{X},\boldsymbol{Z}) \geq K(\alpha) \big){=}\alpha,\end{aligned}$$

or, with $\epsilon = y-\hat {\boldsymbol {\beta }}'\boldsymbol {x}-\hat {g}(z_i)$ generated under H ₀,

$$\displaystyle \begin{aligned}P_{H_0}\Big(\frac{\phi(y_i-\hat{\boldsymbol{\beta}}'\boldsymbol{x}_i-\hat{g}(z_i)-\hat{\eta})} {\phi(y_i-\hat{\boldsymbol{\beta}}'\boldsymbol{x}_i-\hat{g}(z_i))} \ge K(\alpha)\Big)=\alpha.\end{aligned}$$

We can find approximate solution for K(α). For simulated data, let {LR _j : j = 1, …, n ₀} be the LR _j’s of patients from the treatment unfavorable subgroup (for simulated data, the subgroup memberships are known), then set K(α) is estimated by the (1 − α)-th upper quantile of $LR_1,\ldots ,LR_{n_0}$, it is the cut-off beyond which patients will be classified to the treatment favorable subgroup, even though they are from the treatment unfavorable subgroup.

However, for real data {(y _i, x _i, z _i) : i = 1, …, n}, the subgroup memberships are unknown, we cannot use the above method to decide K(α), instead we obtain it by the following way. Set $LR_i = \phi (\epsilon _i-\hat {\eta })/\phi (\epsilon _i)$, let

$$\displaystyle \begin{aligned}Q_n(t)= \sum_{i=1}^nw_{ni}I(LR_i\leq t),~~~w_{ni}=(1-\hat{\delta}_i)/\sum_{j=1}^n(1-\hat{\delta}_j)\end{aligned}$$

be a weighted empirical distribution of the LR _i’s under the null hypothesis. Note that $1-\hat {\delta }_i$ is the estimated membership of subject i belonging to group 0, corresponding to the null hypothesis, and $1-\hat {\delta }_i$ scaled by $\sum _{j=1}^n(1-\hat {\delta }_j)$ makes the w _ni’s a set of actual weights. So intuitively, Q _n(⋅) is a reasonable estimate of the distribution of the LR _i’s under the null hypothesis. We set $K(\alpha )=Q_n^{-1}(1-\alpha )$ to be the (1 − α)-th upper quantile of Q _n.

For coming patient with covariate x but without response y, we define

$$\displaystyle \begin{aligned}LR(\boldsymbol{x},\boldsymbol{z}) = E_{H_0}\Big(\frac{\phi(y-\hat{\boldsymbol{\beta}}'\boldsymbol{x}-\hat{\eta})}{\phi(y-\hat{\boldsymbol{\beta}}'\boldsymbol{x})}\Big|\boldsymbol{x},z\Big) \approx \frac{1}{n_0}\sum_{i=1}^{n_0} \frac{\phi(y_i-\hat{\boldsymbol{\beta}}'\boldsymbol{x}-\hat{g}(z_i)-\hat{\eta})}{\phi(y_i-\hat{\boldsymbol{\beta}}'\boldsymbol{x}-\hat{g}(z_i))},\end{aligned}$$

where y _i (i = 1, …, n ₀) are the responses of the subjects already in the trail, and being classified to group 0, and classify this patient to group 1 if LR(x, z) > K(α), with K(α) given above.

4 Simulation Study and Application

4.1 Simulation Study

We simulate four examples with non-linear effect of z _i to y _i. We simulate n = 1000 i.i.d. data with 1-dimensional response y _i’s and with covariates x _i = (x _i1, x _i2, x _i3). We first generate the covariates, sample the x _i’s from the 3-dimensional normal distribution with mean vector μ = (3.1, 1.8, −0.5)′ and a given covariance matrix Γ. sample the z _i’s from the normal distribution with mean μ = 0 and σ ² = 1. The ε _i are also sampled from normal distribution with mean μ = 0 and σ ² = 1.We will display estimation results with four different choices of θ ₀ = (β ₀, η ₀, λ ₀) and four choices of g ₀(⋅) below. What is more, we fixed a point (0, 0) for the non-linear effect.

Example 1

g ₀(z) = 6 × Exponential(z + 2) − 6 × Expnential(0 + 2);

Example 2

g ₀(z) = 5 × Beta((z + 2)∕4, 5, 1) − 5 × Beta((0 + 2)∕4, 5, 1);

Example 3

g ₀(z) = 6×I(z < 0)×((N(z, 0, 0.5))−N(0, 0, 0.5))+6×I(z ≥ 0)×(N (z, 0, 0.2)−N(0, 0, 0.2)));

Example 4

g ₀(z) = 3×I(z < 0)×(Beta((z+2)∕4, 0.2, 0.2)−Beta((0+2)∕4, 0.2, 0.2))+7×I(z ≥ 0)×(Beta((z+2)∕4, 0.7, 0.7)−Beta((0+2)∕4, 0.7, 0.7)).

The estimated $\hat {g}$ and g ₀ are shown in Fig. 11.1.

The parameter estimates from the proposed model are displayed in Tables 11.1, 11.2, 11.3 and 11.4, along with the estimates from commonly used linear model as comparison. The estimated standard errors are displayed as [se].

Table 11.1 Parameter estimates under two models (example 1)

Full size table

Table 11.2 Parameter estimates under two models (example 2)

Full size table

Table 11.3 Parameter estimates under two models (example 3)

Full size table

Table 11.4 Parameter estimates under two models (example 4)

Full size table

The hypothesis testing results from both partial linear and linear model are given in Table 11.5, and the classification results using the partial linear model are in Table 11.6.

Table 11.5 Hypothesis test using the partial linear and linear models (example 4)

Full size table

Table 11.6 Classification results using partial linear model (simulated data)

Full size table

From Table 11.5 we see that the partial linear model gives reasonable estimates, while the estimates from the linear model is not reasonable, may due to the fact that it seriously over-estimate the effect η for small value of it.

From Table 11.6, it is seen that the mis-classification error for the treatment favorable subgroup is well controlled around the specified level α = 0.05, and the overall classification error depends on the effect size η. It is small when η is large and vice versa. Note that for η = 0.95 and 1.70, the N-P error is larger than 0.05 this is because the estimate of η is not that accurate when the true value of η is small.

Interpretation of the Results

From Tables 11.1, 11.2, 11.3 and 11.4, we see that when the effect η of treatment favorable subgroup is tiny, the biases of the estimates from the linear model are much larger than those with the proposed partial linear model. That also can be used to explain the results of hypothesis testing with linear model. When the effect of treatment favorable subgroup is small, linear model tend to give an estimate with positive bias. So, type I error here is large and type II error is small. If the effect of treatment favorable subgroup is large, partial linear model and linear model tend to give similiar estimates of parameters.

4.2 Application to Real Data Problem

Now we analyze the real data ACTG175 with the proposed method. The trial was conducted by the AIDS Clinical Trials Group (ACTG), which was supported by the National Institute of Allergy and Infectious Diseases (NIAID). Participants were enrolled into the study between December 1991 and October 1992, and received treatment through December 1994. Follow-up and final evaluations of participants took place between December 1994 and February 1995.

The purpose of this data was to investigate whether treatment of HIV infection with one drug (monotherapy) was the same, better than, or worse than treatment with two drugs (combination therapy) in patients under some conditions.Three different drugs were used to conduct this study: (1) zidovudine (AZT), (2) didanosine (ddI), and (3) zalcitabine (ddC). The three drugs are nucleotide analogues that act as reverse transcriptase inhibitors (RT-inhibitors). The original study noted no clear differences between the ddI and AZT + ddI treatments—both appeared to be approximately equal effective in preventing HIV progressing. Treatment with AZT + ddC provided no additional benefit to continued treatment with AZT. However, the results of ACTG 175 together with the results from earlier studies demonstrate that antiretroviral therapy is beneficial to HIV-infected people who have less than 500 CD4+ T cells/mm3. This study also shows, for the first time, that an improvement in survival can be achieved in a sub-population.

We analyze this data using the proposed method on the combined therapy (ZDV+ddI). The number of patients is 522. The response variable is the CD4 counts after 20 weeks of the corresponding treatment, and the covariates are age, baseline CD4 counts, karnofsky score and number of days of previously received antiretroviral therapy. We assume the effect of baseline CD4 counts on the response variable is non-linear.

The analysis results are presented in Tables 11.7 and 11.8. We see that the null hypothesis of no subgroup is rejected, and there is a treatment favorable subgroup which is about 5% of the total patients. This is consistent with the result in Yuan et al. (2020). This case is of particular interest for hypothesis generating for developmental therapeutics. We can examine the small group of patients who are not benefiting from the treatment and identify underlying reasons and study them.

Table 11.7 Parameter estimates under two models (scaled real data)

Full size table

Table 11.8 Classification results (under scaled real data)

Full size table

5 Conclusion

A partial linear model is proposed for the analysis of subgroups in clinical trial, for the case one of the covariate has monotone non-linear effect on the response. The non-linear part is modeled by a monotone function along with the linear part of other covariates. The semiparametric maximum likelihood is used to estimate model parameters. Simulation study is conducted to evaluate the performance of the proposed method, and results show that the proposed model perform much better than linear models especially when treatment effect is relatively small. Then the model is applied to analyze a real data.

References

Balan TA, Putter H (2019) frailtyEM: an R package for estimating Semipaarametric shared frailty models. J Stat Softw 90(7)
Google Scholar
Bertsekas DP (2016) Nonlinear programming, 3rd edn. Athena Scientific, Nashua
MATH Google Scholar
Best MJ, Chakravarti N (1990) Active set algorithms for isotonic regression; a unifying framework. Math Program 47:425–439
Article MathSciNet Google Scholar
Bordes L, Chauveau D, Vandekerknove P (2007) A stochastic EM algorithm for a semiparametric mixture model. Comput Stat Data Anal 51:5429–5443
Article MathSciNet Google Scholar
Campbell G (1981) Nonparametric bivariate estimation with ranodmly censored data. Biometrica 68:417–422
Article Google Scholar
Chen J, Zhang D, Davidian M (2002) A Monte Carlo EM algorithm for generalized linear mixed models with ïňĆexible random effects distribution. Biometrics 3(3):347–360
MATH Google Scholar
Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm, J R Stat Soc Ser B 39:1–38
MathSciNet MATH Google Scholar
Fan A, Song R, Lu W (2017) Change-plane analysis for subgroup detection and sample size calculation. J Am Stat Assoc 112:769–778
Article MathSciNet Google Scholar
Fokkema M, Smits N, Zeileis A et al (2018) Detecting treatment-subgroup interactions in clustered data with generalized linear mixed- effects model trees. Behav Res Methods 50:2016–2034
Article Google Scholar
Foster JC, Taylor JMC, Ruberg SJ (2011) Subgroup identification from randomized clinical trial data. Stat Med 30:2867–2880
Article MathSciNet Google Scholar
Friede T, Parsons N, Stallard N (2012) A conditional error function approach for subgroup selection in adaptive clinical trials. Stat Med 31:4309–4320
Article MathSciNet Google Scholar
Groeneboom P, Wellner J (1992) Information bounds and nonparametric maximum likelihood estimation. Birkh$\acute {a}$user Verlag, Basel
Google Scholar
Hanley JA, Parnes MN (1983) Nonparametric estimation of a multivariate distribution in the presence of censoring. Biometrics 39:129–139
Article MathSciNet Google Scholar
International Breast Cancer Study Group (IBCSG) (2002) Endocrine responsiveness and tailoring adjuvant therapy for postmenopausal lymph node-negative breast cancer: a randomized trial. J Natl Cancer Inst 94:1054–1065
Article Google Scholar
Lipkovich I, Dmitrienko A, Denne J, Enas G (2011) Subgroup identification based on differential effect search (SIDES)—A recursive partitioning method for establishing response to treatment in patient sub-populations. Stat Med 30:2601–2621
MathSciNet Google Scholar
Ma S, Huang J (2017) A concave pairwise fusion approach to subgroup analysis. J Am Stat Assoc 112:410–423
Article MathSciNet Google Scholar
Mun̂oz A (1980) Nonparametric estimation from censored bivariate observations. Technical Report, Department of Statistics, Stanford University
Google Scholar
Patrick M, Kurt H, Jan DL (2009) Isotonic optimization in R: pool-adjacent-violators algorithm (PAVA) and active set methods. J Stat Softw 32(5):1–24
Google Scholar
Rothmann MD, Zhang J, Lu L, Fleming TR (2012) Testing in a pre-specified subgroup and the intent-to-treat population. Drug Inf J 46(2):175–179
Article Google Scholar
Ruberg SJ, Chen L, Wang Y (2010) The mean doesn’t mean as much any more: finding sub-groups for tailored therapeutics. Clin Trials 7:574–583
Article Google Scholar
Sabine C (2005) AIDS events among individuals initiating HAART: do some patients experience a greater benefit from HAART than others? AIDS 19:1995–2000
Article Google Scholar
Shen J, He X (2015) Inference for subgroup analysis with a structured logistic-normal mixture model. J Am Stat Assoc 110:303–312
Article MathSciNet Google Scholar
Song Y, Chi GY (2007) A method for testing a pre-specified subgroup in clinical trials. Stat Med 26:3535–3549
Article MathSciNet Google Scholar
Tan M, Tian G-L, Ng KW (2009) Bayesian missing data problems: EM, data augmentation and non-iterative computation. Chapman and Hall/CRC, London/Boca Raton
Book Google Scholar
Yuan A, Chen X, Zhou Y, Tan MT (2018) Subgroup analysis with semiparametric models toward precision medicine. Stat Med 37(2):1830–1845
Article MathSciNet Google Scholar
Yuan A, Zhou Y, Tan MT (2020) Subgroup analysis with a nonparametric unimodal symmetric error distribution. Comm Statist Theory Methods. Published online
Book Google Scholar
Zhou Y, Yuan A, Tan MT (2019) Subgroup analysis with semiparametric partial linear regression model. Submitted to Statistical Methods in Medical Research
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Biostatistics, Bioinformatics and Biomathematics, Georgetown University, Washington, DC, USA
Yizhao Zhou, Ao Yuan & Ming T. Tan

Authors

Yizhao Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Ao Yuan
View author publications
You can also search for this author in PubMed Google Scholar
Ming T. Tan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yizhao Zhou .

Editor information

Editors and Affiliations

Biostatistics & Data Sciences, Boehringer Ingelheim Corporation, Ridgefield, CT, USA
Naitee Ting
Pfizer Inc, Groton, CT, USA
Joseph C. Cappelleri
UCB Biosciences Inc., Raleigh, NC, USA
Shuyen Ho
School of Social Work, University of North Carolina, Chapel Hill, NC, USA
(Din) Ding-Geng Chen

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Zhou, Y., Yuan, A., Tan, M.T. (2020). Subgroup Analysis with Partial Linear Regression Model. In: Ting, N., Cappelleri, J., Ho, S., Chen, (G. (eds) Design and Analysis of Subgroups with Biopharmaceutical Applications. Emerging Topics in Statistics and Biostatistics . Springer, Cham. https://doi.org/10.1007/978-3-030-40105-4_11

Download citation

DOI: https://doi.org/10.1007/978-3-030-40105-4_11
Published: 02 May 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-40104-7
Online ISBN: 978-3-030-40105-4
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics

Subgroup Analysis with Partial Linear Regression Model

Abstract

Similar content being viewed by others

A Framework of Statistical Methods for Identification of Subgroups with Differential Treatment Effects in Randomized Trials

Quantile-Based Subgroup Identification for Randomized Clinical Trials

A robust threshold t linear mixed model for subgroup identification using multivariate T distributions

1 Introduction

2 The Method

2.1 The Semiparametric Model Specification

2.2 Estimation of Model Parameters

2.2.1 Computation of g ^(r+1)

2.3 Asymptotic Results of the Estimates

Theorem 11.1

Theorem 11.2

Theorem 11.3

3 Testing the Null Hypothesis and the Classification Rules

3.1 Test the Null Hypothesis

3.2 The Classification Rule

4 Simulation Study and Application

4.1 Simulation Study

Example 1

Example 2

Example 3

Example 4

Interpretation of the Results

4.2 Application to Real Data Problem

5 Conclusion

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Publish with us

Navigation

Subgroup Analysis with Partial Linear Regression Model

Abstract

Similar content being viewed by others

A Framework of Statistical Methods for Identification of Subgroups with Differential Treatment Effects in Randomized Trials

Quantile-Based Subgroup Identification for Randomized Clinical Trials

A robust threshold t linear mixed model for subgroup identification using multivariate T distributions

1 Introduction

2 The Method

2.1 The Semiparametric Model Specification

2.2 Estimation of Model Parameters

2.2.1 Computation of g (r+1)

2.3 Asymptotic Results of the Estimates

Theorem 11.1

Theorem 11.2

Theorem 11.3

3 Testing the Null Hypothesis and the Classification Rules

3.1 Test the Null Hypothesis

3.2 The Classification Rule

4 Simulation Study and Application

4.1 Simulation Study

Example 1

Example 2

Example 3

Example 4

Interpretation of the Results

4.2 Application to Real Data Problem

5 Conclusion

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation

2.2.1 Computation of g ^(r+1)