Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

The Women’s Health Initiative (WHI) is a large-scale epidemiologic research program focused on the prevention of chronic disease among postmenopausal women. A total of 161,808 postmenopausal women, in the age range 50–79, were enrolled at 40 U.S. Clinical Centers during 1993–1998. The centerpiece of the WHI is a multifaceted clinical trial of four preventive interventions, in a partial factorial design [1]. A total of 10,739 post-hysterectomy women were randomized to the E-alone trial of 0.625 mg/day of conjugated equine estrogens (Premarin) or placebo; 16,608 women with uterus were randomized to the E + P trial of this same estrogen preparation plus 2.5 mg/day medroxyprogesterone acetate (Prempro) or placebo; and 48,835 women were randomized to a low-fat dietary pattern (40%) or usual diet (60%). At their one-year anniversary following randomization into either or both of the hormone therapy (HT) or dietary modification (DM) components, participating women were given the opportunity for further randomization to a dietary supplementation trial of 1,000 mg/day calcium carbonate plus 400 international units of vitamin D3 or placebo, and 36,282 women did so. The WHI program is strengthened by the inclusion of a companion cohort study among 93,676 postmenopausal women in the same age range, recruited from essentially the same catchment populations, with much commonality with the clinical trial in methodology, and in data and biospecimen collection.

Table 1 shows key findings from the hormone therapy trials [2, 3] with findings for the designated primary CHD outcome, and the designated primary adverse outcome highlighted. The E + P trial was stopped early in 2002 when health risks were judged to exceed benefits over a 5.6-year average intervention period. The risks included an early elevation in coronary heart disease incidence, the primary trial outcome for which an important risk reduction had been hypothesized, and elevations in stroke and venous thromboembolism incidence. An elevation in breast cancer incidence and a reduction in fracture incidence were also observed, as was hypothesized in trial design [1]. A global index, defined as the time to the earliest of the outcomes listed above it in Table 1 was in the unfavorable direction, and contributed to early stopping considerations. The E-alone trial also stopped early, in 2004, after an intervention period that averaged 7.1 years, substantially because of a stroke elevation of similar magnitude as that observed for E + P, though health risks and benefits and the global index were rather balanced in this trial.

Table 1 Clinical outcomes in the WHI postmenopausal hormone therapy trials

Analyses beyond the summary hazard ratios (HRs) shown in Table 1 took place for each clinical outcome, as well as for some additional important outcomes (e.g., cognition and dementia). These included analyses of HR form as a function of time, analyses of HRs among women adherent to their assigned intervention, and various subgroup analyses (with appropriate caveats). Participating women were actively followed beyond the cessation of intervention, giving rise to a range of additional analyses of public health importance [4, 5].

To cite but one example, the more detailed studies of breast cancer incidence in the E + P trial showed an HR that increased unfavorably and approximately linearly to about 1.6 following 5 years of use, and dropped back to basal levels by 2–3 years following trial stoppage. When analyses focused on adherent women, a more dramatic increase to an HR of about 2.5 after 5 years of use was estimated, again with dissipation by 2–3 years following cessation of use [6]. These patterns, in conjunction with the approximately six million women using this estrogen plus progestin preparation in the USA, about 70% of whom stopped shortly after initial trial results [1] were announced, projected a national reduction in breast cancer incidence of about 15,000 women per year as a result of this change in usage patterns, as agrees with subsequent U.S. breast cancer incidence rates [7].

The WHI low-fat dietary pattern trial had dietary intervention goals of equal or less than 20% of energy from fat; five or more fruit and vegetable servings/day, and six or more grain servings/day, with breast and colorectal cancer as primary outcomes, and with ovary and endometrial cancer as additional diet-related cancers that may benefit from this intervention. Table 2 shows principal cancer incidence results from this trial, which proceeded to its planned termination with an 8.1-year average intervention period. The trial design projected a reduction in breast cancer risk with an overall HR of 0.87. The principal targeted dietary change was a reduction in percent of energy from fat, but only about 70% of the hypothesized change was achieved. In correspondence the estimated breast cancer HR of 0.91 [8] differed from unity by about 70% of that projected, but was not significantly different from one (weighted logrank p = 0.09). The corresponding contrast for ovarian cancer incidence [9] was nominally significant (p = 0.03), providing an important lead for a disease having few known modifiable risk factors. For both breast [8] and ovarian cancer [9], there was a significant interaction between baseline percent of energy from fat and HR, with stronger evidence for an intervention effect among women having a high fat content in their customary diet. These women made a comparatively larger reduction in percent of energy from fat, if assigned to the dietary intervention group.

Table 2 Comparison of cancer incidence rates between intervention and comparison groups in the Women’s Health Initiative (WHI) dietary modification trial

The calcium and vitamin trial did not provide significant evidence of a treatment effect, either for its primary hip fracture outcome [10], or secondary outcomes (colorectal cancer, other fractures).

2 Biomarkers and Variations in Clinical Trial Intervention Effects

Even though it is good clinical trial practice to focus primarily on overall treatment effects as opposed to those in subsets of a study population, it needs to be recognized that hazard ratios may, and often do, vary according to specific characteristics of the study population. Notably, HRs provide but one way of summarizing a treatment effect over time, and lack of variation on an HR scale may differ from corresponding lack of variation on other assessment scales. That said, however, HR or other ratio measures (e.g., odds ratios) seem particularly useful in leading to simple models, wherein the joint association of treatment and study subject characteristics or exposures on clinical outcomes often seems to depart little from a multiplicative model.

For example, for breast cancer incidence, no interacting demographic or clinical variables were found for E + P [11], whereas for E-alone a suggested reduction in risk seemed to be largely confined to lower risk women, specifically those without benign breast disease or a family history of breast cancer [12].

Several nested case–control studies within the HT trial cohorts were conducted in an attempt to identify biomarkers that may interact with hormone therapy HRs, or that may mediate the observed intervention effects on clinical outcomes. These studies primarily focused on biochemical and genetic markers that were recognized risk indicators for the clinical outcomes under study. For example, a Cardiovascular Disease Biomarker Study focused on markers of inflammation, coagulation/thrombosis, lipids and lipoproteins, and related genetic variants, for each of coronary heart disease, stroke, and venous thromboembolism. These studies [13, 14] generally confirmed associations with disease risk, but there were few interacting factors identified and none of the observed biomarker changes following intervention activities appeared to meaningfully mediate the observed treatment effects, a topic that will be discussed further below. As an example of an interacting variable, women having a relatively high baseline low-density lipoprotein (LDL) cholesterol who were assigned to active hormone therapy evidently experienced a comparatively larger early elevation in coronary heart disease risk [13].

Some high-dimensional genotype biomarker studies were also conducted, in an attempt to understand more of the biology related to observed clinical effects in the WHI trials. For example, a breast cancer nested case–control study involved the genotyping of 9,039 single nucleotide polymorphisms (SNPs) for 2,166 women who developed breast cancer during the trial intervention period. A randomized trial context is well suited to genotype by treatment interaction testing in that “case-only” analyses, which require genotype data only on study subjects developing disease, have efficiency about the same as if genotyping had been conducted on the full cohort.

More specifically, let V = 1 and V = 0 denote active and control randomization assignments, and z = 0, 1, or 2 denote the number of minor alleles of an SNP. A simple Cox model that stratifies on SNP genotype, and allows a separate HR parameter for treatment at each value of z, can be written

$$ {\uplambda}\left( {{\mathrm{t} };{ \sim \mathrm{V} },{ \sim \mathrm{z} }} \right) = {{{\uplambda }}_{{0{\mathrm{z} }}}}\left( {\mathrm{t} } \right){ \exp}\{ {{{\upbeta }}_0}{\mathrm{I}}\left( {{\mathrm{z} } = 0} \right) + {{{\upbeta }}_{{1}}}{\mathrm{I}}\left( {{\mathrm{z} } = {1}} \right) + {{{\upbeta }}_{{2}}}{\mathrm{I}}\left( {{\mathrm{z} } = {2}} \right)\}, $$

where I(·) denotes an indicator variable, and eβz is the HR for women having SNP genotype z, for z = 0, 1, or 2. From this expression,

$$ {\mathrm{logit}}\left( {{\mathrm{V} }|{\mathrm{X} } = {\mathrm{t} },{ \sim \mathrm{z} }} \right) = {\mathrm{logit}}\left( {{\mathrm{V} }|{\mathrm{X} } \geqslant {\mathrm{t} },{\mathrm{z} }} \right) + \sum\limits_{{{\mathrm{i} } = 0}}^2 {{{{\upbeta }}_{\mathrm{i} }}{\mathrm{I}}\left( {{\mathrm{z} } = {{\mathrm{z}}_{\mathrm{i} }}} \right),} $$

where X = t denotes disease occurrence at time t following randomization. The important feature here is that V is orthogonal to z by virtue of randomization so that if the disease is rare one has, to a good approximation

$$ {\mathrm{logit}}\left( {{\mathrm{V} }|{\mathrm{T} } \geqslant {\mathrm{t} },{ \sim \mathrm{z} }} \right) = { \log}\left\{ {{\mathrm{q} }/\left( {{1} - {\mathrm{q} }} \right)} \right\}, $$

where q = pr(V = 1) is the randomization fraction for the active treatment group. It follows that one can estimate intervention HRs at each SNP genotype by ordinary logistic regression of the randomization indicator V on indicator variables for the number of minor SNP alleles, with log {q/(1 − q)} as an “offset.” Breast cancer analyses of this type yielded nominally significant variations in the intervention HR with a SNP (rs3750817) in intron 2 of the fibroblast growth factor receptor gene on chromosome 10 for both E + P and E-alone [15] and for the dietary modification intervention in the subset of women (denoted DMQ) who were in the upper quartile of percent of energy from fat in their baseline diet [16]. These analyses also drew attention (nominal p < 0.05) to a SNP (rs7705343) in the mitochondrial ribosomal protein S30 region of chromosome five [17] for each of E-alone, DMQ, and for the calcium and vitamin D intervention (for which there was no breast cancer “main” effect). For either SNP the more favorable intervention effects were evidently localized among women who were homozygous for the SNP minor allele (TT genotype for rs3750817; AA for rs7705343). A challenge with these types of suggestive findings is the identification of a research strategy for replication. Observational studies pertinent to these intervention topics may be limited in their assessment of related exposures (e.g., hormonal or dietary exposures), and may be subject to important confounding or measurement biases. In general, the methods for identification of genotype by environmental factor interaction evaluation are at an early stage of development, and large-scale clinical trial settings have much to offer in this arena.

3 Biomarkers of Intervention Adherence and Exposure

As mentioned above, the Dietary Modification trial evidently achieved only about 70% of its projected intervention versus control group difference for the principal dietary intervention target, percent of energy from fat. Even this 70% assessment relies on self-reported dietary information from participating women. Specifically, based on food frequency questionnaire (FFQ) assessments, intervention group women reported a 10.7% average lower percent of energy from fat compared to the control group at 1-year following randomization; 9.5% at 3 years after randomization; and 8.1% at 6 years after randomization. However, the FFQ data also indicate a differential total energy consumption by about 100 kilocalories/day, which is not consistent with the weight changes experienced by trial participants (2–3 kg greater weight loss in intervention group compared to usual diet control group at 1-year, which mostly dissipated over the subsequent 5 years). If the greater underreporting of energy by intervention group women pertained disproportionately to fat calories, then percent of energy from fat would also be differentially reported and the power of the DM trial accordingly affected.

In fact, the dietary assessment measurement issue is even more acute in observational nutritional epidemiology studies, where systematic and random assessment errors could well distort the very associations under study, rather than simply reducing study power as in the intervention trial setting. In either context, however, biomarkers provide important avenues for strengthening the research agenda.

Two nutritional biomarker substudies have been conducted in WHI cohorts, and a controlled human feeding study that aims to develop biomarkers for additional nutrients and foods is currently underway. The first, the Nutrient Biomarker Study (NBS) included energy [18] and protein [19] biomarkers and a concurrent FFQ, among a representative 544 weight-stable women in the DM trial. The second, the Nutrition and Physical Activity Assessment Study included these biomarkers and self-reports of dietary frequencies, records and recalls, along with a biomarker of activity-related energy expenditure and three types of physical activity self-report, among 450 representative women from the WHI Observational (cohort) Study. By simple linear regression of log-biomarker assessments on corresponding log self-reports and on readily available study subject characteristic data (body mass index, age, and ethnicity), calibrated consumption estimates were developed for energy, protein, and percent of energy from protein. For example, for energy, even though the log self-report data explained only a few percent (e.g., 3–4% for the FFQ) of the variation in log-biomarker values, the inclusion of these other factors in the regression equation raised this percentage to the 40–45% range. Upon extracting the temporal variation in the biomarker, this increased to about 70% of the average daily energy consumption variation over a 1-year study period [20]. The NBS equations were used to develop calibrated-energy, protein, and percent of energy from protein estimates throughout the WHI cohorts, and positive associations between energy and several major cancers [21] as well as coronary disease [22] and diabetes [23] were found that were not apparent without calibration. The role of body mass index in these analyses is complex [24], and the associations just mentioned seemed substantially, if not entirely, explained by body fat accumulation over time.

This nutritional epidemiology research area and the similarly important physical activity epidemiology area are ripe for further development, with a substantial use of biomarkers providing a logical next step in the overall research agenda.

The energy biomarker data indicates severe underreporting using the FFQ, by about 30% overall, and with much greater underreporting among overweight and obese women, along with greater underreporting by younger compared to older postmenopausal women. These analyses also suggest some energy underreporting by intervention compared to control group women also, by about 100 kcal/day, allowing the weight change data mentioned above to align with corresponding calibrated energy consumption in the DM trial.

4 Biomarkers as Mediators of Clinical Trial Intervention Effects

Again let V = 1 or 0 denote active and control randomization assignments in a clinical trial, but now let z denote a biomarker change following some period of intervention activities. Analyses may aim to understand the extent to which intervention effects on the clinical outcome are mediated by z. A traditional mediation analysis would compare the coefficient of V in a regression analysis that doesn’t include z with a corresponding analysis including z, with evidence for mediation if the coefficient for V moves substantially toward the null when z is added to the regression model.

A key statistical difference is evident between the type of interaction analysis discussed in Sect. 2, where V and z are independent by study design, and mediation analyses where V and z may be highly correlated. This is a critical point. In the extreme, for example, if the biomarker doesn’t change in the control group (z = 0) and changes by exactly the same amount (z = c, for some c ≠ 0) among all study subjects in the intervention group, then z and V will be perfectly correlated, and it will not be possible to carry out an analysis that simultaneously models V and z. As a plausible departure from this scenario, suppose that z is constant in both groups, but that z is assessed with some technical measurement error. As amplified below, the regression analyses will then be possible, but z may then appear not to mediate, even though the biomarker change in question may be central to explaining the intervention effects on clinical outcomes.

To elaborate just a little, consider baseline, x0, and post-intervention, x1, biomarker values, and suppose that the biomarker fully mediates an intervention effect in a linear model for a quantitative response Y. Hence, E(Y; x0, x1, V) = a + a0x0 + a1x1, with a1 ≠ 0. Under a bivariate normal model for (x0, x1) with mean (μ0, μ1 + dV), common variance σ2 and correlation ρ, one can derive

$$ {\mathrm{E}}\left( {{\mathrm{Y} };{ }{{\mathrm{x}}_0},{\mathrm{V} }} \right) = {\mathrm{a} {^{\prime}}} + { }{{{\mathrm{a} {^{\prime}}}}_0}{{\mathrm{x}}_0} + \left( {{{\mathrm{a}}_{{1}}}{\mathrm{d} }} \right){\mathrm{V} }, $$

where \( {\mathrm{a} {^{\prime}}} \) and \( {{{\mathrm{a} {^{\prime}}}}_0} \) are simple functions of the response and biomarker parameters, so that a mediation analysis would compare an estimate of a1d to an estimate of the coefficient of V (zero) when x1 is added to the regression model.

Now suppose that x0 and x1 incorporate classical normal measurement error, so that one measures w0 = x0 + e0 and w1 = x1 + e1 where e0 and e1 are independent mean zero normal variates having variance \( {\upsigma}_{\mathrm{e}}^{{2}} \). It is straightforward to show that the coefficient of V in E(Y; w0, V) is again a1d, unaffected by measurement error owing to the independence between x0 and V, but that for V in E(Y | w0, w1, V) is a complicated function of model parameters that approaches (a0 + a1)d/(2 + δ2) as ρ → 1, where \( {{{\updelta }}^{{2}}} = {\upsigma}_{\mathrm{e}}^{{2}}/{{{\upsigma }}^{{2}}} \). This limiting coefficient can be very far from zero even if δ2 is small! This rather counter-intuitive result arises because of the diminishing ability to distinguish the biomarker effect from the overall treatment effect on Y, as ρ → 1. It follows that careful modeling of the biomarker and its measurement process may be needed to reliably assess mediation, or more generally, to assess treatment effects after allowing for certain biomarker changes.

As noted above, none of the candidate biomarkers studied appeared to mediate HT effects on cardiovascular diseases in the WHI hormone therapy trials. We undertook additional “discovery” research to identify blood biomarkers that are risk markers for these diseases, and that are affected by hormone therapy. This work focused on protein expression, using an Intact Protein Analysis System [25] having the capability of quantitatively comparing concentrations between pairs of specimens for about 350–400 proteins across a substantial dynamic range. Specifically, concentrations based on blood collected 1-year after randomization were compared to corresponding baseline concentrations for 50 women adherent to active E-alone, and 50 women adherent to active E + P, over the first year of HT trial participation. For throughput reasons IPAS analyses were based on pools formed from equal volumes of serum from 10 women. A total of 378 proteins were quantified for change. Of these, a remarkable 44.7% (169/378) had evidence of change (p < 0.05) following intervention with E-alone or E + P [26, 27]. The protein changes were mostly quite similar for E-alone and E + P, and included proteins in multiple biological pathways relevant to observed clinical effects, including inflammation, coagulation, immune function, cell adhesion, growth factors, and osteogenesis, among others.

Corresponding analyses were then conducted, using the same proteomic platform, to compare baseline blood protein concentrations between women who went on to develop CHD or stroke and corresponding matched controls, with cases and controls drawn from the WHI Observational Study. This time larger pools of size 100 were employed. There were eight such pool pairs for each of these diseases, as well as for breast cancer. From the resulting data [28] there were 37 proteins having nominal p < 0.05 for a CHD case versus control difference compared to 17.3 expected by chance; and 47 for stroke compared to 18.3 expected by chance. Several of these had estimated false discovery rates <0.05 and most of these were among the proteins evidently affected by E-alone and/or E + P. These provide novel candidates for mechanistic effects of HT on these cardiovascular diseases.

We are still at an early stage of evaluating these candidates for mediation in the HT trials. An initial evaluation involving beta-2 microglobulin, a highly ranked protein for CHD association, and insulin-like growth factor binding protein 4 (IGFBP4), a highly ranked protein for stroke association, confirmed the association of these proteins with disease incidence in the WHI trials but, once again, change in protein concentration following hormone therapy treatment did not seem to mediate intervention effects on these diseases, at least not without explicit account of the biomarker measurement error process. Further, analyses with these analytes revealed that HT hazard ratios, in the presence of baseline and 1-year biomarker measurements were quite sensitive to the ratio (δ2) of the measurement error variance to the underlying biomarker variance for both E-alone and E + P, with larger δ2 values consistent with full mediation.

These preliminary analyses reinforce the need for enhanced statistical methods for identifying the important biological intermediaries of intervention effects in clinical trials. Adequate modeling of the underlying biomarker process, and of the departure of such models from corresponding measured biomarker values, may typically require biomarker assessments at more than two time points in conjunction with large case and control sample sizes. This topic, and methods for correcting treatment hazard ratio estimates for the biomarker measurement process, will be discussed in more detail elsewhere.

5 Biomarkers for Intervention Development

In recent years biomarkers have come to play a rather central role in treatment development, particularly in the therapeutics area. For example, the molecular characteristics of patient tumors may identify key therapeutic targets for disruption by potential treatments. High-dimensional data, including gene expression profiles, may help to focus developmental efforts toward therapeutic benefit or toward the avoidance of certain adverse effects.

The development and initial testing of preventive interventions is a rather underdeveloped aspect of the chronic disease prevention research agenda. Sometimes it is attractive to move interventions from therapeutics to primary prevention. Examples include statins for heart disease prevention; tamoxifen, SERMS, or aromatase inhibitors for breast cancer prevention; or biphosphonates for fracture prevention. However, this approach seems unlikely to lead to the behavioral changes, for example, in the diet and physical activity area, that arguably provide the ultimate preventive approaches needed. Observational epidemiology has much to offer for identifying preventive approaches, but findings may lack the needed specificity and force to fuel needed behavioral, regulatory, or policy changes. For example, one can contrast the influence of the rather extensive body of observational research on postmenopausal hormones, with that of the comparatively few clinical trials that eventually were able to be conducted.

Intermediate outcome trials, which have outcomes on putative pathways between treatments and clinical outcomes of interest, have considerable potential to add to these other data sources for preventive intervention development. For example, a trial of moderate size, called the Postmenopausal Estrogen Progestin Intervention (PEPI) trial was initiated in advance of the WHI trials to compare various hormone regimens in respect to cardiovascular disease risk factors, uterine hyperplasia, and other intermediate outcomes [29, 30]. This trial had an influence on the choice of regimens studied in the WHI trials, but it did not warn, for example, concerning the observed early elevation in CHD, or the sustained elevation in stroke that emerged in the WHI hormone therapy trials.

Intermediate outcome trials that combine changes in major risk factors for clinical outcomes of interest with more agnostic, possibly high-dimensional, changes in blood or other biospecimens, may offer a more comprehensive approach to the selection and initial evaluation of preventive interventions. For example, the agnostic aspect could entail study of potential intervention effects on the plasma proteome and metabolome. These data, whether for a few emergent candidates or for a high-dimensional set of changes, could then be merged with observational analyses relating the entire set of intermediate variables to clinical outcomes of interest, to develop projections of intervention effects on each such outcome. This approach could be considered for behavioral as well as chemopreventive interventions. The two data sources to be combined would each involve studies having a small fraction of the cost of a full-scale prevention trial. While not sufficient in itself, such an approach could augment the value of intermediate outcome trials and, in particular, may help to filter intervention options arising from the traditional data sources mentioned above, thereby permitting a focus on the more strongly justified concepts for full-scale trial consideration with clinical outcomes.

6 Discussion and Summary

Biomarkers have potential to play several important roles in the development, conduct, analysis, and reporting of clinical trials. Specifically, biomarkers may permit stratification of study subjects according to the magnitude of beneficial or adverse treatment effects, possibly leading to the identification of persons for whom the treatment can be particularly recommended or should be avoided.

Though not much emphasized here, biomarkers typically play a key role in the assessment of adherence to intervention goals, and in the assessment of adherence-adjusted treatment effects. Biomarkers also provide the principal approach to identifying the important biological pathways whereby a treatment may influence a clinical outcome of interest. The utility of biomarkers for each of these purposes, but especially for the elucidation of disease mechanisms can be expected to depend strongly on the properties of the biomarker measurement process, and on the ability to adequately model and correct for measurement error in data analyses.

The adherence-adjustment and mediation applications of biomarkers are sometimes posed using a potential outcomes, and a principal stratification formulation [31]. The principal stratification “framework,” however, seems too restrictive to be very useful in this type of biomedical research context [32, 33], and the important measurement error issue discussed here does not seem to have been addressed in the potential outcomes context.

Finally, high-dimensional biomarkers from discovery platforms evidently have an important role to play in intervention development, though this concept has yet to be much explored to date for preventive intervention development.