Introduction

Biogerontology is the study of normal aging and age-associated diseases, both of which contribute to lifespan, the decline in quality of life at older ages, and the cost of health care in the final years. Interventions that can slow the aging process, delay the onset of age-associated diseases, and promote maintenance of strength and vitality will be of great benefit to the elderly. Unfortunately, the quest for anti-aging therapies has produced a marketplace full of unproven remedies. It is imperative that the scientific community address this need with rigorous, reproducible studies on candidate interventions to identify those with potential for a positive impact on healthy aging.

One problem encountered in analyzing the research literature on interventions for aging, aside from its paucity, is that reports from different laboratories analyzing the same interventions often report different results. There are many factors contributing to the variability, including the model organism chosen as the test subject, the design of the intervention study, and the location of the testing labs. For example, there is variation in the lifespan of a single strain of mice published in different studies, as shown by the median lifespan of untreated, ad libitum fed, male C57BL/6 mice from the nine published reports presented in Fig. 1. The longest median survival age is 22% greater than the shortest. The preliminary results of the ITP, as discussed in more detail later in this paper, and in Miller et al. (2007), showed a similar site-to-site variation in the lifespan of the HET mice used in this program. For male control mice at the three sites, the mean median lifespan was 799 days, with a range (minimum to maximum) of 137 days, or 17% of the mean. For females, the mean median lifespan was 881 days, with a range of 51 days, or 6% of the mean.

Fig. 1
figure 1

Variation in median survival of C57BL/6 mice. Median lifespan of control, untreated male C57BL/6 mice from published studies is presented (from: 1 Talan and Ingram 1985; 2 Harrison and Archer 1987; 3 Goodrick et al. 1990 (three independent cohorts); 4 Bronson and Lipman 1991; 5 Blackwell et al. 1995; 6 Pugh et al. 1999; 7 Turturro et al. 1999; 8 Forster et al. 2003; and 9 Ikeno et al. 2005)

When using a rodent model to test interventions, the choice of species, strain and sex can have a large effect on the outcome. This is illustrated by studies on caloric restriction (CR) in rodents. Two independent studies reported a significant extension of lifespan by CR in male C57BL/6 mice but not in male DBA/2 mice (Forster et al. 2003; Turturro et al. 1999). Female DBA/2 mice did show a significant lifespan extension in response to CR, illustrating why it is important to test any intervention in both sexes (Turturro et al. 1999). An earlier study demonstrated that DBA/2f male mice did not exhibit an extension in lifespan in response to CR but did when given a protein-restricted, isocaloric diet (Fernandes et al. 1976). BALB/c mice also appear to be resistant to the benefits of CR (N.L. Nadon, unpublished data). Had the first CR studies been done only in DBA/2 males and BALB/c mice, this very informative line of research might not have been pursued.

This paper outlines the design of the National Institute on Aging (NIA) Interventions Testing Program (ITP) and discusses the challenges faced in setting up large-scale, multi-site testing protocols. We present the importance of standard operating procedures (SOPs), some issues unique to long-term interventions studies and approaches to address these issues. Lastly, we provide a brief summary of early results obtained in the ITP as they support the importance of SOPs in multi-site studies.

The NIA interventions testing program

The ITP was developed to test candidate diets, supplements and compounds for their ability to extend lifespan and promote healthy aging in a mouse model. A detailed description of the mission of the ITP is available on the ITP website (http://www.nia.nih.gov/ResearchInformation/ScientificResources/InterventionsTestingProgram.htm). Testing is performed in triplicate at three independent sites: The Jackson Laboratory, the University of Michigan, and the University of Texas Health Sciences Center—San Antonio. The importance of testing in triplicate will be discussed in more detail later, as our early results demonstrate site-to-site variation even in the control mice. Candidate interventions include pharmaceuticals, food supplements, special diet compositions, plant extracts and other additives. We require that all data be repeatable, so interventions must be chemically defined well enough for replication of experiments. The compounds chosen for testing tend to be those with either preliminary data showing beneficial effects on aging in mice or strong theoretical justification backed up by evidence of increased life span in short-lived models such as nematodes or fruit flies or both. The ITP is not funded to carry out dose response studies for every compound chosen, but rather tests one dose for which there is strong evidence to support its potential to improve healthy aging and extend lifespan in a mouse model.

The ITP was developed after a planning workshop held in 1999 that included experts from diverse fields of study relating to aging (Warner et al. 2000). Many model systems were discussed, ranging from flies to large mammals, and the genetically heterogeneous (HET) mouse model was deemed to be the best fit. Mice are economical and there is a huge volume of literature on the genetics and genomics, physiology, behavior and biochemistry of the mouse. HET mice were chosen because they provide reproducible genetic variation within the population, and because HET mice are not as likely to exhibit the strain-specific characteristics and pathology observed in inbred strains. We believe that this model is more reflective of the general characteristics of the laboratory mouse and will provide more robust findings that would be seen in a single inbred strain.

The specific stock of HET mice used in the ITP is a cross of four inbred strains, generated by breeding two F1 hybrids, CB6F1 X C3D2F1. They are equivalent to the UM-HET3 mice described by Miller and Chrisp (1999). Pathology at death in UM-HET3 mice includes a wide range of lesions and conditions, and previous studies have demonstrated that a combination of data on early life body weight gain, T cell subset levels, and leptin and thyroid hormone levels are predictive of lifespan in HET mice (Miller et al. 2002; Lipman et al. 2004; Harper et al. 2004). These genetically heterogeneous mice have also proven valuable in genetic association studies, adding another level of information that can be mined from these studies (Harper et al. 2003; Volkman et al. 2003; Wisser et al. 2004; Wolf et al. 2004).

The importance of genetic heterogeneity is brought home by the strain-specific differences that have been reported for many aspects of the physiology and biochemistry of mice. For example, significant differences have been reported in the cardiovascular changes that occur during rapid eye movement (REM) sleep: in C57BL/6 and C3H mice, the arterial blood pressure decreases during REM sleep and there are no changes in heart rate; in BALB/c mice arterial blood pressure and heart rate increase; and in DBA/2 mice arterial blood pressure increases and heart rate decreases (Campen et al. 2002). Another example of strain differences is seen through tests of exercise performance. In a forced endurance treadmill test, C57BL/6 performed poorest of the strains tested, yet on a voluntary treadmill test, C57BL/6 mice performed best, staying on the longest and running the fastest and the greatest distance of the strains tested (Lerman et al. 2002).

Responses to toxic exposures also illustrate underlying physiological and biochemical differences among strains. Hepatocarcinogenesis can be promoted by high fat diet and by phenobarbitol, and while C57BL/6 mice are resistant to both promoters, C3H mice are sensitive to both promoters and DBA/2 mice are resistant to high fat diet but sensitive to phenobarbitol (Ahotupa et al. 1993). The induction of antioxidant pathways mirrored the sensitivity to hepatocarcinogenesis, with C3H mice showing the greatest induction of glutathione and catalase in response to a high-fat diet (Ahotupa et al. 1993). Strain-specificity has been reported in the induction of neurodegeneration by excitotoxins, with C57BL/6 mice much more resistant to both kainic acid and quinolinic acid than DBA/2 mice, and in the induction of beta cell dysfunction by high glucose, where again DBA/2 mice are more sensitive (McLin et al. 2006; Zraika et al. 2006). Additional strain differences in physiology are reviewed in Nadon (2006).

To add another example relevant to many animal colonies, the response to Helicobacter infection varies greatly by strain. Helicobacter hepaticus infection is innocuous in most mouse strains, but can cause hepatitis in some, including BALB/c and C3H/HeNCr (Zenner 1999). Because helicobacter is common in many animal colonies, the number of Helicobacter species known is still growing and, because the full extent of pathology caused by helicobacter species is probably not yet known, it is important for long-term studies to be performed in animal colonies of known pathogen status, if possible under specific pathogen free (SPF) conditions.

Development of standard operating procedures

The ITP developed SOPs to ensure that environmental conditions at the three sites are as uniform as possible. Lighting, temperature, housing, and diet are controlled and monitored. It is particularly challenging to provide uniform environmental conditions at independent institutions, as each institution has its own policies and procedures, but the interim results presented later in this paper illustrate how important it is to develop SOPs to which all test sites can adhere.

In the ITP, HET mice for each year’s set of interventions are bred over a period of 6–8 months. The mice are weaned at 19–21 days of age and are group-housed, typically three males or four females per cage, using corn cob bedding. The first litter is not used for ITP testing so that all litters are the product of experienced mothers. At 42 days of age, ID chips are implanted and the mice are randomly assigned to experimental or control groups. More details are provided in Miller et al. (2007).

A power analysis was performed (by A. Galecki, University of Michigan, and S. Pletcher, Baylor College of Medicine) to determine the number of mice needed per experimental group to be able to detect a 10% change in lifespan as compared to control groups, at 80% power, as described in Miller et al. (2007). Each experimental group includes 36 females and 44 males per site, for a total of 108 females and 132 males for each intervention tested. The males are over-represented because of the expectation that some cages will have to be culled due to fighting. The control lifespan group is twice the size of the experimental groups, and all experimental groups are compared to the same control group.

In addition to the experimental and control lifespan groups, two other groups of mice have been set up. One is a pair-fed group, equal in size to the experimental groups, and used to control for any change in food intake due to taste or other characteristics of the test compounds/diets. The mice are weighed at biweekly intervals for the first 2 months on the experimental diets, and the pair-fed mice are fed to maintain their weight at the same level as the lightest experimental group. Once it is clear that there is no effect on body weight for any of the agents tested in a given year’s group, the pair-fed group is removed from the study. The other group is labeled the monitor mice. They are treated the same as the experimental groups but they are not included in the lifespan study so they can be used for invasive experiments needed to demonstrate the bioactivity or effectiveness of the compounds/diets.

Differences in diet are one of the confounding factors when comparing lifespan studies from different laboratories. Turturro et al. (1999) reported differences in both body weight and lifespan when C57BL/6 mice were fed different diets. It was particularly striking that female C57BL/6 mice fed NIH31 diet had a median survival age that was about 25% greater than females fed Emory-Morse911 diet, which uses casein as the primary protein source rather than the fish meal and grains used in NIH31. The ITP mice are fed NIH31 diet with 4% fat, available ad libitum, from the time the interventions are initiated. The food is prepared and sterilized, and then the test compounds are added and the food is re-pelleted without use of heat. The food is prepared in bulk every 4 months and the same batches are used by all three sites. Most treatment protocols begin feeding the experimental diets at 4 months of age, although some have been started later in life. The ITP mice are also provided acidified water, pH 2.5, ad libitum.

Accurate evaluation of interventions requires full life spans

Lifespan in any model system is influenced by a complex set of interactions among normal aging processes, incidence and age-of-onset of age-associated diseases, and development of non-age-associated diseases. To determine the effect of an aging intervention, it is important to look at the full lifespan, not just the median lifespan. Some deaths during the first half of the life span are related to early acting genetic defects, accidents, and age-independent disease. For example, mouse urinary syndrome (MUS) is a potentially lethal condition that does not exhibit age-dependence typical of senescence-related diseases; it commonly occurs in the 4-way cross mice used in this program and could potentially account for a significant portion of the male deaths before the median survival age, but likely none of the male deaths after the median survival age (Miller et al. 1998). It should be noted that the environmental conditions that promote or inhibit MUS are poorly understood and to date MUS has been virtually absent from the ITP mice.

An example of how reliance on median survival only to define lifespan can give misleading results is presented in Fig. 2, which shows lifespan results for the effects of dietary restriction in wild-type mice and obese, leptin-deficient ob/ob mutant mice (from Harrison et al. 1991). The data for median life span (left panel) give two false conclusions: (1) that dietary restriction did not improve survival of normal B6 mice; and (2) dietary restriction improved survival of ob/ob mice only to the level of ad libitum-fed lean B6 mice. Following the entire life span (right panel) illustrates the fallacy of these two conclusions because (1) dietary restriction actually had a significant effect on the life span of normal mice, and (2) it increased the life span of ob/ob mice beyond normal, to that of calorie-restricted lean B6 mice. Thus, focusing on median life span may systematically exclude interventions that retard aging, if those interventions fail to prevent mice from dying before the median survival age. Comparison of two groups for the proportion of mice reaching very old age provides more information about aging rate than tests based on median lifespan alone (Klebanov and Harrison 2002). The age at which the last mouse in a cohort dies is also of limited value since it is a single data point. The ITP has decided to evaluate questions about maximum lifespan operationally by comparing the proportion of mice still alive in each group at each age when the pooled population has reached the 90% mortality point, in order to capture the effect of the treatments on the breadth of the lifespan.

Fig. 2 a, b
figure 2

A change in median lifespan does not always reflect effects in late life. Survival curves for ad libitum-fed and caloric restricted (CR) C57BL/6 (B6), and obese/obese (ob/ob) mutant mice are presented (reprinted from Harrison et al. 1991). On the left (a) is the curve to the median survival point, while the entire survival curve on the right (b) shows a very different trend in the last half of the lifespan

While lifespan studies are important, lifespan is not the sole, or even necessarily the best, measurement of the effectiveness of a compound to promote healthy aging. Other physiological, biochemical and behavioral parameters are important adjuncts to the lifespan study. However, the cost of long-term interventions studies using statistically relevant numbers is significant and for this reason the ITP design is a two-phase program. Phase I studies include determination of lifespan and a few measurements of overall health status such as activity monitoring, measurement of T cell subsets, and select hormone quantifications. Animals dying in the study are preserved so that necropsy information can be obtained for those interventions that significantly increase or decrease life span. Those interventions that show promise in Phase I will go on to Phase II, which will include a larger array of ancillary studies in addition to a replicate lifespan study.

Evaluation of candidate compounds and diets

The ITP solicits proposals from the extramural research community for compounds, supplements and diets that have potential to extend lifespan and promote healthy aging. The proposals go through a two-tier review and prioritization process, using experts from the research community to evaluate the scientific merit, feasibility and priority of the proposals. The types of compounds/diets proposed to date have been quite varied and have included pharmaceuticals, modifications in micronutrient content, and compounds with antioxidant and/or anti-inflammatory activity. While it is usually considered preferable to test purified compounds, sometimes it is worthwhile to test a natural extract if there is potential for bioactivity of multiple compounds in the extract. In such cases, added measures must be taken to ensure consistency of the extract over the testing period.

One of the biggest challenges the ITP has faced has been evaluating the preliminary data supporting the prioritization of interventions and determining appropriate doses in the mouse model. Preliminary data has come from studies in a variety of model systems ranging from invertebrates to humans. The ITP often conducts short-term studies to evaluate bioavailability, efficacy (activity), and toxicity of compounds before investing in the large-scale study. Some pilot studies are needed because, while less complex organisms have proven vital for understanding basic biology, interventions often have different effects when given to complex organisms. This may result from differences in metabolism, differences in target organs or both. For a specific example, in nematodes, reduction of function mutations for genes in the dauer pathway increase life span. The dauer pathway genes are orthologous to genes of the insulin and IGF-1 signal transduction pathways in mammals. Because diminished signaling in these pathways increases life span in worms, fruit flies, and mammals, these pathways appear to govern an evolutionarily conserved, genetic clock of aging (Tatar et al. 2003). However, the insulin and the IGF-1 pathways in mammals are more complex than the dauer pathway in invertebrates. There are two problems with simply testing mutations that reduce functions of the insulin and IGF-1 pathways in mice. First, reduction-of-function mutations in the insulin pathway produce insulin resistance and promote diabetes, thus masking potential beneficial effects on aging (LaMothe et al. 1998). Second, reduction-of-function mutations in the IGF-1 pathway elevate growth hormone (GH, the trophic modulator of IGF-1) by removing its negative feedback signal. GH antagonizes insulin, so elevated circulating GH can cause insulin resistance, which might mask any beneficial effects of diminished IGF-1 signaling (Lembo et al. 1996).

Interventions currently in testing

In the ITP, test agents are incorporated into irradiation-sterilized diet (5LG6, Purina, St. Louis, MO), which is based on the NIH 31 formulation with 4% fat (Rader et al. 1986). Table 1 lists the compounds in testing as of May 2007. Aspirin and nitroflurbiprofen (NFP), two non-steroidal anti-inflammatory agents that inhibit both COX-1 and COX-2, were tested because chronic inflammation is associated with many chronic diseases of aging, and may contribute to those diseases. Aspirin also has anti-thrombotic and anti-oxidant properties (Shi et al. 1999; Vane 2000; Weissmann 1991). At therapeutic doses, it suppresses production of TNFα by an effect on NF-κB signals (Shackelford et al. 1997). Aspirin may also have indirect effects on oxidative damage to proteins by acetylation of ε-amino residues, thus blocking inter- and intra-protein crosslinks as well as glycation reactions (Shi et al. 1999; Caballero et al. 2000; Jones and Hothersall 1993; Weber et al. 1995).

Table 1 Compounds in testing as of May 2007. Each compound is given in the diet at the levels indicated (part per million by weight)

Nitroflurbiprofen (NFP) is a nitric oxide (NO)-releasing flurbiprofen derivative (Ongini and Bolla 2006). It crosses the blood–brain barrier, and so in principle might protect against chronic inflammation in the central nervous system (CNS). The NO-releasing moiety of NFP protects against GI toxicity, protects the cardiovascular system and enhances the anti-inflammatory effect, therefore making the drug safer during chronic administration. Furthermore, 6–12 months of exposure of mice to NFP did not produce overt signs of toxicity, and it was also safe in early human studies (Brunelli et al. 2007; Van and Kadish 2005; Ongini and Bolla 2006).

Nordihydroguaiaretic acid (NDGA) has both anti-oxidant and anti-inflammatory properties (Wood et al. 2004). NDGA suppresses pro-inflammatory gene expression and prostaglandin E2 (PGE2) production (West et al. 2004). NDGA was reported to delay death in Wistar rats and slow disease progression and increase lifespan in SOD1 mutant mice (Buu-Hoi and Ratsimamanga 1959; West et al. 2004).

PBN (α-phenyl-N-tert-butyl nitrone) is a nitrone, known to stabilize free radicals, and protects against age-related diseases including stroke and cancer (Floyd 1990; Zhao et al. 1994; Floyd et al. 2002; Nakae et al. 2003). PBN extended lifespan of short-lived SAM-P8 mice, and has been reported to increase life spans in both C57BL/6J mice and Sprague-Dawley rats when started at about 2 years of age (Edamatsu et al. 1995; Saito et al. 1998; Sack et al. 1996). The principal metabolite of PBN is its 4-hydroxy derivative, 4-OH-PBN, which is being tested in the ITP.

Caffeic acid phenethyl ester (CAPE), a product of the propolis of honeybee hives, possesses strong antioxidant, anti-inflammatory, and immunomodulatory capabilities (Bhimani et al. 1993; Sud’ina et al. 1993; Borrelli et al. 2002). Even low doses of CAPE inhibit oxidant production and formation of oxidized bases in mouse skin DNA and in HeLa cells exposed to phorbol ester tumor promoters (Bhimani et al. 1993; Frenkel et al. 1993). Topical applications of CAPE suppress TPA-induced tumor promotion in mouse skin, suggesting that CAPE possesses anti-carcinogenic properties (Frenkel et al. 1993).

Enalapril maleate is an angiotensin converting enzyme (ACE) inhibitor. ACE inhibitors have been reported to modulate hypertension, obesity, diabetes, and congestive heart failure in both aged humans and rodent models (Ferder et al. 1992, 1994; Inserra et al. 1995, 1996; de Cavanagh et al. 1997, 1999; Gambassi et al. 2000; Kuno et al. 2003). ACE inhibitors may regulate metabolic function, decrease oxidative stress in many tissues, and reduce age and disease related chronic inflammation (de Cavanagh et al. 1997, 1999). Rats injected with enalapril for 6 months had reduced blood pressures, better physical performance in various tasks than controls, and reduced fat mass, but not lean mass, with age (Carter et al. 2002).

Rapamycin is an antifungal, immunosuppressive, potential anticancer drug that acts by inhibition of the protein kinase TOR (target of rapamycin) (Garber 2001; Guba et al. 2002). Its immunosuppressive actions are due to inhibition of proliferation of helper T cells (Lorberg and Hall 2004). TOR proteins may play key roles in nutrient response systems, and abrogation of TOR in Caenorhabditis elegans adults extends lifespan (Vellai et al. 2003). In current models, nutrient signaling through mTOR kinase is integrated with insulin/growth factor signaling so that, in mammals, rapamycin may affect nutritional, mitogenic and insulin metabolic signaling in a coordinated fashion, in some ways reminiscent of dietary restriction (Manning and Cantley 2003; Kim et al. 2003).

Simvastatin inhibits HMG-CoA (3-hydroxy-3-methylglutaryl coenzyme A) reductase. Inhibitors of this enzyme, referred to as “statins,” are clinically effective in reducing cardiovascular disease (Bonetti et al. 2003). Statins appear to have effects independent of reducing cholesterol, including anti-oxidant and anti-inflammatory properties, and stimulation of bioavailability of nitric oxide (Bonetti et al. 2003).

Resveratrol has been shown to increase life span in yeast, fruit flies and nematodes, and killifish, and it extends median life spans in C57BL/6J mice that are fed a high fat diet (Howitz et al. 2003; Wood et al. 2004; Baur et al. 2006; Valenzano et al. 2006). Resveratrol is a potent sirtuin stimulator and it also reduces acute inflammation in vivo and free radicals and mutations in vitro, inhibits tumor generation in response to the DMBA carcinogen, and protects against early death from injected neuroblastoma cells (Jang et al. 1997; Chen et al. 2004). In a summary of current knowledge, Baur and Sinclair (2006) outline several possible mechanisms by which resveratrol might retard mammalian aging, including stimulation of sirtuin protein deacetylases as well as effects on the estrogen receptor, cycloxygenase, toll receptors, and cytochrome P450 enzymes.

Issues unique to long-term intervention studies

In addition to the scientific merit of proposals, there are several important issues to consider when choosing which long-term interventions to test. They include: (1) the ease of administration; (2) the stability of the compound; (3) the rate at which the agent is eliminated or metabolized in mice; (4) whether the compound has efficacy at the chosen dose; and (5) the cost of administration. The ITP is reluctant to accept any treatment that requires repeated subcutaneous or intraperitoneal injections or oral gavage, because of the expense involved in treating so many animals daily over a life time, the added cost for a separate control group subjected to mock injections or similar daily handling, and the concern that repeated injections might be harmful to the mice. Delivery of compounds by osmotic minipump or repeated administration of a slow-release preparation would have similar disadvantages, including the need for a separate set of control animals. Delivery of compounds in drinking water is possible, although only for compounds that are water soluble and stable in solution at room temperature. Although this method is easier than oral dosing by gavage, it is still much more labor intensive than administration in food and there is the risk of reduced uniformity across centers since the solutions must be made up at each center. To date, all interventions tested in the ITP are compounds that can be administered in the food.

Before beginning a study, it is sometimes necessary to verify that the compound retains activity after incorporation into mouse chow and that therapeutic blood levels of the drug can be achieved. Our experience with rapamycin illustrates the kind of problems that can be encountered in studies of the effects of dietary additives. In an initial pilot study, we analyzed food and blood levels of rapamycin at several starting doses. The results revealed that rapamycin was not stable in food but that therapeutic blood levels can be achieved if the mice received a sufficient amount of rapamycin in food.

To solve the problem of instability in food, rapamycin was sent to the Southwest Research Institute (San Antonio, TX) for microencapsulation by dissolving the rapamycin in an organic solvent containing a dissolved enteric coating (Eudragit S100). This methacrylate polymer is stable at pH levels below 7 and thus protects the rapamycin from the acidic conditions of the stomach; the protective coating dissolves in the small intestine, permitting absorption of the active agent. Samples of encapsulated and unencapsulated rapamycin were incorporated into commercial mouse chow at a concentration of 7 ppm and the levels of rapamycin in the food were assayed (Fig. 3). The encapsulated rapamycin survived the process of incorporation into the chow better than the unencapsulated rapamycin, as demonstrated by the 3-fold higher concentration of rapamycin detected in the diet made with encapsulated rapamycin than in the diet made with unencapsulated rapamycin. Diets made from encapsulated and unencapsulated rapamycin were fed to mice for 3 weeks and concentrations of rapamycin in 200 µl whole blood samples were determined using HPLC with UV detection. The average blood level observed after feeding the encapsulated rapamycin was greater than 25 ng/ml, which compares favorably with therapeutic levels in human treatment protocols of at least 12 ng/ml (Fig. 4). By contrast, mice fed the diet prepared with unencapsulated rapamycin had less than 2.5 ng/ml, which is the detection limit of the assay. As a result of the pilot study data and unpublished work by the sponsors, the dose was increased to 14 ppm in the diet for the longevity studies, as shown in Table 1.

Fig. 3
figure 3

Encapsulation of rapamycin improves stability in laboratory chow. Rapamycin was added to commercially prepared lab chow at 7 ppm and the food was then assayed for rapamycin content. Rapamycin levels are less than expected, suggesting that rapamycin degraded during preparation or storage of the food (open bar). Microencapsulation of the rapamycin reduced degradation (shaded bar)

Fig. 4
figure 4

Rapamycin is detectable in whole blood after feeding diet containing encapsulated or unencapsulated rapamycin. Encapsulated and unencapsulated rapamycin (7 ppm) was feed to mice for 3 weeks and the blood assayed for rapamycin levels. Encapsulation resulted in significantly higher blood levels of rapamycin than observed using unencapsulated rapamycin

ITP interim results and evaluation for year 1 cohorts

The principal goal of Phase I elements of the ITP is to identify compounds that delay or decelerate the aging process, using the commonly accepted endpoint of an increase in maximal longevity as a surrogate for an anti-aging effect attributable to a diet, drug, or genetic manipulation. It is often preferred to substitute a test based on the number of animals still alive at some arbitrary substitute percentile, for example the age at which 90% mortality is observed. Tests that depend on the mean age at death of the longest-lived 10% of each population, though they frequently appear in the published literature, have been shown to be invalid, in the sense that they have a substantially higher Type I error rate than the nominal value chosen by the investigator (Wang et al. 2004). Wang et al. (2004) have presented, validated, and documented the power of a number of alternate “quantile” tests, based upon comparing the proportion of live mice in each of two (or more) test groups at the age at which 90% of the mice in the combined group have died; the ITP has adopted this statistical approach. Since the outcome of the test will not depend at all on the age at death of mice surviving past this point, it is not necessary to wait until the death of the longest-lived mouse to evaluate the hypothesis of interest.

The cohort I group consists of aspirin, NFP, NDGA and 4-OH-PBN. Because about 30% of cohort I test mice remained alive at the time of the interim analysis, it was not possible to evaluate our principal hypothesis that one or more of the test agents extends measures of maximum longevity (Miller et al. 2007). The data at hand did, however, permit us to test another hypothesis, one of lesser but still substantial interest, i.e., whether any of the test compounds diminish mortality risks at earlier points in the survival distribution. To avoid the inflation of Type I error rates that accompany repeated, sequential testing of related hypotheses on the same test population, we decided in advance to conduct an analysis on the date at which at least 50% of the male control mice had died at each site, i.e., at the median mortality point at the site at which males were longest-lived. A report of this interim analysis has been published, so here we present a synopsis of the report and some of its implications for study design and interpretation (Miller et al. 2007).

The first unexpected finding was the extent of variation in the survival data amongst the sites. At two of the test sites, University of Texas (UT) and the Jackson Laboratory (TJL), mortality risks for male mice were significantly higher than for female mice (P < 0.0001), but there was no significant difference in survival patterns between males and females at the third test site, University of Michigan (UM). The difference between males and females in calculated median survival was 77 days at TJL and 137 days at UT, but only 36 days at UM. Conversely, there was a highly significant difference in survival patterns of males across the three sites (P < 0.0001), but no difference among the sites in female survival. Median survival for males at UM was 116 days longer than the median survival at the other two sites combined.

We do not have a ready explanation for this disparity among test sites in median survival, or for its restriction to male mice. The ITP protocols were designed to try to minimize inter-site differences, including the use of the same stock of test mice, the same supplier of bedding, the same source of drug-containing rodent chow, and agreed-upon protocols for many other aspects of the test procedures. Although each site used the same source of food starting at the age at which mice were placed on the experimental diets, there were differences among the sites in the sources and formulations of food used for the breeding cages and for weanling mice prior to imposition of the experimental diets, typically at age 4 months. This is a correctable source of variation that will be removed in future test cohorts. There are other differences among sites which might alter survival patterns, relating perhaps to differences in noise levels, proximity to other mice, traffic levels, details of air-flow or water quality, etc., but at present we do not know which of these or perhaps other unspecified factors must be addressed to achieve uniformity of survival curves across test sites.

The interim analysis shows that NDGA leads to a significant decline in mortality risk in the first half of the lifespan. The primary statistical method was to use the log-rank test with stratification by site, so that each of the five treatment groups (four experimental diets plus the control) was compared to the pooled survival experience at each test site. The result showed a highly significant effect of treatment (P < 0.005), showing that at least some of the groups differed in survival rate from one another. A follow-up analysis, using data for each test site separately, showed significant effects at TJL (P < 0.01) and at UT (P < 0.04), but not at UM, the site with the fewest deaths at the time of analysis. Each of the four experimental diets was then compared, again with stratification by test site, to the control group. This calculation showed a significant difference for NDGA versus controls, at P = 0.0004 for the pooled data, and with P = 0.02 and P = 0.005 for TJL and UM males ,respectively. The results for the mice at UM showed a similar trend, with better survival in NDGA-treated males than in controls, but the effect was not statistically significant. We believe this may be the first documented example in which a test agent has diminished mortality risks, at least through the median life span, at multiple test sites for mice on a standard diet. The use of genetically heterogeneous mice for the ITP should increase the chances that the findings will prove replicable on a range of genetic backgrounds.

While NDGA did not have a significant effect on survival of female mice, at the time of the interim analysis fewer than 50% of the females had died at each site, thus diminishing statistical power. It will be of interest to evaluate effects of all test agents on females when the final analysis is performed. None of the other test compounds led to a significant difference, for data pooled across sites, in comparison to survival of the control animals for male mice. There was a trend towards improved survival among aspirin-treated males (P = 0.07 by the log-rank test). A re-analysis using the Wilcoxon-Breslow statistic, which gives greater weight to early deaths, produced P = 0.03 for the pooled data set, stratified by site. Analysis of the completed data set, when it becomes available, will produce more definitive information on the effects of aspirin and the other tested agents.

Another surprising observation was the site-to-site variation in weight gain trajectories. Mice were weighed at each site at the ages of 6, 12, 18, and 24 months, and at some sites also at earlier ages, in part to test the idea that effects of the tested compounds might be attributable to alterations in caloric intake or metabolic efficiency. NDGA-treated males at UT and TJL showed no difference in average weight compared to control males. NDGA-treated males at UM were significantly, but only slightly, lower in weight than controls (1.2 grams, about 3% of mean weight) at 12 months of age; this difference could not be attributed to NDGA because these two groups of mice differed in mean weight by about 13% at 6 months of age, i.e., prior to NDGA administration. Thus it seems very unlikely that the effect of NDGA on male survival was due to an effect on appetite or caloric intake. Unexpectedly, there was a substantial, and statistically significant, difference in the weight of both male and female mice among the test sites, with mice at UM lighter in weight than mice at the other two sites at all time points. It is plausible that uncontrolled differences in the source and formulation of food provided to breeder mice and weanlings may have caused this long-term alteration in weight trajectories, and production of mice for future ITP tests will need to employ a standardized approach to weanling production.

In sum, the NIA ITP demonstrates the power of multi-site studies on interventions for aging, as well as the many concerns that must be addressed in the development of such a program. It is hoped that published results from the ITP will stimulate further studies by other investigators, on the mechanisms of actions of effective interventions and on the efficacy in other animal models and human populations. Phase II testing is being designed to broaden the characterization of the effects of Phase II interventions on age-associated conditions and to facilitate collaborative studies with laboratories outside the ITP.