Introduction

In the first chapter of a book dealing with up-date in gerontology [1], a comprehensive review was made about centenarians and evidence was given of the increasing longevity and lifespan occurred worldwide during the last century. The number of centenarians is still limited but it represents the tip of the iceberg that is mainly made by “the elderly”. The determinants of longevity are partly known, including genetic, lifestyle, environment, and metabolic and endocrine functions. However, large documentation of personal characteristics measured in a cohort early enough to become potential predictors of longevity several decades ahead is rare if perhaps not-existing.

In the present analysis, we have the opportunity to present data on all-cause mortality and age at death (AD: survival) in a cohort of middle-aged men examined in the mid of last century with measurement of many personal characteristics, and then followed up for 61 years until its practical extinction.

Materials and methods

Study population and measurements

The study was run on the Italian Rural Areas (IRA) of the Seven Countries Study (SCS) made of 1712 men aged 40–59 at enrollment in 1960, representing 98.7% of the selected population samples. More details can be found elsewhere [2].

A great number of measurements were taken at entry examination dealing with personal and social characteristics, lifestyle behaviors, risk factors, and major prevalent diseases for a total of over 50 variables. Among them, a selection was made choosing 35 variables after exclusion of those creating collinearity problems in multivariate analysis (for example, systolic versus diastolic blood pressure), those that were included into composite variables (for example, height and weight as components of body mass index), and a few others of minimal interest and/or uncertain interpretation. They are listed in Table 1 together with procedures and criteria, mean values, and references for measurements techniques [3,4,5,6,7,8,9,10,11]. About 3% of variables were missing and they were imputed by multivariate normal procedure.

Table 1 Risk factors measured at entry, definitions, unit of measurement, and mean levels and selection

It is necessary to provide some extra information about variables dealing with diet and physical activity, since the complex procedures could be found only in the bibliographic references reported in Table 1. A dietary history was collected and 18 food groups were identified and then used to produce an a-posteriori dietary score though the Principal Component Analysis that allowed to compute an individual factor score. This score had an inverse gradient with certain events (initially coronary heart disease), and then, it was divided into three classes (of equal size to be conservative) that showed differences in the intake of various food groups. The upper class (33.3%) had a higher use of cereals, bread, potatoes, vegetables, fish and lower use of meat, milk, butter, sugar, pastry, and alcohol, compared with the lower class (33.3%). The intermediate class (33.4%) had intermediate food groups’ intake [5, 6]. The labels given to the three classes were an arbitrary choice (Mediterranean, Prudent, and Non-Mediterranean) and any other terms could have been equivalent. The score was not based on an a-priori procedure, and therefore, it was not influenced by the opinions of the investigators. The multivariate coefficients of both Cox and MLR models confirmed that Mediterranean diet was beneficial versus the reference (Non-Mediterranean) and that Prudent diet had a similar effect but to a lesser extent.

Physical activity classes were determined from engagement due to occupation matched with a few questions and divided into three classes, i.e., sedentary, moderate, and vigorous. This rough classification was validated by ergonometric tests and by the energy expenditure derived from the dietary history [4, 5].

The study population was followed up for life status and mortality during 61 years. Collection of data started before the era of the Helsinki Declaration, with consent implied by participation at the baseline examination, while, in subsequent surveys, verbal or written consent was obtained for collection of follow-up data.

Statistical analysis

A Kaplan–Meier survival curve was computed to describe overall survival trend. A Cox proportional hazard model was solved with all-cause deaths as end-point and the 35 personal characteristics as covariates (32 variables plus 3 reference variables for specific groups of them). A multiple linear regression model (MLR) was also solved using AD (the age at the time of death or when the subject was last seen alive) as dependent variable (survival) and the same personal characteristics used in the Cox model as covariates. Men still alive or lost to follow up (for total of 4 units) were treated as censored in the Cox model.

For both models, the standardized coefficients—that allow to compare their predictive power—were computed and partially reported beyond the original ones. They were computed as the product of a coefficient times the standard deviation of its respective variable divided by the standard deviation of the events. Both Cox and MLR models were replicated including only the risk factors that were significant in both analyses. Then, those risk factors were used to solve, separately for the two approaches, the same models (reduced models), and a comparison was made between the coefficients produced by the full models with those of the reduced models.

A side analysis consisted in solving another MLR model with the addition of the square of Body Mass Index to test to hypothesis of a curvilinear relationship with AD.

Using the intercept and the coefficients of the reduced model of MLR, estimates of AD were computed for a small and arbitrary number of combinations of different levels of the various risk factors.

Finally, baseline mean levels of entry risk factors were tabulated for men divided into three classes of AD, i.e., < 70 years, 70 to 80 years, and > 80 years.

Results

Personal characteristics listed in Table 1 reflect the levels of middle-aged men in rural environments in Italy in the mid of last century.

During 61 years of follow-up, in the original cohort of 1712 men, there were 1708 deaths (99.8%), 3 survivors aged between 102 and 106 years, and 1 lost to follow-up after 50 years when he was aged 91 years.

A Kaplan–Meier curve was produced and is reported in Fig. 1 when the decline of survival showed a moderate increase of mortality during the first 10 years that then tended to accelerate until about year 40 when the acceleration faded out.

Fig. 1
figure 1

Kaplan–Meier estimate of survival of the study population during 61 years of follow-up

The Cox proportional hazard model with all-cause mortality as end-point and 32 risk factors (plus 3 reference factors) is reported in detail in Table 2. Out of the 32 variables, 18 were statistically significant (with p ≤ 0.05) as shown by the hazard ratio and their 95% confidence limits. Taking into account the standardized coefficients, the five most directly related to events were: age, systolic blood pressure, cancer, serum cholesterol, and laterality/linearity index. The five most inversely related were: vital capacity, Mediterranean diet, vigorous physical activity, never smoker, and Prudent diet.

Table 2 Cox proportional hazard model with all-cause mortality as dependent variable and 32 risk factors (plus 3 references) as covariates. Units of measurement as in Table 1

The MLR model (Table 3) had AD as end-point and was based on the same risk factors, but the algebraic signs of coefficients were expected to be inverse compared to those of the Cox model. The overall R of the model was 0.42 (R2 0.18) and the significant risk factors (with p ≤ 0.05) were 19. For this model, we present the original coefficients, instead of the hazard ratios, because these coefficients directly represent the number of years gained or lost (depending on the algebraic sign) as a function of one unit of measurements of the corresponding risk factor.

Table 3 Multiple linear regression model with all-cause mortality as dependent variable and 32 risk factors (plus 3 references) as covariates. Units of measurement as in Table 1

Again, using the standardized coefficients, the five most inversely related factors were: systolic blood pressure, cancer, serum cholesterol, body mass index, and mother early death. The five most directly related factors were: subscapular skinfold, vital capacity, arm circumference, Mediterranean diet and never smoker. Among them, subscapular skinfold, usually considered an indicator of obesity, was unexpectedly directly related to AD. Within the rank of 5 risk factors favoring longevity and inversely related to mortality, those present in both models were: Mediterranean diet, never smoker and vital capacity. Within the rank of five risk factors directly related to mortality and inversely related to longevity, there were: cancer, systolic blood pressure, and serum cholesterol.

The Cox model can be solved also using AD as time variable, and in a test, we found that the coefficients of covariates were not significantly different from those of the standard Cox model except that of age in its role of covariate. The number of significant coefficients was the same (i.e., 19).

In a side analysis, we re-computed the multiple linear regression model adding the square of Body Mass Index to test the hypothesis of a possible curvilinear relationship with AD. Both linear and quadratic terms were statistically significant and the outcome allowed to produce the curve depicted in Fig. 2. A level of around 26 units of Body Mass Index was associated with the highest survival that, on the contrary, declined for higher or lower levels.

Fig. 2
figure 2

Curvilinear association of body mass index with age at death, adjusted for 33 other risk factors in a 61-year follow-up derived from a dedicated MLR model

Comparing the Cox model with the MLR regression, we found that the same 18 risk factors were significant in both models and are marked in Tables 2 and 3. Coefficients of the reduced models based on those 18 risk factors were compared, for each approach, with those of the correspondent full model and no significant differences were detected. Overall, the reduced Cox model had a worse performance than the full model, since the difference of their log-likelihood statistics, after having taken into account the different number of covariates, was statistically significant (p of difference = 0.0027). The R of the reduced MLR model, instead, was not significantly different from that produced by the full model (p = 0.6301).

The MLR model had other advantages versus the Cox model, since it dealt with a continuous end-point ranging from 41 to 106 years and offered a simple interpretation of multivariate coefficients of risk factors that represent the number of years gained (if positive) or lost (if negative). This approach is immediate for dichotomic variables, while the coefficients of continuous variables refer to one unit of measurement and must simply be multiplied by the desired amount of difference. For example, the coefficient of systolic blood pressure has a value of −0.01046 and refers to 1 mmHg, but multiplied by a difference of 20 mmHg it becomes −2.92 that are the years lost for an increase of 20 mmHg. In this model, the coefficient of age is positive which means that being older increases the chances to become even older.

The two multivariate models included all deaths occurred in 61 years, but it is probable that some causes of death might have nothing to do with the risk factors considered in the analysis. Therefore, we computed two new models (Cox and MLR) after excluding deaths due to infectious diseases and violence of any type. The outcome was very similar and this could be expected since the excluded events covered only 5% of all deaths.

The contribution of the various risk factors to AD can be evaluated considering some blocks of them: combining the three best behaviors (never smoker, vigorous physical activity, and Mediterranean diet), there is a benefit of 7.9 years; combining the four major diseases, the adverse effect corresponds to 28.7 years, although in the original data no man had more than two major diseases; combining the increase of 1 standard deviation of systolic blood pressure and serum cholesterol and a decrease of 1 standard deviation of vital capacity, the adverse effect is of 4.6 years compared with the original mean values (for not saying the possible effect on incoming diseases).

Estimates of AD as a function of different levels of risk factors produce, theoretically, an enormous matrix with infinite combinations due to the presence of some continuous variables. An attempt to present an arbitrary series of combinations of risk factors and their effect on AD using selected variables and options is given in Table 4 using the MLR reduced model including 18 risk factors. Note that these, as well as those quoted in the previous paragraph, are only theoretical estimates of AD, based on the multiple linear regression and not a description of facts. They correspond to the estimates of cardiovascular risk that can be made by many softwares elaborated for prediction of risk, the only difference being that here we make estimates of AD. Therefore, any even apparently impossible combination may make sense and can be met in practical applications.

Table 4 Theoretical estimates of age at death based on multiple regression model with 18 risk factors using arbitrary combinations of risk factors levels

The rationale beyond the choice of the nine combinations and their interpretations are listed below:

  • Age was kept fixed at 50 years except when explicitly mentioned, while a number of arbitrary variations were hypothesized for the other factors;

  • Combination 1. This is only theoretical, since the dichotomic variables were expressed as proportions, and not as 0 or 1; as a consequence, the estimate is valuable for a population group but not necessarily for single individuals;

  • Combination 2. This represents the effect of having the same levels of risk factors as in Combination 1 but at age 60, offering an advantage, since the coefficient for age is positive and favors the reach of older ages, with an expected increase of 2.4 years;

  • Combination 3. This represents the effect of having the same levels of risk factors as in Combination 1 but at age 70, offering an advantage, since the coefficient for age is positive and favors the reach of older ages, with an expected increase of 5.8 years;

  • Combination 4. This is still referred to the overall population situation (as in combination 1) but probably is more realistic, since dichotomic variables were expressed using their mode;

  • Combination 5. This is the effect of healthy lifestyle behaviors actually representing an underestimate, since they should be accompanied by beneficial levels of some other risk factors;

  • Combination 6. This is the effect of healthy lifestyle behaviors accompanied by good levels of major risk factors;

  • Combination 7. This is the adverse effect of the presence of two major and rather common diseases together with mean levels of the other risk factors;

  • Combination 8. This is the effect of bad lifestyle, and bad levels of 3 major risk factors;

  • Combination 9. This combination produces one of the worst possible effects due to the early death of mother, 2 major diseases, bad lifestyle habits, presence of clinical signs of dyslipidemia, and adverse levels of some major risk factors, eventually associated with a loss of more than 20 years in AD.

In general, when the estimates include mean levels of discrete variables, the value of the estimated AD is more theoretical than practical and does not apply to single individuals. An easier, simpler, rough but more realistic way to look at the same problem is reported in Table 5 where baseline mean levels of entry risk factors were tabulated for men distributed into three classes of age at death, i.e. < 70 years, 70–80 years, and > 80 years. It is clear that differences, although small, in mean levels of most risk factors are associated in a graded way with AD.

Table 5 Average levels of risk factors in three classes of age at death

Test of ANOVA for continuous variables and the chi-squared for discrete variables provided significant p values ≤ 0.05 for 21 risk factors largely coincident with those identified in this way in multivariate models. However, this more understandable presentation suffers from the absence of any adjustment across variables that only multivariate models can provide and this explains some of the differences versus the Cox and MLR models.

Discussion

This analysis represents, at least partially, a replica of those published in 2013 and 2018 [3, 12] on the same material with 50 instead of 61 years of follow-up. The use of the Cox model was based on its traditional role in estimating the probabilities of events as a function of possible determinants, although in this case, the survivors were too few and the outcome documented only the strength of risk factors coefficients. The use of AD, as expression of longevity, in the role of dependent variable in an MLR model was more rationale and promising. Incidentally, AD is presently considered a valuable metrics for the study of longevity when cohorts are close to extinction [13,14,15,16]. The two models performed in a relatively similar way in identifying the predictive power of risk factors, but for practical purposes, the MLR was more valuable.

We used a smaller number of variables compared with previous analyses dealing with 50-year follow-up and the procedure was slightly different, but the basic findings were very similar. Perhaps, this outcome could be expected, since the number of deaths in the last 11 years increased only by 39 units (from 1669 to 1708) and the death rates from 97.5% to 98.8%. Many risk factors had the role of determinants of AD and longevity, although their single strength, expressed by the coefficients of the multiple linear regression, was not exceedingly high except the case of cancer. It was confirmed that some behavioral risk factors, some traditional mainly cardiovascular risk factors, and the prevalence of major diseases provide an important contribution to the prediction of AD, that is of expectancy of life starting at middle-age of men around the mid of last century. They were: being a never smoker, having a vigorous physical activity, following a Mediterranean diet, having low blood pressure and serum cholesterol, having high arm circumference and vital capacity, and being free from clinical signs of dyslipidemia and from a few major diseases.

The theoretical outcome was clearly more favorable when several beneficial determinants were combined together, as shown in Table 4, with gains in AD up to 10 years or more.

A limitation of this study is that only men are available and that several examinations repeated during the follow-up included only a small subgroup of the measurements taken at baseline. During the follow-up, one of the most relevant changes was the decline of smoker’s prevalence that after 30 years (when the survivors were 39% of the original cohort) dropped to 16% from the initial 61%. Despite that, the adverse role of smoking habits remained present for the entire follow-up period. The corresponding changes of systolic blood pressure were, on average, from 143 to 160 mmHg and that of serum cholesterol from 201 to 208 mg/dl. Again, this study did not consider a number of measurements that became available or simply standardized only many years after the entry examination.

Finally, this cohort was probably little benefited by modern drugs, sophisticated surgical procedures, and preventive measures. This is only an opinion that cannot be demonstrated, but if we take the year 1990 as an arbitrary date for the spread of “modern medicine”, at that time, the survivors were less than 40% and were aged 70–90 years; therefore, we doubt that much attention in terms of prevention and even advanced therapeutic approaches were provided to them during the residual years of life.

The literature offers rare contributions of any type that mention very long follow-up, say 50 years or so [17,18,19,20,21,22,23,24,25]. However, among them, only exceptionally the structure of the analysis gives a baseline measurement of possible determinants that are tested for their possible role in predicting long-term events. The best example was the British Doctors Smoking study [24].

Other more specific studies related to all-cause mortality showed a variety a characteristics [26,27,28,29,30,31,32,33,34,35,36,37,38]. A study was limited to evaluate the influence on longevity of quality of medical care plus some behavioral risk factors [33]. Many others dealt almost exclusively with classical lifestyle behaviors such as smoking and dietary habits and physical activity where non-smoking, being physically active, and following a healthy diet were accompanied, mainly when combined together, by a reduced all-cause mortality in follow-up periods that usually did not exceed 25 years [27,28,29,30,31, 33, 34, 36, 37].

In the reported literature, a few papers are more similar to our analysis, involving more risk factors of different type [26, 35, 38]. In an old Chicago Study in Industry [26] combining several population groups of adults of both genders and with a follow-up of 12 to 30 years, low levels of traditional cardiovascular risk factors including smoking habits, blood pressure, serum cholesterol, and a low intake of dietary cholesterol were associated with an increased longevity. In a study run in Sweden on business people of both genders, followed up for 16 years, healthy intake of alcohol, fish, fruit, vegetables, processed meat (low), having a vigorous physical activity, and refrain from smoking, all adjusted for educational level and body mass index, were associated with clear benefits in terms of all-cause mortality [35]. In a large US study [38], risk factors associated with lower probabilities of reaching the age of 90 years were genetic liability of type 2 diabetes, genetically predicted systolic and diastolic blood pressure, body mass index, LDL cholesterol, and smoking habits; on the other hand, genetically increased HDL cholesterol and high educational levels were associated with higher probabilities of reaching the age of 90 years.

During the last decades, a lot of knowledge was piled up in terms of genetics of longevity [39], but clearly, our study could not exploit this novel approach, since in 1960, valuable and standardized test was not available for field operations.

Nevertheless, our study offered a unique opportunity to evaluate the role of many risk factors and personal characteristics in their association with longevity. Among the significant ones there are, beyond age, some family health profile (likely much related to genetics), lifestyle behaviors, anthropometric measurements, circulatory and respiratory measurements, traditional risk factors, clinical signs of dyslipidemia, and major chronic diseases. All this was projected through an extremely and unusual follow-up of 61 years when the study population was practically extinct.