1 Introduction

One reason that mortality risk plays such a key role in individual decision making is that individuals may take life expectancy into account when making economic choices such as when to retire, how much to save for old age, whether to purchase a life insurance policy, and how much money to set aside for bequests. It is as well a consideration in late life decisions like whether or when to move to a nursing home. As a result, theoretical economic models of life cycle behavior have shown the importance of allowing for lifetime uncertainty for economic outcomes (e.g., Hurd 1989) and since the 1990s, several household surveys have begun including probabilistic questions (Manski 2004) that can be used to measure individual lifetime uncertainty. These so-called subjective survival probabilities (SSPs) elicited from survey respondents are promising because unlike standardized tools such as life tables, they provide individual variation in survival chances and use an interpersonally comparable numerical scale of probabilities (Hurd 2009). Nevertheless, the usefulness of SSPs hinges largely on the assumption that respondents are able to answer these questions in a meaningful way.

Our paper thus focuses on the steps that must be taken before developing and estimating economic models that incorporate SSPs. In particular, by comparing average objective mortality in a group with average SSPs (cf. Hurd 2002; Khwaja et al. 2007; Delavande and Rohwedder 2011), we investigate the extent to which individual beliefs about survival relate to actual mortality.Footnote 1 There is, however, a general limitation of this validation exercise which is worth mentioning at the beginning. Although SSPs are broadly in line with objective values, some notable discrepancies suggest either that individuals’ underlying beliefs are biased or that SSPs do not accurately measure beliefs. If the latter is correct, then either SSPs are inherently useless (at least for some groups) or the presently used question formats are not working efficiently. Hence, before any understanding can be reached of why differences exist between objective mortality and SSPs, these discrepancies must be documented. This paper aims to provide such documentation for the Netherlands.

Earlier research shows that life cycle models that measure mortality based on life tables rather than SSPs are unable to explain several common data anomalies. These include retirees having large amounts of assets even at advanced ages (Poterba et al. 2011; Van Ooijen et al. 2014), inadequate savings before retirement (Hurd and McGarry 2002), a falling retirement age as life expectancy increases (O’Donnell et al. 2008), and the annuitization puzzle (Teppa and Lafourcade 2013). SSPs may shed light on these issues because they contain information about individuals’ survival beliefs, which can differ from actuarial survival probabilities.Footnote 2 For instance, Teppa and Lafourcade (2013), after finding that annuity demand is higher for those who expect to live longer, argued that individual underestimation of life expectancy may be a reason why only a small fraction of Dutch individuals buy life annuities. Another related question is whether SSPs predict individuals’ economic decisions, especially given US evidence that they explain savings and consumption behaviors better than do life table probabilities (Gan et al. 2004; Salm 2010).

The novelty of our paper is the use of Dutch data and the inclusion of 25- to 49-year-olds in the analysis. To our knowledge, this is the first study which investigates the relation between SSP and actual mortality for a sample that includes individuals under 50. Much previous research (e.g., Smith et al. 2001; Hurd and McGarry 2002; Siegel et al. 2003; Perozek 2008; Elder 2013) has relied on US Health and Retirement Study (HRS) data for those aged 50 and over. The goals of the paper are twofold. First, we test an observation made by several other researchers (Smith et al. 2001; Hurd and McGarry 2002; Siegel et al. 2003) by exploring whether these Dutch respondents’ beliefs about their survival chances (SSPs) contain predictive power for their own deaths even when a large set of health indicators are controlled for. Any predictive power for realized mortality in the presence of health indicators would suggest that SSPs convey information not contained in an individual’s current health status, such as potential disease risks or parental longevity (Hurd and McGarry 2002).

The overall research evidence on the accuracy of SSPs, however, is inconclusive. On the one hand, Perozek (2008) investigated whether the Social Security Actuary’s (SSA) revision to the longevity gender gap could be predicted by the gender gap implied by the subjective cohort life tables based on individuals’ SSPs. She found that the latter predicts a smaller gender difference in life expectancy than did the SSA life tables and, thus, claimed that SSPs are better predictors than actuarial forecasts. Elder (2013), however, argued that SSPs are much worse predictors than actuarial forecasts as the implied age profiles are too flat. We contribute to this debate by providing new evidence from another country and a broader age range.

Our second goal is to expand on previous US-based research into whether individuals over- or underestimate their survival chances (e.g., Schoenbaum 1997; Khwaja et al. 2007; Delavande and Rohwedder 2011; Gerking and Khaddaria 2011; Bissonnette et al. 2014), thereby taking into account certain personal characteristics related to socioeconomic status, health, and behavioral risk indicators. Khwaja et al. (2007), for instance, showed that on average SSPs are very close to their objective counterparts, with current smokers being optimistic and those who never smoked being pessimistic in their assessments. Delavande and Rohwedder (2011) subsequently demonstrated that actual mortality and subjective survival expectations are similarly associated with wealth, income, and education.

2 Data

This paper combines Dutch survey data, which include respondents’ survival expectations, with individual-level administrative data that contain the respondents’ date of death. The survival expectations, expressed as SSPs, are taken from the 1995 and 1996 waves of the DNB Household Survey (DNB-HS), an ongoing longitudinal panel survey formerly known as the CentER Savings Survey. Although initiated in 1993, the survey did not adopt the SSP questions until 1995; hence, our data set includes only the 1995 and 1996 survey data because these two waves alone are linkable to our administrative data.

The DNB-HS database, compiled using an Internet survey of around 2550 Dutch households, includes detailed information on respondent age, income, health, education, labor market status, assets and liabilities, and psychological state (see Alessie et al. 2002 for a detailed description). Each year, all household members aged 16 or over are interviewed online. Those who do not have a computer and/or Internet access are provided with these tools by the survey agency. The DNB-HS consists of two different panels: a nationwide panel of around 1900 households and a high-income panel of around 650 households representing the top 10% of the income distribution. This sampling method could make our baseline sample non-random. As discussed by Delavande and Rohwedder (2011), non-randomness of the sample does not affect the validity of our empirical results as our analysis compares subjective and objective mortality for a given population. We nevertheless check whether sample non-randomness affects our results by repeating our analysis with a sample restricted to households in the representative panel survey (see Sect. 4).

The DNB-HS measures subjective survival probabilities using the following survey question: ‘How big do you think is the chance that you will attain (at least) the age of T?’ where T ϵ {75, 80, 85, 90, 95, 100} is a target age dependent on the respondent’s current age. That is, respondents aged 25 through 65 report their survival probability to ages 75 and 80; those aged 65 through 70, to ages 80 and 85; and those aged 70–75, 75–80, and 80–85, to ages 85 and 90, 90 and 95, and 95 and 100, respectively. The responses are measured on a 10-point scale from 0, ‘no chance at all,’ to 10, ‘absolutely certain.’ In our analysis, we follow Hurd and McGarry (1995) by assuming that once divided by 10, the responses can be interpreted as probabilities conditional on being alive at a certain age. To construct our main variable of interest, median remaining life duration, we use the survival probabilities for the two different target ages reported by each individual.

The actual mortality data for the survey respondents were obtained from the Dutch causes of death registry (Doods Oorzaken, DO), which recorded the death dates of all residents deceased during the 1995–2010 period. These data were provided by medical examiners, who are legally obliged to submit them to Statistics Netherlands. The DO data set also assigns a personal identifier that matches the personal identifier in the DNB-HS, thereby allowing determination of whether individuals in the 1995 or 1996 DNB-HS waves were still alive at the end of the observation period (December 31, 2010) or whether they had died and on which date.

2.1 Sample Selection

If respondents are in both the 1995 and 1996 waves, we use only the earlier response to avoid any potential influence of repeated interviewing on respondent behavior (Lazarsfeld 1940; Sturgis et al. 2009). This method eliminates any possible learning effect such as respondents asked about survival probability in 1995 seeking more information about their survival chances before responding in 1996. As a result, we have essentially a cross-sectional sample comprising one observation per individual in either 1995 or 1996. We then further exclude individuals under 25 who may still be enrolled in education and have no individual income data available, for a sample size of 5747.

Before starting our analysis, we assess the extent to which DNB-HS respondents were willing and able to respond to SSP questions by reporting the response rates, fraction of consistent answers, and fraction of focal point responses (i.e., answers that cluster around 0, 0.5, or 1). As regards response willingness, earlier research has revealed that the non-response rate to probabilistic questions is as low as that to traditional questions on the same subjects (see Manski 2004 for a review of the US findings). The response rate in our sample is about 86%, considerably lower than the approximately 98 or 90% for the HRS (Hurd and McGarry 1995) and SHARE surveys (Peracchi and Perotti 2011), respectively. Nevertheless, this response rate suggests that a majority of the respondents were willing to answer SSP questions.

It is also well documented in the US literature that most respondents are able to provide SSP responses that are consistent with the laws of probability (e.g., Hurd and McGarry 1995; Perozek 2008). For example, if a respondent indicates a survival probability to age 75 that is less than or equal to the survival probability to age 80, the answer violates the strict monotonicity assumption. In fact, the survival probability to age 75 should be greater than that to age 80 because reaching age 80 requires survival to 75. Moreover, because respondents face a non-negligible mortality risk between the ages of 75 and 80, those who provide the same answer to these two questions are being internally inconsistent, an issue discussed in more detail below.

The fraction of respondents providing answers that satisfied the strict monotonicity assumption was about 67%. Around 32% reported equal survival probabilities for the two target ages, while only about 0.63% indicated a survival chance to the earlier target age that was less than that to the later target age. As noted by Perozek (2008), respondents with equal survival probabilities can still give valuable information about the shape of individual survivor functions. Since the answer to survival probability questions in DNB-HS survey ranges from 0 to 10, respondents who provide equal survival probabilities for the two target ages may be rounding out their true survival probabilities to the nearest tenth. Thus, following Perozek (2008), we retain respondents with equal probabilities in our sample by assuming a 10% shift between the equal responses.Footnote 3

Similarly, the coarseness of the rating scale offered could lead individuals providing focal point answers to SSPs to round their true probabilities to the nearest tenth. Fifty–fifty (50%) answers in particular, rather than reflecting individuals’ true assessments of their survival probabilities, could be driven purely by uncertainty about time of death (see Fischhoff and Bruine de Bruin 1999; Hudomiet and Willis 2013). In our sample, the percentage of 50% focal point SSP answers is high, between 23 and 30%, depending on the target age in the question. In light of the Kleinjans and van Soest (2013) evidence that when rounding and 50% focal point answers are taken into account, the coefficient estimates on the determinants of subjective survival probabilities do not change significantly, we do not treat focal responses any differently.

The baseline sample, then, includes 5747 observations, 1313 (22.9%) of which we are forced to drop because either respondents’ reported survival chances to the earlier target age are less than that to the later target age (0.63% of the sample) or information is missing for one of the covariates, most often household income. Our final sample thus includes 4434 individuals aged 25 and over, 463 (10.4%) of whom died during the 15 years of follow-up.

2.2 Covariates

The main variable of interest in the mortality hazard model is subjective remaining life duration, which, as demonstrated in Sect. 3.3, is calculated based on two SSPs. The other variables are gender, birth year, a year dummy for 1996, educational attainment, household income, marital status, whether or not the individual has a chronic illness, self-rated health, and other health-related factors such as smoking, heavy drinking, and being overweight or obese. Smokers are individuals who smoke cigarettes every now and then or every day; heavy drinkers are those who consume at least four drinks a day. Overweight and obese are defined as a body mass index (BMI) greater than or equal to 25 but less than 30, and greater than or equal to 30, respectively.

As Table 1 shows, women, with a 47% share, are slightly underrepresented in our sample. The mean respondent age at the time of interview is about 47 years. About 87.9% of the respondents are married, 31.9% are current smokers, 8% are heavy drinkers, 33.2% are overweight, 6% are obese, 82.9% self-rate their health as good or excellent, with only 23.7% reporting a chronic condition such as a long-term illness, disorder, or disability. The sample consists mostly of individuals with pre-university education or junior/senior vocational training. To measure income, we adopt the Statistics Netherland (SN) definition of standardized household income as the sum of the net annual incomes of all household members divided by SN’s equivalence scale (Siermann et al. 2004). The average annual standardized household income for our sample is f43,349 (€19,724). To mitigate the effects of outliers, we follow Delavande and Rohwedder (2011) and divide the income distribution into three parts of equal size, with a tercile dummy indicating to which income tercile the respondent belongs.

Table 1 Variable means, medians, and standard deviations

The role that each covariate could play in the objective and subjective mortality models as follows: Gender takes into account that women on average live longer than men. To account for a possible time effect, we add a year dummy for 1996. Birth year controls not only for the increase in life expectancy over generations but also for the negative correlation between subjective remaining life duration and baseline age. Controlling for the latter is of particular importance as the scale parameter of the hazard function should be age invariant. We capture the socioeconomic differences in mortality by educational attainment, household income, and marital status (Van Kippersluis et al. 2009; Kalwij et al. 2013). Finally, we include self-rated health, behavioral factors such as smoking, heavy drinking, and BMI, and whether or not the individual has a chronic illness to account for mortality differences across individuals with different current health statuses. In addition to those covariates, we also estimate the shape parameter of the objective and subjective hazard functions, \(\gamma\), to capture an increase in mortality risk with age.

3 Estimation Methodology

Most previous literature has relied on proportional hazard rate models to estimate the risk of death during a specific time period, with life duration modeled either by parametric distributions like those of Gompertz and Weibull or semiparametric methods like the Cox proportional hazard model (Gompertz 1825; Cox 1972; Wilson 1994; Siegel et al. 2003; Perozek 2008; Bissonnette et al. 2014). In this paper, we adopt a parametric approach to be able to find estimates of a mortality risk model based on both subjective and objective survival information. In contrast to a semiparametric Cox model, in parametric models the shape of the hazard function is determined by the functional form assumptions about the duration dependence. As demonstrated in the next two subsections, choosing a parametric hazard rate function allows comparison between the estimated parameters of the subjective and objective mortality models. Because we observe only two SSPs for each individual, we can estimate the subjective model by assuming a functional form for individual survival functions. The two parameters of, for instance, the Gompertz or Weibull distributions are exactly identified given two points on the survival function. According to Bissonnette et al. (2014), both these distributions lead to quantitatively similar estimates of objective and subjective mortality models. In our analysis, however, based on its extensive demographic use to model human mortality, we opt for the Gompertz distribution, whose survival function tends to fit the survival data of humans aged 10 to at least 85 better than the Weibull survival function (Wilson 1994).Footnote 4

As to the role of individual characteristics, because they are observed only in the baseline year, our model assumes that, other than age, they are all time-invariant. In other words, we estimate only how objective and subjective mortality risks are associated with current socioeconomic characteristics and health status. The next subsections explain our statistical models and empirical specifications in more detail.

3.1 Objective Mortality Model

In our objective mortality model, respondent i is aged \(t_{0,i}\) at the start of the observation period (in 1995 or 1996) and aged t i at the end of the observation period (December 2010) or at the time of death, whichever comes first. The respondent’s characteristics are denoted by \({\mathbf{x}}_{{\mathbf{i}}}\) and measured at the start of the observation period (in 1995 or 1996). T is a random variable representing the respondent’s age at death (life duration), which is assumed to follow a Gompertz distribution, such that the hazard function can be written as

$$\theta (t_{i} \left| {{\mathbf{x}}_{{\mathbf{i}}} } \right.;{\varvec{\upbeta}}^{obj} ,\gamma^{obj} ) = \exp \left( {\gamma^{obj} t_{i} } \right)\exp \left( {{\mathbf{x}}_{{\mathbf{i}}} {\varvec{\upbeta}}^{obj} } \right) .$$
(1)

The maximum likelihood estimates of the model parameters are then given by

$$\left( {{\hat{\varvec{\upbeta }}}^{obj} ,\hat{\gamma }^{obj} } \right) = \mathop {\text{argmax}}\limits_{{{\varvec{\upbeta}}^{obj} ,\gamma^{obj} }} \log \prod\limits_{i = 1}^{N} {\exp \left\{ { - \int\limits_{{t_{0,i} }}^{{t_{i} }} {\theta (s\left| {{\mathbf{x}}_{{\mathbf{i}}} } \right.;{\varvec{\upbeta}}^{obj} ,\gamma^{obj} )ds} } \right\}\left( {\theta (t_{i} \left| {{\mathbf{x}}_{{\mathbf{i}}} ;{\varvec{\upbeta}}^{obj} ,\gamma^{obj} } \right.)} \right)^{{d_{i} }} } ,$$
(2)

where N is the number of individuals in our sample, and d i is a dummy variable equal to 1 if the respondent has died at time t i and 0 otherwise. The age gradient, hereafter mostly referred to as the shape parameter, is determined by parameter γ.

3.2 Subjective Mortality Model

For this model, we use subjective mortality information to estimate a set of parameters analogous to those of the objective mortality model. As before, we assume a Gompertz hazard function given by

$$\theta (t\left| {{\mathbf{x}}_{{\mathbf{i}}} } \right.;{\varvec{\upbeta}}^{subj} ,\gamma^{subj} ) = \exp \left( {\gamma^{subj} t} \right)\exp \left( {{\mathbf{x}}_{{\mathbf{i}}} {\varvec{\upbeta}}^{subj} } \right)$$
(3)

Given the resulting survivor function, target ages, and SSPs, we can then estimate the following system of linear equations (see the Appendix for details):

$$\gamma_{i}^{*} = \gamma^{subj} + \varepsilon_{1i}$$
(4)
$$\ln \left( {\lambda_{i}^{*} } \right) = {\mathbf{x}}_{i}^{'} {\varvec{\upbeta}}^{subj} + \varepsilon_{2i}$$
(5)

where \(\gamma_{i}^{*}\) and \(\lambda_{i}^{*}\) are the estimated parameters of the individual survival functions, and the error terms \(\varepsilon_{1i}\) and \(\varepsilon_{2i}\) are allowed to be correlated with each other. We obtain the \({\hat{\mathbf{\beta }}}^{subj}\) and \(\hat{\gamma }^{subj}\) estimates using seemingly unrelated regression estimator (SUR; Zellner 1962).

3.3 Empirical Specification

The objective of our empirical analysis is to show how well Dutch beliefs about survival relate to actual mortality. To do so, we first check whether SSPs are good predictors of actual mortality even when a large set of health indicators are controlled for, an assessment tested in Sect. 4.2 by estimating the objective mortality model using subjective remaining life duration as a covariate while again controlling for health indicators. This first step, however, although it suggests whether or not SSPs are useful measures, does not identify which groups are relatively better or worse at predicting their life expectancies. Hence, in a second step, we compare the estimated coefficients of subjective and objective mortality models and document the observed discrepancies. This analysis both differentiates groups of people based on socioeconomic status, health, and behavioral risk indicators (e.g., smoking and drinking) and distinguishes those who are relatively correct in predicting their life expectancy from those who over- or underestimate it.

We construct our main variable of interest in the objective mortality model based on the information contained in the respondents’ answers to the two SSPs. To do so, we must solve two unknown parameters of the Gompertz distribution using two SSPs for each individual. We then use the derived survival function parameters to compute the subjective median remaining life duration conditional on baseline age for each individual in the sample [Eq. (12), in the Appendix]. We compute median life expectancy rather than expected or average life expectancy because the former has a closed-form solution, whereas the latter requires a discrete approximation of the continuous distribution (see, e.g., van Santen 2013). Likewise, because respondents report their SSPs knowing that they have survived up to their current age, we calculate remaining life duration conditional on baseline age. In addition to subjective median remaining life duration, we include several covariates in the scale parameter of the objective hazard function [Eq. (1) and described in Sect. 2.2] that we also use to estimate the scale parameter of the subjective hazard function (Eq. (3)).

We report our key findings both for the main sample of individuals aged 25+ and separately for an older sample of 50+ individuals. We check the results’ sensitivity to the main sample’s non-randomness by excluding the 27% of individuals in the high-income panel. We also conduct additional sensitivity checks to test whether the results are robust to different assumptions about equal probabilities. After first assuming flatter survival functions than before—a 5% shift between equal probabilities rather than a 10% one—we exclude all equal answers from the estimation.

4 Empirical Results

4.1 Descriptive Analysis

A comparison of the subjective and objective remaining life durations across age categories and gender (see Table 2) reveals that for males, the subjective values are marginally lower than the objective values up to age 70 but then slightly exceed their objective counterparts. These differences, although small, are statistically significant at a 5% level for all male age groups except those 65–69. For females, in contrast, the subjective values are substantially lower than their objective counterparts at all ages, and all these differences are significant at the 5% level except for those aged 75–79. On average, the difference between subjective and objective remaining life duration is much smaller for males than for females. Moreover, as the table shows, male and female subjective remaining life durations are quite close to each other at all ages even though the objective values are different. Taken together, these results suggest that males are slightly pessimistic about their survival chances at younger ages but tend to be slightly optimistic at older ages, while females appear overly pessimistic at all ages.

Table 2 Means of objective and subjective remaining life durations across age categories and gender (in years)

4.2 Predictive Power of SSPs for Actual Mortality

To assess whether SSPs predict actual mortality, we first estimate our objective mortality risk model and then stepwise include a set of covariates (socioeconomic variables and health indicators) in addition to the subjective (median) remaining life duration. After a likelihood ratio test indicates that the pooling of male and female samples at a 5% significance level should not be rejected for any model, we include a gender control variable instead of reporting separate results for men and women. The signs of the estimates are informative about whether mortality hazard increases or decreases with respect to individual characteristics, while the exponential of the coefficients shows the size of the relative change in mortality risk.

The first model in Table 3 explains mortality risk only as a function of birth year, year dummy, age, and subjective remaining life duration. The coefficient of the latter is negative and statistically significant, suggesting that those who expected to live longer in the baseline year experienced a lower mortality risk than those who expected a shorter life. In the second model, which includes additional controls for gender and education, the predictive power of subjective remaining life duration remains unchanged and significant. The model does suggest, however, that, as might be expected, men have a higher mortality risk than women, and the more highly educated have, on average, a lower mortality risk than those with medium-level education. The p value of the Wald test for education (bottom of the table) also indicates that the coefficients of high and low education are jointly significant at a 1% level.

Table 3 Objective mortality: estimations

In the third model, which adds in standardized household income terciles and marital status as control variables, the predictive power of subjective remaining life duration is the same as before, but household income is only weakly associated with mortality risk and the respondent’s marital status has no significant effect on actual mortality. Adding health indicators into the set of covariates in the fourth model slightly reduces the magnitude of the effect, but it is still statistically significant at the 1% level. In fact, the size of the coefficient suggests that a 1-year increase in subjective remaining life duration is associated with a 2.3% lower mortality risk or, on average, a 0.6 yearsFootnote 5 longer realized remaining life duration. Among the health indicators, the statistically significant determinants of mortality risk are smoking, heavy drinking, obesity, having chronic illnesses, and being in good health, with the coefficients of obesity and overweight being jointly significant. The coefficient corresponding to the year dummy for 1996, however, is insignificant. As is to be expected, in all models, mortality risk increases significantly with age.

Table 4 illustrates the association between subjective remaining life duration and actual mortality when all controls are included under alternative specifications. In the first column, we restrict the sample to 50+ individuals for whom the predictive power of subjective remaining life duration remains unchanged albeit less precisely estimated. The second column reports the estimates when the main sample is restricted to the representative DNB-HS panel, which excludes high-income respondents. In this restricted sample, the magnitude of the coefficient on subjective remaining life duration is slightly reduced, although it remains significant at the 10% level. The third column then lists the outcomes for the main sample under the assumption of a 5% shift between equal probabilities. Again, the magnitude of the effect is quite close to that obtained under an assumption of a 10% shift between equal answers. Finally, the last column shows the results when all individuals who answered the same number for both probability questions are excluded, under which specification the predictive power of subjective remaining life duration completely disappears. It should be noted, however, that discarding equal responses eliminates 32% of the main sample, possibly producing an endogenous sample selection that could influence the estimates. Nevertheless, because equal responses are still informative about the shape of individual survivor functions, we conclude that a correct specification should include these responses.

Table 4 Objective mortality: sensitivity checks

4.3 Objective and Subjective Mortality Risk Models

As a second step in our validation exercise, we estimate the subjective and objective mortality risk models while including the same socioeconomic variables and health indicators in each model. If the respondents are able to predict their remaining lifetimes correctly, then the signs and magnitudes of the estimates obtained from the objective mortality model should coincide with those obtained from the subjective mortality model. We can then use the estimated coefficients to calculate the predicted median remaining life durations in order to assess which subgroups over- or underestimate their life expectancies [see Eqs. (13) and (14), respectively, in the Appendix].

The first two models reported in Table 5 explain subjective and objective mortality risk as a function of birth year, the year dummy, age effects, and socioeconomic variables. In both models, the shape parameter of the hazard function has the same sign and is statistically significant, although the age gradient is steeper in the subjective than in the objective mortality model. The coefficient on female in the subjective model, however, is insignificant even though the objective model predicts that women have a lower mortality risk than men. This finding suggests that males and females have similar beliefs about survival probabilities, but females do not, on average, expect to live longer than males. In fact, according to the Wald test, the female coefficient is significantly higher in the objective model at the 5% level (not reported in Table 5). We also find that highly educated individuals expect to die sooner than the medium educated even though the objective model suggests the opposite. However, the Wald test (not reported in Table 5) indicates that the equality of the education parameters in the two models cannot be rejected.

Table 5 Subjective and objective mortality risk: estimations

The third and the fourth models show the associations between health indicators and subjective and objective mortality risk. Here, smoking, heavy drinking, obesity, having chronic illnesses, and being in good health explain subjective mortality risk, but their associations with subjective survival chances are not as strong as those with objective survival. In addition, a Wald test (not reported in Table 5) indicates that the smoking coefficient in the objective model is statistically higher than that in the subjective model at a 10% level of significance, suggesting that individuals underestimate the health risks of smoking. For the other health indicators, we cannot reject the null hypotheses of equal associations in both the subjective and objective models.

In the last two models in Table 5, which control for both socioeconomic variables and health indicators, the coefficients on smoking, heavy drinking, good health, obesity, chronic illnesses, and the shape parameter of the hazard function have the same sign and are all significant. However, the results of a Wald test (not reported in Table 5) indicate that the shape parameter and the associations between female and smoking in the subjective model are statistically different from those in the objective model. On the other hand, we can find no significant differences between the parameters for the other variables.

Table 6 reports the predicted median remaining life durations by gender and by current health status when all other characteristics are held constant. We restrict the analysis to these characteristics because they have significant associations with actual mortality risk (see column 6, Table 5). As a reference individual (or group), we take a 45-year-old married man, born in 1950, living in a middle-income household, a non-smoker, non-heavy drinker, of normal weight and medium education, and in good health with no chronic illnesses, who reported his SSPs in 1995. According to the table, for this reference male, the difference between subjective and objective predicted remaining life durations is about −3.5 years, a difference insignificantly different from zero. We then change one characteristic of this reference individual at a time and report the results in the remaining rows. Of particular interest, we find a significant difference of −10.4 years for a 45-year-old woman, suggesting that women substantially underestimate their remaining life duration.

Table 6 Comparison of objective and subjective predicted remaining life durations (means)

Overall, the differences between subjective and objective remaining lifetimes for obese, unhealthy, heavy drinking men, and men with chronic illnesses are relatively small and statistically insignificant, indicating that these subgroups tend to predict their remaining life durations accurately. Smoking men, on the other hand, significantly overestimate their remaining life duration by about 2 years, which implies that smokers underestimate the health risks of their smoking habit.Footnote 6

5 Discussion

Overall, our use of a sample of Dutch respondents aged 25+ to analyze how well individual beliefs about survival chances predict actual mortality reinforces the findings of the US literature, which has relied primarily on individuals aged 50+. Our study has shown, for example, that in line with HRS-based evidence from Smith et al. (2001), Hurd and McGarry (2002), and Siegel et al. (2003), SSPs in the DNB-HS strongly predict actual mortality even when controls are in place for a large set of health indicators, including self-reported health. This finding suggests that SSPs contain information other than an individuals’ current health status. For example, although the age at which a parent died may not affect respondents’ current health, it may alter their expectations about the onset of a genetically linked disease (Hurd and McGarry 2002).

The above finding underscores the importance of eliciting SSPs in household surveys since the traditional questions on health status cannot provide this information on individual expectations. Moreover, as argued by Hurd (2009), probabilistic expectations have an advantage over non-probabilistic expectations in that the former is properly scaled, making them more easily comparable across different individuals. Non-probabilistic expectation measures, on the other hand, involve verbal statements such as ‘likely’ or ‘unlikely,’ which could be interpreted differently by different individuals. Another advantage of SSPs is that they can be used directly in life cycle models of consumption, savings, and retirement in which individuals’ decisions depend on their own mortality expectations. Because life table survival probabilities are homogenous across individuals with different socioeconomic and health status, SSPs are a better instrument for measuring individual mortality risk in these models.

Our analysis has also confirmed the associations between actual mortality and income, education, and marital status found by Van Kippersluis et al. (2009) and Kalwij et al. (2013) for the Netherlands, albeit here less precisely estimated, possibly because of the relatively smaller sample. We have also shown that on average, male survival expectations conform closely to realized mortality, whereas females are overly pessimistic about their survival chances, a finding noted by Perozek (2008) for the US based on fitted survival functions, and Teppa and Lafourcade (2013) for the Netherlands based on actuarial survival probabilities. Additionally, in line with the findings of Khwaja et al. (2007), Hurd (2009), and Bissonnette et al. (2014), smokers in the DNB-HS sample exhibit optimistic survival beliefs, an overestimation of life expectancy that may result from SSPs containing expectations about quitting smoking in the near future (Khwaja et al. 2007). On the other hand, in contrast to Elder (2013), we found that SSPs yield a steeper age profile of mortality risk than actual mortality, suggesting that the old overestimate their mortality risk.

Our analysis also identified discrepancies between SSPs and objective values that may result from individuals’ biased survival beliefs which may influence economic decisions. For instance, pessimistic survival expectations may lead to a higher probability of early retirement, a higher incidence of purchasing life insuranceFootnote 7 or inadequate savings for retirement. They might also have some implications for public health. For example, women with highly pessimistic survival beliefs may underestimate the years they will need health care. In particular, they may underestimate the years they will live without a spouse and/or depend on caretakers outside the home, which would increase health care expenditures and reduce individual well-being at advanced ages.

These findings imply that future research should examine whether incorrect beliefs about the remaining lifetime translate into welfare-reducing behavior in the context of financial decisions, retirement timing, health care needs and similar situations in which lifetime uncertainty is relevant. More investigation is also needed into why women underestimate their life expectancy far more than men. One possible explanation, according to Elder (2013), is that SSPs have a mean-reverting property; that is, probability answers are biased toward 50%. If female probabilities are farther away from 50% than male probabilities, female SSPs will be more biased. Another explanation is a higher prevalence of pain (Banks et al. 2009) and/or disability (Nusselder and Looman 2004) among women in the Netherlands, which despite our controlling for health indicators to some extent may still affect our results. If women incorrectly believe that their high rates of pain and disability forecast a higher mortality risk, they are likely to be more pessimistic than men about their survival chances. Nonetheless, Elder (2013), like Hudomiet and Willis (2013), calls for caution when interpreting and using SSPs because of the risk of measurement error, which must also be taken into account.

Finally, it would also be worth investigating whether individuals’ assessments of their own life expectancy are influenced by their knowledge of actuarial survival probabilities. Some individuals, for example, may have no idea how long people of their own age and sex live on average or may form incorrect beliefs about population life expectancy. If those who underestimate population life expectancy do indeed tend to underestimate their own life expectancy, then a public policy that aims at informing about population life expectancies should help them give more accurate answers to probabilistic survival questions.