1 Introduction

Measures of beliefs about survival probability are frequently available in individual survey data. These measures are of potential interest because they may be used in models of health investments that examine individuals’ choices of health-related behaviours over the life cycle. However, for each individual in the population, a self-assessment of survival probability might be largely influenced by the individual’s perceptions of certain health risks. These perceptions depend on how individuals evaluate the costs and benefits of their behaviours to their current health and their future mortality risk. Perceptions of health risks might influence individuals’ incentives to adopt healthy lifestyles. This paper contributes to the extant literature on smoking and risk perception that addresses the notion of information constraints that may cause the costs and benefits associated with health-related behaviours to be estimated erroneously (see Cawley and Ruhm 2011). The existing empirical studies that link smoking and individuals’ subjective perception of their health status and mortality risk have largely been based on US data and have generated mixed results.Footnote 1 In this work, we use European data to examine how individuals’ perceptions of the health effects of smoking influence their survival expectations and subjective health.

Over the past century, American and European populations have experienced positive trends in longevity, although the former is characterised by lower life expectancy. This gap largely reflects the higher prevalence of non-communicable diseases, the main causes of premature mortality in elderly populations, in the US than in Europe; in the middle-aged American population, these diseases are frequently related to various risk factors such as smoking and obesity (Michaud et al. 2011). However, the most recent estimations of daily tobacco consumption reveal that, in 2008, the estimated prevalence of smoking is about 15.6 % in the US but much higher in Europe, where the highest prevalences of smoking are observed in Greece and Austria (44.3 and 39.8 %, respectively) and the lowest are found in Italy and Belgium (19.6 and 19.8). Cross-country variation in smoking can be attributed in part to differences in beliefs among countries with respect to the health consequences of smoking (Cutler and Glaeser 2009). During the past decade, many European countries have exhibited a commitment to both tobacco control action plans and smoke-free legislative initiatives. A failure to fully consider individual perceptions of the consequences of current and former smoking habit could account for the low levels of success of these interventions. This provides a motive for studying perception of the health and mortality risks that are associated with smoking in Europe.

Furthermore, it has been well documented that smoking may produce immediate side effects, such as increases in pulse rate, blood pressure and weight, and that mortality risks are greater for smokers than for never smokers because the probability of the onset of various health issues increases as a result of the prolonged consumption of tobacco. Therefore, in this work, we seek to assess the risk perception of current, former and never smokers and to examine perceptions of the short- and long-term health effects of smoking, estimating a model for individual survival expectations, subjective health and smoking habit. Our approach allows us to understand whether individuals believe that the detrimental short- and long-term effects of smoking are reversible. A belief in the reversibility of these effects would suggest that the true health-related effects of smoking would be underestimated for individuals who eventually quit smoking.Footnote 2 This issue is particularly concerning because anti-tobacco campaigns often disseminate the information that smoking is bad for your health but that quitting cancels out the long-term risks.

Our analysis of heterogeneity in risk perception among smokers must be interpreted in the context of the myopic and rational models of smoking behaviour, which derive from the traditional economic approach and treat smoking as an addictive good.Footnote 3 Myopic and rational models have different implications. Rational individuals smoke only if the benefits of smoking outweigh the costs of smoking, whereas present-oriented (myopic) individuals may be more addicted to smoking than rational individuals. Furthermore, rational addicts smoke more if they expect future prices to fall, whereas myopic addicts do not adjust their behaviours in this manner. Empirical results strongly reject the hypothesis of myopic behaviour and support the model of rational addiction (Becker et al. 1994; Chaloupka 1991). Assuming that the myopic model can simply be nested within the rational model and controlling for unobservable heterogeneity, Arcidiacono et al. (2007) find that both models predict that smoking rates decrease with age; as individuals become older, their health worsens, illnesses occur more frequently, and smoking becomes less attractive. The rational model predicts the occurrence of a sharp decline in smoking rates after the age of 62 that is followed, however, by a significant upward trend in smoking behaviour after a cut-off age of 80. This end-of-life effect can be defined as a ‘rationally myopic’ attitude of older individuals, who expect to die soon and are, therefore, less concerned about the future effects of smoking.

First, in this paper, we specify a simultaneous recursive model for survival expectations, subjective health and smoking behaviour conditional on observed individuals characteristics including valuable information about optimism. We propose a finite mixture approach to account for unobservable factors that might simultaneously influence subjective assessments and smoking, thereby addressing the issues of reverse causation and endogeneity in the relationship between smoking and individuals’ beliefs. Second, we use data on elderly Europeans who responded to the Survey of Health, Ageing and Retirement in Europe (SHARE) and for whom unique information about survival expectations at different target ages has become available since the SHARE data have been released. By contrast, most of the existing evidence about the risk perception of smokers is based on data from the US. To account for heterogeneity in the risk perceptions of respondents, various types of smokers are differentiated by smoking status at the time of the survey and by the duration of smoking habit over the life cycle; this approach defines the appropriate scope for analyses of the hazard of quitting smoking.

We identify two classes, or types, of individuals in the population; these types differ with respect to both observed and unobserved characteristics. On average, compared with the second type, the first type of individuals has higher survival expectations and lower subjective health; it includes a smaller proportion of smokers, and these smokers have smoked for a shorter duration. For the individuals of the first type, smoking is an important predictor of survival expectations, and smokers incorporate the (long-term) risks of smoking duration into their assessments of survival probabilities. However, for both current and former smokers, this effect vanishes as the duration of smoking habit increases. One important result, attributable to present-oriented (myopic) behaviour, is that for both classes, former smokers appear to perceive the harmful consequences of smoking as reversible and, particularly in the second class, overestimate both survival probability and health status. Subjective health assessments are significantly less positive for current smokers than for other survey respondents in both classes.

2 Data and variables

We use data from the first wave (2004) of the SHARE, which is designed in accordance with the approaches of the Health and Retirement Study (HRS) and the English Longitudinal Study of Ageing (ELSA).Footnote 4 The target population of this survey is non-institutionalised individuals aged 50 and older. Spouses are also interviewed. The SHARE provides rich information about not only respondents’ health and lifestyles but also their survival expectations; previous European surveys have not collected survival expectation information. The complete sample consists of 31,115 individuals and features a response rate of about 85 %. For the purpose of our study and because of item non-response in the variables of interest, the sample used in the analysis consists of 20,285 respondents, who are between 50 and 85 years of age, from northern (Denmark and Sweden), central (Austria, Belgium, Germany, France and the Netherlands) and southern Europe (Italy, Spain and Greece).Footnote 5

2.1 The measures of survival expectations, subjective health and smoking

Survival expectations are measured by a numerical indicator of subjective survival probability (SSP), derived from responses to the following question: ‘What are the chances that you will live to be age T or more?’. Responses are driven by a card that reports a sequence of numbers from 0 (‘absolutely no chance’) to 100 (‘absolutely certain’). SSP is, therefore, a continuous random variable, bounded between 0 and 100. The different target ages (T) depend on the age category of each respondent and reflects the fact that life table survival rates do not decline monotonically but increase within each age class. In this paper we consider target ages of 75, 80, 85, 90 and 95.Footnote 6

The SSP question is the ninth out of eleven questions about the predicted probabilities of future events.Footnote 7 Although a warm-up question is asked to help respondents feel at ease with the notion of probabilities, we further increase the reliability of the SSP responses by excluding individuals who provided non-coherent responses to two questions about the probability that standards of living will be better or worse in the future. These individuals are ‘hidden outliers’ who could bias our estimates because they most likely provide unreliable subjective assessments of any future events.Footnote 8

The elicitation of survival expectations through the use of probabilities is typically preferred to the alternative of utilising qualitative responses. Probabilities allow for a better comparison across individuals than qualitative responses; in addition, qualitative responses (such as ‘likely’ or ‘very likely’) may vary based on cognitive, linguistic and cultural differences and typically suffer from response bias. Furthermore, the internal consistency of probabilities can be assessed (Dominitz and Manski 1997; Manski 2004). Another advantage of using a quantitative measure of survival expectations is that it is comparable with both observed mortality data and probabilities computed from life tables. We estimate a probit model for the probability of dying between waves of the SHARE to examine how average SSP vary between survivors and deceased individuals in our sample.Footnote 9 Results, reported in Table B.1 (available in the electronic supplementary material), suggest that survivors report higher SSP (of about 63) than individuals who die between waves (about 44) and confirm that individuals with higher SSP are more likely to survive to the next wave.Footnote 10 SSP is generally considered to be a better predictor of future mortality than objective life table hazard rates (Peracchi and Perotti 2010; Hurd and McGarry 1995; Hurd et al. 1999; Hurd and McGarry 2002). We compare SSP with the Human Mortality Database period life tables for 2004 and find that the average SSP is lower than the average survival probability calculated from life tables at ages 75, 80 and 85 but higher than that calculated at ages 90 and 95 (see Fig. B.1 in the electronic supplementary material). This might capture the fact that SSP most likely depends on both observable and unobservable individual characteristics, not included in life tables, that influence beliefs (such as an individual’s level of optimism).

The primary disadvantage of using SSP is that heaping occurs at certain focal values, such as 0, 50, 100 and values that end with a zero, as shown in Fig. B.1. About 4.1 % of respondents report having no chance to survive to the target age; 25.2 % report an SSP equal to 50, and 15.8 % of respondents claim that they are certain to survive to the target age. Responses are more concentrated at high values (60, 70, 80 and 90). We address this issue in the econometric modelling by selecting an appropriate parametric distribution for SSP.

Survival beliefs are strictly related to individuals’ perception of their own health. In the SHARE questionnaire, respondents are asked ‘How is your health?’ and can answer ‘excellent’, ‘very good’, ‘good’, ‘fair’ or ‘poor’. These categorical responses are assumed to correspond to a continuous latent variable that measures subjective (or perceived) health. The indicator of self-assessed health (SAH) that can be derived from this question has commonly been used as a measure of general health status (see Deaton and Paxson 1998) and is known to be both a good indicator of morbidity and a powerful predictor of future health and mortality (Doorslaer and Gerdtham 2003). The recent literature has focused on the issue of reporting heterogeneity in SAH, which should be accounted for in measurements of health-related socioeconomic inequalities. In this work, we use a binary version of SAH that takes a value of 1 if reported health is excellent, very good or good and 0 otherwise.Footnote 11

Information on smoking habit in the SHARE mainly derives from the questions ‘Have you ever smoked cigarettes, cigars, cigarillos or a pipe daily for a period of at least one year?’ and ‘Do you smoke at the present time?’, which allow us to build three binary indicators that indicate whether respondents have never smoked, are current smokers or are former smokers at the time of the interview. The question ‘For how many years did you smoke?’ provides the required information to construct a duration time variable that indicates the number of years that each respondent has spent smoking. This variable, which is right-censored at the time of the interview for current smokers (i.e. complete spells of smoking are observed only for former smokers), provides us with the appropriate scope to analyse the hazard of quitting.

2.2 Descriptive statistics

As reported in Table 1, where the variables used in the analysis are defined, the average SSP is 62 %; about 72.5 % of respondents report that they are in excellent, very good or good health. Current smokers, who have been smoking for an average of 36.6 years, comprise 20 % of the sample. About 28.8 % of smokers, who had smoked for an average of 22.5 years, have quit by the time of the interview.

Table 1 Sample means and variable definitions

Since a 50 % SSP might reflect ‘epistemic uncertainty’ rather than probabilistic thinking (Bruine de Bruin et al. 2002), or it could represent a genuine survival expectation (Hill et al. 2005), Table 1 also compares individuals with SSP values that are equal to 50 % with individuals who report lower and higher probabilities. The proportion of respondents in better health increases monotonically as we move from the sub-sample with SSPs lower than 50 to the sub-sample with higher SSP values. The same trend is found for disability measures (i.e. gali, adl and iadl). Moreover, as we move from the sub-sample with SSPs lower than 50 to the sub-samples with higher SSPs, the proportions of never smokers, sedentary individuals and obese individuals decrease, whereas the proportion of drinkers increases. However, current smokers are more concentrated in the sub-samples with SSPs that are equal to or greater than 50 %, whereas former smokers are concentrated in the sub-sample with the highest SSPs. This figures might reflect a form of cognitive dissonance that causes smokers to underestimate the negative effects of smoking on their survival (Chapman et al. 1983). As expected, longer smoking durations are associated with lower survival expectations for both current and former smokers. Individuals with poor socioeconomic statuses are concentrated in the sub-sample of individuals who report SSPs lower than 50. The latter individuals have experienced a higher proportion of deaths of their mothers, fathers and spouses than individuals with higher expectations. However, individuals who report higher SSP values had spouses who died at a younger age. Overall, the observable differences across these three sub-groups suggest that a response of a 50 % SSP is likely to reflect a genuine answer.

The observed variation in survival expectations should be in accordance with epidemiological evidence regarding the relationships among mortality risk, health status, smoking and socioeconomic status. Table 2 indicates that for each target age T, the average SSP becomes dramatically lower as the health level decreases and as age increases (that is, as the target age moves from T \(=\) 75 to T \(=\) 95). Surprisingly, relative to never smokers, the average SSP at T \(=\) 75, 80 or 85 is higher for former smokers but lower for current smokers. The average SSP at T \(=\) 90 is lower for former smokers than for current and never smokers. In the overall sample, current and never smokers report higher average SSPs than former smokers. As expected, average SSP increases with income and education.

Table 2 Subjective survival probability in different categories of self-assessed health, smoking status, income and education

Additional analysis, reported in electronic supplementary material, shows that subjective health varies as expected with survival expectations; however, less clear-cut evidence appears on the link between smoking habit, survival expectations and subjective health (Table B.2). A marked relationship between parental death and the respondents’ SSP, suggested by Hurd and McGarry (1995, 2002) emerges (Table B.3). Observed SSPs are also high for individuals with spouses who are still alive (or died during their fifties). These relationships provide motivation for including parental and spouse mortality as explanatory variables in our empirical model.

3 Model and estimation strategy

We propose a simultaneous recursive (triangular) model for survival expectations \((SE_{i})\), subjective health \((H_{i})\) and smoking behaviour \((S_{i})\); in this model, \(SE_{i}\) at any specific age depends on \(H_{i}\) and \(S_{i}\), whereas \(H_{i}\) depends on \(S_{i}\):

$$\begin{aligned} sE_i&= f_E \left( {H_i, S_i, X_i, \mu _E}\right) \\ H_i&= f_H (S_i, X_i, \mu _H)\\ S_i&= f_S \left( {X_i, \mu _S}\right) \end{aligned}$$

Our model, building on Grossman (1972) and Carbone et al. (2005), assumes that individuals assess their survival by weighing up both the direct effects of smoking on mortality risk (the long-term effects) and the indirect effects of smoking on their health (the short-term effects). We focus on two structural equations for reporting survival expectations and subjective health and a reduced-form equation for smoking duration. These three processes also depend on exogenous individual characteristics \((X_i)\) and unobserved factors (\(\mu _E, \mu _H\) and \(\mu _s\), where the latter vector includes unobserved factors which influence survival expectations, subjective health and individual utility). To address unobservable heterogeneity in the econometric model, it is important to account for reverse causation and for endogeneity biases in the relationships among smoking, health perception and survival beliefs. As a result of reverse causation, the perception of particularly poor health and high mortality risks might decrease the probability and duration of smoking behaviours. The perception of smoking risks can be overestimated if this issue is not appropriately addressed. Subjective health and smoking behaviour can be endogenous to survival expectations, and smoking behaviour can be endogenous to subjective health. Endogeneity might arise from unobservable heterogeneity. In fact, unobserved factors (such as genetics, time preferences, risk aversion and the awareness of the health and mortality risks of smoking) might simultaneously affect the formation of survival expectations, reporting health and smoking behaviour.

Our approach to identification relies on the recursive triangular structure of the model, which imposes restrictions on parameters by construction, and non-linearity of functional forms of each equations. Exclusion restrictions, namely the omission of at least one variable in one equation, can be used to achieve more robust identification. We impose exclusion restrictions on each equation. In particular, from the two structural equations for survival expectations and subjective health we exclude an indicator of household composition measuring the number of children who live in the household.Footnote 12 This strategy is discussed in detail in the following pages where the smoking duration model is described.

We model survival expectations using a beta regression modelFootnote 13:

$$\begin{aligned} f\left( {y_1 \left| y_2,t,s,q,x_1,\mu \right. }\right) =\frac{\Gamma \left( {\omega +\tau }\right) }{\Gamma \left( \omega \right) \Gamma \left( \tau \right) }y_1^{\left( {\omega -1}\right) }(1-y_1)^{(\tau -1)} \end{aligned}$$
(1)

where \(y_1\) is (rescaled) SSP, \(t\) is smoking duration, \(s\) is a binary indicator that takes value 1 if the individual has ever started to smoke, \(q\) is a binary indicator that takes value of 1 if she has quit smoking and \(\mu \) represents unobservable heterogeneity. \(\Gamma \) is the gamma distribution, and both \(\omega \) and \(\tau \) are shape parameters that define the precision parameter \(\varphi \). Maximum Likelihood (ML) estimation is used (Paolino 2001). The expected value of SSP is approximated by a logistic model: \(E\left( \mathrm{{SSP}}\right) =\frac{\omega }{\omega +\tau }=\frac{\hbox {exp} (z_1\beta )}{1+\hbox {exp}(z_1\beta )}\), where \(z_1\) includes subjective health, smoking and exogenous covariates \(\left( {x_1}\right) \). In particular, we use two dummy variables to distinctly identify current and former smokers and the interactions of these statuses with the number of years spent smoking.Footnote 14 Exogenous covariates include standard socioeconomic and demographic variables; indicators of lifestyle and objective health; and an indicator of optimism and country dummies. Lifestyle choices (drinking, physical exercise and obesity) can be regarded as health investment decisions that might mitigate smokers’ risk perceptions and, therefore, affect perceived health and survival beliefs.Footnote 15 Our objective measures of health are hospital admissions and tobacco-related diseases in the previous twelve months; they are assumed to influence both SSP and SAH. The SHARE provides valuable information about personality traits that allows us to use a binary indicator of pessimism.Footnote 16 Optimism appears to be central determinant of not only smoking patterns but also perception of the short- and long-term effects of smoking at different stages of the life cycle (more optimistic personalities typically tend to underestimate the risks of smoking). Other specific controls include indicators of parental and spouse mortality and an indicator of numeracy that captures cognitive ability. Moreover, to account for systematic differences in reporting expectations and the fact that respondents are not asked to evaluate their chances of survival for the same number of years, a continuous indicator of the difference between an individual’s current age and target age is included.

Subjective health is modelled with a probit model that describes the probability of an individual reporting that she is in excellent, very good or good health:

$$\begin{aligned} \Pr \left( {y_2 =1\left| t,s,q,x_2,\mu \right. }\right) =\Phi (z_2 \beta ) \end{aligned}$$
(2)

where \(y_2\) is SAH and \(z_2\) includes smoking and exogenous covariates \((x_2)\).Footnote 17 The exogenous covariates are the same standard variables of Eq. (1). Disability indicators (i.e. gali, adl and iadl) are included because they might have a direct effect on perception of health quality and an indirect effect (through SAH) on survival beliefs, as individuals typically adapt quickly to the onset of disabilities and to sudden health changes.

Smoking behaviour is modelled using a two-part specification of the duration model, which implies splitting the sample according to the decision to start smoking (Douglas and Hariharan 1994; Forster and Jones 2001):

$$\begin{aligned}&[\Pr \left( {s=1\left| x_3,\mu \right. }\right) f(t\left| x_4, \mu \right. )]^{s\cdot q} [\Pr \left( {s=1\left| x_3,\mu \right. }\right) S(t\left| x_4, \mu \right. )]^{s\cdot (1-q)}\nonumber \\&\quad [1-\Pr \left( {s=1\left| x_3, \mu \right. }\right) ]^{(1-s)} \end{aligned}$$
(3)

where a probit model for the probability to start smoking, \(\Pr \left( {s=1\left| x_3, \mu \right. }\right) =\Phi \left( {x_3\beta }\right) \) describes the first part and a Weibull distribution, with density \(f\left( {t\left| x_4, \mu \right. }\right) =\lambda \alpha t_i^{\left( {\alpha -1}\right) }\exp \left( {-\lambda t_i^{\alpha }}\right) \) and survival \(S\left( {t\left| x_4, \mu \right. }\right) =\exp \left( {-\lambda t_i^{\alpha }}\right) \), describes the second part of the model, namely the hazard of quitting smoking.Footnote 18 Here \(\alpha \) is the duration dependence parameter, whereas \(\lambda \) is \(\exp \left( {-x_4\beta }\right) \), a function of covariates.Footnote 19 In Eq. (3), \(x_3\) includes only income, education and demographic variables, which are assumed to reflect prior socioeconomic characteristics that influence tobacco consumption; \(x_{4}\) include standard exogenous variables and an indicator of household composition that measures the number of children who live in each respondent’s households.Footnote 20 It is well known that the presence of children in a household might affect individuals’ decisions about their smoking behaviours (see, e.g. Jarvis 1996); furthermore, it is reasonable to conjecture that the presence of children does not influence health perception and survival beliefs. Instruments for smoking behaviour should be correlated with smoking but uncorrelated with SAH and SSP. In the absence of information about past parental smoking habits, which could be a possible candidate for an instrument as discussed by Balia and Jones (2011), it appears difficult to find a good instrument for smoking behaviour over the life cycle (see Adda and Lechene 2012).

The sample likelihood of our recursive model which combines Eq. (1)–(3) is as follows:

$$\begin{aligned} L_i&= f\left( {y_1\left| y_2,t,s,q,x_1,\mu \right. }\right) \cdot \Pr \left( {y_2 =1\left| t,s,q,x_2, \mu \right. }\right) \cdot \left[ {f\left( {t\left| x_4, \mu \right. }\right) }\right. \nonumber \\&\left. {\Pr \left( {s=1\left| x_3,\mu \right. }\right) }\right] ^{s\cdot q}\cdot \left[ {S\left( {t\left| x_4, \mu \right. }\right) \Pr \left( {s=1\left| x_3, \mu \right. }\right) }\right] ^{s\cdot (1-q)}\cdot \nonumber \\&\left[ {1-\Pr \left( {s=1\left| x_3, \mu \right. }\right) }\right] ^{(1-s)} \end{aligned}$$
(4)

In the presence of unobservable heterogeneity \(\left( \mu \right) \) this likelihood is analytically intractable and an appropriate estimation approach is needed. We propose a finite mixture (FM) model that represents unobservable heterogeneity in terms of a finite number of latent classes, namely the types of individuals in the population from which the observed data are drawn (McLachlan and Peel 2000).Footnote 21 Given class membership, response variables are assumed to be independent of one another, so that a single response per individual is sufficient for identifying the model with cross-sectional data. Handling endogeneity in non-linear models can be challenging (Wooldridge 2002; Cameron and Trivedi 2005); FM models have been widely applied for this purpose because they assume the existence of correlated unobservable heterogeneity (see, e.g. Mroz 1999; Gilleskie and Strumpf 2005; Deb and Trivedi 2006).Footnote 22 A maximum likelihood (ML) estimation of our FM model is achieved through the use of the expectation-maximisation (EM) algorithm (see Appendix A in the electronic supplementary material). In principle, the non-linearity of the functional forms allows for each component of the mixture to be identified.

4 Results

4.1 Survival expectations

We discuss results obtained from the two-class model. Estimated beta regression coefficients and significance levels, reported in Table 3, are transformed in odds ratios, which are interpreted as the percentage change from the average SSP of the baseline individual.Footnote 23 Table 4 shows that the baseline individual expects to have a 82 % probability of survival at age 95 in the first latent class (class 1) and 44 % in the second latent class (class 2). Pessimism diminishes baseline survival probabilities, particularly in class 2 where pessimistic individuals exhibit average decreases in SSP of about 35 % (only 15 % among class 1 individuals). As expected, the average SSP decreases in both classes as the distance between the current and target age increases, and increases as the age class becomes younger. In class 1, the average SSPs are 139, 101 and 20 % higher than the baseline SSP for individuals who evaluate their survival at ages 75, 85 and 90; this effect is amplified in class 2 (the average SSPs are about 287, 184 and 33 % higher than the baseline SSP). We also find evidence of a statistically significant gender effect in reporting survival expectations: the baseline SSP decreases of about 8 % for males in both classes.

Table 3 Estimated coefficients from a Finite Mixture Model with two classes \((K = 2)\) for subjective survival probability, subjective health and smoking
Table 4 Estimated odds ratios for subjective survival probability and average partial effects for subjective health

As expected, we find that SAH explains most of the observed variation in SSP. A better perceived health produces SSPs that are greater than in the baseline case by about 38 % for class 1 and 78 % for class 2. This large variation might be attributed to a greater diversity of true states of health for individuals in class 2. In class 1, the average SSP is significantly higher for former smokers (by 14.7 %) than for the baseline individual (a never smoker). The indicators of time spent smoking are statistically significant and negative in class 1, implying that SSP decreases with smoking duration but at a diminishing rate for each additional year spent smoking. This effect is larger for current (\(-\)6 %) than for former smokers (\(-\)4 %). For longer durations differences between current and former smokers tend to disappear: e.g. for durations of longer than 20 years the percentage change in SSP becomes very small (about \(-\)0.2) for both types of smokers, thus indicating the cognitive dissonance of the most addicted smokers.

Furthermore, the positive effect of quitting dominates the negative effect of smoking duration for former smokers. This might suggest that former smokers do not internalise the negative effects of their prior smoking on perceived mortality risk (an indication of a myopic attitude); instead, they reward themselves for quitting by assuming that they have better chances of living for a longer duration than their past smoking behaviours would appear to indicate. This result appears to suggest that the long-term effects of smoking are regarded as reversible. In class 2, only the indicator of smoking duration for former smokers has a statistically significant and positive coefficient. This finding can again be interpreted as a sign of the cognitive dissonance of smokers, who tend to think that lifestyles do not influence their mortality risk. The odds ratios measure a 4.4 % increase in the average SSP of former smokers relative to the baseline individual (a never smoker), but this effect diminishes as the duration of time spent smoking increases. Although the other smoking indicators are not significant, the sign of the coefficients suggests that the occurrence of any prior smoking behaviour decreases the chances of reporting a low SSP.

4.2 Subjective health

The impact of smoking and other individual characteristics on perceived health is measured by the average partial effects (APEs) of covariates on the probability of reporting a good, very good or excellent health status.Footnote 24 Table 4 indicates that pessimistic personalities tend to report poor perceived health in both latent classes. However, the effect of pessimism is stronger in class 2: pessimistic individuals are 11 % less likely than other individuals to report that their SAH is good (6.4 % in class 1). The probability of reporting good SAH increases monotonically with income and education and is higher for retired and employed individuals than for unemployed individuals; the APEs of income quartiles are larger in class 1, whereas the effects of education and occupational status are larger in class 2. Disability indicators have the expected negative effect and the APEs are larger in class 2, where, for example, the probability that SAH \(=\) 1 decreases by 33 % as the number of gali limitations increases. Individuals who have been hospitalised during the prior year tend to report worse health than other individuals: this effect is larger in class 2 (11 %) than in class 1. For both classes, the presence of tobacco-related diseases is related to a 10 % lower probability of reporting good health. All lifestyle indicators are statistically significant in class 1; only alcohol consumption is associated with a higher probability of good SAH. In both classes, obesity and lack of physical exercise are associated with lower probabilities of good SAH. The effect of a sedentary lifestyle is larger in class 1 than in class 2.

The probability of reporting good health is significantly lower for current smokers (by about 10 % in both classes). By contrast, this probability is higher for former smokers (5 % in class 1 vs. 9 % in class 2). Time spent smoking is significant and negative only if interacted with past smoking behaviour; thus producing a reduction in the probability that SAH \(=\) 1 of about 3 % relative to never smokers. Non-linearity in the relationship between subjective health and smoking duration causes this effect to decrease as smoking duration increases. This evidence can again be interpreted as a signal that former smokers hold myopic attitudes and believe that the short-term effects of smoking are offset by the decision to quit (in other words, that the effects of smoking are reversible). The estimated APEs indicate that this finding is slightly stronger in class 2 than in class 1.

Table 5 Post-estimation predictions in hypothetical scenarios

Table 5 shows that the average predicted SSP from the FM model is higher in class 2 than in class 1 (66 vs. 58 %). However, the average predicted probability of SAH \(=\) 1 is slightly higher in class 1 than in class 2 (73 vs. 71 %). We would have expected to find, instead, better SAH in the class of individuals with higher SSP. For each latent class, average predicted SSP and SAH are compared with predictions calculated for hypothetical scenarios involving various smoking behaviours. In the ‘smoking-free scenario’, nobody has ever smoked; in the ‘quitting scenario’, all individuals have quit smoking; and in the ‘smoking scenario’, all individuals are currently smoking. Independently of smoking, the average predicted SSP is always higher in class 2, with the highest value (66.9) in the ‘smoking-free scenario’. In class 1, the highest average value of predicted SSP (57.9) is associated with the ‘quitting scenario’. For both classes, the highest average predicted probability of SAH \(=\) 1 is calculated for the ‘quitting scenario’. In particular, this probability is about 13 and 17 % points greater in the ‘quitting scenario’ than in the ‘smoking scenario’, and about 6 and 10 % points larger than in the ‘smoking-free scenario’, for class 1 and class 2, respectively. For both classes, we predict that the lowest average predicted probability of SAH \(=\) 1 will occur in the ‘smoking scenario’. These simulations appear to suggest that quitting smoking alters the risk perception of smokers, who tend to report better evaluations of survival expectations and health status than current smokers.

4.3 Smoking behaviour

Results in Table 3 show that the propensity to become a smoker is higher for men in both classes. In class 1, this propensity is higher for individuals from the youngest age cohort and is positively related to income and education, but coefficients do not show a clear-cut socioeconomic gradient. Estimates reflect heterogeneity across classes with respect to the hazard of quitting. Smoking duration is predicted to be significantly shorter for richer, retired and employed smokers, particularly in class 1. Educated individuals, notably in class 2, smoke for longer durations but this positive effect is predicted to decrease as education levels increase. In both classes, smokers with an unhealthy lifestyle in terms of drinking and exercise tend to quit later, whereas smokers with bad dietary habits quit more quickly. The effects of obesity and physical exercise are higher in class 2. Class 1 smokers who report that they have experienced tobacco-related diseases tend to quit earlier, but this effect is not observed in class 2. Marriage appears to protect smokers from a long history of tobacco use and this effect is more prominent in class 1. The indicator of pessimism is statistically significant and positive only in class 1, suggesting that pessimistic smokers, who face a low opportunity cost of smoking, are predicted to quit later in life. As reported in Table 5, the predicted probability of starting to smoke is higher in class 2 (0.52) than in class 1 (0.47), and the estimated average smoking duration is also longer in class 2 (39.3 years) than in class 1 (38 years).

The FM model is first estimated with two latent classes \((K=2)\); subsequently, the number of classes is augmented, and the statistical fit of the alternative models is compared through the use of information criteria. The FM model are also compared with the standard ML model that relaxes the assumption of unobservable heterogeneity. Table 6 shows that the FM model with \(K=2\) has a better fit than the ML model. Only the Akaike information criterion (AIC) suggests that the FM model performs better with \(K=3\). The consistent AIC (CAIC) and the Bayesian information criterion (BIC), which penalises the number of estimated parameters more severely than the AIC, are lower in the model with \(K=2\) thus supporting our choice.

Table 6 Comparison of standard Maximum Likelihood model and Finite Mixture models using information criteria

4.4 Posterior analysis

The estimation results reveal that there are two unobservable populations that differ in the hazard of quitting and in the ways in which they formulate survival expectations and report subjective health. We can use the estimated class membership probability to describe the types of individuals who might belong to each class. We utilise a cut-off probability of 0.5 to assign each individual to the class associated with a larger posterior probability and then define a binary indicator of class membership \((d_{i})\) that takes a value of 1 if the posterior probability is above the cut-off probability. Because \(\sum _{k=1}^{K} \pi _{ik} = 1\), this approach is equivalent to stating that individual \(i\) belongs to latent class 1 if the \(\pi _{i1}\) is larger than \(\pi _{i2}\). We estimate a probit model for the class membership probability conditional on the examined covariates and outcomes.Footnote 25 Table 7 shows that individuals who report higher SSPs and current or former smokers tend to have a lower probability of belonging to class 1 (which generally represents less addicted smokers than class 2). Current smokers have a 13 % lower probability of being in class 1 than never or former smokers. The following characteristics also appear to be associated with significantly lower probability of membership in class 1: pessimistic personalities; retirees; married or separated marital status; a lack of regular exercise; the occurrence of at least one hospital admission in the last year; and a deceased father. Individuals who report better SAH but experience limitations in their usual activities due to health problems (gali) are more likely to belong to class 1. Class membership significantly increases with age. This effect is smaller for individuals in age classes 71–75 and 76–80, thus implying that the oldest individuals are more likely to belong to the class of more addicted smokers. This finding, which is in line with Adda and Lechene (2012), could be interpreted as representative of ‘rationally myopic’ behaviour. For elderly individuals, who have relatively little remaining lifespan, the opportunity cost of smoking is low due to the minimal effects on mortality risk.

Table 7 Class membership probability

5 Conclusion

This work aims to assess how smokers’ risk perceptions affect survival expectations and subjective health. The analysis investigates perceptions of the short- and long-term effects of smoking and examines whether smokers believe that the detrimental effects of smoking can be reversed. The empirical literature that addresses this topic, primarily based on US surveys, provides mixed evidence regarding the relationship between smoking and risk perception. In this study, we use data on elderly Europeans from the SHARE to explore the formation of individuals’ self-assessments of their own mortality risk and health. We estimate a joint recursive model for survival expectation, subjective health and smoking duration within a finite mixture approach, which addresses the potential endogeneity of the health and smoking variables and the reverse causation that arise from unobservable heterogeneity. We provide evidence of heterogeneity in assessing subjective survival probability and reporting subjective health, and show that two unobservable types of individuals can be identified in the examined population. After controlling for observed characteristics, including optimism, it appears that the remaining differences in both the smoking patterns and perceptions of smoking effects between the two types of individuals may reasonably depend on unobservable factors, such as information constraints, which may be regarded as limited awareness of the consequences of smoking; risk aversion; time preferences; life experiences; genetics; true health and true mortality risk (unobservable frailty).

We find that the population of the first type is characterised by lower average survival expectation and higher average probability of reporting good health and includes a smaller proportion of smokers than the population of the second type. Because of longer smoking histories, we can refer to the latter population as the individuals who are more addicted to smoking. For the population of the first type, smoking is an important predictor of survival expectations, and smokers incorporate the (long-term) risks of smoking into their assessments of survival probabilities. However, for both current and former smokers, this effect vanishes as smoking duration increases (i.e. for the most addicted individuals). Furthermore, we find that among individuals of the first type, former smokers are present-oriented (myopic) and believe that smoking effects are reversible. In fact, the negative effect of smoking duration on survival beliefs is counterbalanced by a decision to quit smoking; as a result, former smokers largely overestimate their survival probabilities. By contrast, individuals of the second type do not incorporate the negative effects of smoking duration into their survival beliefs; among these individuals, former smokers significantly overestimate their survival expectations, although this occurs to a diminishing degree as smoking duration increases. In both unobservable types of individuals, subjective health is significantly lower for current smokers. Time spent smoking produces a reduction in subjective health only for former smokers, but this effect goes to zero as duration increases. Furthermore, for former smokers, the short-term effects of smoking duration are subjectively offset by the decision to quit smoking. This evidence is common to both classes and is stronger for individuals in the second population type. The observed attitudes regarding risk perception might relate to the fact that smoking involves low opportunity costs for individuals with lower life expectancies; this consideration may be particularly salient for the oldest individuals, who are more concentrated in the second population.

Policy makers who are concerned with the prevention of health problems and the promotion of healthy lifestyles might be interested in knowing whether and to what extent individuals understand the morbidity and mortality consequences of smoking. This paper demonstrates that despite the existence of various national anti-smoking campaigns in Europe, smokers are not necessarily fully aware of the true health risks that smoking produces. This raises the question of whether information about smoking should be more widely disseminated or whether, using a concept from behavioural economics, it should be more salient to improve its impact on reducing smoking.

Furthermore, the perception of the reversibility of the smoking effects reflects the myopic behaviours of former smokers. This issue presents a second question of whether the dissemination of more detailed information is required. Results suggest that the targeting of heterogeneous types of individuals can be an important element of smoking-related policy interventions. Our analysis also reveals that the numerical indicator of subjective survival probability is a complete measure of survival expectations that captures both observable and unobservable factors that influence individual beliefs. The use of this indicator in models of health investments over the life-cycle might be preferred to the use of mortality hazard rates estimated from life tables; this prospect provides ample scope for future research work.