1 Introduction

The current evidence regarding the interplay between social isolation and health indicates that such isolation, sometimes the subjective side of it in the form of loneliness, can be devastating to a person’s health. According to recent studies, feeling lonely and being socially isolated (i.e., lacking social connections) are at least as bad for one’s health as being obese or a heavy smoker (Holt-Lunstad et al. 2015). Ever since social sciences (Berkman and Syme 1979) revealed the importance of social ties as predictors of survival for persons aged under 70 at baseline, the impact of loneliness and social isolation on health and mortality has been increasingly investigated in public health, medicine, epidemiology, gerontology, and other health-related disciplines. The bulk of evidence has pointed at social isolation and loneliness being linked to a variety of physical and mental conditions such as high blood pressure, cardiovascular diseases, obesity, a weakened immune system, anxiety, depression, fatigue, pain, cognitive decline, Alzheimer’s disease, and even death (Steptoe et al. 2013; Cohen et al. 1997; Shankar et al. 2013; Heffner et al. 2011; Yu et al. 2020; Teguo et al. 2016; Powell et al. 2021). In contrast, the economics literature on the topic has remained relatively scarce.

According to one meta-analysis of scientific literature on the subject (Holt-Lunstad et al. 2015), social isolation, that is, having few network ties, increases the risk of death over 7 years by approximately 30%, while the effect of loneliness is estimated at approximately 26%, and living alone seems to be the utmost risk factor with a weighted average effect of 32%.Footnote 1 That meta-analysis also reviews a number of previous studies that showed that individuals with less social connection have disrupted sleep patterns, altered immune systems, more inflammation, and higher levels of stress hormones. Another recent meta-analysis Valtorta et al. (2016) found that social isolation increases the risk of heart disease by 29% and stroke by 32%.

These meta-analyses report results from a variety of articles that do not share a common level of rigor. For example, out of the 70 studies analyzed in Holt-Lunstad et al. (2015), 31 are fully “unadjusted,” meaning that they include no control of any sort, and 20% of the remaining studies fail to control for baseline health, which, according to the meta-analyses, radically changes the findings. The remaining multivariate analyses that do control for baseline health and other factors rarely have background data on individuals and are seldom based on random samples since participants are often recruited from a medical setting. Even when studies recruit participants from the general community, they rarely collect as much information as gathered in multi-disciplinary surveys such as the data used in our study and cannot claim to be fully representative.

We rely on longitudinal data on a large representative population across 21 European countries to investigate the association between social isolation and mortality and health.Footnote 2 Data from the Survey of Health, Ageing and Retirement in Europe (SHARE) enables us to follow individuals across time and mitigate part of the endogeneity concerns; it provides us with a comprehensive set of health indicators and social isolation and loneliness variables, which are observed every 2 years over 18 years, from 2004 to 2021. A few studies have exploited similar data, such as the American Health and Retirement Study (HRS) and its UK equivalent, the English Longitudinal Study of Ageing (ELSA), to look at correlations between social isolation, loneliness, and mortality or a specific health outcome. One noteworthy study, by Steptoe et al. (2013), uses ELSA to investigate how social isolation and loneliness at baseline are associated with mortality over a 7-year follow-up period.

Relative to those studies that use a representative longitudinal dataset, we make several contributions. First, we present a rigorous analysis of the effects of social isolation (SI) at baseline on mortality over a long follow-up period, in a harmonized multi-country framework, and produce novel findings on the heterogeneous effects of SI across country groups.

Second, in addition to the baseline health controls included in the main mortality analysis, we explore health behaviors, healthcare utilization, loneliness, and a poor social network as additional potential mediators of the relationship between SI and mortality.

Third, we run linear regressions to study the dynamic “value-added” effects of SI on health and other mediator outcomes, that is, the effect of baseline SI on all observable dimensions of health, health behavior, and healthcare utilization in each future wave when controlling for their baseline levels. The Cox analysis is then combined with the dynamic regressions to compute a simple metric of how much of the SI effect on mortality can be attributed to the one-wave-ahead detrimental effect of baseline SI on each dimension of health, health behavior, and healthcare utilization. This metric can serve as a guide toward understanding where it is most necessary to intervene in order to curb the detrimental effects of social isolation.

We find a significant and robust association between our social isolation index and mortality. Heterogeneous effects of social isolation on mortality across countries are revealed as a major finding. The impact of social isolation on the elderly may possess a cultural and/or institutional dimension, which should be examined in a cross-national framework. A much stronger association is found between social isolation and mortality in Eastern countries. While all of our pooled countries estimates range from a 20 to 30% increase in the mortality hazard for the socially isolated, this number jumps to 45% for Eastern countries. This objective measure of social isolation does not produce the same health consequences across countries, albeit using harmonized data, which suggests that public health policies have a role to play in moderating the health risks posed by social isolation.

Remarkably, controlling for loneliness barely weakens the relationship between our social isolation index and mortality, which is also true in Steptoe et al. (2013). This suggests that loneliness cannot be the only mechanism through which social isolation affects health. While we find that socially isolated individuals are more likely to adopt a worse lifestyle (particularly in terms of physical inactivity), the inclusion of unhealthy behavior measures at baseline in our regressions does not affect the coefficient on social isolation. Likewise, healthcare utilization does not appear as a major channel for the effect of social isolation on future health. On the one hand, we find that socially isolated individuals do not use more healthcare services than their non-socially-isolated counterparts, with the exception of prescription drug consumption. This holds in spite of the fact that the health of the socially isolated continues to worsen, which suggests that social isolation might inhibit the use of some healthcare services. However, on the other hand, the inclusion of measures of current healthcare use in our regressions incurs no effect on the coefficient on social isolation.

We also find that social isolation at baseline correlates with worsening health in the subsequent waves along all the dimensions under observation, and these effects are persistent. Up to 13% of the effect of baseline social isolation on mortality can be attributed to the combined one-wave-ahead impact of social isolation on increased frailty, reduced cognitive function, and increased smoking.

Given our careful empirical strategy and multiple robustness checks, we believe that a causal interpretation of our findings is plausible. However, we acknowledge the need to be cautious regarding causality because our study is observational and does not make use of any so-called natural experiment.Footnote 3 As an additional check to provide support for a causal relationship, we use Oster’s test for selection on unobservables (Oster 2019). We also use education as a benchmark for the health effects of social isolation. We find the education gradient in mortality is smaller than the social isolation gradient, but the association of education with future health is stronger than the association we find for social isolation in dynamic value-added regressions.

The rest of the paper is structured as follows: Section 2 establishes a link between social isolation at baseline and future mortality while controlling for a rich set of socioeconomic and health indicators that cover physical, functional, mental, and cognitive health at baseline. In Section 3, additional potential mediators of the social isolation-mortality association are explored by adding them as baseline controls in the Cox regressions. Section 4 presents the dynamic regressions of health and other mediating variables on the SI index, with the same baseline controls as in the Cox regression and the baseline value of the mediating variable. We present a metric to measure how much of the SI effects on mortality can be attributed to the effect of SI at baseline on each dimension of health, health behaviors, and healthcare utilization, one survey wave ahead. Section 5 discusses the causality challenge, Oster’s test of selection on unobservables, and education as a benchmark of the SI effects on mortality and health outcomes. Lastly, Section 6 provides the conclusions.

2 Social isolation and mortality

2.1 Data

We use longitudinal survey data from the Survey of Health, Ageing and Retirement in Europe (SHARE) that covers 8 waves from 2004 to 2019, plus the two “Corona” waves of Spring 2020 and Summer 2021, across 20 European countries plus Israel. This multidisciplinary and cross-national panel database holds micro data on health, socio-economic status, and social and family networks on approximately 140,000 individuals aged 50 and over (around 530,000 interviews). Our sample is made of 67,676 non-institutionalized individuals (i.e., those not living in a nursing home, at least at baseline), corresponding to 243,515 observations, whom we observe at least twice over the 10 waves (the second observation might be an “exit” interview, i.e., a post-mortem interview), and whose information on the set of variables used herein is available (see Table 2 for more details).Footnote 4 Individuals enter the study in any wave between wave 1 (in 2004) and wave 6 (in 2015) and are followed for a minimum period of 24 months, due to the minimum follow-up restriction imposed.Footnote 5 The maximum follow-up time is 207 months (i.e., 17 years and 3 months). The median follow-up time is 79 months (i.e., 6 years and 7 months). Over the course of our study, 9802 deaths, are observed, which corresponds to 14% of our sample.

In order to carry out our empirical strategy, we create a set of health indicators that covers physical, functional, mental, and cognitive health. Physical health is investigated along several lines: objective (i.e., number of diagnosed chronic diseases) and subjective (self-assessed health status); focusing on functional health (Activities of Daily Living (ADLs) and Instrumental Activities of Daily Living (IADLs)); and constructing an index of frailty (Fried et al. 2001), which aggregates unintentional weight loss, self-reported exhaustion, weakness (grip strength), difficulties in walking, and low physical activity. Mental health is summarized by the EURO-D score, which is the sum of 12 items that can be relied on to diagnose depression in older adults, such as suicidal thoughts, sadness, no hopes for the future, excessive guilt, sleep issues, fatigue, irritability, loss of appetite, tearfulness, concentration issues, lack of enjoyment, and difficulties in maintaining interest in things. Cognitive functioning is an average of immediate and delayed word recall (i.e., the number of words an individual is able to remember out of a list of ten words). We also make use of the so-called “exit interviews,” which allow us to keep track of respondents’ death.

Regarding the key regressor, following Steptoe et al. (2013), we create a social isolation index including information on whether the individual lives alone, has infrequent/no social contact with their children (less than weekly contact, or does not have children), and does not participate in any social activities (including political, sports, educational, and voluntary work activities).Footnote 6 The resulting index lies between 0 and 3, with a 1.03 mean and 0.79 standard deviation for our sample at baseline. Higher values indicate increasing social isolation. Table 1 displays the distribution of our SI index. In our sample, 20% of individuals live alone, 61% participate in no social activities, and 20% have infrequent/no contact with their children (13% due to not having children). More than half our sample is massed at SI = 1, and very few individuals have the maximum value of 3 (4%). Interestingly, the jump between a 0 and a 1 value of the social isolation index is largely driven by the non-participation in social activities, while the jump from 1 to 2 is due to a shift in both the other two components of living alone and having infrequent/no contact with children (or not having any children).

Table 1 Distribution of the social isolation (SI) index and its components
Table 2 Descriptive statistics on observables, by social isolation status

The short form of the R-UCLA loneliness scale is also utilized. This was created by aggregating 3 items (how much of the time they felt a sense of being left out, the lack of companionship, and isolation) into one single measure of loneliness.Footnote 7 The effects on mortality, both of objective social isolation and of the perception of social isolation, can therefore be studied.

Table 2 provides descriptive statistics at baseline (when individuals enter the data) on the SI-related variables, the aforementioned health outcomes, and the socio-demographic controls that are to be used in our analyses. It also puts forward the major differences between two populations, those who are not socially isolated at all according to our index, and those with at least 1 social isolation point. The more socially isolated exhibit worse health measurements along all dimensions (note that cognitive functioning is the only health measure where a higher value means better health), and are more likely to be female, less educated, and childless than the non-socially-isolated. Regarding gender and social isolation, the reality is more complex, since women become socially isolated at a higher rate than men as they grow older (past 60), which is likely to reflect gender imbalances in the probability of being widowed after the age of 60. We also add an employment indicator since whether one is working or retired may explain a major part of social isolation for the over-50 s population.

Figure 1 and A2 (the latter in the Appendix) show that there is a great deal of variation in social isolation across countries, whereby Eastern and Southern Europe countries have the highest average levels of social isolation, and Western and Northern Europe the lowest. Since the correlation between loneliness and social isolation at the individual level is a mere 24%, it seems that objective and subjective measures of social isolation capture different aspects of social experience, as suggested in Hughes et al. (2004). As a consequence, several countries, such as Italy, Greece, and Israel, have very high levels of loneliness compared to other countries with a similar level of social isolation, while others, such as Switzerland, Austria, and Denmark, have very low rates of loneliness in comparison with other countries with a similar level of social isolation.

Fig. 1
figure 1

Social isolation and loneliness across Europe (18 countries)

2.2 Main results from Cox models

We first look at how social isolation at baseline is associated with future mortality, by estimating Cox proportional hazards regression models, from the date an individual enters the data (February 2004 at the earliest) until that individual dies or is followed up in subsequent surveys up to July 2021. Out of the 67,676 followed individuals, 9802 died over the period. Although our longitudinal data would allow us to let our explanatory variables vary across time, we keep them fixed at baseline in order to introduce some distance between the measurement of covariates, more particularly social isolation, and the outcome of mortality. In our preferred specification, a minimum of 24 months of follow-up is imposed between the moment social isolation is measured, and mortality, following the robustness checks performed in Steptoe et al. (2013): “we repeated the analysis excluding deaths within 24 months of baseline, and the results were very similar results to those for the full cohort, suggesting that existing terminal illness is not the primary explanation.” This specification helps to alleviate concerns about reverse causality.

A potential concern may arise as to how stable our measure of social isolation is across time: if SI were to vary widely from one wave to another, then picking its first observation ad hoc might lead us to overestimate (or underestimate) the health effects of SI, if that observation was particularly low (or high) that precise year.

Evidence in favor of the stability over time of the SI index is presented in Fig. A1, where each line represents the average SI over time for individuals who were followed for two waves, three waves, and so on, up to seven waves. Out of the 67,676 individuals who enter our survival analysis, 60,454 are represented in this graph: those who are not represented have an SI value at baseline and (an)other value(s) at some future waves. Table A1 provides more detail on how many individuals have only a baseline SI index, how many have an SI index at \(t_{0}\) and \(t_{0}+1\), how many at \(t_{0}\), \(t_{0}+1\), and \(t_{0}+2\), and so on until \(t_{0}\) to \(t_{0}+6\). We can observe at most 7 values of the SI index, that is, Waves 1 to 8 inclusive, except for wave 3. The items that are part of the SI index were not present in the questionnaires of the two “Corona” waves.

This graph informs us about two things: (i) Those who “disappear” earlier have a (slightly) higher social isolation index than those who are followed over 4, 5, and 6 consecutive waves. Since higher social isolation leads to higher mortality, it is reasonable that the “survivors” exhibit lower social isolation; (ii) nevertheless, for both categories, the SI index seems quite stable over time.

Another way to ascertain the stability of the (binary) SI index over time is by looking at transitions into and out of being socially isolated between t and \(t+1\), as shown in Table A2. Those who are socially isolated at t remain socially isolated at \(t+1\) with an almost 90% probability, while those who are not socially isolated at t remain in that state with an almost 70% probability. The transition from not being socially isolated to being socially isolated stands at approximately 30%, while the transition from being socially isolated to not being socially isolated is lower, at around 11%.

We estimate the following Cox model:

$$\begin{aligned} h(i,t|SI_{i,t_{0}},X_{i,t_{0}})=h_{0}(t)exp(SI_{i,t_{0}} \beta + X_{i,t_{0}} \gamma ) \end{aligned}$$
(1)

The hazard of dying, h, at time t is a function of a fully flexible baseline hazard \(h_{0}\) common to all individuals in our sample, which is shifted proportionally upwards or downwards by social isolation SI at baseline and by the individual characteristics \(X_{i}\) introduced in the model at baseline \(t_{0}\). A total of 5 models are fitted, each adding further constraints to the relationship between social isolation and mortality.

Our results in Table 3 fall closely in line with those of Steptoe et al. (2013), who find a hazard ratio for a comparable discrete social isolation index that varies between 1.50 and 1.26. Our first two models suffer from obvious omitted-variable bias since no background information on the individual is included in Model (1), and no health information is included at baseline in Model (2), while gender, age, and health at baseline are potentially both correlated with social isolation at baseline (it is hard to be socially connected to people when in bad health, for instance) and future mortality. The results are displayed in columns (1) and (2). The point estimates of the hazard ratio for the social isolation index, in its continuous version from 0 to 3, decrease from 1.21 to 1.13 when controlling for all health indicators at baseline, since part of the association between social isolation and future health passes through initial health conditions, although the coefficient remains both large and strongly significant. Models (4) and (5) restrict follow-up to individuals who are still alive after the first 24 months, as a way to mitigate reverse causality, thereby ensuring that no life-threatening conditions of our respondents had been overlooked by our observable covariates at baseline. This restriction poses no challenge to our estimate. The comparison of individuals with at least 1 point of social isolation to those with zero point (column (5): the social isolation index is a binary variable) leads to a higher hazard ratio than looking at the effect of one extra point of social isolation (column (4)), meaning that a change from 0 to 1, 0 to 2, or 0 to 3, has more impact than the average increase of one point of SI over all the possibilities of SI increasing by 1. In all the following analyses, we adopt the latter, with a binary SI index and the exclusion of follow-up times shorter than 24 months.

Table 3 Cox models: effect of social isolation at baseline on mortality up to Summer 2021 (SHARE second Covid-19 survey)

2.3 Sensitivity checks

A number of sensitivity checks are performed: (1) to open the black box of the SI index and check whether a specific component is driving most of the SI effects on mortality; (2) to make sure that our results are not driven by a specific subsample; (3) and to verify whether marital status is a confounder in our analysis. (This variable has remained excluded due to its high correlation with living alone).

Since the three components that make the SI index may reflect different dimensions of social interaction and support, evidence regarding the relative importance of each component is provided. In Table A3 (in the Appendix), the results of Cox regressions are displayed that include each of the eight possible combinations of the three items. The only item that does not have any significant impact on mortality by itself is the factor “living alone.” Even considered jointly with one or two other SI items, living alone does not seem to add (at least significantly) to the effect of the other items. For instance, living alone and not participating in any associations jointly lead to a 24% increase in the mortality hazard compared with not being socially isolated at all (SI=0), although the single factor of not participating in any associations leads to a 28% greater hazard. When associated with infrequent contact with children, although living alone seems to have a greater impact than when considering “few contacts” alone, there is no statistical difference between the two estimated hazard ratios. Both non-participation in associational activities and infrequent contact with children (or no children), considered separately, exert a major impact on mortality, with non-participation in associational activities possibly carrying even greater weight and the most widespread presence across the sample, with 41% of our sample possessing only that item). When considered jointly, whether this means a value of 2 or of 3 for the SI, the impact of these factors on mortality is even stronger (with the hazard being shifted upwards by 42%). This exercise is informative as to the contribution to the index of each component and hence is considered useful in terms of policy implications since it enables a better understanding of which variables should be targeted by policymakers: a policy aiming at incentivizing older individuals to participate in associational activities (on which local authorities, for instance, have agency, in contrast to the frequency of contact with one’s children), might lead to curbing at least one of the two most harmful dimensions of SI for individuals’ health.

Second, in order to ensure our results are not driven by a specific subsample, we estimate our specification (column (5) of Table 3) on several subgroups. The main concern is related to the construction of the social isolation index: if a particular population is more likely to have less contact with their children (say, males), to live alone (unmarried individuals), or to participate in social activities (working versus retired individuals), then the results found across the whole representative sample could be misleading. Since these characteristics are already controlled for in our regressions, this is less of a concern, but the results of this sensitivity analysis are still displayed in Table A4 Panel A (in the Appendix). Apart from a few exceptions, such as the employed who have a higher mortality risk associated with social isolation than the non-employed (hazard ratios of 1.49 against 1.22), the hazard ratio remains remarkably stable at approximately 1.22\(-\)1.29 across almost all subsamples. Individuals with no children are mechanically assigned to the “socially isolated” group when the SI index is binary and defined as “\(SI>0\),” since the item “infrequent contact with children” is set to 1 for individuals with no children. Hence, the coefficient of the (binary) SI index cannot be identified for the childless.

One solution towards ascertaining whether SI affects mortality differentially for the childless and those with children is to re-run these regressions using the continuous index of social isolation: we learn how a one-unit increase of the SI index affects the two groups, albeit that the estimate of the impact of the SI index cannot incorporate the effect of going from 0 to 1 for the childless group. Table A4 Panel B shows the impact of the (continuous) SI index is remarkably stable across the childless and “with children” subsamples. One difference with the binary case is the difference between the married and non-married individuals; when the continuous measure is used, those married seem to be more at risk of mortality when more socially isolated than those who are not married. Again, this could be due to the married being much less socially isolated than the unmarried (an SI average of 0.79 vs. 1.71), partly due to the very high correlation between one of the SI components, that of living alone, and the SI index.

Hitherto we have not controlled for marital status due to its very high correlation with “living alone” (between 75 and 83% depending on the definition of marital status). This might still constitute a major confounder, and hence, it is checked by re-running the main Cox estimation: (1) using marital status instead of “living alone” in the definition of the SI index (columns (2) and (3) of Table A5) 2) controlling for marital status in the main regression (columns (4) and (5)); and while 3) adding marital status together with the other two items in a regression (columns (6) and (7)). We define being married as “married and living with spouse” or “in a registered partnership,” versus “married, not living with spouse,” “never married,” “divorced,” and “widowed”; we define “in a couple” as being in a couple and living with one’s partner regardless of the official marital status.

As shown in Table A5 in the Appendix, our SI index (column (1)) seems to have a greater impact on mortality than when replacing “living alone” with marital status (columns (2) and (3)), which was not as straightforward as it seemed since the “living alone” item had no particular relevance in the SI index. Second, controlling for either version of marital status does not change anything regarding the estimated impact of our SI index. Third, hardly any evidence can be found of the well-known protective effect of being married on individuals’ health, which we interpret as evidence that our baseline controls do a good job at capturing health at baseline. When dropping these (columns (8) and (9)), married individuals face a 7% lower hazard of death over the follow-up period, which is close to the lower bound of the effects of marital status found in the literature (12% according to a meta-analysis consisting of more than 250,000 elderly subjects (Manzoli et al. 2007)), as one could expect given the richness of the set of baseline controls used herein. All in all, marital status does not seem to drive the effect of social isolation on mortality.

2.4 Country heterogeneity

One of the most unique features of the SHARE datasets is that it is harmonized across all of Europe, and therefore, we can study how social isolation affects health and mortality differentially across countries. We group countries into four culturally and geographically consistent subgroups: Western (Austria, Germany, the Netherlands, France, Switzerland, Belgium, and Luxembourg); Northern (Sweden and Denmark); Southern (Spain, Italy, Greece, and Portugal); and Eastern countries (Czech Republic, Poland, Hungary, Slovenia, Estonia, and Croatia). Ireland and Israel are excluded from this part of the analysis.

A first look at heterogeneity across these four groups of countries (see Table 4, columns (1) and (2)) suggests that the hazard ratio found for the socially isolated against the non-socially-isolated conceals major differences across countries. While in Western, Northern, and Southern countries, social isolation (defined as having at least one social isolation point) is associated with a 1.19 hazard ratio (whereby Northern countries exhibit a higher HR but do not differ statistically regarding traditional thresholds), the social isolation HR is much higher in Eastern countries (1.45=1.19x1.22). In other words, social isolation has a similar impact in Western, Southern, and, to a certain extent, in Northern countries, but there is a very strong and significant difference between these countries and Eastern countries. Columns (3) to (5) introduce each country group against the other three in order to ascertain whether any other pattern would appear when pooling more countries together in the reference group. Again, only in Eastern countries are the socially isolated at a greater risk of mortality compared with other countries, and strikingly so: in all non-Eastern countries, being socially isolated is associated with a 23% extra likelihood of dying over the follow-up period; in Eastern countries that likelihood increases by a further 18%.

Table 4 Cox models: Country heterogeneity of the impact of social isolation at baseline on mortality up to Summer 2021 (SHARE second Covid-19 survey)

This finding is important to the extent that if the same level of social isolation is associated with different mortality hazards across countries, then there may be room for public policies to weaken that association. Our results provide no indication of what makes people in Eastern countries more vulnerable when they are socially isolated, but our model enables us to rule out several hypotheses: at the individual level, we control for income and wealth quartiles within country and also for educational attainment. The focus is therefore on the effect of social isolation for individuals with a similar socio-economic status. Furthermore, indicator variables are included for country groups (or even for each country when not looking specifically at the effect of a certain group of countries), and hence, whatever may make individuals more or less healthy in a country (such as aggregate economic conditions, and generosity of the healthcare system) is already captured by these indicators. If these country-specific characteristics are still reflected in the interaction term, then it would mean that socially isolated individuals are more vulnerable to aggregate economic or healthcare conditions than non-socially-isolated individuals, even when controlling for their income and health.

We attempt to shed light on the specific aspects of Eastern countries that could explain this result by first re-estimating the same model separately for each country: despite the lack of power to find significant results, this re-estimation enables us to ascertain whether a particular country might be driving the results of its group. As shown in Table A6 (in the Appendix), Poland, for instance, where the socially isolated are 83% more likely to die over the period than the non-socially-isolated, could simply be an outlier. With a significant hazard ratio of 0.59, Portugal could also be the reason why the socially isolated in Eastern countries suffer a higher mortality hazard over the period than do the socially isolated in Southern countries. The Cox regressions are then re-estimated with interactions presented in Table 4 (more specifically Column (2)) by excluding one country at a time, in order to check whether the hazard ratio for the interaction is stable or whether an outlier country might be driving our result for Eastern countries. Table A7 (in the Appendix) confirms that regardless of which country is excluded from the analysis, the mortality pattern of the socially isolated in Western, Southern, and Northern countries does not differ, while Eastern countries face an additional mortality hazard of between 17 and 27% for the socially isolated.

In addition to studying differences in the observables between the four groups of countries in Table A8, we also look into potential cultural and policy factors, using data from the European Social Survey (ESS). Regarding social isolation, Eastern and Southern countries are much more socially isolated than Northern and Western countries, mostly due to their high rate of non-participation in associational activities; they are also lonelier and less connected in terms of social networks (using the scale defined in Section 3).

Both Eastern and Southern countries also perform worse than Western and Northern countries across all health dimensions, with Eastern countries in a worse situation, especially in terms of self-assessed health, number of chronic diseases, and number of limitations, but they do much better than Southern countries in terms of cognitive functioning. Notably, our samples of Eastern and Southern countries differ greatly along the education dimension, with older Eastern Europeans being much more educated.

Eastern countries seem to differ from the rest of the countries due to a combination of high social isolation and bad health, although they remain similar to Southern countries in many features. One potential explanation for these heterogeneous mortality effects of SI is that, conditional on older individuals suffering from poorer health in both Southern and Eastern countries, Eastern healthcare systems might be worse than those of Southern countries.

Table A9 shows suggestive evidence that this could be the case: while the proportion of people who declare they have suffered from symptoms of depression that lasted at least 2 weeks remains remarkably stable across the four groups of countries (approximately 26%), the share of those who have ever been treated for depression by a doctor or psychiatrist (amongst those who have ever been depressed) is much lower in Eastern countries (40% against 54% in Southern countries). The European Social Survey (ESS) also points in the same direction, with individuals from Eastern countries rating the “state of health services in [their] countries nowadays” as worse than in the rest of the countries.

Another possibility is that our social isolation index lacks the friendship dimension that is present in Steptoe et al. (2013) and that the frequency of meeting friends is positively correlated with other items of our index. The proportion of individuals who never socialize with their friends, relatives, or colleagues is much higher in Eastern Europe (24%) than in Southern Europe (19%), Western (10%), and Northern countries (5%). If this constitutes an important dimension of social isolation and if it correlates with, for instance, participation in associational activities or frequency of contact with children, then what we capture in the interaction with Eastern Europe could actually be due to this precise dimension.

3 Pathways from social isolation to death: loneliness, social connectedness, healthcare utilization, and health behavior

The association we uncovered between social isolation and mortality was found to be robust not only to several definitions of the social isolation index, but also to the inclusion of all the confounders suspected of being correlated both with social isolation and mortality (such as health at baseline, education, income and wealth, and country of residence), to the exclusion of the first 24 months after baseline and to restriction to several subgroups.

Once this relationship is established, the main question becomes that of the underlying mechanisms. Loneliness, social connectedness, health behavior, and healthcare utilization are all studied in turn as potential mediators in the relationship between social isolation and health.

How might social isolation lead to adverse health outcomes? Apart from the biological pathways, whereby the inflammatory and antiviral processes are suspected to be at the very core of this association (see Leschak and Eisenberger (2019)), health behavior (such as smoking, drinking, and a sedentary lifestyle), and healthcare utilization (in the form of doctor visits, preventive screenings, etc.) might explain why social isolation is so monotonically associated with worse health. These two mechanisms are in turn very much linked with one’s social network, as “individuals who are socially engaged and connected are exposed to stronger normative pressures from and control by friends and loved ones to perform healthy behaviors and to access healthcare when needed” (Cacioppo and Hawkley 2003). We therefore create an index of “connectedness,” which takes into account both the quantity and quality of one’s social network, and check whether either of these three pathways mediates the association between social isolation and mortality.Footnote 8 It is also often argued that perceived social isolation, which is also referred to as the feeling of loneliness, could be the channel through which objective social isolation influences mortality. Another question is: what contributes the most to an individual’s (bad) health: the objective or the subjective dimension of social isolation? We hence test whether loneliness mediates or mitigates this relationship.

Our preferred specification is first estimated in which the RUCLA scale of loneliness is added as a control. Regardless of whether we include it as a binary or continuous variable (columns (1) to (3) of Table 5), the hazard ratio of the social isolation index remains unchanged, even though loneliness by itself is positively and significantly associated with mortality. If we allow the loneliness measure to vary over time (column (4)), instead of being fixed at baseline, then the impact of the SI index drops from 27 to 20%, but reverse causality is also more of a concern as there is less time between loneliness measured and death observed. It therefore seems that loneliness assumes little of the explanatory power of social isolation.

Table 5 Cox models: does loneliness mediate the association between social isolation and mortality?

A social connectedness scale is subsequently constructed to test whether a poor social network could mediate the relationship between social isolation and mortality. Rich information from the social network modules introduced in waves 4 and 6 is utilized, in which respondents are asked to name up to 7 confidants, or people with whom [they] most often discussed important things, and to provide information about their relations to each one thereof. Following Malter and Börsch-Supan (2017), our measure of social connectedness uses (1) the number of people cited (network size); (2) the number of cited social network members living within a proximity of 25 km; (3) the number of cited persons with weekly or more contact (contact frequency); (4) the number of cited persons with very or extremely close emotional ties (support); and (5) the number of different types of relationships present within the network (diversity).Footnote 9 The connectedness scale is then reversed into a “disconnectedness” scale. The resulting scale lies between 0 and 4 (with a mean of 2), with higher values indicating a poorer social network. As shown in Table 6, similarly to the loneliness scale, social network disconnectedness is associated with higher mortality, but does not seem to mediate the relationship between social isolation and mortality, since it barely takes away anything from the impact of SI on mortality.

Table 6 Cox models: does social disconnectedness mediate the association between social isolation and mortality?

Health behavior is then explored as a potential pathway between social isolation and higher mortality: socially isolated individuals may have worse lifestyles in terms of smoking and drinking more and moving less, which could put them at a higher risk of mortality. When adding health behavior information into the model (column (2) of Table 7), the hazard ratio of the social isolation index drops slightly from 1.23 to 1.19. All three variables come up as highly significant: smoking at baseline increases the mortality hazard by 83%, while being sedentary at baseline (i.e., never engaging in vigorous or moderate activity) raises this rate by 14%. Our measure of alcohol consumption, on the other hand, fails to capture the harmful effects of alcohol, since it is associated with lower mortality. This has to be either due to the definition used, that is, having drunk any alcohol over the last 3 months or the last 7 days depending on the wave, or due to the fact that by controlling for health and socio-economic status, alcohol can be associated with positive outcomes. Had we been able to observe more extreme forms of alcohol consumption, such as binge drinking and alcoholism (which is included from Wave 4), then the result might have been different.

Table 7 Cox models: do healthcare and health behavior mediate the association between social isolation and mortality?

Regarding healthcare utilization as another potential pathway, the addition of healthcare utilization information (column (3)) appears not to alter the SI coefficient whatsoever. Neither the number of medicines an individual takes at baseline nor the number of doctor visits one has had over the last 12 months, seem to matter once health is taken into account (when health controls are not included in the equation, then the number of medicines does). In contrast, having stayed overnight at a hospital over the last 12 months is associated with a 13% higher hazard, even though all observable dimensions of health are controlled for at baseline. This latter finding highlights one dimension that is not well captured by our health controls, that is, the severity of one’s condition: although self-assessed health and the number of chronic diseases are controlled for, the severity of the illness of an individual might be better proxied by adding the number of overnight stays in hospital in the past 12 months.

Lastly, when these potential “mediators” are allowed to vary wave by wave, then they tend to take away more of the SI impact on mortality, since sedentarism and hospital stays gain greater importance, which is consistent with them capturing some unobserved part of the health deterioration process. Although loneliness, social disconnectedness and health behavior show some correlation with both social isolation and mortality, none of these appear as major channels of the association between social isolation and mortality. The following section studies how social isolation affects the dynamics of health (in its observable dimensions), health behavior, and healthcare utilization, and assesses how much of its impact on mortality is directed through each of these dimensions.

4 The dynamic impact of social isolation on health, health behavior, and healthcare utilization outcomes

4.1 Health outcomes

As a second step in digging into the potential pathways from social isolation to health, the dynamics of the association between social isolation and all relevant dimensions of health are explored, some of which should show a significant decline (since social isolation leads to heightened mortality). As before, the focus is on major health indicators belonging to both the objective and subjective health spectrum and summarizing all relevant dimensions of health: physical (including frailty and functional health/limitations), mental, and cognitive health. Our sample is exactly the same as before but it should be borne in mind that the number of observations decreases over time as 41,821 individuals form part of our sample, observed through 2 consecutive waves, while only 7506 are observed 6 waves after their entry (which does not imply participating in all waves in between).

We estimate the following equations and plot the relevant coefficient, \(alpha_{1}\), in the dynamic graphs displayed in Fig. 2:

$$\begin{aligned} Health_{i,t+j}=\alpha _{0}+\alpha _{1} SI_{i,t}+\alpha _{2} Health_{i,t}+\alpha _{3} X_{i,t}+\epsilon _{i,t} \end{aligned}$$
(2)

where \(j=1,2,...6.\) At \(j=0\) the coefficient of the SI index is mechanically 0, which is why it is not shown and the coefficients are plotted from \(j=1\). We regress each future health outcome Health for individual i at time \(t+j\) on social isolation SI at baseline t (the binary indicator that is equal to 1 whenever the index is non null) and for the exact same set of baseline characteristics used in the Cox model. These are summarized in Table 2, including the complete vector of health characteristics (amongst which is the outcome at baseline). Hence, the \(\alpha _{1}\) coefficient measures the correlation between SI at baseline and the deterioration (or the change), rather than the level, of health. Again, covariates are fixed at baseline in order to introduce some distance between the covariates and the outcomes.

Fig. 2
figure 2

Social isolation at baseline and health dynamics. Note: the figures display the coefficients and 95% confidence intervals for the effect of social isolation on all (standardized) health outcomes. “Time” indexes future waves: 1 is wave \(t+1\), 2 is wave \(t+2\), etc. Regressions are performed separately for each outcome and lag. All regressions include the health and socio-demographic variables displayed in Table 2 and wave (a dummy for which wave is baseline) and country FE, and control for the outcome at baseline

For each health outcome, six regressions are run. There are more than 6 waves, but, in contrast with mortality, which is known at each wave and for which the date of death is known even if it occurs between two waves, here, several outcomes are not provided at all waves, such as at wave 3 for depression and frailty (SHARELIFE), and during the two “Corona” waves. “Time” indexes future waves, 1 for wave t + 1, and so on, up to 6, for wave t + 6 (individuals are observed at most from wave 1 to wave 7). In all our regressions, the outcome is measured at one of these future waves, while the rest of the variables are fixed at baseline. All health outcomes are standardized so that their mean is 0 and standard deviation is 1, which facilitates the comparison of the graphical representations of our regressions. They are coded so that higher values indicate worse health and are represented using the same scale on all graphs. As mentioned before, sample size shrinks over time, and hence confidence intervals become larger over time. A clear picture is still attained of how social isolation correlates with health over time.

Social isolation is undeniably associated with worsening health. Nevertheless, there is heterogeneity across outcomes: cognitive health (recall test) starts worsening in association with social isolation after one wave and the effect of social isolation remains at that same level after two waves, in line with Shankar et al. (2013), who find poorer cognitive functioning amongst the socially isolated four years after baseline using the ELSA data. Since we are controlling for baseline cognitive functioning, our results indicate a higher rate of decline for the socially isolated, in accordance with Ertel et al. (2008), who find a higher rate of memory loss using word recall for individuals with lower social integration (which is very close to our measure of social isolation) using HRS data. Other outcomes, such as frailty and self-assessed health, follow a similar trend, while certain outcomes return to their initial level, such as depression after four waves. It therefore appears that social isolation worsens both physical and cognitive health in the short and long run, but its association with mental health is only transitory. Functional health, when measured as ”suffering at least one limitation,” instead of the number of limitations as we had done so far, becomes increasingly correlated with social isolation over time, before possibly going back to the baseline level (the precision of the estimates does not allow us to derive any conclusion after 6 waves of follow-up). Nonetheless, the relationship between functional health and social isolation is sensitive to the definition of functional health: when it is defined as the sum of limitations with ADLs, our estimates are much closer to being non-significantly different from 0, in line with Shankar et al. (2017), who do not find a significant association between the number of ADLs and social isolation using two waves of the ELSA data.

This empirical exercise enables a relationship between social isolation and the deterioration of health to be revealed in almost all its facets. Even when considering mental health, for which the association does not persist in the long run, there is still a deterioration occurring two waves after baseline. Moreover, a high correlation at baseline between social isolation and poor mental health could be associated with higher mortality but would not be revealed in our estimates. In other words, if the socially isolated at baseline suffer from more symptoms of depression, even in the absence of further deterioration, poor mental health that remains poor over the follow-up period could also be a channel leading to higher mortality, which should be borne in mind when interpreting the results.

4.2 Health behavior and healthcare utilization outcomes

Apart from the “biological” channel, the literature puts forward health behavior and healthcare utilization as potential pathways from social isolation to worsened health and mortality. In the mortality section, health behavior has already been implied as playing a role in the SI-mortality relationship, whereas healthcare utilization has not. In the present section, we investigate whether there is a specific pattern of the socially isolated in terms of health behavior and healthcare. The same dynamic analysis is applied to the set of health behavior and healthcare utilization variables employed in Section 3. Again, in each regression, the outcome is controlled for at baseline, and hence looking at how social isolation is associated with smoking in later waves is equivalent to studying how it correlates with changes in smoking.

As shown in Fig. 3, there is no significant relationship between social isolation and smoking (when controlling for smoking at baseline), except after one and three waves, but the dynamic pattern remains unclear. If anything, social isolation seems to be associated with less drinking. The one important behavior that is increasingly and significantly associated with social isolation over time is that of sedentarism, defined as “hardly ever or never” engaging in vigorous (e.g., sports) or moderate (e.g., gardening, going for a walk) physical activity. Sedentarism may then play a role in how socially isolated individuals become sicker, but it is also reasonable that as individuals get sicker they would engage less in physical activity. Shankar et al. (2011) and Kobayashi and Steptoe (2018) find similar results on inactivity, and a more clear-cut association with smoking and drinking, without controlling for these variables at baseline, and they conclude that loneliness and social isolation may affect health independently through their effects on health behavior. One way to ascertain whether sedentarism is a mechanism per se is to control for the health factor at future waves as well, in addition to at baseline. When doing so, the trend looks the same, but the coefficient is no longer significant, and hence the association between social isolation and sedentarism could also be spurious due to their common correlation with a worsened health status.

Fig. 3
figure 3

Social isolation at baseline and health behavior dynamics. Note: the figures display the coefficients and 95% confidence intervals for the effect of social isolation on different health behavior outcomes. “Time” indexes future waves: 1 is wave \(t+1\), 2 is wave \(t+2\), etc. Regressions are performed separately for each outcome and lag. All regressions include the health and socio-demographic variables displayed in Table 2 and wave (a dummy for which wave is baseline) and country FE, and control for the outcome at baseline

Regarding the relationship between social isolation and healthcare utilization, there are two (or more) possible directions in which social isolation might affect healthcare utilization: (1) since social isolation is associated with worse health (see previous subsection) and higher mortality (see Section 2.2), the socially isolated might use more healthcare; (2) social isolation might make individuals less inclined to use healthcare, as their loved ones “pressure” them less to undergo medical checkups, prompt them less often to seek medical help when needed, and even fail to accompany them to a visit to the doctor.Footnote 10 Our results help to shed light on this discussion: Fig. 4 indicates socially isolated people not using any more or less healthcare than non-socially-isolated individuals. Regardless of whether future health is included as a control or not in the regressions, socially isolated individuals do not use more medicine than the non-socially-isolated, and neither do they visit their physician any more or less. If anything, socially isolated individuals spend more nights at the hospital, after a few waves, but this relationship is weak, both in terms of significance and magnitude.Footnote 11

Fig. 4
figure 4

Social isolation at baseline and healthcare utilization dynamics. Note: the figures display the coefficients and 95% confidence intervals for the effect of social isolation on different healthcare utilization outcomes. “Time” indexes future waves: 1 is wave \(t+1\), 2 is wave \(t+2\), etc. Regressions are performed separately for each outcome and lag. All regressions include the health and socio-demographic variables displayed in Table 2 and wave (a dummy for which wave is baseline) and country FE, and control for the outcome at baseline

How is this “null” result compatible with the two aforementioned directions? First, as mentioned for the case of health behavior, any baseline correlation between social isolation and healthcare utilization levels is already factored in the baseline controls. Second, mechanisms (1) and (2) could theoretically cancel each other out. The fact that no positive effect of SI on healthcare utilization can be found due to health deterioration (i.e., direction (1)) is striking and consistent with mechanism (2) being a potential channel.

4.3 Quantifying the contribution of each health dimension to the social isolation-mortality relationship

Having estimated the mortality effects of SI, we now dive into the dynamics of the social isolation effects in order to answer how social isolation might lead to higher mortality. In this section, two questions are asked: (1) Which mediator has the biggest impact on mortality hazards? (2) Which mediator is the most affected by social isolation? The answers to the two questions are then combined to compute how much of the SI effects on mortality can be attributed to the effect of SI at baseline on each dimension of health, in the subsequent period. In Table 8 we present the results of the same Cox regression as in the main table (Table 3 column(5)) but displaying the hazard ratios that correspond to the health controls. In column (1), the variables are introduced as in the descriptive statistics shown in Table 2, while in column (2) all the health variables are standardized to facilitate the comparison of their effects. In answer to the first question, we find that self-assessed health, frailty, and cognitive functioning exert a major impact on mortality and are of a similar magnitude to that of social isolation: 28, 22, and 16%, respectively. When controlling for all other health dimensions, depression is not associated with higher mortality. This is to be expected since it shares 57% correlation with the frailty dimension and 39% with self-assessed health. On removing these two dimensions from the equation, depression is significantly associated with higher mortality. Neither the number of chronic diseases nor the number of limitations in ADLs and IADLs constitute a sizeable effect. These coefficients are reported (as coefficients instead of hazard ratios this time), as \(\beta (H)\) in Table 9 (first row), along with \(\alpha (S)\), in the second row: the coefficient of social isolation in the health outcome regression (at \(t+1\), i.e. the first point plotted on each graph of Fig. 2). The second row provides an answer to the second question: self-assessed health, frailty, and cognitive health, which are the same as in question 1, are the health dimensions most affected by social isolation, at least at wave \(t+1\).

Table 8 Cox models: How does health impact mortality?
Table 9 Chain-rule effect: How much of the SI impact on mortality hazards can be explained through dynamic health effects?

These coefficients \(\beta (H)\) and \(\alpha (S)\) are then multiplied by each other in order to obtain a coefficient that is generated by SI at \(t+1\), through the health channels between t and \(t+1\) (see third row). Last, we compare this coefficient with \(\beta (S)\), the coefficient of SI at baseline in the Cox model, which is equal to 0.227. In the last row, we show how much of the SI effect on mortality can be attributed to the dynamic effect of social isolation on health one wave ahead: while chronic diseases, functional health (limitations), and depression account for less than 1% of the SI effect on mortality, the impact of SI on mortality through self-assessed and cognitive health at the next wave, reported on the effect of these health variables on mortality, stands at approximately 6–7% (4% in the case of frailty). Although this does not seem like a very high figure, it is informative with respect to policy, that is, where and how to intervene in order to curb the SI effects on mortality. A policy designed to target social isolation directly, for instance promoting associational activities for older people, could therefore be viewed not only as a way to allow those who would otherwise be socially isolated to live longer, but also as a way to slow their cognitive decline. Reduction of social isolation could therefore become part of the recommendations, such as those emitted by the WHO in order to reduce the risk of dementia and cognitive decline (Chowdhary et al. 2021), and help “ Understand the influence and interactions of non-modifiable (e.g., gender, genetics, age) and modifiable (e.g., physical activity, diet, and cognitive stimulation) risk and protective factors for dementia in population-based samples.”

On reproducing the same exercise for health behaviors and healthcare utilization (see Table 10), we find that a small part of the impact of SI on mortality passes through the impact of SI on increased smoking (3%). On the other hand, the coefficient of smoking in the Cox regression is as big as that of self-assessed health, and hence, our “chain rule” accounting exercise still yields a non-null percentage for smoking. By comparison, none of the impact of SI on mortality passes through its impact on sedentarism at \(t+1\).

Table 10 Chain-rule effect: how much of the SI impact on mortality hazards can be explained through health behavior and healthcare utilization?

5 Discussion: the causality challenge and the education benchmark

5.1 Causality and other caveats

We uncovered a strong association between social isolation and mortality, in the form of a 25% increase in the mortality hazard rate for individuals who are socially isolated at baseline. The magnitude of this association is close to the estimates found in the literature for the causal impact of education on mortality. How causal can this association be proven to be? In order to make a stronger case for a causal association, all the people who die in the 24 months following the baseline (when social isolation is observed) are discarded from the sample. In this way, our sample is guaranteed not to suffer any life-threatening health condition that would fail to be captured in our health controls and that would still provide a reason for one to be socially isolated. This restriction, coupled with a very long follow-up period (up to 17.25 years, with a median follow-up of 79 months), makes reverse causality highly unlikely.

The main concern involves the potential existence of omitted variables that would affect both social isolation and mortality (or health outcomes). It is not straightforward to come up with potential confounders that are not controlled for in our regressions and would be correlated with both baseline social isolation and future health: not only is health already being controlled for extensively at baseline, but also socio-economic status, through income, wealth, and education, and other observable characteristics that could be related to both the main explanatory variable and the outcome, such as gender, whether individuals are working, and whether they have children. We also control for housing variables, as there could be a link between living in a rural area (vs. urban), and in a house (vs. in a building), and future health, whereby it may be harder for older individuals living in a rural area to seek medical attention when needed. At the same time, everything else being equal, living in a house with no neighbors, or in an isolated area, could also be correlated with social isolation.

An example of an unobservable factor that could determine social isolation is that of personality: Cacioppo et al. (2000) show that individuals from the lowest quintile on the UCLA loneliness scale were found to score lower on neuroticism and higher on surgency (extraversion), conscientiousness, and social agreeableness than individuals from the highest quintile, who in turn did not differ on any of these dimensions from individuals from the middle quintile. Introverted or neurotic individuals are probably more at risk of social isolation, since, for instance, they would be less prone to participate in social activities. Regarding personality and health, the latest evidence using SHARE’s “Big 5” data that was collected for the first time at wave 7 indicates personality as being associated with an array of health indicators in older European adults. In particular, the strongest and most consistent personality-level correlates of good health are high conscientiousness and low neuroticism (Shemesh et al. 2019).Footnote 12 Nevertheless, these would be factored into our health indicators at baseline in our main Cox model. Since neuroticism, for instance, could also be correlated with a more acute deterioration of health, personality traits should become part of our controls. Their incorporation would not be without problems since these traits are measured at wave 7, while our baseline measures of health and social isolation might come from previous waves. Arguing for the stability of (at least a fraction of) the Big 5 over time in later life, we include these traits in our Cox regressions.Footnote 13 Table A10 shows that the same traits that correlated highly with several health indicators, that is, high conscientiousness (both as a continuous and binary variable) and low neuroticism (only as a binary variable) are associated with lower mortality, even when controlling for health at baseline. Individuals who have low conscientiousness (i.e., in the first quintile of that measure), which supposedly captures having a high propension to be self-controlled and to delay gratification, to be task- and goal-oriented, organized, efficient, precise, and deliberate (John et al. 1999), are exposed to a 25% higher mortality.Footnote 14 Even such a high correlation fails to remove any effect from our social isolation index.

One potentially omitted confounder which unfortunately cannot be included in our analysis is that of genetics. What if the same genes that are overexpressed in socially isolated individuals are also responsible for activating the immune system and the inflammation mechanism in the body? This has been found by Cole et al. (2007) amongst a sample of 230 Americans aged 50–67, which explains why lonely people suffer from chronic inflammation in spite of their high levels of cortisol and are vulnerable to microbes, viruses, and other sources of tissue damage.Footnote 15 Genetics could therefore be a relevant source of omitted-variable bias in our study, if linked both to social isolation and mortality. Having established a list of potential confounders, what more could be done to assess how causal a relationship can be? In practice, several approaches can help mitigate omitted-variable concerns. The most straightforward way consists of including an appropriate set of observable controls (Angrist and Pischke 2010), such as when frailty and chronic diseases are included along with functional, self-assessed, mental, and cognitive health in order to capture the true health status of the individual. An additional approach that has been widely used in the empirical literature relies on demonstrating the stability of the key coefficient when faced with the inclusion of additional controls. Table A11 shows that the coefficient of the social isolation index remains stable over various specifications when adding a different subset of controls at each specification. For this table, we chose to regress self-assessed health six waves ahead on social isolation at baseline (at entry into the study) and other controls, such that the coefficient in column (4) corresponds to the point at time = 6 in the “self-assessed health” graph in Fig. 2, although the same stability could be shown for the other outcomes at other times.

As proposed in Oster (2019), despite being very intuitive, this idea relies on the selection on observables being informative regarding the selection on unobservables, which does not necessarily follow from the assumptions of the linear model. We therefore appeal to Oster’s use of coefficient stability as a test for selection on unobservables. The test considers both the stability of the estimated social isolation treatment effect when adding key observables and the importance of these factors in explaining health outcomes. The estimate of the coefficient of proportionality proposed by Oster as a summary of the robustness of results is 1.37, which is higher than the lower bound of 1 which Oster proposes as a standard for robustness. This value implies that unobservables would need to be 37% more important than the observables in explaining the social isolation “treatment,” in order for the treatment effect of social isolation to be zero, which seems unlikely given the richness of our data.Footnote 16 We are therefore more confident that our finding of a significant social isolation effect is not an artifact of omitted variable bias.

5.2 Education as a benchmark of social isolation effects on mortality

Education and mortality

The association we find between social isolation and mortality in Table 3 appears both significant and robust to many checks. In order to gauge the magnitude of this association, the effects of social isolation on mortality are compared to those of education, which seems an ideal benchmark candidate since a compelling literature establishes significant positive associations between education and several dimensions of adult health and negative associations between education and mortality. Whether these associations can be qualified as “causal” remains subject to debate, and diverging conclusions have been reached by several studies. Using changes in education legislation as a source of exogenous variation in educational attainment, Lleras-Muney (2005) in the US and Crespo et al. (2014) in Europe found support for a causal link. No such causal effect is found in Behrman et al. (2011), who uses comparisons of twins in Denmark. In contrast, Halpern-Manners et al. (2020), with a more recent assessment of this causal relationship based on representative US “twins data,” support a causal interpretation of the education-mortality gradient.

We reproduce the same specification as before (see Eq. (1)) and display both the hazard ratio associated with our social isolation index and those associated with our education controls (see Panel A of Table A12 in the Appendix). Education is introduced as 4 categories, the higher educational attainment group being omitted as the reference group (the “other” category was dropped here, to make the interpretation more straightforward, hence the slight discrepancies in sample sizes with respect to Table 3). When controlling for socio-demographic information at baseline, lower education levels are associated with higher hazard ratios (ranging from 1.35 for the lowest education level to 1.22 for the upper secondary group), with greater magnitude than the hazard ratio corresponding to social isolation (1.21).

Note that when adding baseline health as a control, most of the effect of lower education disappears, and the magnitude of the effect becomes similar to that of social isolation. This effect is less for the upper secondary education group. This suggests that most of the effect of education on mortality has already been channeled into individuals’ health status when they enter the study, except for those who are the closest to the higher education group. Therefore, the more controls and horizon constraints are added to the model, the less significant the association becomes between education and mortality. The only remaining significant difference lies between the upper secondary and the tertiary education group. In short, the education-mortality gradient has a similar magnitude as that between social isolation and mortality, but is less robust to the inclusion of the same additional controls and constraints. This result does not question the causal relationship between education and health obtained from the quasi-experimental studies cited above, which do not control for health when looking at education and mortality. When social isolation is removed from the equation and if years of education are considered instead of education categories (see Panel B of Table A5) in order to more closely resemble the study by Lleras-Muney (2005), we find that 1 year of education is associated with a 2.4% drop in the hazard of dying (over the period), which lies in the 1.3\(-\)3.6 interval found in the aforementioned study, over a 10-year period. On adding baseline health into the equation, this result no longer holds.

Dynamic effects of education on health outcomes

Again, in order to gauge the magnitude of the effects of social isolation on future health, we construct the same graphs as before, but display the coefficient of the lowest education category (with respect to the highest category) instead of that of the social isolation index (see Fig. A3). As before, controlling for baseline health implies we are looking at the effect of education on changes in health (or on future health given health at entry), so one possibility would be that all of the impact of education is already factored in health at entry, and is not reflected in deviations of health from that point on. We find a very similar pattern as in Fig. 2, except that the magnitude of the coefficients is much bigger, e.g. coefficients are twice those of social isolation for self-assessed health, and 6 times those of social isolation for cognitive health (word recall). Looking at the health factor that summarizes all the observed dimensions of health, being socially isolated takes a toll on individuals’ health, approximately half the toll of being in the lowest education group.

6 Conclusions

In this paper, a strong association between social isolation and future mortality is found, which is not solely mediated by concurrent loneliness, health behavior, and healthcare utilization, nor solely by social connectedness. We also explore the dynamics of the health impact of social isolation and find that social isolation correlates with a persistent worsening of all the facets of health considered herein (self-assessed, frailty, cognitive, mental, functional).

Although previous studies have identified similar associations, we contribute to the existing literature by looking at health across many dimensions instead of focusing on a single health outcome. Our study is carried out in a harmonized multi-country longitudinal framework that enables us to employ a long follow-up period and alleviates endogeneity concerns. We investigate heterogeneity in the social isolation-health relationship across numerous countries and find a much stronger association between social isolation and mortality in Eastern countries. The fact that this very same objective measure of social isolation does not produce the same health consequences across countries, albeit using harmonized data, suggests that public health policies might have a role to play in moderating the health risks posed by social isolation.

We test several potential underlying mechanisms and find that socially isolated individuals do not resort to more healthcare use in subsequent waves than non-socially-isolated individuals, although their health does worsen across all dimensions. This suggests that healthcare utilization might be a channel underlying the relationship between social isolation and health. When our mortality, health, and health behavior models are combined in an accounting exercise, we find that up to 13% of the effect of baseline social isolation on mortality can be attributed to the combined one-wave-ahead impact of social isolation on increased frailty, reduced cognitive functioning, and increased smoking.

Last but not least, we provide evidence in favor of a causal interpretation of our estimates using Oster’s test for selection on unobservables. We also compare the social isolation-health gradient to the much-studied education-health gradient. Using the same models, the education gradient in mortality is less steep than the social isolation gradient, but the association of education with future health is stronger than that found for social isolation in dynamic value-added regressions.

In terms of public policy, our research shows that older individuals could benefit from public health initiatives that would focus on the strengthening of social connections through participating in clubs and associations. Furthermore, having established the existence of a pathway between social isolation and the deterioration of health for older individuals, it becomes relevant to investigate, using post-pandemic data, how the COVID-19 pandemic has impacted social isolation in the medium and longer run.