Introduction

During the years 2000, many European countries have seen increases in the employment of older workers, in parallel with the implementation of new policies that were aimed at stimulating older workers to extend working lives and postpone transitions from work to retirement. There is a widespread belief that the tendency toward later retirement is attributable to the new policies (OECD 2017). An evaluation of the true effect of policy measures, however, should take into account that not only policies, but also other things have changed. In particular, the characteristics of successive cohorts of older workers have undergone profound changes.

For one, more recent generations of workers born after World War II are better educated and have a greater sense of control over their lives (Flynn 1987; Drewelies et al. 2018), which in turn is associated with a higher level of employment (Boot et al. 2014; Thijs et al 2019). Furthermore, particularly female employment has increased in the older age groups and this change in gender composition of the workforce may also help explain increases in the employment among older workers in general. Moreover, increases in the employment of older women may have led older male workers to postpone their retirement as well when their spouse, who is typically a few years younger, is also working (Eismann et al. 2019; Ho and Raymo 2009).

This study aims to assess the impact of policy measures to extend working lives (PEW) by comparing the employment of two successive samples of older workers between two countries that differ in the extent of implementation of pertinent policies, while sharing many other characteristics. In one country, the Netherlands, rigorous policy measures were implemented in 2006. These involved the abolishment of the financial support of early retirement and at the same time the restriction of early exit by the alternative pathways of unemployment and disability (van Oorschot 2007). Whereas the average age of leaving employment had stayed stable at about 61 years up to 2006, a gradual rise was observed immediately following this year, up to 63 years in 2010. This rise is among the steepest in EU-countries (OECD 2017). As opposed to the Netherlands, Norway is a country in which employment policies were not changed in the years 2000. Yet, employment of older workers in Norway rose as well, although to a lesser extent (Fig. 1).

Fig. 1
figure 1

Source: Eurostat (2021)

Participation in paid work, men and women aged 55–64, The Netherlands and Norway, 2000–2010. The vertical line indicates the year of implementation of policies to extend working lives in the Netherlands (2006).

Meanwhile, as Norway is a country with many similarities to the Netherlands in terms of economic development and institutional arrangements, it is to be expected that individual characteristics have undergone similar changes in Norway as in the Netherlands. Thus, Norway and the Netherlands can be assumed to have similarities, with one important difference: Norway did not change its work and pension policies during the years 2000. This makes Norway a suitable country for comparison of the effect of PEW on the employment of older workers in a quasi-experimental design. The analytical strategy of comparing similar entities differing in one key characteristic is known as the method of ‘most similar systems’ (or Mill’s method of difference) in comparative research (Anckar 2008).

Our main research question is: What is the impact of policies to extend working lives on the employment of older workers? We address this research question in three stages. First, we examine to what extent the rise in employment in older workers was faster in the Netherlands than in Norway. Second, when accounting for secular changes in individual characteristics, we examine if the increase in employment remained faster in the Netherlands than in Norway. If this is the case, we can conclude with greater likelihood that at least part of the Dutch rise in employment can be attributed to other factors such as PEW. If, however, a difference in the rate of increase in employment is no longer observed, we can conclude that the Dutch rise in employment is fully explained by secular changes in individual characteristics and unobserved factors that are assumed to be similar in both countries. Third, given secular change, and provided we find a faster rise in employment in the Netherlands than in Norway, we estimate the amount of extra rise in employment in the Netherlands that may be attributable to PEW. We perform the analyses for men and women separately, because women showed a steeper rise in employment than men in both countries, and because different factors may drive employment in each gender (Loretto and Vickerstaff 2013).

Methods

Samples

Data are used from two same-aged population-based samples in each country, thus including people both in and out of the workforce. Participants were newly recruited both in 2002–03 and in 2012–13 in the Dutch Longitudinal Aging Study Amsterdam (LASA), and both in 2002–03 and in 2007–08 in the NORwegian Longitudinal study on Aging and Generations (NorLAG). For better readability, we will indicate the years of measurement by the year in which most measurements took place, i.e., by 2003 and 2013 for LASA and by 2002 and 2007 for NorLAG.

LASA is an ongoing, prospective cohort-sequential study, which addresses the determinants and consequences of changes in functioning with ageing in different domains: physical, emotional, cognitive, and social functioning. Older adults aged 55–85 years were selected in three socio-culturally different regions in the Netherlands, ensuring a nationally representative sample. The participants were randomly selected from the municipal registries. Since the start in 1992, measurement waves have been repeated every three years. New cohorts of participants aged 55–64 years were recruited in 2003 and in 2013, using the same sampling frame. More detailed information on the sampling and data collection has been described elsewhere (Huisman et al 2011; Hoogendijk et al 2016). The data used in this study are derived from the baseline interview of the 2003 and 2013 cohorts (n = 985 and 981, respectively).

NorLAG is also an ongoing prospective study with the aim to provide knowledge about life-course changes by studying behavior and transitions related to work and retirement, family relations, health and care, and quality of life. Participants aged 40–79 years were randomly selected from the non-institutionalized population living in 30 communities in the national registry of legal residents in 2002. In 2007, a follow-up measurement of the 2002 sample was realized and a refreshment sample was added. The data include survey instruments in combination with various national registers. More detailed information on the sampling and data collection has been described elsewhere (Slagsvold et al 2012). The current study uses data collected in 2002 and 2007 among participants aged 55–64 years (n = 1466 and 1398, respectively, 603 of which participated in both 2002 and 2007).

Measures

Dependent variable

Employment of one or more hours per week is the outcome variable. Employment includes participation in paid work as an employee and self-employment.

Independent variables

Time

Time is measured from 1 January 2002 up to the exact date of each interview, as conducting the interviews for one measurement period could take more than one year (ranging from 0.65 to 1.75 years, depending on study and period). The mean time between the two measurements for Norway is 5.13 years, and for the Netherlands, 10.22 years. In the analyses, we divided the time variable by 5 to obtain better readable regression coefficients; one unit in the time variable thus indicates 5-year change.

Explanatory variables

The explanatory variables are selected based on earlier evidence of an association with employment and increases in employment at older ages (Boissonnault et al. 2020; Boot et al. 2014) and on comparability between LASA and NorLAG. They include demographics, health, and psychosocial characteristics.

Sex and age are derived from the population registry in both studies.

Educational level was based on self-reports in LASA and register information in NorLAG, and categorized according to the International Standard Classification of Education (ISCED) guidelines for both sources: (1) low (elementary not completed, elementary, lower vocational, general intermediate), (2) intermediate (intermediate vocational, general secondary), and (3) high (higher vocational, college, and university) (UNESCO 2012). To obtain a more sensitive measure, the ISCED codes were also recalculated into the number of years normally required to achieve each level of education.

Partner status and partner’s work status were self-reported. Both variables were dichotomous, with categories: (0) no partner, (1) partner, and (0) no partner or partner not doing paid work >  = 1 h/week, (1) partner, doing paid work >  = 1 h/week.

Self-rated health was assessed using the question: ‘How is your health in general?’ This question had response options: (1) ‘very good’ to (5) ‘poor’.

Functional limitations were self-reported. In LASA, it was based on six activities: climbing up and down a staircase, walking outside for 5 min, dressing and undressing, cutting own toenails, getting up from and sitting down in a chair, and using own or public transportation. Response categories were (0) able without difficulty, (1) able with some difficulty, (2) able with much difficulty, (3) not able without help, and (4) cannot do (Van Sonsbeek 1988). Participants who had at least some difficulty with an activity were coded as (1). The summed score ranged from 0 to 6. In NorLAG, the functional limitations measure was derived from two Short Form-12 items (Ware et al. 1996): performing activities such as moving a table, vacuuming, walking or gardening, and going up stairs of several floors. Response categories were (0) ‘health limits a lot,’ (1) ‘health limits to some extent,’ and (2) ‘no limitation due to health,’ and the summed score has values from 0 (no limitations on both items) to 4 (yes, a lot of limitations on both items). To achieve harmonization with LASA, the NorLAG scale was extended so that it had the same standard deviation.

Sense of mastery. In both studies, sense of mastery was derived from the Pearlin Mastery Scale (Pearlin and Schooler 1978). This scale consists of seven questions: five negative and two positive items, with response categories ranging from (1) strongly disagree to (5) strongly agree. In LASA, the five negative items were selected and their reverse scores were summed. Sum scores ranged from 5 to 25. In NorLAG, one negative and one positive item were selected: ‘I have little control over what happens to me’ (reversed), and ‘What happens to me depends on myself.’ The summary scale ranged from 2 to 10. In both versions of the scale, higher scores reflect a higher sense of mastery. To achieve harmonization, both scales were standardized to mean 0 and standard deviation 1.

Statistical analysis

The two country-specific datasets, each including data from two measurement periods, were pooled, and a dummy variable for country was created with Norway as the reference. The basic analytic model consisted of employment as the outcome variable and time as the main independent variable; the time variable indicates the change in employment over time. Because age proved to be a suppressor of the association between time and employment, age was included in the basic model. As logistic regression models for dichotomous variables present a problem of interpretation of the coefficients when variables are added to the model, linear regression models were used to circumvent this problem (Mood 2010). We tested our assumption of a steeper rise in employment for women than for men on the basis of the interaction term sex*time. As this interaction term was highly significant, we stratified all analyses for sex. First, descriptive statistics for all individual characteristics were presented. Next, a series of regression models were evaluated using a difference-in-differences approach.

The difference-in-differences approach assumes that the outcome variable would follow the same time trend in both countries if there had been no policy change in one country (Fredriksson and Magalhães de Oliveira 2019). This assumption can be tested by examining the parallelism of changes in employment prior to the policy change, i.e., the year 2006. As our data consist of only one measurement point prior to 2006, we had to resort to labor force data published by Eurostat, as shown in Fig. 1. Visual inspection of the trends prior to 2006 leads to the conclusion that for men, trends in employment in both countries may be considered similar, but that for women, parallelism is not tenable. Employment in women rose faster in the Netherlands than in Norway. In the case of violation of the parallel trends assumption, a lagged-dependent-variable approach has been recommended (O’Neill et al. 2016). This method adjusts for differences in ‘pre-treatment’ outcomes in the regression model and thus reduces the bias in our estimates for women. Again, as our data include no prior measurements of employment for any of our samples, we had to resort to proxies, based on Eurostat labor force data when our samples were five years younger, as suggested by Morris (2021). These data represent the aggregated, cohort-specific employment history of our participants. In a sensitivity analysis, we accounted for these earlier employment rates in our analyses by creating a new independent variable, assigning the percentage employed five years earlier to each study participant, based on year of interview and 5-year age group, for each sex and country.

To examine the first part of our research question, if a steeper rise in employment in the Netherlands than in Norway was observed during the full study period, the interaction term country*time was included in the basic regression model, using Norway in 2002 as the reference, in addition to the main effects of time, country, and age. The size of this interaction term indicates the difference in rise in employment in the Netherlands compared to Norway, corresponding to the difference-in-differences approach.

For the second part of our research question, to estimate the effect of secular change, the individual characteristics were included into this model to examine their explanatory value for the change over time. Covariates with significance p < 0.20 were kept in the model. The explanatory value of the individual characteristics for the effect of time was derived for each country from the reduction in the coefficient of time in the full model compared to this coefficient in the basic model. In the full model, the interaction term indicates the country difference in the rise in employment that can neither be attributed to secular change in the individual characteristics investigated in this study, nor to unobserved factors that are assumed to be at work equally in Norway and the Netherlands, according to the ‘similar systems’ approach. As a sensitivity analysis, we added the constructed variable ‘cohort employment 5 years earlier’ to the model for women to alleviate the bias that might have resulted from the violation of the parallel trends assumption.

To address our third research question, we estimated the share of the total rise in employment in the Netherlands that may be attributable to PEW, by dividing the extra rise in the Netherlands, net of the effects of individual and unobserved factors, by the total employment rise in the Netherlands from the basic model (Morris 2021). For women, we calculated this proportion using both the full model and the model that also included the constructed variable ‘cohort employment 5 years earlier’.

In NorLAG, 27% of the participants participated in both 2002 and 2007. Thus, the test statistics were also estimated when adjusting for the non-independence of observations for those participating in both years, using the cluster option in Stata (cf. Cameron and Miller 2015). This adjustment did not change the results.

Results

Descriptives

Initially, the employment of older workers was higher in Norway than in the Netherlands (Table 1). In both countries, employment rose subsequently. This rise was clearly stronger in the Netherlands than in Norway, and particularly so among Dutch women. The increase over 10 years amounted to 16.9 percentage points for Dutch men and to as much as 23.8 percentage points for Dutch women. These increases were 3.2 and 5.5 percentage points for Norwegian men and women over five years, respectively.

Table 1 Descriptive characteristics of the Dutch and Norwegian samples, first and last measurement time. The total numbers represent the participants with valid data on work status and all covariates

Several changes over time in individual characteristics were observed in both countries. The level of education increased. There was no clear change in partner status. Among individuals with a partner, the partner was more often employed. There was not much change in health, except that functional limitations decreased among Norwegian women. In both countries and both sexes, sense of mastery increased.

Difference in employment over time

In the pooled dataset, including both countries and both measurement periods, we examined the country difference in rise in employment over time in the basic model including time, country (with reference Norway), and the interaction between country and time, with only age as a covariate. The time coefficient shows that employment in Norway rose by 3.6% points over five years in men and by 5.7% points in women, net of age (Table 2, upper part). The interaction term shows the difference in rise in employment for the Netherlands compared to Norway, amounting to 5.5% points and 6.7% points over five years for men and women, respectively. For both sexes, the interaction term was significant, indicating that the rise in employment was significantly greater in the Netherlands than in Norway. The total rise in employment in the Netherlands, then, was derived by adding the coefficients for time and country*time. This yielded 9.1% points for men and 12.4% points for women over five years, net of age.

Table 2 Regression model of the association of time with employment in data pooled across countries, by sex; interaction effect of country*time before and after inclusion of individual characteristics

The role of individual characteristics

We next examined the role of individual characteristics regarding the rise in employment for each country and sex, comparing the basic regression model with the model also including the other individual characteristics (Table 2, lower part). Inclusion of the other individual characteristics yielded a clear increase in variance explained, which reached approximately the same value for men and women: 27.4% and 29.7%, respectively. The individual characteristics together explained substantial amounts of the effect of time for Norway: in men, 47% (1-(0.019/0.036) = 0.47); in women, 74% (1-(0.015/0.057) = 0.74). For the Netherlands, these amounts were: in men 22% (1 – (0.071/0.091) = 0.22); in women, 27% (1-(0.090/0.124) = 0.27). Thus, the smaller increase in employment in Norway could for a much larger part be attributed to individual characteristics than the larger increase in employment in the Netherlands.

All individual characteristics were significantly associated with employment in both men and women, with one exception. Having a partner who did not have paid work was significantly negatively associated with employment only in women.

After accounting for the individual characteristics, the coefficient for time indicates the rise in employment in Norway, net of the individual characteristics studied (Table 2, lower part). In both men and women, this coefficient was no longer significant but still not equal to zero, indicating that unobserved factors also play a role in the rise in employment in Norway. The ‘similar systems’ method assumes that such unobserved factors are at work similarly in both countries. The interaction term country*time shows that the difference in rise in employment over five years between the two countries amounted to 5.2% points and 7.5% points in men and women, respectively, net of both the individual characteristics studied and unobserved factors as indicated by the coefficient of time.

As the parallel trends assumption was violated for women, we performed a sensitivity analysis accounting for this violation by using a modified version of the lagged-dependent-variable approach (Supplement Table S1). Introducing the constructed variable ‘cohort employment 5 years earlier’ into the adjusted model for women did not change the variance explained (now: 29.8%), but substantially reduced the coefficients for time and country as well as the interaction term country*time. In this model, the difference in rise in female employment over five years between the two countries amounted to 5.1% points.

Share of the total employment rise in the Netherlands that can be possibly attributed to PEW

Recalling that the total rise in employment in the Netherlands, as derived from the basic model, was 9.1% for men and 12.4% for women, we compared the rise net of the effects of individual and unobserved characteristics to this total rise. For men, the share of total employment rise constitutes a proportion of 5.2/9.1 = 57.1% and for women, 7.5/12.4 = 57.3%. The latter proportions may be considered as the maximum that can be attributed to the new Dutch policies to extend working lives. Using the model that includes the variable ‘cohort employment 5 years earlier,’ the share of female employment rise possibly attributable to PEW is reduced and amounts to 5.1/12.4 = 41%.

Discussion

The employment of older workers rose both in the Netherlands and in Norway during the 2000s. Many other European countries experienced similar developments. Research and policy documents have attributed increases in employment to the implementation of policy measures to extend working lives (e.g., OECD 2017). As such, increases in employment have been considered as proof of the effectiveness of these policies. This study’s aim was to test this seemingly obvious explanation by exploring whether alternative explanations related to secular changes in individual characteristics may also explain the increases in employment among older workers. We used a quasi-experimental design, comparing two countries with many similarities in terms of economic development and institutional arrangements and both showing a rise in the employment of older workers: one country with clear PEW (the Netherlands) and another country without such policy (Norway) in the study period.

The Netherlands experienced a steep rise in the employment of older workers. Individual characteristics (beyond age) could explain about one quarter of this increase in both men and women. In Norway, a much smaller rise was observed, and individual characteristics explained about half of the increase in the employment of men and about three quarters of this increase in women. The ‘most similar systems’ approach adopted in our study involves a direct comparison between the increases in employment of older workers in the two countries, assuming that unobserved factors are similarly at work in both countries. Application of this approach, in addition to adjustment for individual variables, indicated that new policies introduced in the Netherlands may have led to an estimated maximum share of 57% of the total increase during 2003–2013 in the employment of both men and women.

Our analyses required that the assumption of parallel trends in employment held prior to the year of implementation of PEW in the Netherlands. Although this assumption held for men, it turned out to be violated for women. As recommended by O’Neill et al (2016), we alleviated the bias in our estimate due to this violation by including a new variable indicating the rate of employment in the female cohorts when they were 5 years younger as a proxy for a lagged dependent variable. A true lagged employment variable was not available in our data at the individual level, and more over, the Eurostat data were only available since 1995. This precluded the construction of cohort-specific employment 10 years earlier, which would have been preferable because it would have avoided any overlap with our study period. As such, the 5-year lag for the second Dutch sample overlapped with the first years following the implementation of PEW in the Netherlands (2007 and 2008) and thus, its inclusion may have led to an overestimation of the role of secular change. Therefore, we prefer to consider the findings derived from Table 2 as our most valid ones.

It is important to consider that changes in individual factors that explain changes in employment in the Netherlands could have been induced by the new policies, so that, strictly speaking, we should have limited our analyses of secular trends to the period up to 2006, when the new policies were first implemented. However, policy measures take time to be implemented, and many workers still had access to early retirement regulations for several years after 2006. Moreover, long-term planning by individual workers and normative expectations regarding the appropriate behavior (including retirement) of older workers likely require several years before being fully adapted to the new policies. In fact, the national data on retirement timing show only a gradual increase in retirement age from 2006, which continued up to 2013 (Statistics Netherlands 2017).

The individual characteristic that is most likely to have changed as a consequence of the PEW implemented in the Netherlands in 2006 is the work status of the partner. Thus, including work status of the partner likely overestimates the explanatory value of individual characteristics. In order to estimate the extent of potential overestimation, we assessed our regression models without this variable (Supplement Table S2). In men, instead of a maximum of 57%, a maximum of 69% can then be attributed to the new policies. In women, the difference is smaller: instead of 57%, it is 64%. However, the rise in Dutch women’s employment up to 2006 has been shown to be related to the increase in norms supporting gender egalitarianism (Thijs et al. 2019), which is an aspect of secular change. Therefore, we believe that by excluding work status of the partner, we might overestimate the role of PEW especially in men.

In addition to secular change in individual characteristics, there is evidence that also work characteristics changed over the study period. Societal developments such as advances in technology and digitalization have changed the content and organization of work. Task requirements changed within jobs, many jobs disappeared, and other jobs were newly created (Cassidy 2017). Jobs more often required mental rather than physical effort. Thus, more recent generations of workers have jobs characterized by other job demands than those held by earlier generations of workers. To illustrate, physical demands such as work in awkward postures decreased as did psychosocial strains such as low job control, and psychosocial demands such as cognitively intense work increased (Burr et al. 2003; Romeu Gordo and Skirbekk 2013). Our own data (see Supplement Table S3) show that in the Netherlands, job level, time pressure, task variation, and autonomy increased. In Norway, time pressure and autonomy changed in similar ways as in the Netherlands. These increases generally correspond to a shift from physical to mental job demands. This shift leaves room for the suggestion that the secular change in work characteristics contributed to the rise in employment, in addition to the PEW and the secular changes in individual characteristics that we studied. We can conclude that, particularly in the Dutch case, a smaller part of the rise in employment of older workers should be attributed to policy measures than the maximum that we were able to estimate based on the secular change in individual characteristics only.

Some further uncertainties are related to the findings from this study. Our full model comparing increases in employment in the Netherlands and Norway includes a selection of individual characteristics that are associated with employment and changes in employment over time and were available in both country-specific datasets. Other important characteristics might include adherence to gender roles. Our argument to highlight this characteristic stems from the observation that female employment increased in an earlier period in Norway than in the Netherlands and has started to flatten out over the study period, in contrast to the Netherlands where female employment also continued to increase during the early 2000s. Hence, some of the steeper rise in the employment of older females in the Netherlands could be attributed to more general changes in attitudes toward gender roles that took place later in this country compared to Norway (Inglehart et al. 2020). In all, our estimates of the contribution of individual characteristics are most likely on the conservative side, which may imply an overestimation of the impact of PEW in the Netherlands.

Another uncertainty relates to developments in other macro-factors than policies to extend working lives. Factors such as the generosity of state pensions and the cost of living show differences between the countries, but there are no indications that these differences have changed substantially during the study period. We recall that Norway was chosen as the comparison country because it did not have any major changes in pension and work policies during the study period, while being comparable in terms of economic development and institutional arrangements. Even if Norway did not change its pension and work policies during the study period, there could still be some effects of changing attitudes toward the employment of older workers from policy discourses from various elites that may have lead older workers to value continued employment as more economically and morally important during this period (Hagelund and Grødem 2019). Such changes in the value of continued work are difficult to investigate empirically, however. Another macro-development is the business cycle. Whereas the great economic crisis of 2008 fell outside the study period in Norway, it took place in the second half of the study period in the Netherlands. It is possible that employment would have increased more, if the crisis had not occurred. Indeed, according to national unemployment statistics (Statistics Netherlands 2020), between 2003 and 2013 unemployment in the Netherlands showed an increase from 3.4 to 6.8% in the age group 55–64 years, the largest increase of which occurred after 2012.

A final uncertainty concerns the compatibility of the two datasets. First, the time period covered by the Norwegian data is only half that covered by the Dutch data, and thus, half of (potential) secular change may be missed. Unfortunately, the Norwegian study does not have data around 2012–13. On the other hand, if we assume secular change to be linear within the time period of 10 years, a longer observation period for Norway would have led to the same results. Indeed, the secular change in Norway was linear for the important characteristic of educational level (Supplement figure S4). We see no reasons to expect deviation from linear change for other important individual factors in our study, as earlier studies showed that the secular change in functional limitations and sense of mastery was gradual (Drewelies et al 2018; Galenkamp et al 2013). Regardless, the difference in length of study period may affect our conclusions in unknown ways. Second, the LASA design consists of two independent samples, whereas the NorLAG design is basically longitudinal with addition of a refreshment sample at follow-up. However, due to the restricted age range of 55–64 years, 56% of the participants at T1 did not participate at T2 and 58% of those participating at T2 did not participate at T1, resulting in only 27% of participants with non-independent data. As stated in the Statistical Analysis section, accounting and not accounting for non-independence of the observations in NorLAG did not make a difference in the results.

Conclusion

This study showed that at ages 55–64 years, employment rose faster in the Netherlands than in Norway in the 2000s, and that secular change in individual characteristics explained a substantially larger share of the smaller rise in Norway than of the larger rise in the Netherlands. Changes in work conditions and changes in attitudes toward gender roles may explain even more of the observed rise. Furthermore, accounting for available individual characteristics and unobserved factors that were assumed to show similar change in the Netherlands and Norway, we calculated that a proportion of less than 57% of the rise in Dutch employment is likely to be attributable to policy measures. This leads to the conclusion that a great part, but certainly not all, of this rise is attributable to policy measures. In addition, there remains room for other factors at the meso- and macro-level that we could not address, to account for the rest of the change in employment.