1 Introduction

Labour market and living conditions of older individuals have become key policy issues in all European countries. Poverty is more prevalent among the elderly than among other age groups, particularly in several Southern European countries (Tsakloglou 1996). Lack of economic resources makes elderly people vulnerable to poor quality of life (Grundy 2006). Downward income mobility is larger among older age groups, particularly among widows and those with an unemployment history, suggesting policies to strengthen the social safety-net and to protect against unemployment and its consequences for economic welfare (Zaidi et al. 2005). Population ageing has lead to more pressure on pension and old age benefit systems, and policies aimed at increasing the labour force participation of older individuals are required in order to preserve the sustainability of pension systems and old age social security. To design such policies, it is important to assess the determinants of retirement. Since job satisfaction is an important factor driving retirement decisions (Kosloski et al. 2001), this makes it particularly relevant to study job satisfaction among older workers.

In this paper, using data on individuals of age 50 and older from 11 European countries, we analyze two economic aspects of subjective well-being of older Europeans: satisfaction with household income, and job satisfaction. Both contribute substantially to overall well-being (satisfaction with life or happiness). For example, Ferrer-i-Carbonell and Van Praag (2002) and Van Praag et al. (2003) analyze how satisfaction with life of adult Germans is determined by satisfaction with domains of life (satisfaction with job, finances, housing, health, leisure, and the environment) and find that, together with health satisfaction, job satisfaction and satisfaction with the financial situation are the most important determinants. Similarly large effects of financial and job satisfaction on satisfaction with life are found for the general UK population (not only the older part) by Van Praag and Ferrer-i-Carbonell (2008, p. 91), though they find even larger effects of satisfactions with leisure-use and social life.

Satisfaction with household income has often been studied in the context of household equivalence scales; see, e.g., Van Praag and Van der Sar (1988), Van Praag and Warnaar (1997), Charlier (2002), or Van Praag and Ferrer-i-Carbonell (2008). The economic literature on satisfaction with life emphasizes the role of income (cf., e.g., Clark et al. 2008), but often analyzes the role of income for life satisfaction directly, without considering satisfaction with income (see, for example, Schyns 2002). An exception is the work of Van Praag and co-authors (e.g., Van Praag et al. 2003) who introduced a two-stage model where satisfaction with life is a function of satisfaction with several domains, including satisfaction with income or the financial situation, and where domain specific satisfaction variables are determined by socio-economic characteristics including income. This two stage approach shows the importance of income satisfaction relative to satisfaction with other domains. Van Praag and Ferrer-i-Carbonell (2008) also compare income satisfaction in several countries. Kapteyn et al. (2008) compare income satisfaction of the adult population in the US and the Netherlands. Hsieh (2004) analyzes income satisfaction among older Americans. We are not aware of studies that focus specifically on international comparison of income satisfaction among older individuals.

Job satisfaction has traditionally been studied in sociology and psychology, but has more recently also been shown to provide useful information about economic life that should not be ignored (Hamermesh 1977; Freeman 1978; Borjas 1979; Clark and Oswald 1996). For example, it appears to have predictive value for observable phenomena such as quit rates (Freeman 1978; Clark et al. 1998) or absenteeism (Clegg 1983). The determinants of job satisfaction have been studied extensively for populations of all adult workers; see, for example, Clark (1997), Clark et al. (1998), and Hamermesh (2001). Sousa-Poza and Sousa-Poza (2000) and Kristensen and Johansson (2008) compare job satisfaction and satisfaction with various job characteristics across countries. We do not know of studies that focus specifically on international comparisons of job satisfaction among older workers.

An important issue underlying the cross-country comparison of self-reported well-being or satisfaction with different domains of life is that individuals from different countries or socio-demographic backgrounds may use different response scales, referred to as differential item functioning (DIF) in the psychology literature (Holland and Wainer 1993). Indeed, if individuals use the same scale, differences in self-reported satisfaction reflect “true” differences across countries or groups of individuals that may also affect behaviour, whereas if differences are due to response scale differences only, they do not influence behaviour and adjustments are required to compare true satisfaction across individuals. Van Praag et al. (2003) use panel data models with (quasi-) fixed effects, capturing persistent differences in response scales.Footnote 1 This allows them to identify how changes in satisfaction respond to changes in characteristics but does not help to identify cross-country differences in satisfaction levels that keep response scales constant. Specifically for the latter purpose, King et al. (2004) have proposed to use anchoring vignettes—respondents are asked to evaluate hypothetical situations described in the survey question. This additional information helps to identify interpersonal differences in response scales, even with cross-section data.

Anchoring vignettes have been used to analyze cross-country differences in various subjective measures of well-being, such as political efficacy (King et al. 2004), health (Salomon et al. 2004; Bago d’Úva et al. 2008a, b), life satisfaction (Angelini et al. 2011; Kapteyn et al. 2010), or work disability (Kapteyn et al. 2007). Kapteyn et al. (2008) use anchoring vignettes to compare income satisfaction between the Netherlands and US. They find that the distribution of self-reported income satisfaction differs substantially across countries, but correcting for response scale differences makes the distributions much more similar. Kristensen and Johansson (2008) analyse the job satisfaction across seven European countries using anchoring vignettes and find evidences of cultural differences in reporting job satisfaction. They show that correcting for such differences alters the country ranking.

The aim of this paper is to compare income and job satisfaction of older individuals (50+) across European countries correcting for differences in reporting styles of the respondent by using anchoring vignettes. The results of Bago d’Uva et al. (2008b) and Kapteyn et al. (2007) suggest that differences in reporting styles across countries and socio-economic groups are important for older age groups, though it is not clear whether they are systematically larger or smaller than for younger age groups.

The remainder of this paper is organized as follows. Section 2 presents the econometric model and motivates the use of anchoring vignettes. Section 3 presents the data and descriptive statistics. Estimation results are presented in Sect. 4. Section 5 presents some simulations of counterfactual distributions, showing how income and job satisfaction compare across countries when response scales are kept constant. Section 6 concludes.

2 The Model

The methodology of anchoring vignettes to measure subjective ordinal responses taking into account differences in the reporting styles across individuals was first introduced by King et al. (2004). We follow their parametric model, the conditional hopit (chopit) model. Define a latent self-satisfaction variable (\( s_{i}^{*} \)) as:

$$ s_{i}^{*} = X_{i} \beta + \varepsilon_{i} , $$
(1)

where X i is a vector of explanatory variables such as country dummies, gender, years of education, and household income, and β is a vector of parameters. The error term \( \varepsilon_{i} \) is assumed to be standard normally distributed and independent of X i . Reported satisfaction (s i ) is a 5-point-scale ordered categorical variable based upon an underlying latent variable \( s_{i}^{*} \):

$$ \begin{gathered} s_{i} = j\quad {\text{if}}\,\tau_{i}^{j - 1} < s_{i}^{*} \le \tau_{i}^{j} , \hfill \\ {\text{For}}\,j = 1, \ldots ,5\quad{\text{and}}\quad\tau_{i}^{0} = - \infty ,\quad\tau_{i}^{5} = + \infty . \hfill \\ \end{gathered} $$
(2)

If the thresholds between categories are the same for all respondents (\( \tau_{i}^{j} = \tau^{j} \) for all i, j) then this gives the ordered probit model, a standard model for ordered response dependent variables. The main distinguishing feature compared to this standard case is that all thresholds can vary with observed respondent characteristics:

$$ \begin{aligned} \tau_{i}^{1} = & X_{i} \gamma^{1} , \\ \tau_{i}^{j} = & \tau_{i}^{j - 1} + \exp (X_{i} \gamma^{j} ),\quad j = 2,3,4, \\ \end{aligned} $$
(3)

where the \( \gamma^{j} ,j = 1,2,3,4 \), are vectors of parameters. Without additional information, γ 1 and β are not separately identified. Imposing γ 1 = 0 leads to a generalized ordered probit model in which the distances between cut-off points can vary with X i ; the exponential function is taken to guarantee that the distances are always positive. We are particularly interested, however, in allowing for non-zero γ 1, since this means that a change in the characteristics leads to a parallel shift in all cut-off points, with the intuition that some respondents use more positive evaluations than other respondents. To identify γ 1, additional information is used in the form of vignette evaluations \( V_{i}^{k} \) (k = 1,…, K), where K is the number of different vignettes evaluated by the respondents. The vignette equivalence assumption implies that there exists a common “true” (objective) actual level of satisfaction \( \theta^{k} \) underlying the situation described by a given vignette k; the vector of all these is denoted by \( \theta = (\theta^{1} , \ldots ,\theta^{K} ) \) The vignette evaluations are modelled as follows:

$$ \begin{aligned} V_{i}^{*k} = & \theta^{k} + \nu_{i}^{k} , \\ V_{i}^{k} = & j\quad {\text{if}}\,\tau_{i}^{j - 1} < V_{i}^{*k} \le \tau_{i}^{j} , \\ \end{aligned} $$
(4)

where \( V_{i}^{k} \) is the evaluation of vignette k by respondent i, and the \( \nu_{i}^{k} \) are errors, assumed to be normally distributed with mean 0 and variance \( \sigma_{v}^{2} \), independent of each other, \( \varepsilon_{i}^{{}} \), and \( X_{i}^{{}} \).Footnote 2

The model consisting of Eqs. 14 is estimated by maximum likelihood, combining the information in the self-assessments with the information in the vignette evaluations. The likelihood contribution of a given respondent consists of a self-assessment part and a vignette part:

$$ L(\beta ,\theta ,\gamma \left| s \right.,V) = L_{s} (\beta ,\gamma \left| {s) \times L_{V} } \right.(\theta ,\gamma \left| {V)} \right., $$
(5)

where \( L_{s} (\beta ,\gamma \left| {s)} \right. \) is the likelihood component for the self-assessment:

$$ L_{s} (\beta ,\gamma \left| {s)} \right. = \mathop \Uppi \limits_{i = 1}^{N} \mathop \Uppi \limits_{j = 1}^{4} \left[ {\Upphi (\tau_{i}^{j} \left| {X_{i} \beta ) - \Upphi (} \right.(\tau_{i}^{j - 1} \left| {X_{i} \beta )} \right.} \right]^{{I(s_{i} = j)}} , $$
(6)

and \( L_{V} (\theta ,\gamma \left| {V)} \right. \) is the likelihood component for the vignette part:

$$ L_{V} (\theta ,\gamma \left| {V)} \right. = \mathop \Uppi \limits_{i = 1}^{N} \mathop \Uppi \limits_{k = 1}^{K} \mathop \Uppi \limits_{j = 1}^{4} \left[ {\Upphi (\tau_{i}^{j} \left| {\theta^{k} ,\sigma_{v}^{2} ) - \Upphi (\tau_{i}^{j - 1} \left| {\theta^{k} ,\sigma_{v}^{2} )} \right.} \right.} \right]^{{I(V_{i}^{k} = j)}} $$
(7)

The parameters \( \gamma = (\gamma^{1} , \ldots ,\gamma^{4} ) \) drive both components of the likelihood contributions, which is why the additional information in the vignette evaluations helps for identification. The main identifying assumptions in this model are twofold. The first is “response consistency:” a given respondent uses the same scales \( \tau_{i}^{j} \) for self-reports and vignettes. King et al. (2004) and Van Soest et al. (2011) found support for this hypothesis for vignettes on vision and drinking behaviour, by comparing vignette corrected self-reports and more objective measures. The second assumption is called “vignette equivalence”: there should be no systematic differences in the interpretation of a given vignette between respondents with different characteristics \( X_{i}^{{}} \) (so that \( V_{i}^{*k} \) does not vary with \( X_{i}^{{}} \)).

3 Data and Descriptive Statistics

The empirical analysis is based on data from the COMPARE sample which is part of the second wave (2006–2007) of the Survey of Health, Ageing and Retirement in Europe (SHARE). SHARE includes rich information about health, employment, financial situation, family contacts, and social activities of a representative sample of the 50+ populations in a number of European countries (Börsch-Supan et al. 2005, 2008). The COMPARE sample consists of random subsamples of the complete SHARE samples in 11 countries. Respondents in these subsamples did the complete face to face SHARE interview and then completed a drop-off questionnaire with self-assessed satisfaction with various domains of life and with vignette evaluations for the same domains; see Van Soest (2008). SHARE respondents in the other subsamples got a completely different drop-off questionnaire. Response rates to the main survey and the drop-off were similar for the COMPARE sample and the remaining SHARE sample. The COMPARE sample includes about 7,000 individuals aged 50+ from eleven European countries: Belgium, Czech Republic, Denmark, France, Germany, Greece, Italy, the Netherlands, Poland, Spain, and Sweden.

3.1 Income Satisfaction and Anchoring Vignettes

Objective measures of economic poverty across countries are typically based upon household income or household consumption expenditures corrected for purchasing power differences and differences in household composition. Such measures, however, may provide only a partial measure of poverty, since whether people can make ends meet may also depend on other factors such as access to cheap housing, availability of help from family, friends, or neighbours, or the availability of free public goods and services such as health care. A more general assessment of living standard is the answer to the income satisfaction question:

  • How satisfied are you with the total income of your household?

  • Very dissatisfied/Dissatisfied/Neither satisfied, nor dissatisfied/Satisfied/Very satisfied

The distribution of income satisfaction among the aged 50+ individuals across countries is presented in Table 1. The ranking of the countries varies with the chosen cut-off point. For example, the percentage of satisfied/very satisfied individuals with their income is higher in Spain than in France, but the percentage of individuals being very dissatisfied or dissatisfied individuals is slightly lower in France than in Spain.

Table 1 Distribution of reported own income satisfaction by country (in %)

To compare the complete income satisfaction distributions and investigate whether an unambiguous ranking across subsets of countries can be obtained, Fig. 1 is presented. It is based upon the numbers in Table 1 and compares the cumulative distribution of reported satisfaction with income across countries by stacking percentages of each outcome. For example, the left hand bars indicate that in Poland, 14% are very dissatisfied, 45% are very dissatisfied or dissatisfied, 77% are at very dissatisfied, dissatisfied, or “neither satisfied nor dissatisfied,” etc. The countries are ranked on the basis of the latter percentages: Poland has the largest percentage at most “neither satisfied nor dissatisfied,” so that Polish respondents report the worst income satisfaction if we set the cut-off between “neither satisfied nor dissatisfied” and “satisfied”. The graph shows, however, that Poland does worst whichever cut-off we use. For example, the percentage very dissatisfied or dissatisfied is higher in Poland (45%) than in any other country. In other words, reported income satisfaction is unambiguously worse in Poland than in all other countries. Such an unambiguous ranking of pairs of countries is not always possible. For example, if the cut-off is put between satisfied and “neither satisfied nor dissatisfied,” Spain does better than France or the Czech Republic, but this reverses if the cut-off is between dissatisfied and “neither satisfied nor dissatisfied.” The figure also shows that Denmark, the Netherlands, and Sweden unambiguously rank first, second and third, respectively, followed by Germany and Belgium.

Fig. 1
figure 1

Distribution of reported income satisfaction by country

Figure 2 compares income satisfaction and equivalent monthly household income by country, using the modified OECD equivalence scale [1 + 0.5*(adult-1) + 0.3*child, where adult is the number of adult (15 years and older) in the household and child is the number of children (at most 14 years old)].Footnote 3 Like Table 1, this figure is based upon reported income satisfaction, and therefore does not take into account the fact that individuals from different countries may use different response scales. The horizontal axis gives the country-specific mean of equivalent monthly net household income corrected for purchasing power parity (PPP) differences, while the vertical axis gives the percentage of individuals who are satisfied or very satisfied with their income. The figure suggests a strong positive (and linear) relationship between income and income satisfaction, except that France does not seem to fit this relationship. While France has quite high household income, it performs poorly in terms of income satisfaction.

Fig. 2
figure 2

Household income (PPP corrected) and reported income satisfaction of the 50+ across COMPARE countries

While the subjective income satisfaction measure has the advantage of encompassing many aspects of economic well-being, it has the drawback that it may suffer from differential item functioning (DIF): individuals in different countries may use different response scales and give different answers although they are economically equally well off. Vignettes describing hypothetical people in given economic circumstances are used in order to correct for these response scale differences. In the COMPARE sample, the vignette questions about income satisfaction are the following:

Vignette 1:

Jim is married and has two children; the total after tax household income of his family is €1,500 per month. How satisfied do you think Jim is with the total income of his household?

Very dissatisfied/Dissatisfied/Neither satisfied nor dissatisfied/Satisfied/Very satisfied

Vignette 2:

Anne is married and has two children; the total after tax household income of her family is €3,000 per month. How satisfied do you think Anne is with the total income of her household?

Very dissatisfied/Dissatisfied/Neither satisfied nor dissatisfied/Satisfied/Very satisfied

The amounts used for net household incomeFootnote 4 in the above vignettes, i.e., €1,500 and €3,000, are the amounts used in the vignette questions in France, Belgium and the Netherlands in which purchasing power of one euro was almost identical. In other countries, PPP adjusted amounts were used in local currencies.Footnote 5 The underlying assumption here, necessary for vignette equivalence, is that the living standard that income satisfaction is trying to measure is not affected by the distribution of income in the country of residence. This distribution may affect the answers to the income satisfaction question, but only because it changes the social norms and therefore the response scales, not because it makes someone genuinely better or worse off.Footnote 6 The chosen amounts (€1,500 and €3,000) place vignettes 1 and 2 between the 20th and 25th and between the 70th and 75th percentiles of the actual equivalized income distribution pooled over all countries. Because of the large cross-country differences in real incomes, the country specific positions vary from the lowest to the highest decile.

Tables 2 and 3 display the distribution of responses to the two vignette questions by country. As expected, the income satisfaction assigned to Vignette 1 is always much lower than for Vignette 2. For both vignettes, there are substantial differences across countries, pointing at systematic differences in response styles across European countries. For example, the low-income vignette in Table 2 is rated as satisfactory or very satisfactory by about 61% of the older individuals in Poland, by only 12% in France, 11% in Sweden and by no one in Greece. The high-income vignette in Table 3 is rated as “very satisfied” by 52% of older individuals in Poland, compared to only 14% in Greece.

Table 2 Distribution of reported income satisfaction Vignette 1 by country (in %)
Table 3 Distribution of reported income satisfaction Vignette 2 by country (in %)

3.2 Job satisfaction and anchoring vignettes

Job satisfaction is measured in the COMPARE survey by a single satisfaction question asked to all respondents (ages 50 and over):

  • How satisfied are you with your daily activities (for example, your job, if you work)?

  • Very dissatisfied/Dissatisfied/Neither satisfied, nor dissatisfied/Satisfied/Very satisfied

For this paper, we only consider the responses of 50–64 year old respondents who do paid work; satisfaction with other daily activities is beyond the scope of the current study. Table 4 presents the frequency distributions in each country. On average, older workers are satisfied with their job: 80% of the workers in the total sample report either “satisfied” or “very satisfied.” The differences across countries are substantial, however.

Table 4 Distribution of reported own job satisfaction by country (in %)

Figure 3, constructed in the same way as Fig. 1, presents the cumulative distribution of job satisfaction by country. Again, Denmark outperforms all other countries, followed by Sweden and the Netherlands. At the other end of the country ranking, we find Greece, France, and the Czech Republic. Interestingly, the ranking of Poland depends crucially on the cut-off point: looking at the proportion of satisfied or very satisfied individuals, Poland does quite well and ranks fourth, but Poland is also the country with the lowest proportion of very satisfied workers.

Fig. 3
figure 3

Distribution of reported job satisfaction by country

This cross-country ranking in job satisfaction is largely consistent with the international comparisons including younger workers of Sousa-Poza and Sousa-Poza (2000) based on data on Work Orientations from the 1997 International Social Survey Program (ISSP) and Kristensen and Johansson (2008) from data collected in seven European countries in 2004. In line with our study, they find that Northern countries, especially the Danes, are the most satisfied with their job while the French and Greeks rate their job satisfaction quite low.

To correct for potential differences in response scales in the job satisfaction assessments, each respondent younger than 65 years in the COMPARE sample also got two job satisfaction vignettes, describing hypothetical workers with given job characteristics.Footnote 7 They are asked to rate the job satisfaction of these hypothetical workers on the same scale used to measure their own job satisfaction. The following two vignette questions are asked:

Vignette 1:

Mike works full-time, five days per week; in principle, he can organize his work in his own way but is still often under a lot of pressure to meet deadlines. He works for a big company and feels that his job is quite secure. How satisfied do you think Mike is with his job?

Very dissatisfied/Dissatisfied/Neither satisfied, nor dissatisfied/Satisfied/Very satisfied

Vignette 2:

Sally works four days per week and does not experience her job as stressful; she has little say over what she is doing, this is decided by her boss. She feels it is a very secure job. How satisfied do you think Sally is with her job?

Very dissatisfied/Dissatisfied/Neither satisfied, nor dissatisfied/Satisfied/Very satisfied

These vignettes only describe a subset of all possible job characteristics (hours of work, whether the job is stressful, control over activities, job security) but not, for example, the wage. Ideally, vignettes should be complete, but there is a trade off between being as complete as possible and the drawbacks of long stories that many respondents will not read seriously. Whether the current vignettes are sufficient remains a topic of future research.

Tables 5 and 6 present the frequency distributions of the job satisfaction vignette assessments by country. The job in Vignette 2 is seen as less satisfactory than the job in Vignette 1. Differences across countries are again substantial. Danish respondents are quite positive about the first vignette in particular (with 78% evaluating it as satisfied or very satisfied), while Spanish respondents are very critical of this vignette (52% satisfied or very satisfied). On the other hand, the Swedes are particularly critical about the job in Vignette 2.

Table 5 Distribution of reported job satisfaction Vignette 1 by country (in %)
Table 6 Distribution of reported job satisfaction Vignette 2 by country (in %)

3.3 Explanatory Variables

In addition to country dummies, the regressors in the econometric model include socio-demographics such as gender, age, marital status, years of education, dummies for employment status, and the logarithm of net household income last month, adjusted for PPP differences across countries.Footnote 8 We also include two health indicators: the numbers of self-reported symptoms and chronic diseases. See Appendix, Table 9, for variable definitions and sample statistics, revealing large differences across countries in many of the explanatory variables.

The job satisfaction model also includes variables describing job conditions, such as workload, recognition, job security, monthly net labour income and usual hours worked per week. Job conditions are measured by asking whether respondents strongly agree, agree, disagree, or strongly disagree with the statements: “My job is physically demanding”; “I am under constant time pressure due to a heavy workload”; “I have very little freedom to decide how I do my work”; “I have an opportunity to develop new skills”; “I receive adequate support in difficult situations”; “I receive the recognition I deserve for my work”; “My job promotion prospects/prospects for job advancement are poor”; “My job security is poor”. For each statement, a dummy is created which is equal to one either when the respondent agrees or strongly agrees for positive job characteristics or when the respondent disagrees or strongly disagrees for negative job characteristics. See Appendix, Table 10 for details and sample statistics, again showing large differences across countries.

4 Estimation Results

4.1 Income Satisfaction

Table 7 presents the parameter estimates of the main equation for the model with identical thresholds for everyone (the baseline model, column (i); these estimates are virtually identical to those of a simple ordered probit model) and the estimates of the (conditional) hopit model [column (ii) to column (vi)] taking account of differences in response scales (DIF). The results for the baseline model are in accordance with most findings in the literature. As expected, household income has a strong positive effect on income satisfaction, while household size has a substantial negative effect. In terms of equivalence scales, the estimates imply that an increase in family size from one to two household members would require an increase in household income of almost 29% to keep income satisfaction constantFootnote 9—an estimated equivalence scale of 1.29. This is comparable to the results of Van Praag and Van der Sar (1988, Table 3), who find equivalence scales between 1.15 and 1.35 for eight out of nine countries. The estimate of Van Praag and Ferrer-i-Carbonell (2008, Table 3.1.4) for the UK is 1.31—also very similar to what we find.

Table 7 Baseline model and Hopit model of income satisfaction

Conditional on income (and other covariates), higher educated individuals are more satisfied with their income. This is consistent with results of Kapteyn et al. (2008), who point out it may be due to the fact that higher educated people have higher permanent income, or to the fact that our measure of income is imperfect so that education is a proxy for the deviation between self-reported income and actual income. The estimated effect of an additional year of education is about the same as the effect of a 2% rise in household income.

Women tend to report higher income satisfaction than men. Age has a positive effect,Footnote 10 while poor health (number of symptoms and number of chronic diseases) reduces income satisfaction. Keeping other variables constant, we find no significant differences in income satisfaction between workers, retirees, or disability benefit recipients, but unemployed individuals experience a significantly lower income satisfaction than workers, while inactive persons are more satisfied than workers.

Country dummies indicate that, conditional on income and other covariates, French respondents report the lowest income satisfaction level while Danish respondents report the highest level. Interestingly, keeping the other covariates constant, Polish respondents report about the same level of income satisfaction as German respondents. The fact that Polish respondents report low income satisfaction (Table 1) is therefore mainly explained by the characteristics of the Polish respondents, particularly their low income and large family size.

Allowing for DIF substantially modifies the estimates of the satisfaction equation (column (ii) in Table 7). The likelihood-ratio test strongly rejects the constrained model of no DIF against the more general model allowing for DIF (LR = 2256; 84 degrees of freedom) The coefficient on household income is much higher once we control for DIF, suggesting that individuals with higher income are more “demanding”—they evaluate a given income as less satisfactory than low income individuals with the same other characteristics. The effect of family size also increases, and this approximately compensates the increased income effect so that the equivalence scale does not change much compared to the baseline model—a two person household needs 32% more than a one person household according to the model with DIF, compared to 29% in the baseline model. The effects of education and gender are also much higher than in the baseline model, suggesting that women and higher educated individuals use more negative response scales. On the other hand, the effects of other socio-economic variables (age, employment status, health) do not change much or even decrease.

Many of the socio-economic characteristics significantly affect the thresholds, particularly the first threshold [see column (iii)]. The differences between effects on income satisfaction in the two models can be explained by the effects of the same background variables on the thresholds. For example, income has a positive effect on the first threshold, implying that higher income respondents will more often assess a given income as very unsatisfactory. This is in line with the notion that higher income makes people more demanding; see, for example, Van Praag and van der Sar 1988, who find that the (stated) income required to achieve a given utility level increases with actual income. Our model specification implies that a shift in the first threshold also leads to a parallel shift in all other thresholds, and our estimates of the income coefficients in \( \gamma^{1} ,\gamma^{2} ,\gamma^{3} {\text{and }}\gamma^{4} \) imply that higher income respondents are more critical at all cut-off points, not only the first.

Thresholds also significantly depend on the country dummies. Italians, for example, uses higher thresholds (i.e., tend to give more negative assessments) than Germans throughout the scale. As was already clear from Tables 2 and 3, Greek respondents tend to give quite negative vignette evaluations, translating into an unusually high first threshold. As a consequence, the coefficients on the country dummies in the income satisfaction equation turn out to be quite different in the hopit and the baseline model. Polish respondents tend to evaluate the vignettes quite positively, and when this is corrected for, they are worse off than respondents in any other country with the same income and other characteristics. The ranks of the Czech Republic, the Netherlands, and Germany also worsen substantially when correcting for DIF—respondents in all these countries use relatively optimistic evaluation scales and are worse off when this is corrected for. The opposite is found for Greek respondents: for given income and other characteristics, they are in 10th place in the model without DIF, but correcting for their very negative evaluations moves them to 2nd place. Correcting for DIF also improves the position of Italy and Spain.

4.2 Job Satisfaction

Table 8 presents the results for the ordered probit model [column (i) and (iii)] and the hopit model [column (ii) and (iv)] for job satisfaction among 50–64 year-old workers. The first two columns show the results without taking into account job conditions other than hours worked and earnings, while the last two columns add a richer set of job characteristics.Footnote 11 As for income satisfaction, a likelihood-ratio test strongly rejects the constrained model without DIF against the more general model allowing for DIF for both specifications (LR = 256.2; df = 68) for the model without the set of job characteristics (in either Eq. 1 or Eq. 3) and LR = 302.0; df = 100 for the specification including them (in Eqs. 1 and 3).

Table 8 Job satisfaction among 50–64 year-old workers: Baseline model and Hopit model

The ordered probit model suggests that, keeping individual and job characteristics constant, women report to be more satisfied with their job than men. This is in accordance with many other studies on job satisfaction (Clark 1997; Kaiser 2007). Once DIF is corrected for, however, the difference between women and men is not significant anymore, suggesting that women report being more satisfied with their job because they have different response scales. A reason for this may be that they have lower work expectations than men and are therefore less demanding (Phelan 1994).

Age has a significant positive effect on job satisfaction in both models. Note also that the age effect may reflect a selection process if less satisfied workers retire earlier than more satisfied workers. Years of education has no significant effect on job satisfaction whichever model is considered. Health symptoms have a significant negative effect on job satisfaction in both models. Their effect is lower when the larger set of job characteristics is included in the model, since health problems are associated with unattractive (reported) job characteristics.

Higher earnings have a positive effect on job satisfaction, but this effect is insignificant when more job conditions are included, suggesting that attractive job characteristics (that are correlated with high wages) are more important than the wage itself. Clark and Oswald (1996) find a negative relationship between working hours and job satisfaction in the general adult UK population. All our models suggest that, keeping monthly earnings constant, there is no significant relation between job satisfaction and working hours of older workers in Europe.

The final two columns show that most job characteristics significantly affect job satisfaction with the expected sign. The magnitudes of some of the coefficients change when DIF is controlled for, but signs and significance levels do not change much. A heavy workload has a negative effect while the opportunity to develop new skills, receiving adequate support in difficult situations, recognition for the job, job advancement opportunities, and job security all have a positive influence on job satisfaction. The largest impact on overall job satisfaction comes from recognition for the job and from receiving support in difficult situations. Opportunities for developing new skills and future job advancement are also important. This may seem surprising given the fact that the sample consists of older workers who are approaching retirement age. Whether the job is physically demanding and (in the hopit model) freedom at work have no significant effect. These results support the hypothesis that non-pecuniary job characteristics are important for job satisfaction, confirming findings for broader age groups (Clark 2005; Skalli et al. 2008).

The coefficients of the country dummies reflect ceteris paribus differences between respective countries and Germany, keeping constant individual characteristics and job characteristics [earnings and hours only in columns (i) and (ii), or the larger set of job characteristics in columns (iii) and (iv)]. Some of them are strongly significant and which ones these are varies across the four model specifications. Correcting for differences in response scales mainly affects the position of Denmark, Sweden, and France. Compared to Germans, Danish and French workers tend to use the more positive and more negative responses, respectively (cf. Tables 5 and 6); once this is taken into account in the models with DIF, their job satisfaction levels are not significantly different from those of German workers with the same characteristics. Swedish workers evaluate a given job more negatively than German workers (cf. Tables 5 and 6) and when this is corrected for in the models with DIF, their job satisfaction levels are actually higher than those of similar Germans.Footnote 12

In the final model in the last column of Table 8, the only countries which are significantly different from Germany are Greece and Sweden. In all other countries, keeping response scales, individual characteristics, and the rich set of job characteristics constant, job satisfaction levels are not significantly different from those in Germany. Greek workers are less satisfied than Germans with similar jobs. Only Swedish workers are significantly more satisfied, possibly pointing at attractive unobserved job characteristics that are particularly relevant in Sweden, such as a more positive attitude towards older workers than in other countries. This would be in line with Wadensjö (2006), who argues that Swedish firms are willing to share the responsibility of society to increase employability of older workers and sees this as one of the explanations of the success of the Swedish partial retirement program.

5 Counterfactuals

To understand the implications of our approach we simulate the distribution of income or job satisfaction in each country using different thresholds—the thresholds that the average respondent in the benchmark country (Germany)Footnote 13 would use instead of the actual thresholds used by the respondent. The latter simulation (own thresholds) almost exactlyFootnote 14 reproduces the observed distribution of reported satisfaction levels in each country, presented in Tables 1 and 4 and Figs. 1 and 3. The simulation of interest, however,—using each country’s own parameters in the satisfaction equation but using the threshold parameters for Germany—produces a counterfactual distribution without observational equivalent. Comparing these counterfactual simulations across countries shows how much of the difference between each country and the benchmark country remains when differences due to DIF are eliminated.

5.1 Income Satisfaction

Figure 4 is similar to Fig. 2 but uses the counterfactual simulation to construct the values along the vertical axis. It presents, for each country, the proportion of individuals who would report being satisfied or very satisfied with their income if they would use German benchmark thresholds. The horizontal axis gives the corresponding equivalent monthly household income, as in Fig. 2. Compared to Fig. 2, income satisfaction France is now much more in accordance with income satisfaction in other countries with a similar income level. The low proportion of individuals reporting satisfied with their income in France that we saw in Fig. 2 apparently was partly due to DIF. Greece moves from a relatively low satisfaction (given its actual income level) to a relatively high satisfaction country. Correcting for response scale differences makes the difference between Poland and the other countries even larger than before. All in all, the correction brings the ranking of the countries more in line with the ranking of their income levels. The Spearman rank correlation coefficient is equal to 0.66 when DIF is taken into account while it is equal to 0.64 in the raw data; the Pearson correlation coefficient increases from 0.74 to 0.84 when we control for DIF.

Fig. 4
figure 4

Household income (PPP corrected) and income satisfaction among the 50+ individuals across COMPARE countries using German thresholds

Figure 5 presents the complete counterfactual cumulative income satisfaction distribution for all countries using German benchmark thresholds. It confirms that correcting for DIF has important effects on the country ranking. First, the ranking between Sweden and the Netherlands is reversed—a consequence of correcting for the fact that Swedish respondents tend to assess vignettes with a given income level more negatively than Dutch respondents. Second, there is hardly any difference left between Belgium, Italy and Germany once DIF is eliminated. As in Fig. 4, one of the most salient changes due to eliminating DIF is France. Using German scales, French respondents would be much more satisfied with their incomes than their actual reports (based upon the French scales) suggest, and France becomes an “average country.” As expected given the estimation results and Fig. 4, Greece does much better after the correction than before correcting for DIF. Finally, the cumulative distribution function of income satisfaction in Spain no longer crosses that of the Czech Republic. Spain does unambiguously better than the Czech Republic.

Fig. 5
figure 5

Predicted distribution of income satisfaction using German thresholds

5.2 Job Satisfaction

The counterfactual cumulative distributions of job satisfaction assuming that all individuals use the German benchmark thresholds are presented in Fig. 6. It is based upon the final model in Table 8 [column (iv)], including the rich set of job characteristics. The country ranking differs substantially from the one in Fig. 3. Once differences in response scales are eliminated, Sweden becomes the country with the highest level of job satisfaction, with Denmark in second place, but at substantial distance. Greece is the country with worst job satisfaction in both figures, but the difference with the other countries is much larger once DIF is corrected for. As for income satisfaction, job satisfaction in France increases when German rather than French thresholds are used.

Fig. 6
figure 6

Predicted distribution of job satisfaction using German thresholds

Accounting for DIF reduces the cross-country association between job and income satisfaction: the cross-country rank correlation between country specific percentages of working respondents younger than 65 who are (at least) satisfied with their income and with their jobs decreases from 0.80 for reported satisfaction to 0.43 for the counterfactual rates using the German thresholds.Footnote 15 An interpretation is that response scales in different domains are positively correlated: respondents who tend to give negative evaluations in one domain will often do the same in another domain. For example, French respondents assign low satisfaction to the income vignettes as well as the job satisfaction vignettes compared to respondents in other countries. This illustrates that correcting for DIF may also be important to analyze the relation between satisfaction levels in various domains of life.

6 Conclusion

This paper analyses two important components of economic well-being among the 50+ in 11 European countries: satisfaction with household income and job satisfaction. The first one is important in order to assess the overall economic welfare of the elderly. The results highlight a large variation in self-reported income satisfaction. The lowest is found in Poland and the highest in Denmark. Differences across countries are partly explained by differences in response scales. Once these differences are eliminated, the cross-country differences are much better in line with differences in an objective measure of purchasing power of household income. Correcting for differences in response scales also alters the ranking across countries. The most striking change is for France, where respondents tend to use negative assessments more often than in other countries.

An important motivation for this paper is that how a country compares to other countries in terms of living standard is an important input for public policy on old age social security and pensions and combating poverty and social exclusion among the older part of the population. We have shown that it matters whether the country comparison is done with or without correcting for response scale differences (DIF). So should policy makers use the cross-country comparison with or without corrections for DIF? Under the assumptions that we have made, the answer is a clear yes: assuming differences in vignette evaluations purely reflects differences in the way terms like “very satisfied” and “not satisfied” are used, correcting self-assessments for such differences seems a good thing if the aim is to compare genuine living standards. This leads to the conclusion that living standard comparisons come much closer to objective comparisons of equivalized and PPP corrected average household incomes than the subjective income satisfaction reports would suggest.

There is an alternative interpretation of the differences in vignette evaluations, however. If, for example, goods are publicly provided (free of charge) in one country and not in another, or poor households can do more with a given income in one country than in another country, because of differences in, e.g., housing subsidies or health insurance, then a given income amount may lead to different living standards in different countries. In that case vignette equivalence would not be satisfied and our corrections would take away genuine differences in living standards. We do not think this can explain much of our results—for example the fact that French respondents give negative assessments would then suggest that the French get less public support than similar countries, which seems implausible. A similar conclusion is drawn by Kapteyn et al. (2008) on the basis of comparing evaluations of vignettes with low and high incomes. Moreover, the tendency to give less positive evaluations in France is also found for other subjective well-being measures such as life satisfaction (Angelini et al. 2011), further supporting the notion of cultural differences in thresholds.

Older workers in Europe are generally satisfied with their jobs. Cross-country differences are not as large as for income satisfaction. Being able to develop new skills and having job advancement opportunities contribute substantially to job satisfaction, though recognition for the job is the most important factor. Keeping job characteristics as well as response scales constant, Swedish workers are more satisfied than workers in all other countries considered, possibly due to a more positive attitude of employers towards older workers in Sweden than elsewhere. Sweden remains the country where job satisfaction among older workers is highest if cross-country variation in job characteristics is taken into account and only the response scales are kept constant. The raw data, however, do not reveal this, since the actual job satisfaction reports are also affected by response scale variation, leading to lower reported satisfaction in Sweden and higher satisfaction in Denmark, for example. Like for income satisfaction, correcting for response scale differences changes the ranking of the countries. Now that financial incentives for early retirement have been or are being removed, and other factors like job characteristics and job satisfaction are gaining importance for the decision to work longer, this seems an important message for national policy makers who compare the situation in their own country to that in other countries. Whereas looking at the raw data would suggest that Denmark is the European role model for job satisfaction of older workers, Sweden becomes the best performing country when controlling for the Danish tendency to use positive scales and the Swedish tendency to be more negative.

There are common features in the response scale differences in job satisfaction and income satisfaction. French respondents tend to be critical in both assessments, while Danish and Dutch respondents are always on the optimistic end of the spectrum. The tendency to give negative evaluations in France and the Danish tendency to use very positive qualifications seems rather general; Angelini et al. (2011) and Bonsang and Van Soest (2011), for example, also find it for satisfaction with life and social contacts, respectively, suggesting that the correction corrects for differences in cultural norms relevant for reporting behavior. As a consequence, correcting for DIF decreases the cross-country association between average income and job satisfaction among workers younger than 65.

The fact that correcting for DIF brings subjective and objective evaluations closer to each other can be seen as support for the validity of the vignettes approach as a tool for improving cross-country comparisons. It is in line with the finding of King et al. (2004) that correcting for DIF using anchoring vignettes increases the cross-country correlation between objective and subjective measures of health. Still, more work is needed to test the validity of the vignette approach in the domains considered and establish the robustness of the results. The main underlying assumptions are response consistency and vignette equivalence, which have been studied in other domains (e.g., Van Soest et al. 2011) but not for income and job satisfaction. Response consistency requires that respondents evaluate the hypothetical situations on the same scale that they use to evaluate themselves; this could be violated, for example, if self-assessments are affected by social desirability bias but vignette evaluations are not. We do not think this is particularly problematic in our case. Vignette equivalence means that respondents in different countries interpret the vignettes in the same way. As discussed above, this not an innocuous assumption, particularly in the context income satisfaction, but we have also explained why we think our results are not due to violation of vignette equivalence. Still, validating the use of vignettes and testing these assumptions remains an important issue for future research.