Common interpretation problems in psychosocial work environment research

It is not known to what extent self reported assessments of the psychosocial work environment—that the majority of research reports have been based upon—reflect individual characteristics (which may distort the perception of reality) and to what extent they reflect true environmental conditions. Critics argue that “subjectivity bias” may explain most of the observed associations between psychosocial working conditions and health (Wainwright and Calnan 2002; Mc Leod and Davey Smith 2003). This is indeed a classical problem in this research. We will discuss some aspects of this but the reader interested in the whole range of assessment problems is referred to other articles (for instance Zapf et al. 1996).

The problem with self-reported data is particularly prominent when both psychosocial environment and health are described by means of self-reports (common method variance) and when both are recorded at the same time in a cross-sectional study. Such studies are relatively cheap and easy to perform and they are therefore abundant. The source of error that comes to mind first of all is a possible tendency among subjects who “complain about everything” to exaggerate problems in the environment as well as in their own health. This is frequently labelled negative affectivity (for discussion of negative affectivity in psychosocial research see Cooper 2000; Judge et al. 2000; Payne 2000; Spector et al. 2000). In the other end there are also subjects who complain about nothing and are therefore underreporting both environmental and health problems—denial. These two groups together could cause severe interpretation difficulties. They may cause inflated relationships in studies of representative working populations who include both participants with negative affectivity and with denial. In populations with a high proportion of denying participants spuriously small associations may be found (underestimation of risk). The opposite may be the case if negative affectivity is prominent (overestimation of risk). It should also be pointed out that both denial and exaggeration may be due to external pressures—for instance in work sites where employees want to get rid of a superior or in work sites where they are forced to conceal bad conditions and fear punishment if they complain.

The next source of error that could cause problems is the fact that illness could cause secondary changes in psychosocial work environment—a long lasting illness may cause a subject to change jobs and for instance also to perceive increased demands or other changes in the psychosocial work environment. Illnesses, which have lasted for a long time, are particularly problematic from this point of view. It could be impossible to know whether the illness causes the environment or the environment causes the illness. A similar problem is that the duration of exposure may make a difference. If a cross-sectional study is based upon working conditions at one point of time and illness at the same time it is impossible to know whether the assessment of the exposure represents something that has been going on for many years or only for a couple of days. In chronic illness development this may be crucial. After a short exposure there may be no sufficient basis for illness development. Both of these problems represent the time dimension in the assessment.

Methodological examples from research using the demand-control model

There have been efforts to solve these assessment problems in work environment research. Research on psychosocial work environment and cardiovascular disease may be used as an example. Although there are many psychosocial theories, this particular model has been used extensively in research during recent years. Its two main dimensions represent very different examples and a wide diversity of assessment problems. Therefore, our points may also be applicable to the assessment of other kinds of theoretical models in the field.

According to the demand control model high psychological work demands may increase risks of ill health particularly if there is a low level of decision latitude for the employees (Karasek and Theorell 1990). An extensive literature deals with the testing of this hypothesis in relation to risk of myocardial infarction. This is our starting point because of the objective character of myocardial infarction as an outcome.

Subjectivity versus objectivity

Most of the research on the demand-control model in relation to myocardial infarction has been performed with self-rated assessments of demand and decision latitude (Belkic et al. 2004). There are exceptions however. In the American HES and HANES studies (Karasek et al. 1988), in the Hawaii study (Reed et al. 1989), in the Swedish SHEEP study (Theorell et al. 1998) and in the Whitehall II study (Bosma et al. 1997), various measures of a more objective nature (or at least non-subjective) have been used and tested in relation to myocardial infarction or coronary heart disease (Whitehall II) outcomes.

In the first group of studies, an “imputed” assessment of working conditions has been used. This could be regarded as a “more objective” kind of assessment in the sense that the individual assessment is not distorting the assessment. Imputation means that an assessment of averages in specific groups has been obtained from population surveys. A group is defined according to occupational group, gender, age and duration of work in the occupation. All subjects belonging to a specific group defined in this way are assumed to have the same “imputed” work condition. The basis for the imputation is a “psychosocial job exposure matrix” constructed from a national work environment survey. Such a matrix has been constructed in several countries, for instance USA, Sweden, France and Great Britain. In most such studies—with few exceptions, such as a study in Hawaii (Reed et al. 1989) which had a relatively old cohort of subjects many of whom retired during the follow-up period—there were significant relationships between the more objective measure of work conditions and myocardial infarction risk. Still this kind of measurement should be regarded as a poor approximation of real conditions and it has recently been criticised for its lack of precision (Netterstrom 2004). A critical description of imputation is that it is based upon averages of subjective assessments.

In the Whitehall II study, the objective measure was based upon expert assessments of the work sites. With this measure there was also a significant relationship between low decision latitude and high myocardial infarction risk. It is obvious that expert assessments not only introduce an objective element in the assessment—they also introduce the subjectivity of the expert, which is an additional source of error.

In studies with “more objective assessments” the relationship with myocardial infarction risk was weaker (and in some cases less significant) than with the one based upon self-report, particularly after adjustment for biological risk factors and social class. In assessing the importance of this, it should be pointed out that evaluations of a work site or imputations of scores from national averages for specific groups are not only more “objective” measures than self-reports. They also capture less of the individual’s objective work conditions. The more “subjective assessments” partly reflect “true” differences in the objective micro-environment. Therefore, the “subjective” and “objective” assessments are embedded in interpretation difficulties and one could not serve as a golden standard for the others.

More detailed observations of the individual working conditions have also been performed in research and this seems to be a very fruitful area. Such assessments may capture the “objective micro environment” referred to above. Greiner et al. (2004), in their studies of bus drivers in San Francisco, have been able to show that objectively recorded adverse conditions (hindrances at work) in the working day of the bus driver are much more clearly related to blood pressure elevation than are self-reports. Recent German research by Rau et al. (2001) and Rau (2004) has indicated that the relationships between objective working conditions and blood pressure regulation may be even more clear than the ones between self-reported conditions and risk of having blood pressure elevation.

We should remember that subjective exposure assessment represents much more of a problem in cross-sectional studies when the outcome is self-rated—because of the common variance problem outlined above.

The time dimension cumulative aspects—how long has the person been exposed to a particular psychosocial work environment?

There are studies, which have been addressed to the question related to cumulative effects of exposure to adverse psychosocial working conditions. These have been studies using either more objective or more subjective measures. Hammar et al. (1994) studied the prospective relationship between imputed scores of demand and decision latitude and myocardial infarction in men who had the same job classification in 1970 and 1975—which is a way of recording a cumulative exposure to adverse job conditions. Those with jobs classified both 1970 and 1975 to be high on the demand and low on the decision latitude scale had significantly higher myocardial infarction risk during follow up than other men. This was more pronounced than in those who had had this adverse job classification during only one of these years. Johnson et al. (1996) studied the cumulative effects of low decision latitude measured by means of the imputed method in more detail. It was found that subjects who had continuously worked in jobs high on the decision latitude scale were increasingly protected from dying in cardiovascular disease. This effect was seen progressively up to 10 years of exposure after which the cumulative effect was attenuated. Both of these studies were based upon indirect assessments and they were not adjusted for biological risk factors. The Whitehall II study by Bosma et al. (1997) showed an accumulated negative effect of low decision latitude (stronger effects among subjects who reported low decision latitude twice within a 3-year interval) with subjective assessments and with adjustment for other accepted risk factors. The outcome in this case was total number of ischaemic heart disease episodes in previously heart disease free subjects. There is accordingly some crude evidence for accumulated effects both with self-reports and with more objective measures.

Social class and gender

Comparisons of several kinds of “objective assessments” with self-reports (Hasselhorn et al. 2004) were made in the WOLF study in Stockholm. There, expert ratings performed by the occupational health care teams were compared with self-reports and group means of self-reports as well as with imputations from national population surveys. For decision latitude, correlations were quite high (2275 men, see Fig. 1), and a further analysis indicated that they were particularly higher in blue-collar workers than in white-collar workers. As expected, the correlation was highest when group means were compared, as this reduces the influence of individual characteristics (Pearson correlation coefficients from 0.73 to 0.78). The respective relationships were much weaker for psychological demands. All these findings were very parallel for women although as in other studies the decision latitude scores were in general lower in women than in men. Interestingly, although the correlations between objective and subjective assessments were stronger for blue-collar workers, the mean levels of expert and self reported decision latitude differed substantially in this occupational group—which was not the case for white-collar workers. This can be seen in Table 1, which shows that the mean scores for decision latitude are nearly always lower in the expert ratings than in self-ratings. The ratios between self-rated and expert means are higher for manual workers and for skilled service workers than for other groups, both among men and women (1.6-2.0 vs. 1.0-1.4 for other groups). Thus, blue-collar workers and skilled service workers themselves rated their decision latitude systematically higher than did the experts. These findings indicate that there are systematic differences between white-collar and blue-collar workers with regard to associations between expert ratings and self-ratings of decision latitude. They also confirm what has been known for a long time—that women’s decision latitude is both self-rated and expert rated as lower than that of men. This is also very important to bear in mind in cross-sectional studies.

Fig. 1
figure 1

Correlations between four different kinds of assessment of decision latitude in participants of the Swedish WOLF study (2275 men, all associations p< 0.01)

Table 1 Self-rated and externally rated demand scores for 2275 men and 1505 women by differentiated socioeconomic group

Work demands particularly problematic

Findings on comparisons between subjective and objective assessments in general indicate fairly good agreements for decision latitude, but not for psychological demands. Low correlation for demands may be due to a phenomenon observed and described by Johnson and Stewart (1993) that in contrast to decision latitude, actual work demands may vary relatively rapidly across individuals and time in the same work place. Kristensen et al. (2005) have pointed out that psychological demands are heterogeneous and should be divided into several groups of demands. These problems may explain why in a number of investigations with assessments of both “objective” decision latitude and demands (mostly nferring methods), stronger associations have been found between outcome and decision latitude than between outcome and demands (Alfredsson et al. 1982, 1985; Alterman et al. 1994; Hammar et al. 1998; Bosma et al. 1997).

Subjective and objective measures are of course different and very high correlations could not be expected. Even the highest correlation (0.78) in the figure explains only 61% of the variance. In the Whitehall II and SHEEP studies the associations between work environment and heart disease risk were weaker for the objective assessments than for the subjective ones although they were in the predicted direction.

Relationships to biological risk factors—an example

Besides the high correlation for decision latitude between expert and subjective ratings another finding in the WOLF study is interesting and relevant in this context: the associations between the job strain categories and biochemical risk factors for cardiovascular disease were more pronounced when expert ratings were used for estimating the psychosocial work exposure of the workers than when the workers’ own ratings were used (Tsutsumi et al. 1999; Hasselhorn et al. 2002). These findings would indicate that real working conditions are more important for the workers’ cardiovascular health than the workers’ perception of the working conditions. However, these findings need to be treated with caution since imputed measures are crude proxy assessments of the real conditions. In addition, biochemical and physiological assessments represent another “brain domain” than questionnaire responses (limbic system versus cortex).

Differences in reporting tendency such as negative affectivity

In the Whitehall II studies, a number of psychological dimensions that could possibly influence the description of working conditions, such as hostility and negative affectivity, have been recorded and used in confounding analyses. Despite these efforts to falsify the relationships there were clearly significant relationships remaining between working conditions and risk of developing new coronary heart disease during follow-up (Bosma et al. 1998). A study of the relationship between job strain (high demand and low decision latitude) and high blood pressure during daily activities by Landsbergis et al. (1992) showed that adjustment for a number of psychological attributes did not substantially alter the relationship between job strain and blood pressure regulation in working men in New York. A study by Haynes (1991) based upon women in the Framingham cohort showed that adjustment for type A behaviour increased (rather than decreased) the magnitude of the association between a self-reported assessment of decision latitude and subsequent risk of developing new episodes of coronary heart disease. The conclusion so far has been that several personality dimensions as well as negative affectivity do not seem to invalidate the associations between job control and myocardial infarction risk.

Dynamic versus static assessments

Loss of decision latitude in comparison to the surrounding working population could be another point of departure in testing the general hypothesis that low decision latitude is related to myocardial infarction risk. This represents a dynamic test of the low decision latitude hypothesis—if the hypothesis is correct a lowered decision latitude would be associated with increased risk of developing a myocardial infarction. In a cross-sectional study of decision latitude loss there may be difficult interpretation problems since the person may be influence by the fact that he/she became ill in interpreting change. An inflated relationship may arise. An example of an examination in which this problem was tackled is the SHEEP study (Theorell et al. 1998) in which loss of decision latitude was studied by means of imputed measures in relation to myocardial infarction risk. Data regarding job titles for all myocardial infarction cases and for referents were collected for the 10-year period preceding myocardial infarction (and corresponding referent period). Decision latitude scores were imputed from national surveys and the most unfavourable quartile with regard to the development of decision latitude was identified. The risk of developing a myocardial infarction in subjects exposed to this situation was computed after adjustment for preceding chest pain, social class and accepted biological cardiovascular risk factors. In this case there was a clear significant relationship between loss of control defined in this way and myocardial infarction in men. The risk was particularly pronounced in men below 55 years of age and in blue-collar workers.

Is it possible to analyse recall bias in a cross-sectional study?

Very often the researcher has to accept a cross-sectional design and is aware that differential recall bias may arise. Is it possible to assess the size of this problem? In their study of imputed and self-reported assessments of demand and decision latitude in first myocardial infarction cases and referents (SHEEP), Theorell et al. (1998) did not find evidence of recall bias for decision latitude self-reports in the myocardial infarction cases. The analytical strategy was to compare assessments of decision latitude made by means of self-reports and with imputations and to do this both in the myocardial infarction group and in the referent group. The myocardial infarction cases were interviewed within a month after the myocardial infarction. The relationship between the more subjective and the more objective measure of decision latitude did not differ between the groups. If there would have been significant recall bias caused by the experience of the disease onset the patterns of relationships would have been different. The analysis of demands was more difficult since the relationship between self-reports and imputed scores was—again—in general much weaker than for decision latitude.

Other models

The effort-reward imbalance model has been subjected to assessments of methodological bias as well. An interesting recent study based upon the large GAZEL cohort provided cross-sectional as well as prospective data on the relationship between effort reward imbalance and self rated health (Niedhammer et al. 2004). The cross-sectional part of that study represents risk of overestimation of association because of the common variance problem outlined above. Accordingly, the prospective relationships were weaker albeit still significant than the cross-sectional. If there are many such published comparisons we may in the future use them for inferring possible magnitude of overestimation in cross-sectional studies without prospective data.

Dorman and Zapf (1999) have performed a three-wave longitudinal study of the effects of lack of social support and other work stressors on depressive symptoms. They have used an important statistical technique which may throw light upon these difficulties—linear structural modelling. By means of this technique it is possible to find how much variance that is accounted for different levels in the work situation, for instance the company, the work place and the individual. Such methods may also provide indirect possibilities to explore whether observed relationships are causal or reversed.

Conclusion

We conclude that cross sectional self-report assessments of psychosocial conditions and health have an important role in stress research. Often, they are the first step in the identification of risk exposures and risk groups, yet their findings must be interpreted with caution as problems like subjectivity and common method variance arise.

However, in a number of studies subjective assessment of psychosocial work characteristics have been “validated” by objective assessments (triangulation), be it external expert rating or the use of inferred data such as job exposure matrixes. This has been done by (a) showing a high degree of convergence of subjective and objective ratings (e.g. for decision latitude, but not for psychological demands) and (b) demonstrating an association between objective assessment of psychosocial exposure at work and health outcomes (especially cardiovascular disease and risk factors).

The associations of objective indicators and health outcomes are usually weaker than when subjective indicators are being used. But more important than the degree of association between adverse psychosocial exposure and low health is the fact that it is repeatedly being confirmed. Whether or not the scientific findings shall have practical consequences (i.e. introduction of preventive measures) should rather depend on the validity of the measures than on the exact strength of the association.

Finally, we argue that in the recent three decades a variety of subjective and objective assessment methods have been developed in stress research. They are being used in cross sectional studies and also in other study designs. None of the research methods has explicitly been proven to be incorrect. It would be too easy to condemn certain assessment methods simply because other methods (especially longitudinal and intervention studies) may be more valid. The result would be that much important research in this field could no longer be carried out. As long as the study results are interpreted with adequate consideration of the research method, they shall be used. This, however, is not only an appeal to the researcher but also to the reader. In the psychosocial field it is important that prospective studies and in particular intervention studies are performed since they may provide more solid evidence and even in such studies it is important that self-reported outcomes are supplemented with more objective outcome measures.