Introduction

Within the field of medicine, quality management is not only en vogue but also a laudable undertaking: It implies a self-critical attitude as a condition for the improvement of medical care. Within quality management, investigating patient satisfaction is of major importance. Here, a bipolar continuum between satisfaction and dissatisfaction is usually assumed, such that dissatisfaction is equivalent to low scores of satisfaction.

It has been repeatedly commented that this conceptualization is inadequate and that patient (dis)satisfaction is more than simply the discrepancy between expectations and experience [13]. Instead, patients’ ratings are usually also influenced by attribution processes. Therefore, only those negative experiences attributable to derelictions of duty by the medical staff lead to negative satisfaction ratings [4]. Negative experiences, which, for example, can be explained by the fact that the staff is overworked etc., do not inevitably reduce indications of satisfaction [5]. This attributional reformulation of the concept of (dis)satisfaction provides one possible explanation of a much replicated finding: Common measurements reveal extraordinarily high satisfaction scores [68]—possibly because negative experiences are attributed to the difficult circumstances with which the staff are faced and excused. As valuable as theoretical explications of the concept of satisfaction might be [9], it is at the same time both useful and necessary to consider methodological problems of the measurement procedures. Aside from potentially influential demographic variables (age, sex, education, marital status of the patient) [7, 10], the majority of general problems relevant to the construction and administration of questionnaires also apply. For example, negatively formulated questions might be ignored [11]; orally administered questionnaires lead to artificially higher scores of satisfaction than written ones [12] and the same holds true for surveys which were carried out during the period of treatment as compared to those conducted after termination of treatment [13]. In sum, relevant biases must be considered for the collection as well as interpretation of the data [14]. Among these biases, social desirability is likely to be of major importance [15]. The normal tendency to adapt one’s attitudes and judgments to social norms is plausibly even stronger in patients. This can be explained by the fact that they will usually try to be “good patients” and adjust to the medical staff, from which they hope to receive help [16]. This tendency is probably particularly evident during hospitalization, which usually implies a severe illness. Thus, we are in this case not dealing with social desirability as a trait but rather as a state variable. Here, we are concerned with the pressure to conform, which results from the patient’s subjective experience of psycho–physical dependency upon the medical staff and can lead to unrealistically high satisfaction ratings [17, 18].

Standard measures of social desirability as a trait variable are not useful for the assessment of the influence of social desirability as a state variable because the results will be affected by the situation in addition to the individual disposition. The assessment during hospitalization is not an option either, because, once more, the answers would be influenced by social desirability. For the same reason, obtaining data across various time measurements from a single sample of patients does not seem optimal because the purpose of the study would be too obvious. The least reactive procedure is the comparison of two different samples at different times.

The comparison of satisfaction ratings in a sample of patients during hospitalization with those in a different sample of patients after their release is thus an indirect but relatively safe method of examining the potential bias which social desirability might create. The unusually high pressure of social desirability during hospitalization should be reduced to a normal rate when patients give judgments based on their memories. It is however also possible that a positive memory bias makes the experiences in hospital appear more pleasant than they were actually perceived within the situation. This tendency to gloss over negative experiences should primarily result in negative emotions being forgotten [1921]. Such memory distortions would lead to retrospectively higher rates of satisfaction in comparison with ratings given during the stay in hospital. Their effect would therefore be opposite to the hypothesized effect of social desirability. If, nevertheless, the comparison of satisfaction ratings during hospitalization with those of 1 year after release reveals more negative evaluations for those provided retrospectively, this would strongly suggest that, in the situation of hospitalization, social desirability leads to positively biased appraisals. Because other potential influences work against this hypothesis, this finding would provide a valid argument for the social desirability hypothesis. It is thus possible to examine the effects of social desirability on patients’ satisfaction ratings by obtaining these ratings at two different points in time: during and 1 year after treatment in hospital. Evaluations of anxiety levels, however, should not be affected by social desirability caused by the situation: We predict that there will be no difference between anxiety ratings provided during and after hospitalization.

Materials and methods

Between June and September 2003, patients who had undergone a coloproctological operation were asked to complete a survey including ratings of satisfaction and evaluations of anxiety on the last day of their stay in hospital (N = 86). Data were collected anonymously in order to prevent an artificial increase of social desirability measurements. At the same time, patients who had undergone an operation in the same department in 2002 received the same survey by mail and were asked to complete it, based on their memories of the stay in hospital approximately 1 year earlier (N = 728; out of these, 48 had died or moved away; rate of returns 48.2% = 328). The diagnoses of the patients are in a summarized way listed in ICD 10, so one can see the part of minor and major procedures (see Table 1).

Table 1 Diagnoses summarized due to the ICD 10

Two scales of satisfaction from a measure developed by Hildebrandt et al. [8] were included in this questionnaire: a scale measuring satisfaction with the medical staff (scale D.1., 11 items), and a scale measuring satisfaction with the nursing staff (scale D.2, six items). These are very carefully developed scales which are frequently used in Germany. They focus upon those aspects of satisfaction for which situational pressure towards social desirability is assumed due to the dependency on medical staff as described above. Answers were obtained on a five-point rating scale (from “very good” to “mediocre” and “very bad”). Self-assessments of anxiety were provided in five comparable answer categories. The items of this measure had been developed specifically for the sample of proctologic patients and belonged to one of the following five theoretically postulated scales (for samples of items see “Appendix”): causes of anxiety, reasons for anxiety reduction, peak of anxiety, postoperative worries, and possibilities of reducing anxiety. In addition, there were three single anxiety items regarding: anxiety during admission, appropriateness of anxiety, and general level of anxiety. Other questionnaires that were used in the study, e.g., the Fragebogen zur Erhebung von Kontrollüberzeugungen zu Krankheit und Gesundheit (Questionnaire for the assessment of control beliefs regarding illness and health) [22], are not relevant to the methodological questions dealt with here.

Step one consisted of a factor analysis to test whether the two factors of satisfaction (with medical and with nursing staff) could be replicated for our sample. The comparability of the two samples (during versus after hospitalization) regarding demographic variables (age, sex, education, type of illness, etc.) was tested using t-tests for independent samples.

A multivariate analysis of variance was computed in order to test the central methodological question: whether satisfaction ratings and anxiety ratings are reduced in the second observation period. Due to the fact that the sample of patients who completed the survey while hospitalized was much smaller than the sample of patients who completed it after their release from hospital, some of the demographic characteristics differed between the two subsamples. Therefore, in the following analyses, the variables which revealed significant differences between the two groups were controlled statistically with regard to their influence on the dependent variable (satisfaction ratings and anxiety ratings). For this purpose, an analysis of covariance was computed, in which the proportion of variance accounted for by the covariates was partitioned out. This ensures that the remaining variance can be explained by the influence of the independent variable. With a large sample size, even numerically rather minor differences can yield a statistically significant result. Therefore, in addition to the test of significance, a statistical power analysis (Eta square) was computed, in order to estimate whether the effects were substantial or not. The structure of the results was additionally tested, using a discriminatory analysis and calculating the correspondence of the observed group membership with a predictive model. This allows an examination of whether the differences (within the dependent variables, satisfaction ratings, and anxiety ratings) which were significant in the analysis of covariance also emerge as the main predictors of the observed group membership.

The calculations were computed using the statistical package SPSS 11.0.

Results

In order to create a measure of satisfaction, we used items drawn from the factors “satisfaction with medical staff” and “satisfaction with nursing staff” from the comprehensive measure of Hildebrandt et al. [8]. By means of a confirmatory factor analysis (principle component analysis with varimax rotation; N = 354), we tested whether it is possible to replicate these factors for our sample. Two factors were extracted, which accounted for 51.76% of the variance (see Table 2).

Table 2 Rotated component matrix of satisfaction items

As Table 2 reveals, the structure of factor loadings was successfully replicated. All items with the exception of one (No. 7) loaded onto the postulated factors, the factor loadings being between 0.401 and 0.821. For further calculations concerning the two measures of satisfaction, the factor values (for each subject) were entered, in which single items for each factor were weighted, according to their factor loading.

Following theoretical and conceptual considerations, the proposed dimensional differentiation of the anxiety scale was maintained, and a factor analysis was not computed for these items. The internal consistency, i.e., the reliability of the scales (Cronbach’s alpha: causes of anxiety α = 0.827; reasons for anxiety reduction α = 0.796; peak of anxiety α = 0.593 postoperative worries α = 0.796; possibilities of reducing anxiety α = 0.835), was sufficient for our purpose, which was a methodological comparison of the two groups. It was not necessary to reduce the number of items inductively. The ratings of all items on each scale were summated, and the mean was computed in order to obtain separate scale values.

Data had been collected from the sample of patients still in hospital treatment, however, only for the purpose of controlling for the factor of social desirability. Accordingly, this sample was ultimately much smaller than the posttreatment sample (86 hospitalized patients as compared to 328 released patients). Given such large differences between the sample sizes, it is clear that differences in demographical variables etc. might occur. Therefore, we tested the samples for their comparability regarding the following variables: sex, age, education, and type of illness. The variable “education” was measured by the level of school education (five levels). Concerning the variable “type of illness,” a distinction was made between (less serious) operations on the rectum and (more serious) colon operations. The differences between the two samples were tested with t-tests for independent samples, and significant differences regarding the variables age, sex, and type of illness were revealed (cf. Table 3).

Table 3 Descriptive values of the two groups and results of the t-tests

These differences between the two subsamples must be statistically controlled when evaluating the original differences of interest, namely differences in satisfaction ratings and anxiety ratings.

Table 4 shows these differences (satisfaction ratings and anxiety ratings during treatment in hospital and after release from hospital). A multivariate analysis of variance reveals significant effects for three factors, namely satisfaction with medical staff, satisfaction with nursing staff, and reasons for anxiety reduction. The effects occur in the expected direction: The intensity of the ratings is reduced for the group in which the data were collected after their release from hospital, as compared to the group in which the survey was completed during the stay in hospital, when higher rates of satisfaction and anxiety reduction were indicated. For all other anxiety scales, there were no significant differences between the two groups.

At the beginning of this article, the methodological question was posed whether there is a danger of satisfaction ratings being biased due to the influence of social desirability. Before it is possible to draw a sufficiently reliable conclusion based on this data set, two problems must be taken into account. First of all, differences in satisfaction ratings and anxiety ratings between the two measurement times might (partially) be due to the variability among the two samples, as described above. This can be tested by controlling for variables in which a significant difference exists—here: age, sex, and type of illness—in an analysis of covariance. However, it is possible that not all relevant differences between the two subsamples have been identified and controlled for. There may well be other variables which correlate with proneness to situational social desirability. In this respect, the lack of difference in the level of education provides relative safety because differences regarding proneness to social desirability are primarily due to education [23, 24]. Therefore, we may assume that the computed analysis of covariance controls for the relevant sample differences. Secondly, relatively small differences can become significant in large samples. Thus, the substantiality of the effects must be tested using a measure of effect size (qua explanation of variance); in this case, the partial Eta square was used [25]. Table 5 shows the results of the corresponding multivariate analysis of covariance, in which the three major differences between the two measurement times (satisfaction with medical staff and nursing staff as well as reasons for anxiety reduction) yielded significant results and a substantial effect size.

Table 4 Descriptive values of anxiety and satisfaction scores; results of multivariate analysis of variance (N = 331; Pillai’s trace method) and the corresponding univariate analyses of variance

According to Cohen [26], the single effects—being between 0.018 (satisfaction with medical staff) and 0.046 (reasons for anxiety reduction) up to 0.076 (satisfaction with nursing staff)—must be classified as small effects. The overall value of 0.141, however, corresponds to a large effect.

Finally, we tested whether, complementarily, the variables of interest are also significant predictors of the two samples (during hospitalization vs. after release from hospital). The corresponding statistical test carried out by means of discriminatory analysis leads to results reported in Table 6.

Table 5 Multivariate analysis of covariance (N = 323; Pillai’s trace method) and corresponding univariate results

The prediction was indeed significant (z = 5.194; p < 0.001). The correspondence between the cross-validated prediction model and the observed group membership is, with 78.9%, by all means satisfactory (Tables 6 and 7). The variables, confirmed by the analysis of variance, were confirmed to be the major predictors.

Table 6 Discriminatory analysis for separating the groups during and after hospitalization (N = 331)
Table 7 Cross-validated table of group membership (from probability of sample size estimated a priori)

All in all, the results clearly reveal a reduction of intensity in the investigated satisfaction, even when controlling for relevant confounding demographic variables. The results of the model applying analysis of variance, as well as of the model using discriminatory analysis, are consistent, and both reveal a significant decrease between the first and the second measurement time. Thus, the pattern of results can be considered both stable and substantial.

Discussion

The two scales of satisfaction (regarding the medical and the nursing staff) reveal the expected decrease of rating scores after release from hospital, compared to the results of measurement during hospitalization. This reduction indicates a bias caused by effects of social desirability because potential memory distortions would have had the reverse effect. The possibility that the disease reoccurred or worsened in the period after release from hospital is highly unlikely for the case of coloproctological operations, even given cancerous diseases. Therefore, it is implausible that events subsequent to release are responsible for the decreased satisfaction scores. Furthermore, such complications concerning the illness would be attributed to the physicians and the satisfaction ratings would thus decrease even more for the medical staff than for the nursing staff. However, the contrary is the case (cf. Tables 4 and 5) so that the present satisfaction ratings can almost certainly be traced back to social desirability bias.

Anxiety ratings, however, were not expected to be influenced by social desirability to different extents at the two times of observation. This proved true for all anxiety scales, with the exception of the scale “reasons for anxiety reduction.” This result does not imply that the social desirability bias is less influential. A precise examination of the items shows that the scale is, after all, a scale measuring anxiety reduction. Asking about reduction is, de facto, inquiring about positive emotionality, for which the social desirability bias should also be of relevance. Therefore, due to the reverse coding, which implicitly asks about positive emotionality, the results of the scale measuring “reasons for anxiety reduction” in fact provide additional evidence of a social desirability bias. Because of the choice of experimental design, the estimation of the extent of this bias is rather conservative as its dynamics cannot be utterly neglected at the second observation time (after release from hospital). The pattern of results was obtained in a specialized clinic for coloproctology. This clinic does not however, to our knowledge, significantly differ from other surgical hospitals. Therefore, external validity of the results can be assumed, at least for stays in surgical hospitals. Thus, it can be stated that social desirability leads to a stable and substantial bias for ratings of satisfaction and emotion obtained with general scales.

Conclusion

The methodological consequences of this conclusion are considerably far reaching. First of all, the conceptualization of a bipolar continuum of patient satisfaction and dissatisfaction, which has already been criticized from a theoretical point of view [27], should be renounced. If the self-critical approach towards an improvement of medical care as described at the start of the present study is taken seriously, then putting this into practice cannot simply entail a theoretical elaboration of the concept of patient dissatisfaction but rather must also include methodological considerations.

And this should be achieved—at least within a first step—by not only using general rating scales because these are particularly prone to bias factors such as social desirability. Rather, a multistage approach should be favored, just as it is used for other attitude and evaluation questionnaires: By first of all using qualitative measurement steps and starting with individual cases, relevant aspects of judgment, problem areas, etc. can be determined, as was done at least rudimentarily through the application of group discussions [2830].

In the next step, these can be translated into quantitative research instruments for general survey studies. One method which appears to be particularly useful for the qualitative and heuristic research phase is the critical incident technique [31]. In this method, participants are asked, in the course of a narrative interview, to describe particularly impressive (in this case, evoking dissatisfaction) situations [32, 33]. This method has been applied within the field of medicine, in order to develop and carry out training for nursing staff [34, 35]. So far, the potential of this technique for research on patient dissatisfaction has not been fully recognized [36]. As soon as a multitude of situations evoking dissatisfaction have been systematically assessed in this manner, it will be possible to group them into typical categories, by means of a systematic content analysis [37, 38]. In a third step, these categories could be used to develop questionnaires, which could then be administered in representative survey studies. Questionnaires developed according to these three steps will no longer target unspecific and general evaluation ratings. Instead, they will inquire about types of situations at a medium level of abstraction and concreteness. For this type of item, the influence of common bias factors, such as social desirability, should be less severe.

It is our opinion that only when researchers of quality management within the field of medicine can be convinced to apply such an ambitious procedure, combining qualitative and quantitative research methods, will this domain also be able to optimize its potential for change and, furthermore, gain credibility and trust beyond the field of medicine.