Analysis of the validity of the vignette approach to correct for heterogeneity in reporting health system responsiveness

Rice, Nigel; Robone, Silvana; Smith, Peter

doi:10.1007/s10198-010-0235-5

Analysis of the validity of the vignette approach to correct for heterogeneity in reporting health system responsiveness

Original Paper
Published: 28 March 2010

Volume 12, pages 141–162, (2011)
Cite this article

Download PDF

Access provided by CONRICYT-eBooks

The European Journal of Health Economics Aims and scope Submit manuscript

Analysis of the validity of the vignette approach to correct for heterogeneity in reporting health system responsiveness

Download PDF

Nigel Rice¹,
Silvana Robone¹ &
Peter Smith^1,2

1013 Accesses
42 Citations
Explore all metrics

Abstract

Despite the growing popularity of the vignette methodology to deal with self-reported, categorical data, the formal evaluation of the validity of this methodology is still a topic of research. Some critical assumptions need to hold in order for this method to be valid. In this paper we analyse the assumption of “vignette equivalence” using data on health system responsiveness contained within the World Health Survey. We perform several tests to check the assumption of vignette equivalence. First, we use a test based on the global ordering of the vignettes. A minimal condition for the assumption of vignette equivalence to hold is that individual responses are consistent with the global ordering of vignettes. Secondly, using the hierarchical ordered probit model (HOPIT) model on the pool of countries, we undertake sensitivity analyses, stratifying countries according to the Inglehart–Welzel scale and the Human Development Index. The results of this analysis are robust, suggesting that the vignette equivalence assumption is not contradicted. Thirdly, we model the reporting behaviour of the respondents through a two-step regression procedure to evaluate whether the vignettes construct is perceived by respondents in different ways. Overall, across the analyses the results do not contradict the assumption of vignette equivalence and accordingly lend support to the use of the vignette methodology when analysing self-reported data and health system responsiveness.

Promises and Pitfalls of Anchoring Vignettes in Health Survey Research

Article 03 September 2015

Comparing South Korean and US self-rated health using anchoring vignettes

Article 08 August 2020

Systematic measurement error in self-reported health: is anchoring vignettes the way out?

Article Open access 28 June 2018

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Introduction

In recent years the concept of responsiveness has been promoted as a desirable measure to evaluate the performance of health systems. Responsiveness relates to a system’s ability to respond to the legitimate expectations of potential users about non-health enhancing aspects of care [1]. In broad terms, it can be defined as the way in which individuals are treated and the environment in which they are treated, and encompasses the notion of an individual’s experience of contact with the health system [2].

One of the most ambitious attempts to implement a cross-country comparative instrument aimed at measuring health system performance is the World Health Survey (WHS), which includes modules on the responsiveness of a system to user preferences. Respondents are asked to rate their experiences of health systems using a 5-point categorical scale (ranging from “very good” to “very bad”). A common problem with such data is that individuals, when faced with the instrument, are likely to interpret the meaning of the response categories in a way that differs systematically across populations or population sub-groups according to their preferences and norms (for example, see Salomon et al. [3]). Accordingly, the response categories will not be comparable across populations if they do not correspond to the same underlying level of the responsiveness construct. We refer to this phenomenon as “reporting heterogeneity”.

Recently, the use of anchoring vignettes has been promoted as a means for controlling for reporting heterogeneity across populations or population sub-groups. Vignettes represent hypothetical descriptions of a fixed level of a latent construct, such as responsiveness. Since these are fixed and predetermined, systematic variation across individuals in the rating of the vignettes can be attributed to differences in reporting behaviour [4]. The idea is to use information from the vignettes to adjust self-reported experiences of health system performance to increase cross population comparability by removing the influence of reporting heterogeneity.

In recent years anchoring vignettes have been utilised to address the issue of heterogeneous reporting behaviour in many studies regarding, for example, health and health-related behaviours (i.e. [3–7]), health system responsiveness [2, 8–10], happiness and job satisfaction [11, 12], national identity [13] and state effectiveness [14]. Despite the growing popularity of the vignette methodology to address the issue of reporting heterogeneity, the formal evaluation of the validity of the approach remains a topic of research [15–18]. Two critical assumptions need to hold in order for the method to be valid. The first, termed response consistency, implies that individuals classify the vignettes in a way that is consistent with the rating of their own experiences of health system responsiveness. This implies that the mapping used from the latent levels of responsiveness given by the vignettes to the response categories is the same as the mapping used to translate latent responsiveness of own experiences of contact with health services to the available response categories. The second assumption, termed vignette equivalence, implies that “the level of the variable represented by any one vignette is perceived by all respondents in the same way and on the same unidimensional scale” [7, p. 194]. This assumption implies that, conditional on the socio-economic characteristics that determine reporting behaviour, for each vignette there is an actual (unobserved) level of responsiveness that all individuals agree to, irrespective of their country of residence, their socio-demographic characteristics or the level of responsiveness they actually face.

In this paper, we focus attention on the assumption of vignette equivalence.^{Footnote 1} A limited number of other studies have tried to assess the validity of this assumption. These were focussed on self-reports of the ratings of work disability [5], mobility [19], visual acuity and political efficacy [7, 21], job satisfaction [11] and life satisfaction for income [21], largely making use of non-parametric methods using tests based on the global ordering of the vignettes. Our study explores the validity of the vignette equivalence assumption, making reference to the concept of responsiveness and using data from the WHS. Moreover, we adopt several strategies to assess the validity of the vignette equivalence assumption, using both non-parametric and parametric methods. The use of a two-step regression procedure to evaluate whether a vignette construct is perceived in the same way across respondents is novel in this context.

Data and methods

Data

To assess the validity of the vignette equivalence assumption we use data from the WHS. The WHS is an initiative launched by the WHO in 2001 aimed at strengthening national capacity to monitor critical health outputs and outcomes through the fielding of a valid, reliable and comparable household survey instrument (see Ustun et al. [22]). The basic survey mode was an in-person interview, consisting of either a 90-min in-household interview (53 countries), a 30-min face-to-face interview (13 countries) or a computer-assisted telephone interview (4 countries). In total, 70 countries participated in the WHS 2002–2003. All surveys were drawn from nationally representative frames with known probability resulting in sample sizes of between 600 and 10,000 respondents across the countries surveyed. Data collection was on a modular basis covering different aspects of health and health systems, including information on health state valuation, health system responsiveness and health system goals. Samples have undergone extensive quality assurance procedures, including the testing of the psychometric properties of the responsiveness instrument [23], and close attention has been paid to the issue of comparability [22].

The WHS responsiveness module gathers basic information on health care utilisation for both inpatient and outpatient services. In the analysis that follows we make reference only to inpatient services. The measurement of responsiveness was obtained by asking respondents to rate their most recent experience of contact with the health system within a set of eight domains by responding to set questions. The domains consist of “autonomy” (involved in decisions), “choice” (of health care provider), “clarity of communication” (of health care personnel), “confidentiality” (e.g. talk privately), “dignity” (respectful treatment and communication), “prompt attention” (e.g. waiting times), “quality of basic facilities” and “access to family and community support”.^{Footnote 2} The following five response categories were available to respondents when rating their experience of health systems: “very good”, “good”, “moderate”, “bad”, and “very bad”.

The WHS further contains information on respondent characteristics. We make use of age, gender, level of education and income. These variables have been extensively used in the studies investigating differential reporting behaviour in self-reported measure of health [2, 4, 19] and heath-related disabilities [5]. Level of education is a continuous variable measuring the number of years in education. Gender is a dummy variable coded 0 for women and 1 for men. Income is derived from a measure of permanent income based on information on the physical assets owned by households. The approach to its measurement, which relies on a variant of the hierarchical ordered probit model (HOPIT) to improve cross-country comparability, is provided by Ferguson et al. [24]. We construct dummy variables to indicate the tertiles of the within-country distribution of household permanent income to which individuals belong. For the analysis presented here, the first income tertile is considered as the base category.

The WHS contains a number of vignettes describing the experiences of hypothetical individuals within each of the eight domains of responsiveness. The vignettes have been divided into four sets (A–D) with each set containing five vignettes for each item present across two domains. For example, Set A contains five vignettes for each of the two items in the domain of “Dignity” and five vignettes for each of the two items in “Prompt Attention”. Due to constraints of interview length, each respondent in the survey rated the vignettes present in only one of the sets. Therefore, each vignette has been rated by approximately 25% of survey respondents. The response scale available to respondents answering the vignettes is the same as the scale available when reporting their own experiences of health system responsiveness. Examples of the WHS vignettes are provided in Table 1 for the domains “Confidentiality”, “Choice”, “Clarity of communication” and “Quality of basic amenities”.

Table 1 Examples of vignettes for the domain of confidentiality, choice, communication and quality of basic facilities

Full size table

We attempt to take into consideration the different levels of socio-economic development of countries to assess whether this influence the perception of the vignettes by making use of the Human Development Index (HDI) to stratify the countries into high, medium and low HDI groups. The HDI is a composite index of human development that combines indicators of life expectancy, educational attainment and income [25]. We also try to take into account the presence of different values and norms in different countries and evaluate if those values and norms affect the way individuals perceive the vignettes. To do this, we stratify our sample on the basis of the Inglehart–Welzel Cultural Map of the World, represented in Fig. 1 (http://www.worldvaluessurvey.org) [26].^{Footnote 3} This map reflects the presence of a strong correlation between a large number of basic values common to several countries. If we focus on European countries only, according to the Inglehart–Welzel map it is possible to identify three sets of countries that share similar social norms and values: Catholic countries, Protestant countries and ex-communist countries. At a broader level, if we consider all countries across the world, the basic values can be represented across two major dimensions of cross-cultural variation: Traditional/Secular-rational and Survival/Self-expression values (http://www.worldvaluessurvey.org). The first dimension reflects the contrast between societies in which religion is considered as an important element of life and those in which it is not. The second dimension reflects the contrast between industrial and post-industrial societies. In the former societies emphasis is given to economic and physical security while in the latter societies there is an increasing emphasis on subjective well-being, self-expression and quality of life. We follow this stratification in the analysis that follows.^{Footnote 4}

Methods

Consistent and near-consistent ordering of vignettes

We assess the vignette equivalence assumption by first considering the global ordering of the vignettes. A minimal condition for the assumption of vignette equivalence to hold is that individual responses are consistent with the global ordering of vignettes. The global ordering for a domain can be obtained by pooling all the responses across countries and considering the average categorical response for each vignette [19]. Similar tests of the vignette equivalence assumption based on the global ordering of vignettes, but for health-related disabilities, job satisfaction and self reported measures of health, have been undertaken by [5, 9, 11, 21]. Due to the presence of stochastic measurement errors we cannot expect all individuals to order the vignettes in exactly the same way as each other. Adopting the approach of Murray et al. [19], we define a consistent ordering as “a set of categorical vignette ratings that could be consistent with the global ordering in the latent variable space, if ambiguities were resolved in favour of the global ordering” [p. 373].^{Footnote 5} Accordingly, for each domain and for each country we compute the percentage of respondents that gave an ordering of vignettes consistent with the global ordering, or had an ordering where only one vignette moved one or two ranks or two vignettes moved one rank each. Further, we compute the average percentage of respondents in each country that gave an ordering of vignettes consistent or near-consistent with the global ordering, where countries have been stratified by HDI groups and by the Inglehart–Welzel map groups.^{Footnote 6}

Spearman rank order correlation coefficient

Individuals’ ordering of the vignettes might differ due either to measurement errors (caused, for example, by incorrect phrasing, translation or implementation of the vignette questions) or to problems of multidimensionality and variation in the cultural construct of a domain [19].^{Footnote 7} An analysis of the more common alternative patterns of vignette ordering can provide information about the relative importance of the problem of measurement error versus the problems of multidimensionality and variation in the cultural construct of a domain. Measurement error is generally associated with a large number of alternative orderings (due to chance). The prevalence of multidimensionality or cultural variation in a construct should, however, lead us to observe a limited number of alternative orderings, “reflecting some other weighting of the components of a multidimensional construct or alternative cultural constructs” [19, p. 376]. Multidimensionality of the responsiveness construct provides evidence of a violation of the vignette equivalence assumption. The Spearman rank order correlation coefficient (SROCC), which quantifies the extent to which an ordering is consistent with the global ordering of vignettes, has been suggested as a means to investigate the relative importance of the two sources of difference in ratings of vignettes [19].^{Footnote 8} For each domain we compute the SROCC between the vignettes rankings of each respondent and the global ranking.

We calculate the frequency distribution, together with several descriptive statistics, of the SROCCs across all individuals in the WHS dataset for the eight domains considered.^{Footnote 9} First, for each domain, we compute the percentage of individuals who report an ordering of vignettes that is positive and the percentage of individuals for which the correlation coefficient between the individual and the global ordering of vignettes is larger than 0.5. Secondly, following Murray et al. [19], we report the number of different rank order correlation coefficients observed in each domain and the number that occur with a frequency greater than 1%. The greater the number of different rank order correlation coefficients reported in each domain together with a smaller number occurring with a large frequency, the higher the probability that alternative orderings are due to measurement errors rather than to multidimensionality or cultural variation. We also show the median SROCC for each domain and the average SROCC across domains for each country.^{Footnote 10}

The HOPIT model

An alternative way to check the vignette equivalence assumption implies estimating a model for responsiveness that takes into account possible biases due to reporting heterogeneity. This approach, adopted by Kristensen and Johansson [11] when considering self-reported job satisfaction, consists of firstly estimating a model on a pool of countries. Secondly, the sample is split into groups of countries according to the values, social norms, economic development, etc. that characterise these countries. Models are then estimated on the sub-samples and the coefficients are compared to those obtained from the pooled sample. If the model is robust and the vignette equivalence assumption is not violated, then we would expect the coefficient to be similar in the two samples. However, if the differences in culture and values across the country groups lead individuals to interpret the meaning of vignettes differently (and thus to violate the vignette equivalence assumption), we should observe very different estimated coefficients across the country groups [11].

Since the data on responsiveness in the WHS are self-reported and categorical, we use the HOPIT model developed by Tandon et al. [27] (also see Terza [28]), to adjust for reporting behaviour. The model can be specified in two parts. The first part draws on the use of the anchoring vignettes to provide a source of information that enables the thresholds to be modelled as functions of relevant covariates (reporting behaviour equation). The second part maps the relevant covariates to underlying self-reported health system responsiveness while controlling for differences in reporting behaviour obtained through the first step (responsiveness equation). A more formal description of the two parts of the model is reported in “Appendix” (also see Rice et al. [9]). The use of vignettes to identify reporting heterogeneity relies on the assumptions of response consistency and vignette equivalence described in the Introduction.

As a preliminary analysis, we apply the HOPIT model across the pool of 27 European countries present in the WHS, using the domain “Dignity”. For the purposes of our model, we use the dummies for country of residence together with individual specific characteristics (age, gender, level of education and income) as relevant covariates in both the reporting behaviour and the responsiveness equation. Austria is taken as the baseline country. We then stratify the European countries in three groups according to the Inglehart–Welzer map to reflect similar cultures, social norms and values. We finally re-estimate the HOPIT model for each of the three groups of countries.

We further extend the analysis by considering all the countries present in the WHS.^{Footnote 11} Mexico, which has the largest sample size, is taken as the baseline country. Countries are stratified into four groups according to the Inglehart–Welzer map (“Self-Traditional”, “Self-Secular”, “Survival-Traditional”, “Survival-Secular”) and the HOPIT model is estimated separately for each of these groups of countries.

We also consider the possibility that differences in the level of socioeconomic development of countries might induce individuals to interpret the meaning of vignettes differently. Accordingly, we stratify the countries in the WHS according to their level of HDI and again apply the HOPIT model for each of these groups of countries.

Assessment of multidimensionality of the constructs represented by vignettes

An analysis of the characteristics of individuals described in the vignettes offers a further tool to check the vignette equivalence assumption. If the person described in a vignette is characterized by specific socio-demographic characteristics, it is possible that respondents are influenced by these characteristics, which may induce them to perceive the vignettes differently to other respondent. This would represent a violation of the vignette equivalence assumption. As an example, consider a vignette about “Autonomy” representing an elderly person. Some respondents may feel that elderly people are incapable of making appropriate decisions about treatments and may have lower expectations about the level of autonomy afforded to elderly individuals. Other respondents, however, could consider elderly people equally able to be involved in decisions about treatments as young people and hence would have the same expectations about the level of autonomy for elderly and young people. Specifying the age of the person described in the vignette may therefore induce some respondents to perceive the construct as representing “autonomy for elderly people” and for others to perceive it as “autonomy” in general.

Information on the characteristics of the individual described in the vignette have been used to assess vignette equivalence in a study by Kapteyn et al. [5]. The authors use responses obtained from two internet surveys on work disability conducted in the Netherlands and in the US. Vignettes were presented to respondents by randomly using either a female or a male name (i.e. faced with the same vignette, some respondents rated the health conditions for a woman while others rated the same condition for a man). Variability across the ratings allowed the authors to model the reporting behaviour of respondents as a function of the gender of the individual described in the vignette by explicitly including this variable as a regressor in the HOPIT model.^{Footnote 12} They reported that “for a given vignette description, a male vignette person is seen as more work disabled than a female vignette person, by both male and female respondents” [5, p. 469]. In a similar vein, we evaluate whether individuals judge vignettes differently according to the gender of the person presented in a vignette and whether the person suffers from physical pain. We choose these individual characteristics for two reasons. First, on practical grounds, vignettes tend to represent “neutral” individuals, with little information on personal characteristics. Gender and pain are two of a very limited set of characteristics we can identify in the 20 vignettes considered. Secondly, while Kapteyn et al. [5] suggest that respondents tend to judge the vignettes differently according to whether the person in the vignette is female or male, Bago d’Uva [4] suggests that the elderly and the young interpret the construct of a vignette differently where the vignette describes a situation of physical pain.

For our analysis, we consider the pool of countries present in the WHS and, for illustration, make reference to the set of vignettes contained in the domains of “Dignity” and “Prompt attention”.^{Footnote 13} This set comprises 20 vignette questions answered by 858,570 individuals across all countries. Unfortunately, in the WHS there is no variability within a vignette in the gender of the individual described. The gender of the individual represented in each vignette is fixed and, accordingly, we are unable to adopt the methodology of Kapteyn et al. [5]. However, since within each domain of responsiveness in the WHS respondents are asked to evaluate a set of vignettes, we can exploit the variability in gender that is present across the vignettes within a given domain. To exploit this variability we perform a two-stage analysis using an estimated dependent variable regression model (EDV), as described by Lewis and Linzer [29]. In the first stage we model the reporting behaviour of respondents using a standard ordered probit model. We regress respondent ratings of the vignettes on the socio-demographic characteristics of the respondents and on a set of vignette-specific dummy variables [30, p. 61].^{Footnote 14} We then “store” the coefficients of the vignette-specific dummy variables.^{Footnote 15} In the second stage we regress the coefficients of the vignette-specific dummies on a dummy variable indicating if the person in the vignette is female, and on a dummy indicating if the person is in pain. Given the small sample size of the data we use in the second step regression, we correct for the potential presence of heteroskedasticity using the Efron robust standard error estimator [31], as suggested by Lewis and Linzer [29].

Results

Consistent and near-consistent ordering of vignettes

Using the data on health system responsiveness contained in the WHS, Table 2 reports the percentage of respondents for each domain in each country that gave an ordering of vignettes consistent with the global ordering, or had an ordering where only one vignette moved one or two ranks or two vignettes moved one rank each.^{Footnote 16} For each domain, there was no substantial variation across countries. For all countries (with few exceptions) more than 90% of respondents report consistent or near-consistent vignette orderings. For each domain, this percentage is equal to or greater than 95% in at least 52 countries. These preliminary results provide support for the assumption of vignette equivalence.

Table 2 Percent of consistent and near-consistent ordering by domain and country

Full size table

Table 3 presents the average percentage of respondents in each country that gave an ordering of vignettes consistent or near consistent with the global ordering, where countries are stratified by HDI groups and by the Inglehart–Welzel map groups. Average percentages are reported for each domain. In general, the average percentages are slightly higher for High HDI countries compared to Medium and Low HDI countries, and for countries characterised by “Secular-Rational” values compared to “Traditional” ones. However, the variation across HDI groups and across the Inglehart–Welzel grouping of countries is very small. These results provide further evidence that individuals across different countries tend to interpret the vignettes in a consistent way.

Table 3 Average percent consistent and near-consistent ordering, by Human Development Index (HDI) groups and by Inglehart–Welzel map groups

Full size table

Spearman rank order correlation coefficient

Table 4 provides frequency distributions for the SROCCs for an illustrative domain, “Clarity of Communication”, and Table 5 provides descriptive statistics across all domains. For each domain, the majority of the individuals reports an ordering of vignettes that is positive and highly correlated with the global ordering (the percentage of individuals whose SROCC is positive is between 87 and 95%, and the percentage of individuals with a SROCC larger than 0.5 is between 64 and 90%). The number of different rank order correlation coefficients reported in each domain appears to be high, and varies quite substantially (between 59 and 145) across domains. Accordingly, in some domains there is a large number of alternative orderings (i.e. “Prompt Attention” and “Quality of Facilities”), while for others the number of ordering is small (i.e. “Clarity of communication”, “Autonomy” and “Social Support”). The number of SROCCs that occur with a frequency greater than 1% does not appear to be particularly large (on average 19) and it varies across domains much less than the number of alternative orderings.^{Footnote 17} Overall, the results suggest that vignettes ordering inconsistencies are more likely to occur because of measurement errors than because of the multidimensionality or cultural variation in the constructs of a domain. However, the possibility of some problem of multidimensionality appears to be higher in some domains (domains presenting a smaller number of alternative orderings, i.e. “Autonomy”) than in others.

Table 4 Spearman’s rank order correlation coefficient between individual ordering of vignettes and the global ordering, for the domain “Clarity of Communication”

Full size table

Table 5 Descriptive statistics about the Spearman rank order correlation coefficient, by domain

Full size table

Figure 2 shows the median SROCC across the data for each domain.^{Footnote 18} For most domains the vignettes appear to work well, with the median correlation assuming values between 0.85 and 0.95. Only the domains “Confidentiality” and “Choice” appear to have a slightly worse performance, presenting a median correlation that varies between 0.75 and 0.80. Figure 3 shows the median value of the SROCC across domains in each country. This value ranges from very high levels observed for Bangladesh and Comoros Islands (1.00 each) to more moderate values for Cote d’Ivoire and Namibia (0.84 and 0.74, respectively). However, the coefficient is greater than 0.90 in the majority of countries. The high values presented by the average SROCCs imply that cultural differences in the interpretation of vignettes across countries may not be of great concern.^{Footnote 19}

Table 6 provides the average SROCCs across all countries for individuals belonging to different socioeconomic groups. We perform this analysis following the suggestion of King et al. [7, p. 200], that “the key in detecting multidimensionality [of the vignette construct] is searching for inconsistencies that are systematically related to any measured variable”. In particular, Table 6 provides the SROCC between the ordering of vignettes defined at global level and the median ordering given by individuals within different education groups. The same information is provided for individuals stratified according to their level of income and gender. The vignettes appear to be ordered in a similar way across the different socio-economic groups. The exception is individuals with a high level of education for the domain “Confidentiality”. For these individuals the ordering of the vignettes is less close to the global ordering, since the SROCC assumes values inferior to 0.8.

Table 6 Average Spearman rank order correlation coefficient (SROCC) across all surveys

Full size table

The HOPIT model

Table 7 presents the results from the responsiveness and reporting behaviour equation of the HOPIT model estimated on the pool of the 27 European countries present in the WHS. For brevity, only the results related to the first cut point (the cut point separating the response category “very bad” from “bad”) are presented in the table. Results relating to other cut points are available on request. Belonging to the top income tertile, compared to the bottom, appears to be significantly related to experiencing a high level of responsiveness, while being a woman is negatively related to responsiveness (although this effect does not attain statistical significance). Elderly people and more educated people appear to face higher levels of responsiveness, but only for the former is the association statistically significant. On average, individuals in Eastern European countries appear to face lower levels of responsiveness than in Austria, while we can not draw general conclusions for individuals in western European countries.

Table 7 European countries: coefficients and standard errors for the responsiveness equation and the reporting behaviour equation (first cut point) of the hierarchical ordered probit model (HOPIT) model, for the domain “Dignity”, for the pool of countries and for countries stratified by the Inglehart–Welzer value map

Full size table

We stratify the European countries into three groups, according to the Inglehart–Welzer map, to reflect similar cultures, social norms and values. When we estimate the HOPIT model for each of the three groups of European countries separately (Catholic, Protestant and ex-communist), the coefficients for the country dummy variables are very robust both in the responsiveness equation and in the reporting behaviour equation. The coefficients retain the same sign when compared to the coefficients for the model where all the European countries are pooled together. Further, few of them change substantially. These results lend further support to the assumption of vignette equivalence.

Table 8 presents the results of the HOPIT model estimated across the full pool of countries and on “Self-Traditional”, “Self-Secular”, “Survival-Traditional”, “Survival-Secular” countries separately. Again, the coefficients for the country dummy variables, both in the responsiveness and in the reporting behaviour equation, appear robust. Similar results, presented in Table 9, are obtained when the HOPIT model is estimated separately for countries stratified according to their level of HDI.^{Footnote 20} For both the responsiveness equation and the reporting behaviour equation, the coefficients for the country dummy variables again appear robust. These results provide further evidence in favour of the assumption of vignette equivalence.

Table 8 All countries: coefficients and standard errors for the responsiveness equation and the reporting behaviour equation (first cut point) of the HOPIT model, for the domain “Dignity”, for the pool of countries and for countries stratified by the Inglehart–Welzer value map

Full size table

Table 9 All countries: coefficients and standard errors for the responsiveness equation and the reporting behaviour equation (first cut point) of the HOPIT model, for the domain “Dignity”, for the pool of countries and for countries stratified by HDI group

Full size table

Test for multidimensionality of the constructs represented by vignettes

When we perform the two-stage analysis described in the section “ Assessment of multidimensionality of the constructs represented by vignettes”, neither the regressors nor the constant term in the second step regression are statistically significant at the 95% percent level.^{Footnote 21} This result suggests that the gender of the person represented in the vignettes and his/her condition of pain do not influence the way respondents judge the vignettes.^{Footnote 22} Again, these results provide support to the vignette equivalence assumption.

Conclusion and discussion

Despite the growing popularity of the vignette methodology to address the issue of systematic reporting heterogeneity in self-reported data, the formal evaluation of the validity of this methodology has remained a topic of research. Two critical assumptions need to hold in order for the method to be valid. This paper presents analyses to assess the validity of the assumption of vignette equivalence using data on health system responsiveness contained within the WHS.

We first performed non-parametric analyses based on the global ordering of the vignettes. Secondly, after estimating a HOPIT model for responsiveness on the pool of countries, we performed sensitivity analyses stratifying the countries in our sample on the basis of the Inglehart–Welzel map and HDI groupings. Thirdly, we adopted a two-step regression procedure to evaluate the possibility that an individuals’ perceptions of the construct described by a vignette differ according to the characteristics of the person described in the vignette. The results derived from our analysis do not contradict the assumption of vignette equivalence. Accordingly, they lend support to the use of the vignette methodology to correct for the presence of reporting heterogeneity.

A potential limitation of our analysis is that, for brevity, only a limited set of domains of responsiveness were used. For the analysis in the section on “The HOPIT model” we considered only “Dignity”, while in “Test for multidimensionality of the constructs represented by vignettes”, we refer to “Dignity” and “Prompt Attention”. Some caution is, therefore, required in generalising our results to other domains of the responsiveness construct.

The results refer only to the assumption of vignette equivalence and do not consider response consistency. Recent literature has tried to assess the validity of the latter assumption [6, 17]. The majority of these studies test this assumption by comparing self-reported data to objective data (for example, comparing self-reported data on health to objectively measured levels of health). Unfortunately, the WHS does not contain objective measures of the level of responsiveness faced by respondents. Hence, we are currently unable to test this assumption in the WHS.

Our study provides an original contribution to the literature on anchoring vignettes by exploring the validity of the vignette equivalence assumption with reference to the concept of responsiveness. We adopt several strategies to assess the validity of the vignette equivalence assumption, employing both non-parametric and parametric methods. Overall, our results do not provide strong evidence to suggest that the assumption does not hold and, accordingly, support the use of the anchoring vignette approach to adjust self-reported data for systematic differences in reporting behaviour.

Notes

Other studies focus on the assumption of response consistency when trying to assess the validity of the anchoring vignettes methodology, i.e. [6, 15, 17].
The long-form questionnaire uses two questions items per domain, while the short-form questionnaire uses only one. We use the eight items that are common to the long- and short-form questionnaire.
This map has been utilised to assess the validity of the vignette equivalence assumption also by Kristensen and Johansson [11].
“Self Secular” = Austria, Belgium, Denmark, Germany, Spain, Finland, France, Great Britain, Greece, Israel, Italy, Luxemburg, Netherlands, Slovenia, Sweden. “Self-Traditional” = Brazil, Dominican Republic, Ecuador, Guatemala, Ireland, Portugal, Uruguay. “Survival-Traditional” = United Arab Emirates, Burkina Faso, Bangladesh, Chad, Cote d’Ivoire, Congo, Comoros, Ethiopia, Ghana, India, Kenya, Lao, Sri Lanka, Malaysia, Mauritania, Malay, Morocco, Myanmar, Mauritius, Malawi, Namibia, Nepal, Pakistan, Philippines, Senegal, Swaziland, Tunisia, South Africa, Zambia, Zimbabwe. “Survival Secular” = Bosnia, China, Croatia, Czech Republic, Georgia, Hungary, Kazakhstan, Latvia, Russia, Slovakia, Ukraine, Vietnam.
For an example of consistent vignette ordering, consider Murray et al. [19], Fig. 30.3.
The average is computed assigning the same weight to each country within a group.
As an example “running a marathon” could be viewed as a multidimensional construct. Some individuals may view running a marathon as evidence of a high level of mobility and some as a result of exceptional talent. Others might consider it as an attribute related to health, whist others might as an attribute related to sport [19].
Perfect agreement of the rankings leads to a coefficient of 1, perfect disagreement −1, and independence 0.
We do not include in the analysis individuals who gave the same evaluation of all the vignettes (i.e. they judge all the vignettes as excellent responsiveness). Indeed, for these individuals it is not possible to compute the Spearman rank order correlation coefficient between their ranking and the global ordering ranking. However, we perform a robustness check including in the sample the observations about respondents who gave the same evaluation of all the vignettes. Referring to the domain “Confidentiality”, we perform the robustness check by just moving one vignette of one rank, in a consistent way with the global ordering. The results obtained including these observations are extremely similar to those not including them.
The average SROCCs have been computed assuming equal weight for each individual.
We exclude only Australia, Norway and Turkey since data on “Dignity” are not available for these countries.
See pp. 463–464 of Kapteyn et al. [5] for a formal description of the model estimated by the authors.
This set of vignettes is coded as Set A in the WHS. We are unable to perform our analysis on a pool of all the vignettes contained in the responsiveness module, since each set is evaluated by a different group of respondents.
The first vignette of the set (q7501) is assumed to be the base category.
The strategy adopted by STATA (the software we utilize for the empirical estimates) for identification in the ordered probit model is to set the constant term to zero. Therefore, we assume the coefficient of the base reference vignette-dummy to be equal to zero.
Australia, Turkey and Guatemala are excluded from the analysis since data on vignettes are not reported for all the domains considered.
The coefficient of variation of the number of alternative orderings is 14.35, while for the number of SROCCS that occur with a frequency greater than 1% it is 0.91.
For each domain, we have computed the median SROCC on the basis of tables analogous to Table 4.
We are not aware of any study that explicitly defines a threshold of acceptability for the rank order correlation coefficient above which we can assume that vignette equivalence holds. However, according to Murray et al. [19], a rank order correlation coefficient greater than 0.9 strongly corroborates the assumption of vignette equivalence.
Only the result related to the first cut point in the reporting bias equations is reported in Tables 8 and 9. Results related to the other cut points are available on request.
The results of the first and second step regression are available on request.
The results are not affected by the distribution of the gender of individuals across vignettes, since both women and men are represented in vignettes describing high and low levels of responsiveness.

References

Murray, C., Frenk, J.: A framework for assessing the performance of health systems. Bull. World Health Org. 78, 717–731 (2000)
Google Scholar
Valentine, N., De Silva, A., Kawabata, K., Darby, C., Murray, C.J.L., Evans, D.: Health system responsiveness: concepts, domains and operationalization. In: Murray, C.J.L., Evans, D.B. (eds.) Health systems performance assessment: debates, methods and empiricism, pp. 573–596. World Health Organization, Geneva (2003)
Google Scholar
Salomon, J., Tandon, A., Murray, C.J.L., World Health Survey Pilot Study Collaborating Group: Comparability of self-rated health: cross sectional multi-country survey using anchoring vignettes. Brit. Med. J. 328(258), 258–261 (2004)
Article Google Scholar
Bago d’Uva, T., van Doorlsaer, E., Lindeboom, M., O’Donnell, O.: Does reporting heterogeneity bias the measurement of health disparities? Health Econ. 17(3), 351–375 (2008)
Article Google Scholar
Kapteyn, A., Salomon, J., van Soest, A.: Vignettes and self-reports of work disability in the US and the Netherlands. AER 97(1), 461–473 (2007)
Google Scholar
Van Soest, A., Delaney, A., Harmon, C., Kapteyn, A., Smith, J.P.: Validating the use of vignettes for subjective threshold scales. Discussion Paper, Tilburg University (2007)
King, G., Murray, C.J.L., Salomon, J., Tandon, A.: Enhancing the validity and cross-cultural comparability of measurement in survey research. Am. Polit. Sci. Rev. 98(1), 184–191 (2004)
Article Google Scholar
Rice, N., Robone, S., Smith, P.C.: Analysis of the validity of the vignette approach to correct for heterogeneity in reporting health system responsiveness. HEDG Working Paper, University of York (2009)
Rice, N., Robone, S., Smith, P.C.: International comparison of public sector performance: the use of anchoring vignettes to adjust self-reported data. Evaluation 16(1), 81–101 (2010)
Google Scholar
Sirven, N., Santos-Eggimann, B., Spagnoli, J.: Comparability of health care responsiveness in Europe using anchoring vignettes from SHARE. IRDES working paper DT15 (2008)
Kristensen, N., Johansson, E.: New evidence on cross-country differences in job satisfaction using anchoring vignettes. Labour Econ. 15, 96–117 (2008)
Article Google Scholar
Hsee, C.K., Tang, J.N.: Sun and water: on a modulus-based measurement of happiness. Emotion 7, 213–218 (2007)
Article Google Scholar
Javaras, K.N., Ripley, B.D.: An “unfolding” latent variable model for likert attitude data: drawing inferences adjusted for response style. JASA 102(478), 454–463 (2007)
Google Scholar
Grzymala-Busse, A.: Rebuilding Levithan: party competition and state exploitation in post-communist democracies. Cambridge University Press, New York (2007)
Book Google Scholar
Bago d’Uva, T., Lindeboom, M., O’Donnell, O., van Doorslaer E.: Slipping anchor? Testing the vignettes approach to identification and correction of reporting heterogeneity. HEDG Working Paper, University of York (2009)
Hopkins, D., King, G.: Improving anchoring vignettes: designing surveys to correct interpersonal incomparability. Public Opin. Q. (2010) (in press)
Gupta N., Kristensen, N., Pozzoli, D.: External validation of the use of vignettes in cross-country health studies. IZA Discussion Paper (2008)
Wand, J.: Credible comparisons using interpersonally incomparable data: ranking self-evaluations relative to anchoring vignettes or other common survey questions. Mimeo (2007)
Murray, C.J.L., Ozaltin, E., Tandon, A.J., Salomon, J.: Empirical evaluation of the anchoring vignettes approach in health surveys. In: Murray, C.J.L., Evans, D.B. (eds.) Health systems performance assessment: debates, methods and empiricism, pp. 369–399. World Health Organization, Geneva (2003)
Google Scholar
King, G., Wand, J.: Comparing incomparable survey responses: new tools for anchoring vignettes. Polit. Anal. 15(1, Winter), 46–66 (2007)
Google Scholar
Kapteyn, A., Salomon J., van Soest, A.: Are Americans really less happy with their incomes? Rand Working Paper (2008)
Üstün, T.B., Chatterji, S., Mechbal, A., Murray, C.: The world health surveys. In: Murray, C.J.L., Evans, D.B. (eds.) Health systems performance assessment: debates, methods and empiricism, pp. 762–796. World Health Organization, Geneva (2003)
Google Scholar
Valentine, N., Prasat, A., Rice, N., Robone, S., Chatterji, S.: Health systems responsiveness—a measure of the acceptability of health care processes and systems. In: Mossialos, E., Smith, P., Leatherman, S. (eds.) Performance measurement for health system improvement: experiences, challenges and prospects, pp. 256–305. WHO European Regional Office, London (2009)
Google Scholar
Ferguson, B.D., Tandon, A., Gakidou, E., Murray, C.J.L.: Estimating permanent income using indicator variables. In: Murray, C.J.L., Evans, D.B. (eds.) Health systems performance assessment: debates, methods and empiricism, pp. 748–760. World Health Organization, Geneva (2003)
Google Scholar
United Nation Development Programme: Human Development Report. New York (2006)
Inglehart R.: Inglehart–Welzel cultural map of the World. http://www.worldvaluessurvey.org
Tandon, A., Murray, C.J.L., Salomon, J.A., King, G.: Statistical models for enhancing cross-population comparability. In: Murray, C.J.L., Evans, D.B. (eds.) Health systems performance assessment: debates, methods and empiricism, pp. 727–746. World Health Organization, Geneva (2003)
Google Scholar
Terza, J.V.: Ordinal probit: a generalization. Commun. Stat. 14(1), 1–11 (1985)
Google Scholar
Lewis, J.B., Linzer, D.A.: Estimating regression models in which the dependent variable is based on estimates. Polit. Anal. 13, 345–364 (2005)
Article Google Scholar
Jones, A.M., Rice, N., D’Uva, T.B., Balia, S.: Applied health economics. Routledge, New York (2007)
Google Scholar
Efron, B.: The Jackknife, the Bootstrap and other resampling plans. Society for Industrial and Applied Mathematics, Philadelphia (1982)
Google Scholar

Download references

Acknowledgments

This research was funded by the Economic and Social Research Council under the Public Services Programme, grant number RES-166-25-0038, and under the ESRC Large Grant Scheme, grant number RES-060-25-0045. We would like to thank the World Health Organization for providing access to the World Health Survey and, in particular, Somnath Chatterji, Amit Prasad, Nicole Valentine and Emese Verdes. We are also grateful to the Health, Econometrics and Data Group Seminar Series at the University of York and to the Health Economics Seminars, Erasmus University for helpful comments on an earlier draft.

Author information

Authors and Affiliations

Centre for Health Economics, University of York, Alcuin Block A, Heslington, York, YO10 5DD, UK
Nigel Rice, Silvana Robone & Peter Smith
Business School, Imperial College London, South Kensington Campus, London, SW7 2AZ, UK
Peter Smith

Authors

Nigel Rice
View author publications
You can also search for this author in PubMed Google Scholar
Silvana Robone
View author publications
You can also search for this author in PubMed Google Scholar
Peter Smith
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Silvana Robone.

Appendix: The HOPIT model

Reporting behaviour equation

To identify the thresholds as a function of respondent covariates, let $ R_{ik}^{v * } $ represent the underlying health system responsiveness for vignette k, rated by individual i. Given that each vignette is fixed and unrelated to a respondent’s characteristics, it is assumed that the expected value of the underlying latent scale depends solely on the corresponding vignette, such that:

$$ R_{ik}^{v * } = K_{ik} \eta_{k} + \varepsilon_{ik}^{v} ,\quad \varepsilon_{ik}^{ * } |K_{i} \sim N\left( {0,1} \right) $$

(1)

where $ K_{ik} $ is the vector of vignettes, $ \eta_{k} $ is a conformably dimensioned vector of parameters and $ \varepsilon_{ik}^{v} $is an idiosyncratic error term. $ R_{ik}^{v * } $ is unobservable to the researcher and instead we observe the vignette rating, $ r_{ik}^{v} $ on a five-point scale ranging from ‘very bad’ to ‘very good’. We assume the observed category of $ r_{ik}^{v} $ is related to $ R_{ik}^{v * } $ through the following mechanism:

$$ r_{ik}^{v} = j\quad {\text{if}}\;\mu_{i}^{j - 1} \le R_{ik}^{v * } < \mu_{i}^{j} \quad {\text{for}}\;\mu_{i}^{0} = - \infty ,\;\mu_{i}^{5} = \infty ,\;\forall \,i,\,k;\quad j = 1, \ldots ,5 $$

(2)

Should the thresholds represent fixed constants, $ \mu^{j} $, common to all individuals, then the above mapping is common to the ordered probit model. For the HOPIT model the thresholds are assumed to be functions of covariates, X such that:

$$ \mu_{i}^{j} = X_{i} \gamma^{j} $$

(3)

where $ \mu_{i}^{j} ,\;j = 1, \ldots ,5 $ are parameters to be estimated along with $ \eta_{k} $. Further, we assume an ordering of the thresholds such that $ \mu_{i}^{1} < \mu_{i}^{2} < \cdots < \mu_{i}^{5} . $ If we impose the restriction that the covariates affect all thresholds by the same magnitude, then we have parallel cut-point shift. However, if the degree of reporting heterogeneity varies across thresholds such that it is greater at some levels of responsiveness than others, we refer to this as non-parallel shift [30].

Responsiveness equation

Underlying health system responsiveness faced by individual i can be expressed as:

$$ R_{i}^{s * } = Z_{i} \beta + \varepsilon_{i}^{s} ,\quad \varepsilon_{i}^{s} |Z_{i} \sim N\left( {0,\,\sigma^{2} } \right) $$

(4)

where $ Z_{i} $ represents a set of regressors predictive of responsiveness. As with the vignettes $ R_{i}^{s * } $ represents an unobserved latent variable and we assume that the observed categorical response, $ r_{i}^{s} $, relates to $ R_{i}^{s * } $ in the following way:

$$ r_{i}^{s} = j\quad {\text{if}}\;\mu_{i}^{j - 1} \le R_{i}^{s * } < \mu_{i}^{j} \quad {\text{for}}\quad\mu_{i}^{0} = - \infty ,\;\mu_{i}^{5} = \infty ,\;\forall \,i;\quad j = 1, \ldots ,5 $$

(5)

where $ \mu_{i}^{j} $ are defined by (3) with $ \gamma^{j} $ fixed and it is assumed that $ R_{ik}^{v * } $ and $ R_{i}^{s * } $ are independent for all $ i = 1, \ldots ,N $ and $ k = 1, \ldots ,V. $ Note that $ \hat{\sigma }^{2} $in Eq. 4 is identified due to the thresholds being fixed through the reporting behaviour equation. It follows that the probabilities associated with each of the five categories are given by:

$$ \Pr \left( {r_{i} = j} \right) = \Upphi \left( {\mu_{i}^{j} - Z_{i} \beta } \right) - \Upphi \left( {\mu_{i}^{j - 1} - Z_{i} \beta } \right),\quad j = 1, \ldots ,5 $$

(6)

where $ \Upphi ( \cdot ) $ is the cumulative standard normal distribution.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Rice, N., Robone, S. & Smith, P. Analysis of the validity of the vignette approach to correct for heterogeneity in reporting health system responsiveness. Eur J Health Econ 12, 141–162 (2011). https://doi.org/10.1007/s10198-010-0235-5

Download citation

Received: 21 September 2009
Accepted: 04 March 2010
Published: 28 March 2010
Issue Date: April 2011
DOI: https://doi.org/10.1007/s10198-010-0235-5

Keywords

JEL Classification

I10

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Analysis of the validity of the vignette approach to correct for heterogeneity in reporting health system responsiveness

Abstract

Similar content being viewed by others

Promises and Pitfalls of Anchoring Vignettes in Health Survey Research

Comparing South Korean and US self-rated health using anchoring vignettes

Systematic measurement error in self-reported health: is anchoring vignettes the way out?

Introduction

Data and methods