Introduction

Calculation of Quality-Adjusted Life Years (QALYs) in cost-utility analysis requires description and subsequent valuation of health states characterising a disease area [1]. Generic preference-based measures (PBMs), such as EQ-5D [2], SF-6D [3], and HUI3 [4], are widely used for this purpose. These instruments consist of a general health state descriptive system and an algorithm converting each health state into a utility value.

Generic PBMs may be inappropriate or insensitive in capturing Health-Related Quality of Life (HRQoL) in some medical conditions [5]. On the other hand, the majority of available condition-specific measures (CSMs) are not preference-based. One solution to this problem has been the “mapping” from CSMs directly onto generic PBMs (e.g. [6, 7]); however, this process may result in limited performance in terms of model fit and ability to predict values where the overlap between the generic measure and the CSM is poor [8, 9]. For this reason, there has been an increased interest in the development of PBMs directly from existing CSMs.

CSMs normally consist of a large number of items capturing multiple dimensions of health. Inclusion of all items in a PBM would often result in the description of a massive number of potential health states that would be impractical to use and complicated to value. The main approach of dealing with this situation is to develop health state classifications by selecting 1–2 items from each dimension represented in a CSM, thus defining a concise set of health states. This approach was first applied to the generic SF-36 in the development of the SF-6D preference-based index [3] and has since been used at the development of PBMs from a number of CSMs [1013]. Factor analysis can be used in such cases to assess the dimensional structure of a measure, explore potential correlations between dimensions, and suggest appropriate reductions in dimensions [14]. Items may be selected based on classical psychometric criteria, such as internal consistency and responsiveness to change. Rasch analysis has also been used at the development of health state classifications from existing CSMs, in order to select items within dimension and reduce item response levels [15, 16].

Ideally, health state classifications should have a multi-dimensional structure with little or no correlation between items. This requirement results from the demands of the valuation stage, where a sample of states is selected for valuation since it is not practical to value all states. For instruments like EQ-5D and SF-6D that employ statistical inference, statistical designs such as orthogonal arrays and balanced designs are used to estimate additive models in order to predict the values for all potential health states. For the HUI3 that uses multi-attribute utility theory, ‘corner’ states must be valued where one dimension is at the worst level and all others are at the best level. A major problem arises when items in a health state classification tap the same or highly correlated dimensions and therefore cannot be treated independently, as separate statements. In such cases, some of the health states may include combinations of statements that are not plausible (e.g. ‘I feel happy most of the time’ and ‘I often feel like crying’). This problem is most likely to arise in the case of CSMs with high correlation between dimensions.

One approach for developing plausible health states from measures characterised by high correlation across their items has been described by Sugar et al. [17]. The authors conducted cluster analysis in order to identify distinct groups of patients with depression based on their mental and physical health composite scores on SF-12. The resulting patient groups corresponded to 6 distinct health state descriptions for depression that were clinically meaningful since they were derived from actual cases observed in clinical practice. This approach can therefore be employed for the development of PBMs from CSMs with few and highly correlated dimensions, where conventional statistical approaches for generating health states (such as orthogonal arrays) are not appropriate.

In this paper, we propose an alternative approach for constructing plausible health state descriptions from CSMs with high correlation between their dimensions, using Rasch analysis. Rasch analysis has already been used in order to select appropriate items and response levels from existing multidimensional CSMs [15, 16]. Here, we take advantage of another property of Rasch models relevant to our context, that is, the ability of Rasch analysis to assign respondents to different points of severity along the latent variable, based on their responses, and to subsequently generate groups of respondents of different symptom severity [18]. We have used this attribute of Rasch models in order to develop plausible health states from the Clinical Outcomes in Routine Evaluation Outcome Measure (CORE-OM).

Methods

The Clinical Outcomes in Routine Evaluation-Outcome Measure (CORE-OM)

The CORE-OM is an instrument measuring common mental health problems that has been developed to evaluate the effectiveness of psychological therapies across multidisciplinary services in the United Kingdom [19]. It consists of 34 items, each with 5 levels of response: ‘not at all’, ‘only occasionally’, ‘sometimes’, ‘often’, and ‘most or all the time’. The items tap 4 conceptual domains: ‘subjective well-being’ (4 items), ‘problems’ (4 items on depression, 4 items on anxiety, 2 items on physical symptoms, and 2 items on trauma), ‘functioning’ (4 items on general functioning, 4 items on close relationships, and 4 items on social relationships), and ‘risk’ (4 items on risk-to-self and 2 items on risk-to-others). Eight of the items are positively worded. The dimensional structure of CORE-OM is presented in Table 1.

Table 1 The dimensional structure of the CORE-OM

The CORE-OM comprises a valid, reliable, and acceptable effectiveness measure across a wide range of practice settings offering psychological therapies [20, 21]. It has been routinely used to evaluate psychological therapies and counselling services in primary and secondary settings in the United Kingdom [19, 22] and is a widely used patient-based tool for measuring mental health outcomes in the British National Health Service (NHS) [23, 24]. Based on these characteristics and given the scepticism about use of generic PBMs in mental health and the arguments favouring the development of a mental health-specific PBM [2527], CORE-OM was selected as the basis for constructing a PBM specific to common mental health problems.

With 34 items having 5 levels each, CORE-OM may form a practically unmanageable number of 534 health states. Previously undertaken exploratory factor analysis revealed that the 34 items loaded on 3 components, one including mainly the negatively worded items, one made up of the positively worded items, and a risk items component [20]. The same study examined the correlation across the instrument domains and demonstrated that the domains of ‘subjective well-being’, ‘problems’, and ‘functioning’ were highly correlated with each other (in pairwise examinations of the 3 domains the Spearman’s ρ value exceeded 0.70 in both clinical and non-clinical populations); the ‘risk’ items also showed high though somewhat lower correlation with the non-risk items (Spearman’s ρ value = 0.64 in a clinical sample; 0.44 in a non-clinical sample) [20]. Thus generating states using standard statistical design from the health state classification would not be appropriate in this case, as it would most likely result in implausible health states. For this reason, a new method using Rasch analysis was applied, aiming at the construction of credible health states from CORE-OM.

The CORE-OM dataset used in Rasch analysis

Data analysed in this study were derived from a database containing patient information from 33 NHS primary care services in the United Kingdom. Data included CORE-OM scores, as well as patients’ age, gender, and ethnicity. Details on the database and the data collection procedures are available in Evans et al. [22]. A sample of 1,500 primary care clients formed the dataset for the work presented in this paper [N1500]. Conventional psychometric tests were conducted on the whole dataset [N1500]. A random sample of 400 respondents [N400a] out of the whole dataset [N1500] was used in Rasch analysis. Use of a smaller sample size for Rasch analysis was dictated by evidence that some Rasch fit statistics for polytomous scales (like CORE-OM) are highly dependent on sample size, which translates into a higher possibility for type I errors with increased sample size [28]. The findings of Rasch analysis on [N400a] were validated on another random sample of 400 respondents [N400b] out of the whole dataset [N1500].

Use of Rasch analysis to select items and identify plausible health states amenable to valuation

The Rasch model is underpinned by the principles of unidimensionality and local independence of items [29]. Rasch analysis can therefore be used at the development of unidimensional PBMs derived from existing CSMs with no clear multidimensional structure. The objective of this study was to use Rasch analysis in order to construct a health state descriptive system from CORE-OM, amenable to valuation. This process, which resulted in the development of a measure able to describe health states for common mental health problems across 6 conceptual domains (named ‘CORE-6D’), followed 4 major steps illustrated in the flow diagram shown in Fig. 1.

Fig. 1
figure 1

Flow diagram of the process of developing a health state descriptive system and plausible health states using Rasch analysis as a primary tool

Step 1

Use of Rasch analysis, conventional psychometric tests, and expert opinion in order to exclude items and develop a unidimensional scale

Rasch analysis was undertaken in order to identify and subsequently exclude CORE-OM items not fitting the Rasch model. Conventional psychometric tests provided additional indication of an item’s suitability for inclusion in a PBM and were considered at deciding which item to exclude first, when two or more items met exclusion criteria in Rasch analysis. The criteria used to exclude items from Rasch analysis and judge their classical psychometric properties have been described and justified in previous related studies [15, 16]. In summary, the following criteria were considered at the development of the new instrument:

Rasch analysis criteria for item exclusion

  • Item level ordering: item-threshold maps were inspected to investigate whether respondents were able to distinguish between adjacent response levels. When items had disordered thresholds (i.e. when an item score was likely to decrease as respondent’s symptom severity increased), then visual inspection of respective category probability curves determined which adjacent responses to merge. If the only way to order an item’s thresholds was by merging adjacent responses that were not clinically meaningful, then this item was excluded. For example, a new response level that was formed by merging response levels ‘sometimes’ and ‘often’ was deemed to be not clinically meaningful in any of the items; in addition, response levels ‘not at all’ and ‘only occasionally’ could not be merged in the case of ‘severe’ items, such as the risk items, as the two response levels, in this case, indicated significantly different levels of severity.

  • Goodness of fit following threshold re-ordering: overall and item fit statistics were examined to assess whether the whole instrument and individual items fit into the Rasch model. Items with fit residuals beyond ± 2.5 and/or significant χ2 statistics (at the 0.01 level after Bonferroni adjustment) were excluded.

  • Differential item functioning (DIF): items demonstrating significant DIF (that is, responses depended on patients’ age, gender, or ethnicity) were excluded from further consideration for two reasons: first because DIF can be a source of misfit in the Rasch model; second because items forming a PBM need ideally to constitute a universal measure, expressing the same aspects of HRQoL across the whole patient population (and capturing the same preferences by the valuing population), and not to distinguish significantly among sub-groups with different baseline characteristics. For this reason, although uniform DIF can be dealt with by splitting for DIF and separating the item into the different sub-groups where DIF has been identified, this was not attempted in this analysis.

Conventional psychometric tests

  • Responsiveness to treatment, measured by the standardised response mean (SRM), which is estimated as the mean change in score of an item before and after treatment, divided by the standard deviation of the change score.

  • Percentage of missing data

  • Correlation with total CORE-OM score, expressed by Spearman’s non-parametric ρ values

Results of conventional psychometric tests were used to compare the performance of items for inclusion in the final measure; thus, no formal threshold values were set to determine item exclusion.

In addition, at early stages of the analysis, expert opinion was sought to judge the appropriateness and relevance of certain CORE-OM items for inclusion in a PBM expressing peoples’ perceptions on their own HRQoL relating to common mental health problems.

Items not fitting into the Rasch model (i.e. meeting one of the criteria for item exclusion described above) were excluded one at a time followed by Rasch analysis on the remaining items and subsequent testing of Rasch statistics. The person-separation index was constantly checked to ensure that the model had good ability to discriminate amongst different respondent groups. This process was repeated until all remaining items fit into the Rasch model.

Step 2

Selecting items for the new measure

After several items were excluded, a scale fitting the Rasch model was constructed that, nevertheless, still contained a high number of items. However, there is evidence that respondents can receive, process, and remember about 7 ± 2 pieces of information, depending on the complexity of the statements [30]. For this reason, and because our purpose was to develop a concise PBM manageable in a valuation survey, further exclusion of items was undertaken, after testing different item combinations and applying the following criteria:

  • The final instrument should consist of items representing the various conceptual domains of the CORE-OM. In order to keep the number of items concise, one item at maximum per domain should be selected for the final instrument.

  • Overall model statistics should demonstrate best possible fit of the measure in the Rasch model, indicating the unidimensionality of the new scale.

  • Response levels should be the same for all items

  • Respective threshold locations for all items (the points where the probabilities of adjacent levels of response are equally likely) should ideally increase with increasing ‘difficulty’ of the item (as expressed by its average location). This was checked by visual inspection of the item threshold map and ensured a ‘smooth’ transition of responses from milder to more severe health states.

  • The final instrument should be well-targeted to the patient population, covering the whole range of patients’ symptom severity.

At the end of this step, an extra post hoc test was undertaken to confirm the unidimensionality of the new scale, as proposed by Smith [31]; this test has been recommended for this purpose in the Rasch literature [29, 32]. The first stage of this test is to undertake principal component analysis of the item fit residuals, in order to identify the first residual factor that primarily contributes to the variance of the data after the ‘Rasch factor’ has been taken into account. Subsequently, the correlation between the items and the first residual factor is examined in order to define 2 subsets of items (i.e. positively and negatively correlated). These two ‘divergent’ sets of items, which are most likely to breach the assumption of unidimensionality, are used to estimate two separate scores for each respondent, respectively. If the content of the whole scale is unidimensional, then each respondent should produce similar scores in the two subsets. Thus, independent t-tests are undertaken for each pair of scores on each respondent in order to estimate the proportion of significant tests at the P = 0.05 level in the study sample. The percentage of significant tests should not exceed 5%. A confidence interval for a binomial test of proportions is calculated for the observed number of significant tests, and this value should overlap the 5% expected value for the scale to be considered unidimensional [29].

Step 3

Deriving health states for utility measurement

The item threshold map was visually inspected after all the above criteria had been satisfied, to identify the most likely item response combinations expected across the continuum of patients’ symptom severity. The most likely item response combinations at each location across the scale represented frequently observed, plausible health states experienced by the study population.

Step 4

Validation of the new measure

The new measure was validated on the random sample [N400b]: the scale was tested for overall and item fit statistics and DIF. The post hoc unidimensionality test was repeated and the item threshold map was inspected to indentify the most likely item response combinations in the validation sample [N400b].

Conventional psychometric tests were undertaken using SPSS 11.5 [33]. Rasch analysis was performed on RUMM2020 [34].

Results

Rasch analysis, conventional psychometric tests, and expert opinion for exclusion of items

Rasch analysis of CORE-OM on [N400a] revealed that 26 out of the 34 CORE-OM items had disordered thresholds. Threshold ordering was achieved by merging adjacent response levels following visual inspection of item category probability curves. Since no common re-scoring to all 34 items was possible to apply, we selected to use the partial-credit Rasch model for our analysis. After all thresholds were ordered, goodness of fit was assessed by examining overall model and individual item statistics. The CORE-OM did not fit into the Rasch model, with 11 items showing misfit (either a fit residual beyond ± 2.5 or a χ2 probability significant at the 0.01 level). Moreover, 5 items demonstrated DIF.

Results of initial Rasch analysis on all items are shown in Table 2. Table 3 provides the results of conventional psychometric tests. Based on the results of Rasch analysis, a number of items were consecutively excluded from further analysis according to our exclusion criteria, until a good model fit was achieved. Conventional psychometric test results were consulted as an extra indication of items’ psychometric properties.

Table 2 Results of initial Rasch analysis of CORE-OM (all items included)
Table 3 Results of conventional psychometric tests on CORE-OM

At an early stage of this process, it was decided to exclude items 6 (I have been physically violent to others) and 22 (I have threatened or intimidated another person). These items were judged not to be relevant to a preference-based measure, as they expressed external behaviour and not people’s perceptions on their HRQoL. Moreover, both items had very low correlation with the total CORE-OM score and demonstrated low responsiveness to treatment. Item 34 was also excluded as its wording was judged to be ambiguous.

Successive Rasch analyses led to the exclusion of items 3, 8, 9, 19, 23, 24, 27, and 31 that persistently (in the initial and all consecutive analyses) misfit into the Rasch model, after considering also the results of conventional psychometric tests. For example, item 19 had relatively low responsiveness, low correlation with the initial instrument, and the highest percentage of missing data. Items 3, 8, and 31 had low correlation with CORE-OM. Item 9 had relatively low responsiveness. Items 14 and 29 were excluded because they demonstrated persistently significant DIF. Items 5, 18, 28, and 30, although did fit into the Rasch model in the initial analysis, demonstrated high fit residuals (beyond ± 2.5) at later stages and were eventually excluded from further consideration. On the other hand, items 1 and 17, which showed misfit to the model at initial stages of analysis, appeared to fit into the model at later stages, following deletion of other items, and were thus retained in the analysis.

Item 8 (I have been troubled by aches, pains, physical problems) was excluded in early stages of analysis due to misfit, which was expected, since the item expressed physical symptoms and therefore clearly belonged to a different dimension from items measuring, in their majority, emotional symptoms. Nevertheless, physical symptoms were judged to constitute an important dimension in its own right that should be captured by the final PBM; hence, although item 8 was excluded from Rasch analysis, it was decided to be combined with the final (unidimensional) product of Rasch analysis, thus creating a 2-dimensional measure tapping emotional and physical symptoms.

The 17 items of CORE-OM that fit into the Rasch model and the respective Rasch statistics are presented in Table 4. The 17-item scale had a good fit (total χ2 probability 0.275) with an excellent ability to discriminate amongst different groups of respondents (person-separation index 0.898).

Table 4 Results of Rasch analysis with the 17 items of CORE-OM fitting into the Rasch model

Selecting items for the emotional component of CORE-6D

The purpose of this stage of analysis was to further remove items so as to derive a concise unidimensional measure that would be manageable at a valuation exercise but at the same time would capture major conceptual domains of CORE-OM.

All items belonging to the conceptual domains ‘symptoms—depression’, ‘symptoms—physical’, and ‘risk/harm to others’ had already been excluded at a previous stage. Expert judgement concluded that the conceptual domains ‘symptoms—anxiety’ (represented by items 2, 11, 15, 20 in the 17-item scale), ‘functioning—general’ (items 7, 12, 21, 32 in the 17-item scale), ‘functioning—close relationships’ (items 1, 26 in 17-item scale), ‘functioning—social relationships’ (items 10, 25, 33 in the 17-item scale) and ‘risk/harm to self’ (item 16 in the 17-item scale) reflected major domains in people with common mental health problems and should be represented in the final construct. The ‘subjective well-being’ domain (items 4 and 17 in the 17-item scale) covered the overall perception of person’s HRQoL rather than distinct symptoms/problems of the study population. Indeed, this domain had been previously found to highly correlate with items in the overall ‘problems’ domain [20]. Regarding the ‘symptoms—trauma’ domain (item 13 in the 17-item scale), this was considered less relevant for this HRQoL measure. More importantly, attempts to include items of these last two domains (‘subjective well-being’ and ‘symptoms—trauma’) in the final measure resulted in a scale not satisfying the criterion of ‘smooth’ transition of the response thresholds from milder to more severe health states, set in step 2. Consequently, it was decided to exclude these domains from the final measure.

Items were excluded one at a time and Rasch statistics as well as the person separation index were constantly checked. Finally, various combinations of 5 items (of those included in the 17-item scale), one from each of the 5 CORE-OM conceptual domains considered for the emotional component of the new measure, were tested for their fit into the Rasch model, in order to construct a final scale that would meet the set criteria for this step.

Testing of various item combinations resulted in a measure consisting of 5 items (1, 15, 16, 21, 33), each with 3 levels of response, common to all items (‘not at all’, ‘only occasionally or sometimes’, and ‘often, most or all the time’). The 5 items belonged to 5 major CORE-OM conceptual domains, respectively. The scale demonstrated good model fit (χ2 probability 0.69). All items fit into the model, as shown in Table 5; no DIF was observed. The person-separation index reached 0.659. Respective threshold locations increased with increasing item difficulty, as shown in the item threshold map (Fig. 2). The item map demonstrates that the instrument is well targeted to the study population as it is able to capture the whole range of severity of mental symptoms, with minimal floor or ceiling effects and good spread of items across the full range of respondents’ scores (Fig. 3).

Table 5 Rasch statistics of the emotional component of CORE-6D
Fig. 2
figure 2

Item threshold map of the emotional component of the CORE-6D illustrating the plausible health states obtained by Rasch analysis

Fig. 3
figure 3

Item map of the emotional component of the CORE-6D

According to the results of the post hoc test proposed by Smith [31], the proportion of independent t-tests that were significant at the 0.05 level was 1.34% (well below 5%), thus confirming the unidimensionality of the emotional component of CORE-6D.

Deriving plausible health states from the emotional component of CORE-6D for utility measurement

Derivation of plausible health states was based on the item threshold map (Fig. 2). The map illustrates the most likely combinations of item responses expected to be obtained by the study population at various levels (locations) of symptom severity. Items have been ordered from the easiest (item 1) to the most difficult one (item 16), as indicated by their average location in the Rasch model. Shaded areas 0 (black), 1 (dark grey) and 2 (light grey) correspond to the 3 response levels ‘not at all’, ‘only occasionally or sometimes’, and ‘often, most or all the time’, respectively, with the exception of item 21, which is positively worded and therefore response levels are reversed. Threshold locations between response levels 0–1 and 1–2 increase (that is, they move from the left to the right) with increasing difficulty of the item, thus ensuring a smooth transition of responses from milder to more severe symptoms. The item threshold map allows prediction of the most likely responses at various levels of severity. For example, a person whose symptom severity corresponds to location +1 on the logit scale is expected to most likely respond 22210 (to items 1, 15, 33, 21 and 16, respectively).

Each combination of item responses represents a plausible health state, likely to be observed in people with common mental health problems. As illustrated in Table 6, 11 distinct health states can be identified. These states covered 37% of complete responses in N400a. In contrast, the coverage of health states derived using an orthogonal block design on the full range of 35 = 243 potential health states of the emotional component of CORE-6D was only 7%. Moreover, some of the states generated using the latter approach were not credible, as, for example, they described a situation where a person ‘never felt alone and isolated’ and at the same time ‘made plans to end their life often, most or all the time’.

Table 6 Health states of the emotional component of CORE-6D as identified by the item threshold map

Validation of the emotional component of CORE-6D

The emotional component of CORE-6D was validated on the random sample [N400b]: the scale had satisfactory overall and item fit statistics and no DIF was observed. The post hoc unidimensionality test verified the scale’s unidimensionality in this sample, too, and the item threshold map indicated the same most likely item response combinations (reflecting plausible health states) with those demonstrated by the analyses on sample [N400a].

Discussion and conclusions

This paper proposes a methodology that uses mainly Rasch analysis to develop plausible health states from existing CSMs that have no clear multidimensional structure; in such cases, conventional approaches for generating states from health state classifications (e.g. orthogonal block designs) are not appropriate, as, by treating items as independent (uncorrelated) statements, they are likely to result in formation of implausible health states. In contrast, the proposed ‘Rasch vignette approach’ helps create credible health states comprising combinations of item responses observed in a real population. Indeed, the health states developed with this method represent not only plausible, but also the most likely combinations of responses over a continuum of symptom severity, thus allowing prediction of a person’s severity of symptoms based on his/her responses and vice versa. On the other hand, in their clustering-based approach, Sugar et al. [17] combined the most frequent individual item responses within each cluster in order to develop health state descriptions. However, the resulting item response combinations were not necessarily the most frequently observed in the study sample; what’s more, they might have not been observed at all in the sample.

One limitation of our approach, similar to the methodology proposed by Sugar et al. [17], is that the number of generated health states is limited and does not capture the whole range of plausible combinations of responses. In the case of the emotional component of CORE-6D, the Rasch vignette approach generated 11 health states, which, nevertheless, covered 37% of the study sample’s complete responses; on the other hand, use of an orthogonal block design, which assumes that items are independent statements, achieved a much lower coverage of 7%, and, more importantly, generated a number of implausible health states.

Despite generating a limited number of health states, application of our approach allows the valuation of all potential health states described by CORE-6D: an advantage of Rasch analysis over the clustering-based approach is that it assigns all potential health states (i.e. all combinations of item responses including those not illustrated in item threshold maps) to different locations along the scale according to their level of severity. The relationship between the health states’ location across the latent variable and the respective utility values obtained in a valuation exercise can be estimated and used to generate utility values for all patients completing CORE-OM. This solution has been explored, using regression techniques, in a subsequent application of this approach on the Flushing questionnaire [35]. The findings of this latter study show that it is possible to assign appropriate utility values to all potential health states of a measure based on their location along the latent variable as estimated by Rasch analysis.

The emotional component of CORE-6D comprises a unidimensional 5-item scale, able to capture the full range of severity of emotional symptoms in people with common mental health problems. The person-separation index of this scale was approximately 0.66, which is somewhat lower than the 0.70 value that is generally considered acceptable for group comparison [36]. Nevertheless, the 0.66 figure was deemed adequate for our purpose, which was the development of a PBM, considering that the ability of the scale to discriminate amongst different respondent groups needed to be traded off with its conciseness and convenience in a valuation survey, where respondents need to process a combination of individual statements rather than a summated scale score.

The proposed Rasch vignette approach has led to identification of 11 plausible health states. These states, combined with 3 response levels (same as those of the 5 ‘emotional’ items) of item 8 of the original CORE-OM (I have been troubled by aches, pains, or physical problems), produce a 2-dimensional set of 11 × 3 = 33 plausible health states that can be used to value the overall emotional and physical HRQoL in people with common mental health problems. The next step of this study, recently completed, was to undertake a valuation survey in a representative sample of the UK population, in order to attach appropriate utility values to all health states of CORE-6D and thus convert it into a preference-based index. This new condition-specific PBM is suitable to use in the area of mental health, where the use of generic PBMs such as EQ-5D has been shown to be problematic [27, 37, 38]. Since this measure has been derived from CORE-OM, an instrument routinely used for outcome monitoring in people with common mental health problems in the United Kingdom, it is expected that this study will enable wider assessment of healthcare interventions for the management of common mental health problems in the form of cost-utility analysis.