Introduction

Health-related quality of life (HRQL) has grown in its importance as an essential outcome for patient-centered research [1]. Advances in medical research have resulted in prolonged survival for those with chronic diseases, making the patient’s experience vital to assessment of therapeutic effectiveness. Arguably, effective therapies not only alleviate the patient’s signs or symptoms, but also make a significant difference in their HRQL.

According to the ISOQOL Dictionary of quality of life and health outcomes measurement “HRQL is a measure of the value assigned to duration of life as modified by impairments, functional states, perceptions and opportunities, as influenced by disease, injury, treatment and policy [2].”

Measurement approaches to assess HRQL can be broadly grouped into two: (a) traditional measures with predetermined domains, (b) individualized measures with real time patient-selected domains. Both methods have their pros and cons and therefore, the approach taken to assess HRQL may vary according to the aim of the measurement.

Traditional HRQL measures, with standardized set of questions, are convenient tools for group comparisons. These measures are also useful for economic evaluation of new or equally effective health care interventions. However, traditional measures may not represent all health domains valued by each individual patient [3,4,5,6]. Some researchers have expressed concern about the lack of patient-centeredness of traditional HRQL measures [7]. The predetermined questions on traditional measures may not be relevant to particular patients at different stages of their disease [6,7,8]. Moreover, what may be important for one patient may not have similar value for another patient [9]. Personalized approaches to treatment and variation in patient characteristics such as age, gender, disease severity and other environmental and genetic factors also enhance potential differences between treatment effects that a particular therapy may produce [9, 10]. To avoid the complexity of heterogeneous treatment effects [11], individual patient data are thought to be better captured by generic individualized outcome instruments that allow each patient to determine and measure what is important to him/her during a clinical consultation. Similar to any other measurement approach, individualized measures also have some limitations. Individualized measures cannot be used for economic evaluation as well as their scores cannot be used for comparability between individual patients or group of patients. This lack of comparability of scores has been a topic of tension and debate for a while and therefore it remains unclear whether the psychometric criteria that investigate the cross-sectional comparability of the scores of patient-reported outcome measures (such as structural validity and internal consistency) are applicable to individualized tools. Therefore, these psychometric criteria are not discussed in this review.

In the ISOQOL Dictionary of definitions, individualized measures are defined as “measures that allow patients to identify domains (or areas of life) that are important to them, and then to assign a weight on the relative importance of each one [2].” Measure yourself medical outcomes profile (MYMOP) is an individualized measure that allow patients to nominate and score two most important aspects of their lives (in the order of their importance) that contribute most to their overall quality of life but does not ask respondents to weigh their nominated domains. Presumably the wording on the questionnaire leads patients to name two top most important aspects of their life in the order of their importance and thereby weighting is implicit and not formalized.

Examples of commonly used individualized measures include: Schedule for the Evaluation of Individual Quality of Life-Direct Weighting [12,13,14,15,16] and Patient Generated Index [12, 17,18,19]. Critical analysis of the properties of SEIQoL and PGI has been reported in the literature as standalone measures [13, 17] and also in the context of a number of health conditions [12, 18]. Paterson et al.’s MYMOP [20] furthered the concept of individualized measures. The MYMOP has been invaluable since it is patient-centered and patient-completed; in this way, it is very different from clinical practice in which the clinician’s treatment goals may drive the questions they choose to ask of patients. Despite MYMOP being in use since 1996 [20], there is no critical review of the properties of measure performed to date. The purpose of this paper is to critically appraise the measurement properties of MYMOP and its adaptations.

Methods

Search strategies

A SCOPUS search for articles’ titles, abstracts, and keywords was conducted up to April 2017 with the name of adaptations of MYMOP. The names were identified using the MYMOP website and personal communication with instruments developers. The search terms included MYMOP, MYCaW, PSYCHLOPS, ‘MYMOP-pictorial,’ and MYMOP-P; only English language adaptations were included in the review. Besides, to identify additional publications, the reference lists of the included articles and publication list on each instrument’s primary websites was also scanned.

Finally, abstracts were screened to identify studies conducting formal psychometric evaluation, or qualitative evidence collection to validate the instruments of interest.

Quality assessment

We evaluated the results of measurement properties for each measure, identified, using the COSMIN checklist for systematic reviews of patient-reported outcome measures [21,22,23] and COSMIN taxonomy [24]. There are three domains of measurement properties: reliability, validity, and responsiveness [24,25,26,27,28,29]. Reliability is further subdivided into internal consistency, reliability, and measurement error. Validity is subdivided into content validity, construct validity, and criterion validity. The possible overall rating for each measurement property is “positive” (+), “indeterminate” (?), “negative” (−), or “no information available” (0) (Table 4).

Results

The Scopus search yielded 111 unique studies; an additional 28 studies were identified from questionnaires’ websites. After screening the title, abstract, and keywords, we retrieved 34 articles in full text. We finally included 16 studies, evaluating four questionnaires (MYMOP and three adaptations: MYCaW, PSYCHLOPS, and MYMOP-P) [20, 30,31,32,33,34,35,36,37,38,39,40,41,42,43]. The new questionnaires were adapted for evaluation of therapies in cancer [42, 43], psychiatry [31,32,33,34, 37], and acupuncture [35, 36]. Table 1 presents the general characteristics of these studies. It is notable that 10 of 16 were applied to evaluate effectiveness of complementary therapies.

Table 1 Study characteristics

Measure yourself medical outcome profile (MYMOP)

MYMOP is a problem specific, individualized measure that was developed in a primary care setting (Table 2) [20]. Each patient is asked to report two symptoms that bother them the most over the previous week, one activity limited by the reported symptoms, and general wellbeing. After an initial pilot study, a brief medication questionnaire was added to the scale [41]. However, medication questions are not scored and thus do not contribute to the final MYMOP score [44]. The overall score is calculated by taking the average of item scores, and is interpreted in the presence of individual item scores. For meaningful comparison, the items chosen must remain unchanged between the first and the subsequent completion of the questionnaire.

Table 2 Description of included measures

Quality assessment of MYMOP

We did not identify any studies evaluating measurement error, floor, ceiling effect, and interpretability of the MYMOP. Three studies assessed content validity (Table 3). The first of these gives clear description of the measurement aim and information on the target population [20]. The second study, [39] gathered patients’ views about MYMOP’s ability to measure outcomes that are important to them. This study compared the qualitative interview data of 20 interviewees to their corresponding quantitative MYMOP score [39]. Incorporation of participants and practitioners’ views resulted in the development of the current version called “MYMOP 2.” The third study exploring content validity [40] involved interviewing 23 new patients of eight acupuncturists. They used two qualitative analytical techniques: focus groups, and cognitive interview. The issues identified about MYMOP2 were floor effect, inability to measure episodic symptoms, and inaccurate measurement of medication change. No revisions of MYMOP2 were performed based on the study results [40].

Table 3 Summary of the assessment of measurement properties (based on COSMIN Criteria [21, 22])

Construct validity was assessed in two studies by examining the correlation between “perceived change in condition” and MYMOP scores [20, 41]. Both studies confirmed the MYMOP scores correlated with the perceived change in condition. Similar results were observed for the correlation of clinical-outcome assessed by physicians and MYMOP scores [41]. Also, MYMOP scores of individuals with acute conditions and those with chronic conditions were compared; it was hypothesized that changes in MYMOP score would correlate well with changes in acute conditions (< 4 weeks) rather than chronic conditions (> 4 weeks). This correlation was confirmed [20]. In addition, expected correlations of MYMOP and SF-36 scores were also reported [20].

Responsiveness of MYMOP was determined by gradient change in score at repeat applications across perceived changes by clinicians [20] and by patients [20, 41]. Standardized response mean, and index of responsiveness were also reported [20, 41]. A t test was conducted to compare the scores of patients who described themselves as a “little better” to “about the same, [41]” and gradient changes in scores at two and four weeks were determined [20]. The authors applied the SF-36, MOS-6A, and EQ-5D, simultaneously to the study population, but did not report correlation coefficients for changes [20, 41].

Measure yourself concerns and wellbeing (MYCaW)

MYCaW [42, 43, 45] was adapted from MYMOP to evaluate cancer patients undergoing integrative treatments (Table 2). Like MYMOP, it allows patients to define and measure their two most important concerns and general wellbeing on a seven-point ordinal scale; higher score signifies poorer health [46]. MYCaW also has pictorial faces, and the wording added at the each end of the seven-point scale: “not bothering me at all = 0,” “bothers me greatly = 6” [46]. There are two versions, self-administrated and face-to-face interview scale. Each version has initial and follow-up forms. The questionnaire consists of three scored domains, two of which are individualized. The followup form includes two open-ended questions: “other things affecting your health” and “reflecting on your time with (service name) what were the most important aspects for you? [42].” MYCaW provides quantitative (mean change in score and SD), and qualitative data.

Quality assessment of MYCaW

Adaptation and validation of MYCaW started in 2002 [45] (Table 3). Initial draft, for content validation, was discussed with experts and patient-representatives resulting in subsequent revision to the layout and wording of the instrument [42, 45]. A later study defined minimal important change for the interpretation of scores as 0.5, 1, and 1.5 as minimal, moderate, and large, respectively [38].

Construct validity of MYCaW was evaluated by testing a priori hypothesized negative correlation of r > 0.3 with functional assessment of chronic illness therapy questionnaire-spiritual subscale (FACIT-SpEx) [47]. The FACIT-SpEx is an expanded version of the FACIT questionnaire. In addition to physical, social/family, emotional, and functional wellbeing, it also includes questions on spiritual wellbeing relating to cancer therapy. The reported results confirmed a correlation of r = − 0.57 [47].

Responsiveness indices reported were standardized response mean and effect size of baseline and 6-week MYCaW and FACT-SpEx patient scores [47]. The Guyatt’s responsiveness index for MYCaW concern 1, 2, wellbeing, and overall profile were grouped according to five predefined categories on FACIT-SpEx scale. The categories were as follows: ‘substantial improvement,’ ‘clinically relevant improvement,’ ‘stable,’ ‘clinically relevant deterioration,’ and ‘substantial deterioration.’ Scores on MYCaW were consistent with the categories except for the ‘stable’ group. The category of ‘clinically relevant deterioration’ did not have enough participants to analyze.

One of the advantages of MYCaW is its ability to capture range of qualitative information at individual level [42]. There have been substantial efforts to provide a frame of analysis for the rich qualitative information gathered by the questionnaire [43, 48]. Three questions of MYCaW were qualitatively analyzed: (i) “concerns and problems” question on the first form; (ii) “other things affecting your health,” and (iii) “what has been most important for you?” of the follow-up form. Sample of 782, 407, and 588 patients reported on “concerns and problems,” “other things affecting your health,” and “what has been important for you?” respectively. Their responses were organized into categories and a qualitative analysis guideline for MYCaW was developed; a focus group of five women validated the categories for appropriateness and acceptability. Four of the women who participated in the focus group had cancer, and one of them was a caregiver of a cancer patient. Later, for generalizability of the coding framework it was reviewed by mapping data from Penny Brohn Cancer Care UK and Ottawa Integrative Cancer Clinic Canada. As a result, some new categories under ‘physical concern,’ ‘hospital cancer treatment concerns,’ ‘concerns about wellbeing,’ and ‘practical concerns’ were identified.

Psychological outcome profiles (PSYCHLOPS)

PSYCHLOPS is an individualized mental health outcome measure [30]. Similar to MYMOP, PSYCHLOPS measures the score of unique issue(s) for an individual (Table 2). PSYCHLOPS is a one-page questionnaire [49] that consists of three domains: problems, function, and wellbeing. The questionnaire has three versions: pre-therapy, during-therapy, and post-therapy. Four questions are common to each version. The initial two questions ask patients to identify and measure their most bothersome problems, the third identifies and measures one function limited due to the identified problem(s), and fourth is about general wellbeing over the last week. A fifth question in the during-therapy version identifies any new problem that arises amidst therapy. A sixth question on the post-therapy version asks the patients to score how they feel compared to the start of therapy. PSYCHLOPS does not assign a score to every question. The questions related to Problems, Functioning and Wellbeing have six-point (0–5) scales, where higher score signify worse outcomes. The “individually identified” items from the initial form are transferred to the subsequent versions for patient to re-score them. This process provides changes in score from pre- to post-therapy [49].

Quality assessment of PSYCHLOPS

A group of clinical psychologists, counseling psychologists, psychotherapists, counselors, general practitioners, and academic mental health researchers interested in mental health started adaptation of PSYCHLOPS in 2004 (Table 3) [30].

Content validity was assessed by consulting patient representatives, and three expert groups. The initial draft was piloted to 30 patients [30], and it was revised as required [30]. In 2005 (Table 2), Ashworth et al. gathered information about the feasibility, validity, and usefulness of PSYCHLOPS from experts [33]. Internal consistency was determined via Cronbach’s alpha, and the values were within acceptable range [31, 34, 37].

In terms of construct validity, PSYCHLOPS has moderate to strong correlation with clinical outcomes routine evaluation-outcome measure (CORE-OM) [34] and Hospital Anxiety Depression Scale [22]. Responsiveness was defined as “sensitivity to change” and was measured by effect size [31, 34]. Interpretability was assessed by mean and SD of pre- and post-therapy scores [31, 34]. Test–retest reliability was reported as intraclass correlation coefficients (ICC) between baseline and retest as 0.70, 0.68, 0.69, and 0.79 for problems domain, activity that was hard-to-do, wellbeing, and overall score respectively [37]. The study participants for reliability assessment were healthy individuals and remained stable during the interim period.

In 2007, Ashworth analyzed if the preset items on CORE-OM identify the individualized PSYCHLOPS responses [32]. There were 611 individual responses on PSYCHLOPS and the responses were categorized into 8 themes and 61 sub-themes. Of 61 sub-themes, 27 (44%) were not mapped to preset questions of CORE-OM. Of 215 clients, 128 (60%) reported at least one response that could not be mapped to CORE-OM.

MYMOP-pictorial (MYMOP-P)

MYMOP-P was developed to assess elderly patients’ outcomes (Table 2) [35, 36]. During the study [36], the author found that patients who were “elderly,” “having low confidence in completing forms,” “low literacy,” or “mother tongue not English” were not able to fill MYMOP2 properly. To solve this issue MYMOP-P was developed. The measure has six points scale (0–5) that range from “as good as it could be” to “as bad as it could be.” Each response option has a “face” that corresponds to the current state of patient, and patients are asked to choose one face in order to score their reported issue. The author did not explain the method of questionnaire adaptation any further, it is not clear if any patient representatives were involved in the development process. To our knowledge, no formal evaluations of the instrument’s measurement properties are reported yet.

Discussion

In this article, we reviewed the format, content and evidence of measurement properties for MYMOP [20, 37,38,39], and its three adaptations [30,31,32,33,34,35,36,37,38,39,40,41,42,43, 45,46,47,48,49]. Of these measures, PSYCHLOPS was the most thoroughly evaluated [30,31,32,33,34, 37, 49], and therefore had the greatest evidence of its measurement properties, including test–retest and internal consistency reliability. To our knowledge, MYMOP-P [35, 36] has had the least formal evaluation regarding its measurement properties; only reported evidence on content validity was identified in this review.

Content validity was the most widely reported measurement property [20, 30, 33, 35, 36, 38,39,40, 45]. Of four measures, three had positive [20, 30, 33, 35, 45], and one (MYMOP-P) had indeterminate rating [35, 36] for content validity. The reason MYMOP-P had indeterminate rating for content validity was the lack of information on what and how target population was involved in ascertainment of the relevance of the questionnaire content. The author has been contacted for unpublished data on validity more than three times, but was unreachable. Construct validity was the second commonly tested measurement property [20, 31, 34, 47]; it was reported for all measures except MYMOP-P. Evidence on construct validation was limited in terms of reporting a priori hypotheses regarding expected correlations. Modern day reporting standards for assessment of construct validity [23, 24, 50] suggests that a priori hypotheses regarding the strength and direction of the correlation also be specified. Given our results, future validation studies should consider developing and reporting a priori hypotheses for construct validity evaluation.

Criterion validity was reported for three measures in five studies [20, 31, 34, 41, 47]; however, we find that all claims of criterion validity were actually supportive of construct validity under the current definitions [24]. We find it difficult to see an instrument as a “gold standard,” unless a short version of a questionnaire was tested against its long version [23, 24, 50]. Similar challenges in the evaluation of criterion and construct validity have also been highlighted in the review of PGI’s measurement properties [17]. We therefore evaluated these claims as we would evaluate construct validity. Our approach did not affect the grading of the evidence. For future researchers we recommend to avoid reporting such evaluations as criterion validity, unless it involves testing a short version of a questionnaire against a long version (gold standard); when a “gold standard” does not exist, criterion validity cannot be assessed. Further, assessment against SF-36 may be considered assessment of construct validity, not criterion validity, since some would argue that SF-36 is not a universally accepted “gold standard.”

Evidence internal consistency reliability is not relevant to the included measures. Internal consistency reliability is applicable for questionnaires with predetermined multidimensional domains and therefore is not calculated for individualized measures [17].

Of five studies reporting on responsiveness [20, 31, 34, 41, 47], two [31, 34] assessed responsiveness by effect sizes. We were unable to evaluate this evidence because the reported statistic did not meet the COSMIN and modified Terwee criteria for evaluation of responsiveness; both studies [31, 34] were published before these criteria were developed. Given these more recent criteria for measurement properties, we would recommend further evaluation of responsiveness of included measures. Lack of external anchor, a priori hypothesis and change in patients’ priorities/concerns are the common challenges that there also identified in the evaluation of responsiveness in SEIQoL-DW [13] and PGI [17].

Another limitation of the included studies is the imprecise use of terminology to define measurement properties. This finding is not unique to these studies; Mokkink et al. [50, 51] reported similar finding in a study of quality assessment of systematic reviews of measurement properties. Of note, international consensus on taxonomy of measurement properties is a recent development in the field of psychometrics [24].

Strength and weaknesses of our approach

Critical appraisal is essential to evaluate medical research; it helps identify methodological strengths and limitations. Critical appraisal can be done using checklist or score-based scales. For our review, we considered appraisal tools such as Criteria by the Scientific Advisory Committee of the Medical Outcomes Trust (MOT) [52], evaluating the measurement of patient-reported outcomes (EMPRO) [53], and Terwee [22] and COSMIN criteria [21, 23]. The MOT criteria provide a list of items that instrument developers should have considered ascertaining optimal properties of their tool. However, MOT does not provide guidance on how the reported evidence should be classified if any of the listed items are absent. Evaluating the measurement of patient-reported outcomes criteria has an integral scoring system, the weighting of which is not clearly described nor explicitly justified with empiric data [53]. We used the COSMIN criteria because the COSMIN checklist was developed through a consensus-based Delphi study and has empirical evidence supporting its measurement properties [50, 51]. We preferred to use a checklist rather than a summary score because a summary score does not provide specific details on methodological strengths or limitations. A checklist approach is also preferred by the Cochrane Collaboration, based on empirical evidence that the summary scores of quality assessment tools can be problematic [54,55,56]. As such, Cochrane has moved from the popular use of a score-based quality assessment tool [57], to the new descriptive checklist assessment, the Risk of Bias tool [54].

Unlike a systematic review, study inclusion, data abstraction, and quality assessment were not independently duplicated in this paper. We acknowledge that lack of independent duplication can be a source of error to a review; however, single data extraction does not result in any difference in the effect estimates for many outcomes [58]. Moreover, to strengthen our critical appraisal, we chose objective checklist criteria to evaluate the quality of measurement properties, enhancing the reproducibility of our results. Although we only included studies published in English, a Chinese, and German translation of the tools were identified in the database search, demonstrating the sensitivity of our search method to identify all relevant studies. Also, the MYMOP and PSYCHLOPS websites provided contact information of 12 and 10 language translations, respectively. However, translations into other languages were not included in this review as non-English questionnaires would not be applicable to English speaking populations, which was our primary interest. Future research should evaluate the cross-cultural validity of other language translations before application of these tools to target population.

Assessing HRQL offers the opportunity to improve physician-patient communication and achieve better outcomes [59, 60]. Given the multiple demands put on the health care system and the time constraints faced by health care providers, individualized measures that are short, straightforward and quick to administer may help integrate routine HRQL assessment in clinical settings. MYMOP and its adaptations offer a set of brief and easy-to-complete questionnaires that can be used to measure variation in patient-concerns regardless of their diagnosis. MYMOP has been criticized for being symptom specific [61, 62]; however, the recent development by patient-reported outcomes information system (PROMIS) encourages the use of domain-specific rather than disease-specific measures [63]. Researchers at PROMIS state that the experience of fatigue, headache, nausea, sleep problems, and etc. are less likely to be influenced by the mere presence or absence of a disease. MYMOP was developed primarily to overcome the diagnostic differences in different disciplines of health care in a primary care setting. MYMOP (and its adaptations) being generic domain (patient selected)-specific measure can be used to overcome issue of variability in outcome measurement in clinical trials.

As seen in this review, MYMOP and its adaptations have been widely used in the evaluation of complementary therapies because of their excellent fit with individualized patient-centered approach. Given the global initiatives advocating patient-centered research and outcomes [64,65,66,67], and a better understanding of limited application of evidence from group data of clinical trials to individual patients [11]; MYMOP and its adaptations can help provide rigorous data from patient perspective. While there are sophisticated methods to deal with heterogeneity of treatment effect [10], because they are often unavoidable and may not be necessarily seen as ‘undesirable’ there is a need to have robust generic individualized outcomes measures such as MYMOP and its adaptations. Therefore individualized outcome assessment tools such as MYMOP is the way forward to personalized medicine approaches to tailor conventional therapies from patient perspective.

Conclusion

MYMOP and its adaptations can be a starting point for domain-specific measurement of symptoms like pain, nausea, anxiety, etc. Given that validation is an iterative/ongoing process and considerable efforts have been put to develop and achieve sound psychometrics of these measures, we would recommend researchers to further the validation of MYMOP and its adaptations before considering developing a new measure. We recommend future studies on construct validity and responsiveness include well defined a priori hypotheses with direction and magnitude of expected correlations [23, 24, 50], and thoughtful consideration of external anchors against which the MYMOP measures are validated. Also to improve consistency, modern day recommended taxonomy should be used to define instrument measurement properties [24].