Keywords

An illustration titled P R O M reads patient-reported outcome measures. Another marking reads methodology. Properties read validity, reliability, and responsiveness. Systematic review tick marks evaluate, compare, recommend, and identify.

Patient Reported Outcome Measures can be assessed by evaluating their Measurement Properties.

A systematic review can be performed in order to compare and evaluate PROMs, to make recommendations regarding their use, and to identify any gaps or the need for the design of a new instrument.

The COSMIN initiative (Consensus-based Standards for the selection of health Measurement Instruments) has provided thorough methodological guides for performing such a systematic review.

This involves a step-wise approach, to assess separately content validity, internal structure and the remaining measurement properties.

Following the current advancements and increased scientific interest in research relating to quality of life, particularly with the use of patient reported outcome tools, clinicians are frequently involved in relevant studies.

A clinician may be interested to investigate which tool is more appropriate for their practice, and this is the purpose of this methodological overview.

Nevertheless, although a clinician can massively benefit from a more in-depth understanding of this methodology, it is strongly advised that such studies should be undertaken in close collaboration with Epidemiologists and Biostatisticians.

Introduction

Aim of the Chapter

This chapter aims to discuss and present the currently used methodology for performing studies and systematic reviews on the measurement properties of PROMs.

It aims to initially provide some insight into the most common terms utilised in the fields of designing and interpreting reported papers and results on PROMs.

The process of PROMs design, and generation of a new PROM is beyond the scope of this chapter and is only discussed as part of the assessment and evaluation of studies for a systematic review.

What Are Patient Reported Outcomes (PROs) and Patient Reported Outcome Measures (PROMs)

Patient-reported Outcomes (PROs) have long been established in current medical research, as both primary and secondary outcomes of studies.

According to the FDA, a Patient-Reported Outcome (PRO) is any report of the status of a patient’s health condition that comes directly from the patient, without interpretation of the patient’s response by a clinician or anyone else [1].

As Patient-Reported Outcome Measures (PROMs) or, alternatively PRO instruments, we define the instruments that are utilised to measure PROs or capture PRO data, such as questionnaires that are completed by patients [1].

In the relevant literature, when referring to a PROM or a PROM instrument, authors may be discussing a questionnaire as a whole or single question.

What Are the Measurement Properties of PROMs

Μeasurement properties are essential criteria in the design and evaluation of a PROM.

Broadly, these are Validity, Reliability, Responsiveness and Interpretability. Detailed definitions will be discussed below.

Why Perform Systematic Reviews on Measurement Properties of PROMs

Provided that PROMS, looking at an area of interest, exist already (developed and/or validated), a systematic review may be performed, in order to compare the measurement properties of these PROMs, evaluate the quality of each PROM, identify advantages and disadvantages of each PROM, and ultimately, recommend which PROMs should be used in future studies.

In addition, if the results indicate a rather low quality of the available PROMs, or inadequate measurement of the area of interest, then the systematic review may inform and guide the design of a new PROM.

Current Methodology: The COSMIN Initiative

The vast majority of guidance and tools on PROMs interpretation, has been provided by the Consensus-based Standards for the selection of health Measurement Instruments (COSMIN) initiative [2].

The COSMIN initiative, after initially identifying the lack of clear definitions and widely accepted methodology [3], has specified the definitions of the measurement properties of PROMs [4], and also provides comprehensive guidance for performing a systematic review of outcome measurements, as well as handbooks for the interpretation and assessment of each measurement property in PROMs.

Definitions and Taxonomy

In order to perform a systematic review on measurement properties of PROMs, the researcher must be familiar with the measurement properties, and their definitions.

As mentioned previously, the COSMIN initiative, following a Delphi study, has recommended definitions for the measurement properties [4].

Most importantly, the initiative agreed on a taxonomy, incorporating the measurement properties [4].

According to this taxonomy, COSMIN identifies three main domains of measurement properties in assessing the quality of a PROM; Validity, Reliability and Responsiveness with Interpretability being considered as a fourth domain (Fig. 4.1 and Table 4.1). A fourth domain, Interpretability, is also considered [4].

Fig. 4.1
An illustration titled quality of an H R PRO. 3 color-coded circles read reliability, validity, and responsiveness. Another smaller circle is marked as interpretability.

Three plus one domains of assessment of a quality of a PROM. Mokkink, L. B. et al. The COSMIN study reached international consensus on taxonomy, terminology, and definitions of measurement properties for health-related patient-reported outcomes. J. Clin. Epidemiol. 63, 737–745 (2010)

Table 4.1 Definitions of the three domains of the assessment of a PROM

Performing a Systematic Review

General

A systematic review on measurement properties of PROMs shares some common methodological features with any other systematic review. We will focus more on discussing the process of assessing the measurement properties.

The COSMIN initiative has provided summarising guidelines for performing a systematic review [5] as well as a more detailed user manual, describing the methodology in more depth [6].

In this section, we will present and discuss the processes recommended in these documents. All tables and figures are adopted from these sources.

The overall process and the steps that need to be followed, can be shown in the following flowchart [5].

As shown in the flowchart, a systematic review consists of three stages (Fig. 4.2).

Fig. 4.2
A 10-step flowchart divided into 3 sections. The sections perform the literature search, evaluate the measurement properties, and select a P R O M. The final step reads report the systematic review.

The first four stages of a literature search. Prinsen, C. A. C. et al. COSMIN guideline for systematic reviews of patient-reported outcome measures. Qual. Life Res. 27, 1147–1157 (2018)

Initially, as per routine practice, a literature search is performed followed by a thorough assessment of the measurement properties. Finally, recommendations can be exported and formed, and the review is reported.

Literature Search

The initial stage consists of the standard steps (steps 1–4) for performing systematic reviews.

  • Step 1: Formulating the aim

    When deciding and developing the aim of the review, the four key elements that need to be included are the construct of interest, the population, the type of the instrument and the measurement properties of interest.

  • Step 2: Formulating the Eligibility Criteria

    Not all studies mentioning the PROMs of interest are to be included. Eligible studies should fulfil the aforementioned four key elements. Most importantly, given the large amount of studies on different PROMs, the main focus should be studies looking at the assessment and evaluation of one (or more) of the measurement properties of the PROM, and certainly not studies just using the PROM as an outcome measurement.

  • Step 3: Performing the literature search.

    Standard Cochrane methodology should be followed for performing the literature search. The four key elements of the aim need to be included, as can be shown in the following flowchart, depicting the search strategy and terms, as described by the COSMIN initiative [5] (Fig. 4.3)

  • Step 4: Selection of abstracts and full-text articles

    Selection and review of the abstracts and full texts is performed in a routine manner with the general recommendation for this to be performed by two reviewers independently.

Fig. 4.3
An illustration, 3 independent columns are titled as all PROMs with 3 steps, all validated PROMs with 4 steps, and one or more PROMs with 4 steps. The bottom sections have a highlighted section reading exclusion filter terwee.

Step 3, performing the literature search. Prinsen, C. A. C. et al. COSMIN guideline for systematic reviews of patient-reported outcome measures. Qual. Life Res. 27, 1147–1157 (2018)

Evaluation of Measurement Properties

As demonstrated in the flowchart in Fig. 4.2, this is done in three main stages. Given the significance of content validity and internal structure, these are assessed separately, followed by assessment of the remaining properties.

  1. 1.

    Content Validity

  2. 2.

    Internal Structure

  3. 3.

    Remaining Properties (Reliability, Measurement error, Criterion validity, Hypotheses testing for construct validity, Responsiveness)

Evaluation of Content Validity

The COSMIN initiative, given the significance and complexity of the evaluation of content validity, provides a separate user manual, with the relevant methodology [7].

According to the COSMIN recommendations, there are three aspects of content validity in a PROM:

  • Relevance

  • Comprehensiveness

  • Comprehensibility

In order to assess these, COSMIN recommends ten criteria for good content validity, which have been formulated following a Delphi study [8], as shown in Table 4.2.

Table 4.2 Criteria for good content validity

To assess the above, we are using a stepwise process:

  • Step 1—Evaluation of the quality of the PROM development

  • Step 2—Evaluation of the quality of content validity studies on the PROM

  • Step 3—Evaluation of the content validity of the PROM

A more detailed description of the steps is provided below, but not in its full length and detail. For each step, COSMIN has very comprehensively provided relevant boxes, summarising the process in a rather succinct manner. These will also be presented below.

Step 1: Evaluating the Quality of the PROM Development

This step is further subdivided into steps 1a and 1b.

In step 1a, the quality of the PROM design is assessed (evaluating relevance).

In step 1b, the quality of any cognitive interview studies or pilot studies assessing the PROM, are examined (evaluating comprehensibility and comprehensiveness) (Table 4.3).

Table 4.3 COSMIN box 1

To perform the above steps, a number of items/questions need to be answered, as per the flowchart shown below (Fig. 4.4).

Fig. 4.4
A vertical decision flow diagram. It is divided into 2 parts. 2 areas have arrows pointing outwards reading P R O M development inadequate. Arrows at the bottom point to the final step titled as determine final rating.

Evaluating the quality of the PROM development. Caroline B Terwee et al. COSMIN methodology for assessing the content validity of PROMs

This describes 13 items/questions for Part 1a, and 22 items/questions for Part 1b. The detailed items are not presented here, and we would recommend reading the full manual, where the items are presented, along with further explanations and examples.

Step 2: Evaluating the Quality of Content Validity Studies on the PROM

In this step, we assess how patients and professionals were asked about the relevance, comprehensibility and comprehensiveness, either as part of the PROM design process, or as a separate content validity study (Table 4.4).

Table 4.4 COSMIN box 2: Standards for evaluating the quality of studies on the content validity of a PROM

This can also be widely separated in Steps 2a, 2b and 2c (asking patients about relevance, comprehensiveness and comprehensibility), and steps 2d and 2e (asking professionals about relevance and comprehensiveness), as shown in the respective flowchart. Overall, there are 31 items/questions to be assessed (Fig. 4.5).

Fig. 4.5
A 2-part illustration with 5 decision charts. Part A asks questions like whether were patients asked about relevance, comprehensiveness, and comprehensibility. Part B asks questions like were professionals asked about relevance and comprehensiveness.

Evaluating the quality of content validity studies on the PROM. Caroline B Terwee et al. COSMIN methodology for assessing the content validity of PROMs

For Steps 1–2

As mentioned previously, the exact items that are utilised in each step are not presented here.

What is important to note is how ratings are provided for each item. A 4-point rating scale is utilised, as shown here.

  • Very good

  • Adequate

  • Doubtful

  • Inadequate

For each item, the COSMIN manuals provide detailed examples of what criteria should be fulfilled to achieve is rating. Below we provide an example, of Item 5, from step 1a (Table 4.5).

Table 4.5 Example of the COSMIN manuals

To ensure high quality, COSMIN recommends using a ‘worst score counts’ method, where the lowest rating is utilised as an overall rating.

For Step 1, the lowest rating in the respective items will correspond to the overall rating for the PROM development.

For Step 2, the lowest rating in the respective items will correspond to the overall rating of the content validity studies on the PROM.

Step 3: Evaluating the Content Validity of the PROM

For this step, content validity of the PROM is evaluated by examining the quality and results of already performed studies on the PROM. This, again, is further subdivided in three steps.

For step 3a, ratings need to be provided for relevance, comprehensiveness and comprehensibility, using the ten criteria for good content (presented previously), for three different aspects, as per the table shown below.

  • Methods and results of PROM development study

  • Content validity studies on the PROM

  • Reviewers’ own ratings of the PROM (Table 4.6)

Table 4.6 COSMIN criteria and rating system for evaluating the content validity of PROM

Essentially, the ratings for the methods and results of the PROM development studies, and the content validity studies, are the ones already assessed in steps 1 and 2, according to the respective COSMIN boxes, and are utilised in this table.

With regards to the potential ratings of each criterion, these can be:

  • Sufficient (+): ≥85% of the items of the PROM (or sub-scale) fulfil the criterion

  • Insufficient (−): <85% of the items of the PROM (or sub-scale) does fulfil the criteria

  • Indeterminate (?): No(t enough) information available or quality of (part of a) the study inadequate

After ratings have been provided for each criterion, a final rating can be generated for relevance, comprehensiveness and comprehensibility. These three ratings are then combined to provide the Overall Content Validity Rating.

For these processes, COSMIN provides further tables and guidance in the manual, which are not presented here.

Importantly, given the individual importance of relevance, comprehensiveness and comprehensibility, it is recommended to report on them separately, if found relevant (different ratings/different importance), and not only as an Overall Content Validity Rating.

For step 3b, a qualitative summary of available studies is performed, providing a rating for relevance, comprehensiveness and comprehensibility, resulting in an overall rating for each domain, which will be added in the respective boxes of the aforementioned table.

Lastly, for step 3c, the ratings achieved from step 3b, are assessed with regards to the quality of the evidence that generated them, to determine how reliable these ratings are.

To do this, the GRADE approach is, as shown in the table below [9] (Table 4.7).

Table 4.7 GRADE criteria

Summary of Content Validity Assessment

In summary, as per the COSMIN guidelines and the methodology to assess content validity, a structured and step-by-step approach was presented.

Sequentially, a number of aspects are being examined systemically, and the relevant outcomes need to be reported in a systematic review:

  • Quality of PROM development process (step 1)

  • Quality of content validity studies on the PROM (step 2)

  • Overall ratings for relevance, comprehensiveness and comprehensibility, as well as a summative overall content validity rating (step 3)

Evaluation of Internal Structure

When evaluating internal structure, the properties that need to be assessed include structural validity, internal consistency and cross-cultural validity, as defined previously.

As per the definition of internal structure, at this stage, reviewers need to evaluate if the items in a scale or sub scale are appropriately correlated manifestations of the same one underlying construct. Subsequently, this step is relevant for studies based on such a reflective model (not formative).

COSMIN recommends three steps for assessing internal structure, which are summarised in the following table (Fig. 4.6).

Fig. 4.6
A flowchart with 3 steps. The steps are labeled as evaluating the methodological quality, applying criteria for good measurement, and summarizing the evidence and grade quality. The section between the second and third step reads make overview tables.

Evaluation of internal structure. Lidwine B Mokkink, Cecilia AC Prinsen, Donald L Patrick, Jordi Alonso, Lex M Bouter, Henrica CW de Vet, Caroline B Terwee. COSMIN manual for systematic reviews of PROMs COSMIN methodology for systematic reviews of Patient-Reported Outcome Measures (PROMs) user manual. (2018)

In the first step, the COSMIN Risk of Bias Checklist is utilised, by answering the relevant boxes for structural validity, internal consistency and cross-cultural validity/measurement Invariance, as demonstrated below [10].

3 tables titled as structural validity, internal consistency, and other. Table 1 illustrates 4 questions. Table 2 illustrates 5 questions. Table 3 illustrates 1 question. The answers are written in 5 columns titled as very good, adequate, doubtful, inadequate, and not applicable.

In the second step, data extraction is performed from studies on PROMS, focusing on patient characteristics, methods and timings of administration, interpretability, feasibility and results on measurement properties.

COSMIN provides the relevant tables that can facilitate and guide this data extraction (Fig. 4.7).The outcomes of theses will be evaluated against the criteria of good measurement properties (Table 4.8).

Fig. 4.7
2 blank tables and a box with 4 questions in between. Table 1 has column headers reading P R O M, construct, target population, mode of administration, and scoring. Column headers in table 2 read population, disease characteristics, and instrument administration.

Structural validity. Mokkink, L. B. COSMIN Risk of Bias checklist [PDF File]. Amsterdam Public Heal. Res. Inst. 1–37 (2018)

Table 4.8 Mokkink Cecilia AC Prinsen Donald L Patrick Jordi Alonso Lex M Bouter Henrica CW de Vet Caroline B Terwee Contact LB Mokkink, L. B. COSMIN manual for systematic reviews of PROMs COSMIN methodology for systematic reviews of Patient-Reported Outcome Measures (PROMs) user manual. (2018)

In the third step, reviewers should perform a quantitative pooled analysis or qualitative summary, and evaluated against the criteria for good measurement properties. Lastly, as described previously, grading of the evidence with the GRADE criteria, needs to be performed (Table 4.9).

Table 4.9 Updated criteria for good measurement properties

These tables are presented as examples, with the intention to provide the research with an initial overview of the process. The thorough and extensive work done by the COSMIN initiative has given us a very precise methodology, which we would be duplicating if we were to describe these processes in more detail. Therefore, we strongly recommend that researchers refer to the relevant manuals and checklists, as cited throughout the chapter—that can also be found on the COSMIN website.

Evaluation of Reliability, Measurement Error, Criterion Validity, Hypotheses Testing for Construct Validity and RESPONSIVENESS

The remaining measurement properties, are once again assessed in a similar process, with the use of the respective COSMIN Risk of Bias Checklist boxes, which are indicatively shown below (Tables 4.10, 4.11, 4.12, 4.13, and 4.14).

Table 4.10 Evaluation of reliability
Table 4.11 Assessing risk of bias in a study on measurement error
Table 4.12 Assessing risk of bias in a study on criterion validity
Table 4.13 Assessing risk of bias in a study on hypotheses testing for construct validity
Table 4.14 Assessing risk of bias in a study on responsiveness

Report and Selection of Most Suitable PROM

This final stage consists of evaluating interpretability and feasibility, formulating the recommendations and reporting the systematic review.

  • Evaluation of Interpretability and Feasibility (Fig. 4.8)

    These are assessed with the use of the relevant tables

  • Formulation of Recommendations

    COSMIN suggests dividing PROMs into three categories, according to the quality of evidence. In that way, the reviewers can assess and define which of the PROMs they assessed would be recommended for further use in the field, which require further studies and improvements, and which should not be used.

    The categories are shown below.

    1. (A)

      Recommended

      PROMs with evidence for sufficient content validity (any level) AND at least low quality evidence for sufficient internal consistency

    2. (B)

      Further research required

      PROMs categorised not in A or C

    3. (C)

      Not recommended

      PROMs with high quality evidence for an insufficient measurement property

  • Reporting the Systematic Review

    Reporting should be performed following PRISMA guidelines [11], and it is suggested it follows the flowchart that was presented initially (Fig. 4.9).

Fig. 4.8
A blank table with 4 columns and 13 rows. The row headers illustrate feasibility aspects like patient's and clinician's comprehensibility, completion time, and availability in different settings. Column headers read PROM A, B, C, and D.

Lidwine B Mokkink, Cecilia AC Prinsen, Donald L Patrick, Jordi Alonso, Lex M Bouter, Henrica CW de Vet, Caroline B Terwee. COSMIN manual for systematic reviews of PROMs COSMIN methodology for systematic reviews of Patient-Reported Outcome Measures (PROMs) user manual. (2018)

Fig. 4.9
A flow diagram with 3 sections marked as identification, screening, and included. These illustrate records and reports. The identification is of 2 types. One through databases and registers and another through other methods.

PRISMA 2020 flow diagram for new systematic reviews which included searches of databases, registers and other sources. Page, M. J. et al. The PRISMA 2020 statement: an updated guideline for reporting systematic reviews. BMJ 372, (2021)

Limitations and Considerations

We have chosen to present the COSMIN methodology as a roadmap for performing systematic reviews on measurement properties of PROMs, mainly due to the structured approach and detailed recommended process.

Researchers that are interested in performing a systematic review on measurement properties of PROMs, need to be aware of potential limitations, prior committing to following this methodology.

On a recent article by McKenna and Heaney, several points have been raised and we consider it useful to briefly mention them here [12].

According to this, the authors claim that there is lack of evidence to support the COSMIN recommendations. It is discussed that the guidelines have been produced based on empirical evidence, and the experience of the COSMIN steering committee.

In addition to that, while performing Delphi studies to agree and produce recommendations in a scientifically robust manner, there may be concerns about the inclusivity of the participating professionals.

A further point raised, concerns the omission of several aspects in the assessment of the PROM, that the authors consider significant, such as the construct theories, the fundamental measurements, unidimensionality, item generation and reduction.

Moreover, it is identified that there has been no actual evaluation of the COSMIN guidelines themselves. As an overall concept, the critique concludes that the COSMIN guidelines and recommendations are not evidence-based.

Lastly, the most significant point relates to who utilises and attempts to follow the COSMIN methodology.

As the vast majority of the researchers performing these reviews are clinicians, and given the complexity of the COSMIN guidance, it may be extracted that they lack the necessary expertise and ability to interpret and evaluate the relevant information, hence producing inaccurate reviews and recommendations.

Overall, we feel that through this chapter, a researcher may be introduced to the basics of performing systematic reviews on measurement properties of PROMs, and the COSMIN methodology and guidelines can be used as they introduce a step-wise approach and thorough approach.

Nevertheless, the limitations discussed bear some value—particularly with regards to the researcher’s expertise and background in the field. These should be meticulously taken into account, and the research team should certainly consider the involvement of professionals with a strong background in measurement, psychometrics, statistics and health-related quality of life research.