Introduction

Health-related quality of life (HR-QOL) is defined by the World Health Organisation as ‘an individual's perception of their position in life in the context of the culture and value systems in which they live and in relation to their goals, expectations, standards and concerns’ [1]. This is a broad definition, which takes into account an array of important elements of which health and disability form an important part. However, even defining disability can be challenging. The International Classification of Functioning, Disability and Health [2] aims to formulate a structure for both the medical and social models of disability, defining how these, along with an individual’s perceptions, form disability. Scoliosis impacts heavily on these models as it changes an individual’s health perceptions, potentially leading to disability.

Adult scoliosis (AS) is a condition requiring different considerations when comparing to other forms of scoliosis. Individuals with AS experience an array of symptoms including back pain and focal neurology [3]. Furthermore, the AS population represents a wide age-range and spectrum of co-morbidities when compared to adolescent idiopathic scoliosis (AIS). The aetiology of AS is also different to AIS, being either degenerative or idiopathic in nature. All these factors mean that individuals often report a significant impact on their HR-QOL which has been classed as more significant than other health conditions such as heart failure, chronic lung disease or diabetes [4].

The reported incidence of AS varies in the literature from 2.9 to 32% [5,6,7,8]. The incidence of AS, and specifically adult degenerative scoliosis, is likely to increase with an ageing population [9]. Management of AS is aimed at improving HR-QOL through improving pain and function [10,11,12], with a lesser focus on the cosmetic appearance of the spine and torso, which is more of an issue in those with AIS [13].

Conservative measures for managing AS can be ineffective due to the degenerative nature of the condition [14]. Furthermore, the understanding and management of AS has undergone a paradigm shift over the last 10 years, with an increasing awareness and subsequent increased demand for active management [15, 16] leading to an increase in surgical management. Health economic evaluation of surgical interventions have been shown to be cost-effective [17]. However, to be able to accurately assess the impact of any intervention in this specific cohort of individuals, there is a requirement for appropriate patient-reported outcome measures (PROMs). PROMs have a well-established role within the scoliosis population as a whole [18]. The most widely used PROM in individuals with scoliosis is the Scoliosis Research Society questionnaire (SRS-22) which has been adapted and modified over time [19,20,21,22]. However, in the AS literature, a wide array of PROMs are also used, including the Oswestry Disability Index (ODI), the SRS-22 and Short Form-36 (SF-36).

It is key, however, that the measurement properties of any PROM that is used in any group of individuals is accurate and appropriate [23]. Performing research with PROMs without establishing their measurement properties is a waste of research resources [24] and provides little value. The measurement properties for PROMs used in AS have not been reported in the literature. It is, therefore, essential that this assessment is performed to identify and reduce the risk of bias and inaccuracy in any results reported using PROMs [25] in this group of individuals. This will allow a full understanding of how best to assess HR-QOL and subsequent management in this discrete patient population.

Objective

To systematically review and synthesise the evidence on the measurement properties of patient-reported outcome measure (PROMs) used to assess the quality of life in patients with adult scoliosis.

Methods

Design

This review and synthesis of evidence followed a previously published protocol [26] and is registered with PROSPERO (CRD42020219437). The review was designed using the COnsensus-based Standards for the selection of health Measurement INstruments (COSMIN) methodology [25], which aims to improve the selection of the best outcome measures in both clinical practice and research. The results of the review are reported using the preferred reporting items for systematic review and meta-analysis (PRISMA) guidance [27]. This guidance aims to ensure all systematic reviews and meta-analyses report a minimum set of items with the overall aim of improving the quality and the usefulness of systematic reviews. The PRISMA checklist is included as supplementary file 1.

Search strategy

The searches were performed in two stages. In stage one, any PROMs used to report on HR-QOL in individuals with AS were identified. This was completed to identify the frequency of use of the various PROMs and to help inform the search strategy for stage two. In stage two, studies assessing the measurement properties of the PROMs identified in stage one were identified. Stage two searches were grouped according to whether they were generic or disease-specific PROMs.

Eligibility criteria: stage one

Inclusion criteria

  1. 1.

    Individuals diagnosed with AS (> 10° Cobb angle [28] with an age of 18 years or more)

  2. 2.

    Any study which included a PROM of HR-QOL, as defined by the World Health Organisation: “Individuals' perceptions of their position in life in the context of the culture and value systems in which they live and in relation to their goals, expectations, standards and concerns” [1]. PROMs were defined using the Core Outcome Measures in Effectiveness Trials (COMET) taxonomy [29]

Eligibility criteria: stage two

Inclusion criteria

  1. 1.

    Individuals diagnosed with AS (> 10° Cobb angle [28] with an age of 18 years or more).

  2. 2.

    Any study which evaluated the measurement properties (reliability, internal consistency, measurement error, validity, content validity, structural validity, interpretability, responsiveness) of the PROMs identified in stage one.

The exclusion criteria for both stages of the search were studies not in the English language and previous systematic reviews.

Data sources

A comprehensive search strategy was performed using the National Institute for Health and Care Excellence (NICE) Healthcare Databases Advanced Search (HDAS) tool. The following databases were searched: AMED, CINAHL, EMBASE, Medline, PsychINFO and Pubmed from inception to 31st December 2020.

The details of the searches are included as supplementary files 2 and 3.

Study selection

A standardised blinded selection process was performed by two authors (JA and CB) independently. In stages one and two, the titles and abstracts were assessed against the predetermined eligibility criteria. For any study where the title and abstract was not clear with regards the inclusion and exclusion criteria, the full-text study was retrieved and reviewed. After unblinding, any disagreements were discussed and consensus agreed between the two authors. In the event of continued disagreement, the third reviewer (AG) acted as an arbitrator.

A PRISMA flow diagram was constructed to demonstrate this process [27, 30] for each stage of the search (Figs. 1, 2 and 3).

Fig. 1
figure 1

PRISMA diagram for the stage one search

Fig. 2
figure 2

PRISMA diagram for the stage two search for generic PROMs

Fig. 3
figure 3

PRISMA diagram for the stage two search for disease-specific PROMs

Data extraction

Included studies had data extracted on the study characteristics, participant characteristics, PROM and measurement properties (Table 1).

Table 1 PROMs separated into generic and disease-specific groups

Risk of bias assessment

The COSMIN checklist was used to assess for the risk of bias (ROB) of each study as one of the assessments of the quality of the paper. Each item describing a measurement property was rated using the scale (very good, adequate, doubtful or inadequate). Two reviewers (JA and CB) assessed study risk of bias independently with disagreements resolved by discussion. In the event of continued disagreement, the third reviewer (AG) acted as an arbitrator. This step is important within the COSMIN process as it helps to dictate what overall rating a study receives and whether the rating needs to be ‘downgraded due to risk of bias’.

Synthesis of data

Data were examined to assess whether a meta-analysis was appropriate. In the absence of a large number of studies or homogeneous data precluding meta-analysis, a narrative synthesis was performed. The different measurement properties for each PROM were rated as ‘sufficient’, ‘insufficient’ or ‘indeterminate’ as directed by the COSMIN process [25] based on the findings from the identified studies. The evidence was then assessed using the modified grading of recommendations assessment, development and evaluation (GRADE) approach [31, 32]. The GRADE approach combines the rating and the ROB outcome into a final recommendation for each measurement property.

Results

Stage one

Figure 1 shows the PRISMA diagram for the stage one searches.

Our stage one searches initially yielded a large number of studies, however, after duplicate removal this number almost halved. A number of studies (n = 38) were excluded as they did not utilise PROMs, with most focussing on radiological outcome assessment and highlights the traditional focus in the literature on ‘improving X-rays’ which does not necessarily lead to an improvement in PROMs [33] or HR-QOL. A further 17 studies were excluded from the stage one searches as they focussed on AIS, and in particular those under 18 years of age, rather than AS.

The list of PROMs is included in the final box of Fig. 1 with the multiple uses of each PROM tallied across studies. The agreement between reviewers was 92.1% for the eligibility of studies at the stage one searches. After discussion, 100% agreement was achieved and the third reviewer was not consulted.

The references for the studies, grouped by PROM, found in search one are included as supplementary file 4.

Stage two

Due to the large number of identified PROMs from stage one, the stage two search results were grouped to those which were generic PROMs and those which were disease-specific PROMs. The PROMs that were included in each section are shown in Table 1.

The stage two searches excluded most of the papers as they did not assess the measurement properties of the PROMs reported.

The PRISMA diagrams (Figs. 2 and 3) show the exclusion criteria which led to the selection of four studies that assessed the measurement properties of generic PROMs and five studies that assessed the measurement properties of disease-specific PROMs. Four studies [34,35,36,37] were results from both searches with one study [38] found in the disease-specific PROM searches alone. There was 100% agreement achieved between the two reviewers on screening and so a third reviewer was not consulted.

Study characteristics

The stage one search identified 99 studies which utilised 16 different PROMs (Fig. 1). There were ten generic PROMs and six disease-specific PROMs (Table 1). Of the 16 PROMs, only three measures achieved the selection criteria of the stage two search. These were the SRS-22, SRS-22r and the ODI. As shown in Fig. 1, the ODI and SRS-22 were the most commonly used PROMs identified. Five studies were identified from the stage two searches and Table 2 shows the characteristics of these studies.

Table 2 A table of the identified studies and the characteristics of the participants from the stage two searches

SRS-22

The SRS-22 is a PROM that has been developed for the AIS population and has been through multiple revisions [19,20,21,22]. As shown in Table 3, this PROM covers 5 sub-domains, which were important to patients with AIS.

Table 3 Characteristics of the identified PROMs found in the stage two searches

Responsiveness of the SRS-22 was studied by Bridwell et al., 2007 [34] who prospectively recruited 56 individuals to complete the SRS-22, ODI and SF-12 before and after surgery.

The validity and reliability of the SRS-22 was studied in individuals with AS by Berven et al. [35] who performed an observational study comparing healthy individuals to those with AS. The groups appear well matched for age and gender but there was no information on the curve types seen in the individuals with scoliosis.

The cross-cultural validity of the Russian version of SRS-22 was studied by Gubin et al. [36], the only study identified which was not performed in the USA. The study shows a discrepancy in the reported data concerning gender as it reports a total of 56 individuals comprising 25 males and 25 females. The authors were contacted for clarification, but no response was received.

ODI

The ODI is a PROM for assessing low back pain and function in patients [39, 40]. It has been utilised in AS because of the importance of back pain in HR-QoL for these patients [10,11,12]. As shown in Table 3, its sub-domains focus on specific aspects of function which are relevant to adult patients.

The responsiveness of the ODI was the only property assessed in individuals with AS by Blizzard et al. [37]. Unfortunately, there is limited information provided in the published work as illustrated in Table 2.

SRS-22r

The SRS-22r is a development of the SRS-22 and involved a change to one of the questions in the function sub-domain, aimed at improving internal consistency in patients under the age of 18 [20].

Cross-cultural validity of the SRS-22r was the only measurement property investigated by Arima et al. [38]. Their study compared individuals in both the USA and Japan. The study did include a large number of individuals with other forms or spinal deformity (n = 54), however, this was less than 50% and was, therefore, included.

Measurement properties and narrative synthesis

Due to the small number of identified studies, a summary of the evidence was created, shown in Tables 4, 5 and 6. The tables are grouped by PROM to allow easily visualisation of which studies assessed which measurement properties.

Table 4 A description of the measurement properties of the SRS-22 (n = 3 studies)
Table 5 A description of the measurement properties of the ODI (n = 1 studies)
Table 6 A description of the measurement properties of the SRS-22r (n = 1 studies)

Internal consistency was not assessed in line with the COSMIN guidance.

SRS-22

The SRS-22 was the PROM that had the largest number of studies assessing the measurement properties, as can be seen in Table 4.

Reliability

One study evaluated reliability of the SRS-22 [35]. Using the ROB scoring tool, the study was rated adequate for patient stability, which in this context relates to the individuals condition remaining stable between testing. It is assumed patients were stable in this study, but this was not directly stated. The time interval was appropriate, and it was assumed that the test conditions were similar. The overall ROB rating was downgraded to doubtful, due to a Pearson correlation coefficient being utilised to analyse the test–retest results, rather than an intraclass correlation coefficient calculation.

There was very low-quality evidence indicating indeterminate reliability of the SRS-22 and this was downgraded twice due to the doubtful ROB rating and the lack of further studies.

Content validity

No evidence was found for the content validity of the SRS-22.

Construct validity

One study evaluated the cross-cultural validity of the SRS-22 [36]. The samples had similar characteristics except for the group variable and was rated very good. The approach to data analysis was appropriate and a Cronbach alpha was calculated, giving a rating of very good. However, the sample size was small and so this led to a rating of inadequate. Furthermore, the errors identified in the reported data were an important flaw and so, therefore, was again given an inadequate rating. This led to an overall ROB rating of inadequate.

One study evaluated the hypothesis testing of the SRS-22 [35]. The study was rated very good as the comparator instrument (SF-36) was clear and well defined. The measurement properties of the comparator are well-known and Pearson correlation coefficient and Cronbach alpha were utilised as the data analysis method leading to ratings of very good. There were no other flaws identified and, therefore, a ROB rating of very good.

There was very low-quality evidence indicating sufficient construct validity of the SRS-22. This was downgraded twice for ROB and once for imprecision.

Criterion validity

No evidence was identified for criterion validity of the SRS-22.

Responsiveness

The responsiveness of the SRS-22 was evaluated by one study [34]. The study was rated very good for the comparator instrument (ODI). However, it was rated inadequate for measurement properties of the comparator instrument as these were not described or discussed. The statistical method was also rated as inadequate as it is not clearly stated how the scores were converted to ‘standard scores’ and were then compared using paired t tests. This led to an overall ROB rating of inadequate.

There was very low-quality evidence indicating sufficient responsiveness of the SRS-22. This was downgraded twice due to the ROB.

ODI

One study evaluated the responsiveness of the ODI [37]. This study lacked an adequate description of the characteristics of the sub-groups leading to a rating of doubtful. The method of statistical analysis appears appropriate and was rated very good. However, the study is reported as an abstract and, therefore, very limited information is available. The ROB rating was doubtful.

There was very low-quality evidence indicating sufficient responsiveness of the ODI. The ROB rating led to 2 downgrades of the rating.

Table 5 demonstrates that no other studies were identified which assessed the other measurement properties of the ODI in AS.

SRS-22r

One study assessed the cross-cultural validity of the SRS-22r [38]. It was unclear whether the samples were similar between the two groups which were compared, leading to a rating of doubtful. The statistical methodology used was receiver operating characteristic (ROC) curve analysis and appropriate statistical methods for the data and was rated very good. However, the sample size was also small and led to a rating of doubtful. The overall ROB rating was, therefore, doubtful.

There was very low-quality evidence indicating indeterminate cross-cultural validity for the SRS-22r. The rating was downgraded twice due to the ROB.

Table 6 demonstrates that no other studies were identified which assessed the remaining measurement properties of the SRS-22r in AS.

Discussion

This is the first systematic review to synthesise HR-QOL PROMs for use in an AS population and evaluate their measurement properties. The objective was to identify the PROMs that are best suited to assessing HR-QOL in individuals with AS.

The stage one searches highlighted a wide array of PROMs being utilised, most notably generic PROMs. This may be explained by the fact that there is not a set of core outcome measures for AS, as identified by the COMET initiative [41] and defined for this population leading to ambiguity about the choice of PROMs to use. The most frequently utilised PROM is the ODI, which was designed as a tool for use in individuals with low back pain [39, 40], rather than individuals with AS. However, one of the most important reasons for the use of the ODI in this group is its role in assessing pain and function which are key considerations in HR-QOL [1]. The SRS-22 was the next most used PROM, and has been extensively utilised in individuals with AIS. Due to its development within the AIS population, the SRS-22 is highly likely to have a high internal consistency as it is a PROM that has been developed specifically for individuals with scoliosis. While the SRS-22 is not specifically designed for only adult individuals, this internal consistency would likely lead to a high degree of inter-relatedness between the items. However, it is also reasonable to assume that it will not assess all aspects of HR-QOL that are important for individuals with AS. These individuals represent a very different cohort to those with AIS and whether the same features that are assessed by the SRS-22 are as important to adults as they are adolescents [42] cannot be assumed.

Limited studies were identified investigating the ODI or SRS-22r, and the evidence that was identified was of very low quality. The ODI appears to have been widely adopted due to its relevance to adult populations and its focus on pain and function which are regarded to be more important features in adults. As back pain is one of the primary complaints of individuals with AS, the ODI could hold an important role as a PROM in this cohort. While the ODI may be an entirely reasonable PROM to use in this cohort, the lack of assessment of measurement properties mean it cannot currently be recommended. This lack of evidence should not be interpreted as a lack of evidence for the ODI as a PROM, but that there is insufficient evidence to support its use in individuals with AS.

The SRS-22 demonstrated very low-quality evidence to support its reliability, construct validity and responsiveness. This result suggests that while SRS-22 is well studied in AIS, it has not had the measurement properties sufficiently assessed in AS to currently support its use. The lack of assessment of content validity is particularly important as it is often considered the most important measurement property of a PROM [43]. Without studies evaluating the content validity in individuals with AS, we may be missing domains that are highly valued by this group [43]. The SRS-22 aims to assess mental health, self-image and satisfaction with management, features absent from the ODI. However, the ODI appears to offer more granularity on function than the SRS-22 and this may explain its widespread use in AS.

Findings from this review therefore suggest that until the measurement properties of the ODI and SRS-22 are adequately assessed in AS, neither of these PROMs can be recommended.

Strengths and limitations

This was a robust review assessing the measurement properties of PROMs for AS [25], following an a priori, published protocol to reduce bias. The review used a team who are experts at systematic review methodology and focussed on a very specific subject area. The two-stage search strategy allowed for the development of a comprehensive list of PROMs which have been used in this population and then allowed assessment of whether the measurement properties have been studied. The small number of studies identified by this study is a potential limitation.

Implications for research and clinical practice

Further low risk of bias studies are needed to strengthen the level of evidence for the use of PROMs in AS, especially if these are to be used to ensure that interventions provide actual benefit to individuals’ HR-QOL and ensure appropriate benefit relative to the risk and cost of treatments.

Conclusion

Individuals with AS are being assessed using a wide array of PROMs. A very small number of studies have attempted to assess the measurement properties of these PROMs within an AS population. There is very low-quality evidence indicating indeterminate reliability, very low-quality evidence indicating sufficient construct validity and very low-quality evidence indicating sufficient responsiveness of the SRS-22. There is very low-quality evidence indicating sufficient responsiveness of the ODI. There is very low-quality evidence indicating indeterminate cross-cultural validity for the SRS-22r. Due to this low-quality evidence, no PROM can currently be recommended in this cohort and further studies on the measurement properties of PROMs for individuals with AS are urgently required.