Introduction

Ankylosing spondylitis (AS) is a commonly occurring inflammatory rheumatic disease that affects the axial skeleton, which causes characteristic inflammatory back pain that can lead to structural and functional impairment [1, 2]. The prevalence of AS ranges from 9 to 30 per 10,000 in the general population [3], with a gender ratio of approximately 2:1 (male to female) [4]. Severe disease symptoms such as low back pain, stiffness and limited spinal mobility affect patients with AS health-related quality of life (HRQoL) and work productivity [5,6,7]. AS is a chronic problem that requires lifelong treatment and manifests in a wide variety of symptoms, often requiring coordinated multidisciplinary treatment by rheumatologists. The optimal intervention for patients with AS needs a combination of non-pharmacological and pharmacological treatments [8]. The clinical guidelines point out that the primary goal of treating patients with AS is to maximize HRQoL by controlling symptoms, preventing progressive structural damage, and normalizing function as well as social engagement [8].

Health state utility values (HSUVs), a quantitative measure of HRQoL, are quantified on a scale range from 0 (death) to 1 (perfect health), with HSUVs less than 0 indicate the health state considered worse than death [9]. HSUVs can be derived from direct valuation methods (e.g., standard gamble [SG], time trade-off [TTO] or rating scale [RS]), indirect valuation methods (e.g., EQ-5D, short form six dimensions [SF-6D] and assessment of quality of life [AQoL]), or mapping techniques (converting data from non-preferred quality of life instruments to HSUVs). "Mapping" is offered as a second-best solution [10]. HSUVs reflect people’s preferences or feelings towards a specific health state can be quantified and compared across conditions and interventions. Quality-adjusted life-years (QALYs), the most commonly used index for economics evaluations, combines length of life with HRQoL, in which time spent in a given state of health is weighted by the corresponding HSUVs—used to represent the "quality" in QALYs [11]. Therefore, high quality estimates of HSUVs is an important base for cost-utility models, decision-making, and determining effects of new treatments on HRQoL [12].

Various generic measures have been applied to assess HSUVs in patients with AS, including the EQ-5D [13], SF-6D [14, 15], AQoL [16], and Health Utilities Index 3 (HUI3) [17]. There is a wide range of HSUVs for health states related to AS. The previous cost-effectiveness studies estimated HSUVs in patients with AS based on Bath Ankylosing Spondylitis Disease Activity Index (BASDAI) and Bath Ankylosing Spondylitis Functional Index (BASFI) scores, as well as age and gender [18, 19]. The selection of HSUVs has not been properly considered and the estimates have not been systematically pooled. Using different HSUVs can lead to different cost-utility analysis results [20]. Therefore, it is important to systematically identify a suitable and high quality HSUVs for clinical management and economic evaluation of the disease [21].

Previously, only 1 study conducted a meta-analysis and review of studies using The Medical Outcomes Short-Form-36 questionnaire (SF-36) to measure HRQoL in patients with AS [6]. Therefore, a synthesized study with a large sample size is urgently needed to estimate the magnitude of the impact of AS on HSUVs. The aims of this study are to (1) systematically assess HSUVs and studies quality in patients with AS by reviewing the literature; (2) provide pooled mean HSUVs and then make comparisons with the general population; (3) explore underlying sources of heterogeneity between studies and some of the potential correlates influencing HSUVs.

Methods

Data sources and searches

This systematic review was prospectively registered on PROSPERO (http://www.crd.york.ac.uk/PROSPERO) (Registration number: CRD42019129463) and was conducted followed Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) statement (Table S1) [22]. Databases used in the search included PubMed, EMBASE, Cochrane library, Web of Science and Scopus. The databases were searched from the inception to July 2023. The search used a combination of medical subject headings (MeSH) and text words relating to AS and health utility-specific, details of search strategy was provided in Table S2. References were also checked to identify any studies missed by electronic search. All the titles and abstracts were examined independently by two authors (KTZ and JCF) to identify potentially relevant studies. When the abstract did not provide enough information about the article, the full manuscript was obtained for further examination. Any disagreements were resolved after consideration of the full manuscript and consultation with the third reviewer (SPL).

Eligibility criteria

The search was performed without limitation to epidemiological design and particular treatment intervention. Studies with the following criteria were eligible for this article: patients with a definite diagnosis of AS, full-text articles available in English, clear reporting of HSUVs. We excluded letters or secondary research, and duplicate published studies. Additionally, studies were excluded if they reported only adjusted values rather than mean values of health utility, or if the standard deviation was not reported.

Data extraction and management

Using a pre-tested form, two independent authors (KTZ and JCF) extracted key aspects from the included studies, and differences were resolved through discussion. The following information was recorded: first author, the year of publication, country, study design, interventions, mean age, sex distribution number of respondents and clinical patient characteristics, instruments and mean HSUVs. For intervention studies, baseline HSUVs were used to avoid the effect of intervention on pooling utility values.

Risk of bias

Risk of bias across studies was assessed with the ROBINS-I checklist for missing data [23], and categorized as low, moderate, serious, critical or no information (Table S3). We selected bias due to missing data because this was a common problem in HRQoL studies [24, 25].

Data synthesis and statistical analysis

After critical evaluation, all appropriate studies were included in the subsequent analyses. The HSUVs were pooled by meta-analyses using random-effect models that considered both within-study variation and between-study heterogeneity [26]. We measured residual heterogeneity using a restricted maximum likelihood estimator and quantified it with the I2 statistic [27]. The mean and 95% confidence interval (CI) of the pooled values were presented. We used histograms to compare the health status scores of patients with AS with the general population.

Subgroups were analyzed to exploring the underlying reasons for the heterogeneity between the different studies. We grouped the included studies into five subgroups (EQ-5D-3L, EQ-5D-5L, SF-6D, HUI3 and AQoL) based on the instruments used. Histograms were used to compare HSUVs measured by different instruments between patients with AS and the general population. For the most used EQ-5D-3L, we performed stratified meta-analyses based on the mean age of the patients, year of publication, disease activity, disease duration and national economic level. Firstly, according to the age of the patients, three subgroups were formed, 30–40, 41–50, and > 50 years of age. Secondly, the included studies were divided into two subgroups by publication year (with 2012 as the cut-off year, as the publication years of the included studies were 2002–2022). Thirdly, patients with a BASDAI of more than 4 were categorized as having high disease activity, while the others were categorized as having low disease activity [28]. Fourthly, we used a cut-off of 10 years (the median duration of the disease) for the disease duration subgroups. Finally, subgroups of developed and developing countries were formed on the basis of the economic level of the countries in the World Bank's online database.

To further explore underlying sources of heterogeneity, we conducted a random-effects meta-regression of studies using the EQ-5D-3L, with proportion of females, mean patient age, BASDAI score, and publication date as separate covariates. All analyses were conducted using STATA 15.0 (StataCorp, College Station, Texas, USA) and Excel 2021.

Results

Study selection

A total of 7112 records initially identified by the database search underwent title and abstract screening. Based on the screened of abstracts, the full text of 406 were retrieved for further review, of which, 42 studies were eligible and included. The PRISMA flow diagram showing the study screening process is presented in Fig. 1.

Fig. 1
figure 1

PRISMA flowchart describing the identification, selection and inclusion of studies

Study characteristics

Table 1 summarizes the basic characteristics of the included studies. The first study was published in 2002 [29], and the average number of studies published each year is around 5. Forty-two studies, including 11,354 participants, one of which assessed HSUV using a direct valuation method (RS). The most frequent instruments applied were EQ-5D-3L (35 studies), SF-6D (9 studies), and EQ-5D-5L (3 studies). Studies have been conducted mainly in developed countries, with the highest number of UK and cross-national studies (Figure S1, B). The main study designs were cross-sectional (37 studies) and RCT studies (7 studies). Study sample sizes have a wide variation, ranging from 13 [30] to 1615 [31]. In these studies, the average age of patients with AS was 46 years, with a cluster of ages between 30 and 55 years. Twenty-five and twenty-two studies reported BASDAI and BASFI scores, respectively, both ranging from 1.4 [32] to 7.1 [33] (score range 0–10). The mean duration of disease in patients was 12.6 years, with a median of about 10 years.

Table 1 Study characteristics and reported utility values

Risk of bias assessments

In the risk of bias assessment, the results showed that 28 articles (65.1%) were rated as low risk, 8 articles (18.6%) were rated as moderate risk, and the other studies did not provide information about missing data (Table S3).

Overall pooled estimates

Figure 2 shows the pooled overall HSUVs with categorical estimates by instrument. The estimated HSUVs for patients with AS from all available studies were pooled as 0.62 (95% CI 0.59 to 0.65). The pooled mean utility estimates from the random effects meta-analysis for SF-6D, EQ-5D-3L, EQ-5D-5L, and HUI3 were 0.65 (95% CI 0.62,0.68), 0.63 (95% CI 0.59,0.66), 0.60 (95% CI 0.42,0.79), and 0.48 (95% CI 0.43,0.53), respectively. AQoL and RS each had a study with a mean HSUVs of 0.45 and 0.62, respectively. There is heterogeneity in the pooled HSUVs across the different instruments (I2 > 98.0%).

Fig. 2
figure 2

Forest plot (random effect) of HSUVs in patients with AS

The pooled estimated HSUV values ranged from 21% (SF-6D) to 47% (AQoL) lower than the norms for people aged 45 to 60 years, depending on the measurement instrument (Fig. 3). When choosing the norms, we considered a variety of factors, such as the tariff most used in the included studies, the regions where the studies are clustered, and the countries in which the instruments were developed. Because the majority of the studies used UK population norms, the EQ-5D-3L norms were taken from the UK [34]. The EQ-5D-5L norms are taken from the United States, one of the most recently published standards on the EuroQoL website [35]. SF-6D norms were taken from study in UK where they were developed [36]. AQoL norms are taken from Australia [37], where all of the research was done, and HUI3 norms are taken from Canada [38].

Fig. 3
figure 3

Pooled HSUVs of AS patients for all included instruments were compared with population norms. Note. Mean (95% confidence intervals)

Pooled stratified estimates

Sufficient studies reported HSUVs by sub-group strata for the EQ-5D-3L only. The results of the stratified estimation showed that there were differences in the estimation of HSUVs by publication years, disease activity, disease duration, mean age of patients and national economic level (Fig. 4 and Table S4). Among the 35 studies that included stratified utility values, HSUVs estimates were higher for studies with publication dates after 2012 than for those published before 2012 (0.66 vs. 0.58). The HSUVs of patients in the BASDAI ≥ 4 group was 0.16 lower than in the BASDAI < 4 group. Meanwhile, HSUVs were relatively low in the younger prevalent group (30–40), but the difference between groups was very small. We found that patients from developing countries had better HUSVs than those from developed countries (0.67 vs. 0.62). Lastly, twenty-nine studies reported the duration of disease, and HSUVs were higher for less than ten years (0.66) compared to ten years and above (0.62) of disease.

Fig. 4
figure 4

EQ-5D-3L pooled HSUVs by publication decades, disease severity, mean age and national economic level

Meta-regression

Meta-regression across studies of the EQ-5D-3L in patients with AS showed a trend toward increasing patient HSUVs over time (P = 0.042; Figure S2, A). Meanwhile, higher BASDAI scores were associated with lower HSUVs (P < 0.001; Figure S2, B). There were no significant differences in the mean age of patients and the proportion of females on the effect of HSUVs (P > 0.05; Figure S2, C and D). After meta-regression, significant heterogeneity remained (I2 > 96%), indicating that there were other unexplained factors contributing to the between-study differences.

Discussion

To our knowledge, this is the first systematic literature review on HSUVs in patients with AS. This systematic review pooled the best available evidence on quantitative measures of health state preferences in people with AS, with only one study using a direct measure (RS). The pooled mean utility estimates from the random effects meta-analysis for SF-6D, EQ-5D-3L, EQ-5D-5L and HUI3 are 0.65, 0.63, 0.60, and 0.48, respectively. Regardless of which instrument is used, our summary estimates are lower than those of the general population. However, it is important to be aware that, consistent with other studies [39], there was considerable heterogeneity among the studies we included.

Compared with other inflammatory rheumatic diseases, HSUVs in health states in AS appear similar to patients with rheumatoid arthritis (0.66) [40] and psoriatic arthritis patients with presence of enthesitis (0.65) [41]. On the one hand, that is because psoriatic arthritis and AS are subtypes of spondyloarthritis [42], and psoriatic arthritis patients with presence of enthesitis symptomatology may have strong similarities with AS. On the other hand, there is also a review confirming that the proportion of patients with moderate-to-severe pain or mood impairments is similar between the AS and rheumatoid arthritis groups, which explains the agreement of HSUVs between the two disorders [43]. As compared to other health conditions, HSUVs of patients with AS appear similar to patients with after ≥ 3 months following stroke [44], stage IV melanoma [25], and moderate depression [45]. These comparisons suggest that the impact of AS on patient's HRQoL is substantial and highly burdensome.

We conducted to explore the factors associated with HSUVs affecting patients with AS. Over the past two decades, the diagnosis and treatment of AS have continued to improve [46], and patient management has become more standardized, which explains why the publication year has influenced the HSUVs of patients with AS. Then, we selected BASDAI, which has been widely used in clinical practice for patients with AS, to define disease activity [47]. Likewise, patients with AS with higher BASDAI had a significantly lower HSUVs. This leads to the conclusion that more severe states are associated with significantly worse HSUVs, which is consistent with the previous study [5]. The findings suggested that early diagnosis and intervention in patients with AS are crucial task.

Intriguingly, patients from developing countries have higher HSUVs than those from developed countries. Previous review studies on SF-36 have also shown lower HRQoL in patients from developed countries on all dimensions [6]. Meanwhile, the cross-country analysis of the EQ-5D-3L population norms surveys for 20 countries found that the EQ Visual Analogue Scale (EQ VAS) correlated well with a country’s Gross Domestic Product (GDP), although China and Thailand are outliers, with lower GDPs (along with relatively high EQ VAS scores) [48]. The definitive reason for the high HSUVs in patients from developing countries remains unclear. This discrepancy may be related to the cross-cultural applicability of the instrument, as studies have confirmed that populations in different countries have different perceptions of health [49, 50]. Finally, many factors may influence the measurement results of HSUVs (e.g., age, gender, sample source, comorbidities, and value sets), and future research is needed to explore the differences between developed and developing countries [51].

The EQ-5D has been used most commonly in the measurement of HSUVs in patients with AS, which is related to the advantages of the concise nature of the instrument and the recommendations of the national health technology assessment [52, 53]. However, we have found a low degree of correspondence between the HSUVs predicted by the different instruments, with HUI3 and AQoL having the lowest value. This phenomenon has been confirmed in previous studies and may be attributed mainly to the items of the descriptive system [54, 55]. There was only one direct measurement study using RS, while TTO is a commonly used and highly recommended measure to elicit direct HSUVs. With the clear relationship between TTO and QALY, and given its relative simplicity, stronger respondent preference, and greater consistency with the theoretical axioms of economic evaluation [56], further research could be conducted in the future on the measurement of TTO in patients with AS.

The disease-specific instrument, The Assessment of SpondyloArthritis international Society Health Index (ASAS HI), was also developed and validated [57]. Another study provided a generalized algorithm and six country-specific algorithms (UK, France, Germany, the Netherlands, Spain and Italy) to calculate HSUVs based on ASAS HI [58]. The ASAS HI measured a mean HSUVs value of 0.37 (SD 0.31) for patients with AS, a value that is significantly lower than the generic instruments, suggesting that the ASAS HI has a relatively high sensitivity and is able to accurately capture patients' symptoms and feelings [58]. However, since the number of patients with AS was not reported in the study, it was not included in the HSUVs estimation. In conducting a cost-utility analysis, It indicates that the ASAS HI would show greater gains than generic instruments in an intervention to improve AS symptoms. There is still a need for future research to explore the correlation between the ASAS HI and generic instruments, as well as the impact of different sources of HSUVs on the cost-utility analysis.

The "mapping" method has been developed allowing prediction of HSUVs based on non-preference scores in patients with AS [59]. The study linked BASDAI, BASFI, age, sex, and disease duration to utility values to derive appropriate estimates of cost-effectiveness. However, we did not include studies of "mapping" in our pooled study for the following main reasons: (1) “mapping” is a suboptimal method for eliciting HSUVs [60]; (2) there are a limited number of mapping studies and the use of different elicitation methods, which may result in greater heterogeneity. As can be seen, part of the economic evaluations has estimated incremental costs per QALYs from clinical efficacy studies[18, 19, 61], which indicates that “mapping” has important applications.

Strength and limitations

This systematic review summarized quantified HSUVs of a large number of patients with AS worldwide by rigorous and reproducible methods. However, there are limitations in our study. Firstly, our review did not include non-English-language publications. Although we tried to scan all the studies about the HSUVs of patients with AS, it may omit a small amount of relevant literature. Secondly, there was strong heterogeneity as this study used a form of pooled data (study level rather than individual information), making it difficult to conduct a more comprehensive meta-regression analysis. Finally, we evaluated the quality of the included studies and performed subgroup analyses and meta-regressions, however, missing data may still introduce bias into the synthesized results.

Conclusion

In our systematic review and meta-analysis of 42 studies, the estimated HSUVs for patients with AS from all available studies was pooled as 0.62 and there was a high degree of heterogeneity across studies. We found that HSUVs of patients with AS was significantly lower than the population norms. The pooled estimates of EQ-5D-3L were lower for studies published before 2010, with high disease activity, long duration of disease, and in developed countries. Our findings contribute to the understanding of factors influencing HSUVs and may serve as a reference for future patient-reported HRQoL surveys, clinical trials, and economic evaluation.