Introduction

The Bath Ankylosing Spondylitis Disease Activity Index (BASDAI) [1] is the most widely used tool to assess the activity of ankylosing spondylitis (AS), although it is not free of limitations considering that it only includes the perspective of the patient, and that its responsiveness is questionable [2, 3]. A score over 4 is considered indicative of activity. On the other hand, less subjective variables, such as C-reactive protein (CRP), have little sensitivity to detect disease activity [4], and conventional radiographs are only useful to demonstrate progression over years [5]. The International Society of Spondylarthritis (ASAS) proposed the Ankylosing Spondylitis Disease Activity Score (ASDAS) [6, 7] as a new method for the evaluation of disease activity. ASDAS is a mixed index, combining acute phase reactants with patient outcome measures. It includes questions number 2 (pain), 3 (swelling), and 6 (stiffness) of BASDAI, plus CRP and patient global assessment (PGA). By adding additional parameters to the patient’s assessment, it is expected that ASDAS shows greater correlation with the general physician assessment (PhGA) than BASDAI, whereas the correlation with the PGA should be higher with BASDAI than with ASDAS. The ASDAS cutoff points proposed by ASAS (<1.3 inactive disease, 1.3–2.1 moderate activity, 2.1–3.5 high activity, and >3.5 very high activity) differ from those observed when using the minimally acceptable clinical status of the patient [8]. Few studies compare the performance of ASDAS and BASDAI, making it difficult to establish the priority of one over the other in clinical practice, and taking into account the feasibility and the utility of both indexes to support therapeutic decisions. The objective of this study was to evaluate the performance of ASDAS for assessing disease activity in AS in comparison with BASDAI, by analyzing the psychometric characteristics of both indices taking the global disease assessment as reference external criteria. Specifically, it was expected that ASDAS would show greater correlation with the general physician assessment (PhGA), and with BASDAI than with PGA.

Methods

A prospective longitudinal observational study was carried out in 23 Spanish centers. Patients with a diagnosis of AS according to the modified NY criteria [9] and BASDAI >4 despite treatment with NSAIDs (treatment guidelines according to consensus of the Spanish Society of Rheumatology [10]) were included. Patients were followed up for 12 months in four clinic visits (baseline, 4, 8 and 12 months). The study was approved by the Ethical Committees for Clinical Research in all clinics and was carried out in accordance with the principles of the Declaration of Helsinki. All patients gave informed consent for their participation.

In addition to sociodemographic data and disease history, all of which were gathered at baseline, the following information was collected by physical examination at study visits: tender joint count (TJC) and swollen joint count (SJC); PGA and PhGA on analog visual scales (0–100; where the left end (0) means no symptoms and the right end (100) means the maximum activity); overall pain, lumbar, and nocturnal pain during the last week, and Patient Acceptable Symptom State as to the physician and the patient opinion (physician-PASS and patient-PASS). CPR and ESR determinations were obtained, as well as the ASDAS, BASDAI, BASFI (for functional capacity) [11, 12], and BASMI (for spinal mobility) [13]. For the BASDAI, the validated version for Spanish population was used [14]. The ASDAS score was obtained through the following formula [7]: (0.12 × back pain (BASDAI2) + 0.058 × morning stiffness duration (BASDAI6) + 0.110 × overall patient assessment + 0.073 × peripheral pain/inflammation (BASDAI3) + 0.579 × Ln (CRP + 1)).

Statistical analysis

The sample was described as to the distribution of descriptive variables by summary statistics. The analysis of the psychometric properties requires a heterogeneous sample comprising patients with different levels of activity. Since all patients were active (BASDAI >4) at baseline, validity analyses were carried out at the 12-month visit, where heterogeneity in disease activity was evident. In the same way, as the objective of the sensitivity study was to demonstrate whether changes in BASDAI and ASDAS correlate with changes in the clinical status of the patients, data from baseline and 12-month visits were used for this analysis.

The degree of agreement in the classification of patients, according to BASDAI and ASDAS, was evaluated using the Cohen’s kappa statistic.

Construct validity is defined as the degree to which the scores of a measurement instrument are consistent with hypotheses with regard to internal relationships, relationships with scores of other instruments measuring similar or dissimilar constructs, or differences in the scores. Taking these basic principles into account, construct validity was analyzed using the convergent validity method (correlation with similar constructs), divergent (correlation with different constructs), and discriminant validity (ability to differentiate between subgroups of patients with different levels of activity). For the convergent/divergent validity, Pearson’s correlation coefficient was used with PhGA and PGA in the first case, and with BASMI (spinal mobility) and BASFI (functional capacity) in the second. To evaluate if there were significant differences in the correlation coefficients between the global assessment of physical (PhGA) and patient (PGA) with BASDAI and ASDAS, the Z statistic as unilateral contrast test proposed by Meng, Roshental and Rubin was used [15]. For discriminant validity, patients were divided into different levels of activity according to the criteria external to the ASDAS (physician-PASS and patient-PASS, CRP and time of disease evolution), and the differences were analyzed with Student’s t tests for independent samples, and effect size (by using Cohen’s δ).

Criterion validity was analyzed with receiver operator characteristics (ROC) curves. These were used to establish ASDAS and BASDAI cutoffs with best performance for disease activity defined by two external criteria: global opinion of doctor and patient (physician-PASS and patient-PASS). For each cutoff, sensitivity and specificity were calculated, as well as the area under the curve (AUC).

In those patients who experienced a change in their clinical status (according to PASS-physician and PASS-patient) between baseline and 12 months, sensitivity to change of both indexes was analyzed by effect size (Cohen’s δ) [16].

Statistical significance was set at p < 0.05 and all analyses were performed in SPSS statistical software version 21.0.

Results

The sample consisted of 127 patients, mostly men (75.6%), with a mean age of 45.8 ± 12.6 years and disease duration of 11.6 ± 10.9 years. The duration of treatment prior to study beginning was 52.4 ± 72.6 months. Due to the loss of follow-up and non-compliance between visits, data from 127, 114, 108 and 102 patients were available at baseline and at 4, 8 and 12 months visits, respectively. Attrition at 12 months was 19.7%.

Table 1 shows the main clinical findings at baseline and 12 months. Initial values of acute phase reactants (CRP and ESR), as well as BASDAI and ASDAS scores, suggest a sample of patients with inflammatory activity (BASDAI selection criterion >4) despite low joint counts. In basaline data, values for BASMI and BASFI ranged between 0.4–8.2 and 0.8–9.7, respectively. These findings show a change in the classification of AS activity according to the previously established criteria and cutoff points. The cutoff point of ASDAS with the best agreement with BASDAI was 2.1 (global agreement 82.9%, kappa 0.65). The correlation coefficient between both indexes was r = 0.72 (Table 2).

Table 1 Clinical and analytical characteristics of the sample
Table 2 Agreement between BASDAI and ASDAS scores

Table 3 shows the convergent, divergent, and discriminant construct validity analysis performed at the 12-month visit. Convergent validity shows a higher correlation of BASDAI with PGA than with PhGA (0.76 vs 0.67), which supports one of the hypotheses of the study. However, unlike initially predicted, ASDAS also had a higher correlation with PGA than with PhGA (0.70 vs 0.57). These results were significant for ASDAS (p = 0.016) and were at the limit of significance in the case of BASDAI (p = 0.050), according to the Z statistic proposed by Meng and Rosehental in unilateral contrast tests [15]. The results of divergent validity showed a moderate correlation of ASDAS with BASMI (r = 0.55) and BASFI (r = 0.65), which can be understood by the estimation of different constructs (spinal mobility and functional capacity for BASMI and BASFI, respectively). Finally, the discriminant validity analysis showed that both indexes discriminate patients with acceptable medical status according to the physician and patient PASS criteria (means significantly different by Student t), with a larger effect size for BASDAI than for ASDAS (Cohen δ 1.72 vs 0.88 for physician’s PASS, and 1.57 vs 1.12 for patient’s PASS. Likewise, BASDAI was better than ASDAS to discriminate patients above and below the median PhGA (Cohen δ 1.49 vs 1.28). In contrast, ASDAS showed a larger effect size in differentiating patients according to the median CRP and disease duration, although in the latter case the means of both indexes were not significantly different (Table 3).

Table 3 Construct validity: convergent, divergent and discriminant validity

The results of the sensitivity analysis to change, measured by the score difference between baseline and 12 months visit, 95% confidence interval, and effect size (Cohen’s δ) are presented in Table 4. Both ASDAS and BASDAI were sensitive to change in patients who changed from an unacceptable to acceptable symptomatic clinical status according to PASS criteria (physician and patient) and BASDAI 50 response criteria, since in all cases the value of Cohen’s δ was higher than 0.8 (value considered high); i.e., changes in both indexes between baseline and 12 months are related to changes in disease activity. However, the effect size was greater for BASDAI than for ASDAS, with δ of 2.01 for the physician’s PASS and of 2.13 for the patient’s PASS.

Table 4 Sensitivity to change

Finally, the criterion validity of the BASDAI was higher than that of the ASDAS, both when the patient’s PASS (AUC 0.852 vs 0.794) and the physician’s PASS (AUC 0.900 vs 0.790) were used as an external criterion. On the other hand, these differences were statistically significant in the case of physician’s PASS, since the AUC of each index is not included in the confidence interval of the other (Table 5) (Fig. 1).

Table 5 Criterion validity of ASDAS and BASDAI
Fig. 1
figure 1

Receiver Operator Characteristics (ROC) curves of ASDAS and BASDAI to predict classification of patient acceptable state (PASS) from the patient and the rheumatologist perspective

Discussion

In recent years, different outcome measurements have been proposed in spondyloarthritides, and in particular in AS. BASDAI, published in 1994 [1], measures disease activity from the patient’s perspective. The subjective perspective of this scale created the need of developing more objective tools. In this sense, during ASDAS development, the ASAS group attempted to respond to the different goals of patients and physicians [6, 17] constructing a composite instrument that included not only patient’s opinion but also objective measures (CRP) and allows to measure disease activity at any given time point. This instrument has demonstrated adequate psychometric properties for the measurement of disease activity and the results of some studies have confirmed a good sensitivity of ASDAS to treatment changes [18,19,20]. In this study the diagnostic performance of both indexes, BASDAI and ASDAS, was compared by analyzing their main psychometric properties.

The usefulness of objective and subjective measures is different for patients and professionals, as well as the importance they give to them. In a study of 203 patients with AS according to NY criteria, it was observed that patients with AS perceive disease activity differently from physicians. Patients rank first pain and movement limitations and secondly variables related to disease activity. In contrast, for physicians, the most relevant data for assessing disease activity are variables related to inflammation, severity, their own assessments, and acute phase reactants, but not the patient’s perception [21].

The fact that both BASDAI and ASDAS correlate better with PGA than with PhGA could be explained by the inclusion of items related to the patient’s opinion (which they both share) and suggests that their performance is similar in patients with AS. This result has already been observed in individuals with early forms of spondyloarthritis [22].

The high correlation observed between ASDAS and BASDAI, coupled with the poor association with other instruments, such as BASMI and BASFI, demonstrates the adequate validity of convergent and divergent constructs. That is, the concurrent validity of ASDAS and BASDAI means that both indexes measure the same activity construct, whereas the divergent validity with BASMI and BASFI shows that these indexes evaluate different constructs such as spinal mobility, and functional capacity, respectively.

ASDAS, and especially BASDAI, showed discriminant ability to differentiate between different groups of patients, defined by the change in PASS, the median of physician and patient assessment, and the 50th percentile of CRP (in the last one, the differences were only observed for ASDAS). These results should be evaluated with caution, since ASDAS variables are included in BASDAI. The discrimination capacity of ASDAS has also been observed in different studies carried out in patients with early spondyloarthritis [22], non-radiographic axial spondyloarthritis [23] and psoriatic arthritis [24], although in these cases different classification criteria were used and the results were more advantageous for BASDAI.

Finally, the analysis of sensitivity to change showed better results for BASDAI than for ASDAS, measured by the effect size.

The most important limitation of this study may be related to the inclusion of patients with a certain level of disease activity. Such a population makes difficult to study the psychometric characteristics at baseline. To minimize the possibility of a selection bias, it was decided to perform the analysis of the different dimensions of validity in the last visit of the study (12 months). In the 12-month visit, due to the effect of treatment, patients are at different levels of activity and the sample was heterogeneous enough to evaluate the validity of the criteria. The need to compare changes in the clinical situation of the patient with the changes in the indexes under study throughout the treatment period allowed both samples to be used at baseline and at 12 months in the sensitivity analysis. A second limitation might be the absence of contrast test to compare ROC curves. This limitation has been solved with the use of overlapping AUC confidence intervals.

To summarize, ASDAS shows adequate performance for disease activity in patients with AS; however, its psychometric characteristics do not present advantages over BASDAI (greater criterion validity, sensitivity to change and discriminative capacity of BASDAI), and it needs information from laboratory tests, so it does not seem justified to substitute one for the other.