Development of the

Mannion, Anne F.; Mariaux, Francine; Reitmeir, Raluca; Fekete, Tamas F.; Haschtmann, Daniel; Loibl, Markus; Jeszenszky, Dezsö; Kleinstück, Frank S.; Porchet, François; Elfering, Achim

doi:10.1007/s00586-020-06462-z

Development of the "Core Yellow Flags Index" (CYFI) as a brief instrument for the assessment of key psychological factors in patients undergoing spine surgery

Original Article
Published: 16 June 2020

Volume 29, pages 1935–1952, (2020)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

European Spine Journal Aims and scope Submit manuscript

Development of the "Core Yellow Flags Index" (CYFI) as a brief instrument for the assessment of key psychological factors in patients undergoing spine surgery

Download PDF

Anne F. Mannion ORCID: orcid.org/0000-0002-1203-1096¹,
Francine Mariaux¹,
Raluca Reitmeir²,
Tamas F. Fekete²,
Daniel Haschtmann²,
Markus Loibl²,
Dezsö Jeszenszky²,
Frank S. Kleinstück²,
François Porchet² &
…
Achim Elfering³

734 Accesses
10 Citations
1 Altmetric
Explore all metrics

Abstract

Background

Depression, anxiety, catastrophising, and fear-avoidance beliefs are key "yellow flags" (YFs) that predict a poor outcome in back patients. Most surgeons acknowledge the importance of YFs but have difficulty assessing them due to the complexity of the instruments used for their measurement and time constraints during consultations. We performed a secondary analysis of existing questionnaire data to develop a brief tool to enable the systematic evaluation of YFs and then tested it in clinical practice.

Methods

The following questionnaire datasets were available from a total of 932 secondary/tertiary care patients (61 ± 16 years; 51% female): pain catastrophising (N = 347); ZUNG depression (N = 453); Hospital Anxiety and Depression Scale (anxiety subscale) (N = 308); fear-avoidance beliefs (N = 761). The single item that best represented the full-scale score was identified, to form the 4-item "Core Yellow Flags Index" (CYFI). 2422 patients (64 ± 16 years; 54% female) completed CYFI and a Core Outcome Measures Index (COMI) before lumbar spine surgery, and a COMI 3 and 12 months later (FU).

Results

The item–total correlation for each item with its full-length questionnaire was: 0.77 (catastrophising), 0.67 (depression), 0.69 (anxiety), 0.68 (fear-avoidance beliefs). Cronbach's α for the CYFI was 0.79. Structural equation modelling showed CYFI uniquely explained variance (p < 0.001) in COMI at both the 3- and 12-month FUs (β = 0.11 (women), 0.24 (men); and β = 0.13 (women), β = 0.14 (men), respectively).

Conclusion

The 4-item CYFI proved to be a simple, practicable tool for routinely assessing key psychological attributes in spine surgery patients and made a relevant contribution in predicting postoperative outcome. CYFI's items were similar to those in the "STarT Back screening tool" used in primary care to triage patients into treatment pathways, further substantiating its validity. Wider use of CYFI may help improve the accuracy of predictive models derived using spine registry data.

Unpacking the impact of chronic pain as measured by the impact stratification score

Article Open access 23 September 2022

The impact of psychological factors on condition-specific, generic and individualized patient reported outcomes in low back pain

Article Open access 21 February 2017

The persian version of the fear-avoidance beliefs questionnaire among iranian post-surgery patients: a translation and psychometrics

Article Open access 15 July 2024

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Introduction

"Yellow flags" are psychological factors and maladaptive beliefs that act as risk factors for persistent pain and prolonged disability in relation to musculoskeletal symptoms [1, 2]. They concern the features that affect how a person manages their situation with regard to their thoughts, feelings and behaviours. The flags are not a diagnosis or a symptom, but an indication that someone may not recover as expected. Some studies have shown that the presence of yellow flags, such as psychological distress/depression [3,4,5], fear-avoidance beliefs [6, 7] and anxiety [8], also increases the likelihood of a poor outcome after spine surgery. For this reason, such risk factors may influence clinicians' perceptions of the suitability of a patient for a surgical intervention [9] or their opinion of the "appropriateness of surgery" in individual cases [10]. However, even if spine surgeons are cognisant of the flag concept and its importance, many have difficulty detecting yellow flags during the consultation [11] and they rarely formally screen for them [9]. This may be the result of the length and scoring complexity of the current instruments, time constraints in routine consultations, or the perception of not being specifically trained to manage psychosocial attributes identified by such tests [12]. While established self-report instruments exist to evaluate most of the yellow flag constructs of interest, lengthy questionnaires are not suitable for use in the routine clinical setting, where the compliance/involvement of all patients is desired and brevity is of the essence. Further, although brief yellow flag screening instruments have been developed for use in primary care [13, 14] or outpatient physiotherapy [15], these may not be appropriate for use in surgical patients, who appear to be a distinct group with respect to their psychological status pre-treatment [16].

The aim of this study was to create a new, brief tool to routinely assess the yellow flag status of patients being considered for spine surgery, and to evaluate its predictive validity in relation to the outcome of surgery.

Methods

The development of the “yellow-flag” tool followed two phases, as summarised below (details regarding the specific questionnaires and the statistical procedures used are given later, in the respective sections).

Phase 1: strategy to select the “yellow-flag” single items

The multidimensional Core Outcome Measures Index (COMI) [17, 18] comprises single items covering the key outcome domains in patients with spinal disorders and has become a useful tool in the routine evaluation of patient outcome. In accordance with the philosophy behind the COMI of keeping responder burden to a minimum, we sought to develop a complementary set of single-item measures with standardised 5-point response options to assess four of the "core" yellow flags (depression, anxiety, catastrophising and fear-avoidance) [3, 7, 8, 13, 19]. Our previous outcome studies in patients with spinal disorders have provided us with many large datasets containing patients' individual item scores for full-length, established questionnaires addressing these four yellow flags. Table 1 gives a description of the patient samples and the references to the original studies from which the data were taken. The data were derived from a total sample of 932 patients (61 ± 16 years; 51% female; 64% surgical) presenting with spinal problems in secondary or tertiary care. Not all patients had completed each questionnaire, depending on the study they were involved in (see Table 1).

Table 1 Data sources for the secondary analyses used to identify core yellow-flag items for the CYFI

Full size table

We carried out a secondary analysis of these datasets to select the item that in each case best represented the corresponding full questionnaire while also making sense as a stand-alone question for inclusion in a short set of yellow flag questions, to be coined the "Core Yellow Flags Index" (CYFI). Item quality was assessed using the criteria developed by Stanton et al. [20]. Final judgements about the clinical importance of the best single items for the four instruments were made by an expert group comprising spine surgeons, a methodologist and researchers in the field of spine outcome measures.

Phase 2: test of factor structure and prognostic validity of the four yellow-flag items

In a second phase, we tested the factor structure and prognostic validity of the CYFI using new clinical data collected from May 2015 to Apr 2018. A total of N = 3344 patients undergoing surgery of the thoracolumbar spine were asked to complete the CYFI and the COMI, preoperatively, and the COMI at 3- and 12-month follow-up (FU). Questionnaires were completed preoperatively by 2971 (89%) patients, and at 3-month and 12-month FU by 2940 (88%) and 2738 (82%), respectively. A total of 2422 (73%) patients (64.4 ± 15.8 years; 54% female) completed all questionnaires at all three time-points (baseline and both follow-ups). The "Main Pathology" as documented on the Spine Tango surgery form (v.2011; https://www.eurospine.org/forms.htm) was degenerative disease in 1963 (81%) patients, repeat surgery in 194 (8%) and various other pathologies (such as non-degenerative deformity or spondylolisthesis, fracture or trauma, inflammation, infection, tumour, other) in the remaining 265 (11%) patients.

The test–retest reliability of CYFI was assessed in a subgroup of 56 patients (66.3 ± 13.4 years; 55% female) who completed the questionnaire on two occasions preoperatively, 5 ± 9 days apart.

Questionnaires

The questionnaires used to identify the single item yellow flags included:

the 6-item catastrophising sub-scale of the Coping Strategies Questionnaire (CSQ) [21], or the Pain Catastrophising Scale (PCS) [22, 23]
the ZUNG Self-rated Depression questionnaire [24]
the Hospital Anxiety and Depression Scale (HADS) Anxiety subscale [25, 26]
the physical activity sub-scale of the Fear-Avoidance Beliefs Questionnaire (FABQ) [6, 27], to assess beliefs about activity being a cause of the patient's back trouble and fears about the dangers of such activities when experiencing an episode of low back pain.

The questionnaires used to assess the concurrent validity of the single item yellow flags included:
Visual Analogue Scale (VAS) or graphic/numeric rating scale (GRS/NRS) to measure representative (back or leg) spine-problem-related pain in the last week [28]
Roland and Morris questionnaire (RMQ), a 24-item questionnaire that assesses disability due to low back pain in relation to various daily functions/activities [29, 30].

The longitudinal validity of the single item flags was evaluated in relation to the COMI.

The COMI is a 7-item instrument scored 0–10 and comprises questions covering the domains: pain intensity (axial and peripheral, measured separately); function; symptom specific well-being; general quality of life; and social and work disability [17, 31].

All the questionnaires were either originally developed in German or had been adapted and validated for the German language prior to their use in the studies listed in Table 1.

Statistical analyses

Phase 1

Items were favoured for CYFI that: (a) showed a high corrected item–total correlation, i.e. the value of the item corresponded closely to the total scale score without the respective item, indicating the representativeness of the item score for the total scale and its adequacy in representing the construct as a single item; (b) did not display large floor or ceiling effects (i.e. high proportions of scores representing the lowest or highest score possible), that might otherwise indicate a lack of discriminative function, and (c) in Spearman rank correlation analyses, had a meaningful relationship with pain intensity and disability, the clinical outcome measures that have previously been shown to correlate with yellow flag items.

Phase 2

The new sample of data from 2422 surgical patients was analysed using structural equation modelling (SEM). Confirmatory factor analysis was carried out on the preoperative CYFI data, to examine whether the single items corresponded to a single yellow-flag factor, i.e. had a one-dimensional factorial structure with high item loadings on a common factor. Cronbach’s alpha was used to assess the internal consistency of the CYFI (≥ 0.70 considered good, [32]).

The hypothesis involving longitudinal data (i.e. that CYFI would add to the prediction of follow-up COMI scores, over and above baseline COMI scores) was tested using SEM by examining the longitudinal directional paths between CYFI at baseline and COMI scores at follow-up, controlling for age, and spinal pathology; this was entitled the "prospective risk path". We estimated risk paths separately for men and women because the prevalence of yellow flags seems to differ between men and women and because the first test of a model that did not allow their risk paths to differ was a worse fit to the empirical data than a model that allowed differences in risk paths. Path coefficients were considered small (0.10), moderate (0.30) and large (0.50) in relation to the effect size classification of Cohen [33].

The reproducibility of single yellow-flag item scores was tested using quadratic weighted Kappas and that of the whole CYFI score was tested with intraclass correlation coefficients (ICC) (in each case, ≥ 0.60 is considered substantial [34]).

The analyses were performed using IBM SPSS (IBM SPSS Statistics for Windows, Version 20.0. Armonk, NY: IBM Corp), and AMOS 18.0 software (for the confirmatory factor analysis (CFA) and prospective risk path analyses).

Results

Selection of best yellow-flag single items representing their scales (phase 1)

For PCS, two datasets were available (studies B and D) and for the CSQ catastrophising subscale, one (E) (Table 2). The item "It's terrible and I think it's never going to get any better" (present in both CSQ and PCS) proved to be the best for representing catastrophising. It showed the most consistently high corrected item–total correlations for all studies (0.75, 0.80, and 0.66 for B, D and E, respectively). Compared with most other items of the PCS, floor effects were in the mid-range (33.6%, 39.4% and 33.5%, respectively); there were a few items with lower floor effects, but these were poor in other item characteristics. The chosen item had consistent correlations with pain (0.31, 0.20 and 0.33, respectively) and with the RMQ score (0.52, 0.37 and 0.21, respectively). Finally, the item was verified in the expert group to be one of the best items to represent the pain catastrophising construct as a "stand-alone" item.

Table 2 Results of the statistical analyses to identify the best item representing the domain catastrophising. Item 3, "It's terrible and I think it's never going to get any better" (highlighted in bold), was chosen as the best

Full size table

Table 3 Results of the statistical analyses to identify the best item representing the domain depression. Item 1, "I feel down-hearted, blue and sad" (highlighted in bold), was chosen as the best

Full size table

Table 4 Results of the statistical analyses to identify the best item representing the domain anxiety. Item 5, "Worrying thoughts go through my mind" (highlighted in bold), was chosen as the best

Full size table

Table 5 Results of the statistical analyses to identify the best item representing the domain fear-avoidance beliefs. Item 2, "Physical activity might harm my back." (highlighted in bold), was chosen as the best

Full size table

The ZUNG Depression scale consists of 20 items. For this construct, data from 3 independent samples were analysed (studies A, B and D) (Table 3). The best stand-alone item for the depression scale was found to be “I feel down-hearted, blue and sad”. The item represents the construct very well (corrected item–total correlations in the three samples were 0.67, 0.69 and 0.66, respectively). Floor effects were large (30.6%, 53.0% and 46.7%), but compared with most other items of the ZUNG they were in the mid-range. Correlations with pain in the last week were relatively low but consistent (0.14, 0.19 and 0.17, respectively), whereas those with Roland–Morris disability scores were moderately high and also consistent (0.30, 0.41 and 0.37, respectively). In addition, the item was verified in the expert group to be the most useful stand-alone item for representing the depression construct. Item 20 also showed good item quality in sample A, though less good in B and D, but we considered it unclear whether “not enjoying the things I used to enjoy” might be reflecting the lack of pleasure due to physical pain rather than the depressed mood.

The anxiety subscale of the Hospital Anxiety and Depression Scale (HADS) consists of 7 items, and data from one study (C) were analysed to identify the best fitting single item (Table 4). The item that performed best was item 5 “Worrying thoughts go through my mind”. The item showed the highest corrected item–total correlation of all items in the scale (0.69), confirming that it represented the total anxiety score very well. Floor effects were large (52.3%), but about in the mid-range of values for all the seven items (32–76%). The correlation between this item and pain in the last week was the second highest of all the seven items (0.19), and its correlation with disability was third highest (0.22, with the highest correlation being 0.30). Item 1 “I feel tense or ‘wound up’” also showed good item quality, but it was felt the colloquialism "being wound up" may have made it unsuitable for use as a stand-alone item, and perhaps caused difficulties with later translations into other languages. Hence, with item 5 (“worrying thoughts…”) having the highest item–total correlation, and wording suitable for a stand-alone item, the experts rated this as the best to represent anxiety.

The physical activity subscale of the Fear-Avoidance Beliefs Questionnaire comprises four items, and data were available from four data-sets (studies A, B, C, and D) (Table 5). The item “Physical activity might harm my back” was chosen as the best. It was not “the best” in any of the criteria, but it was always good and more consistently good across the four samples than were other items (respectively, corrected item–total correlation: 0.75, 0.66, 0.62, 0.61; floor effects: 20.6%, 16.1%, 22.7%, 9.0%; correlation with pain: 0.17, 0.23, 0.29, 0.19; correlation with disability: 0.40, 0.45, 0.45, 0.37). Experts rated the item as the best and most credible as a stand-alone item in representing the FABQ-Activity subscale.

The final wording of the CYFI items in English and other languages (official national languages or native languages commonly spoken by patients attending the authors' Spine Center, for which published versions of the full-length questionnaires were available) is shown in Table 6.

Table 6 German, English, French, Italian, Spanish, Portuguese and Hungarian versions of the CYFI (see footnote for further details)

Full size table

Test of factor structure and prognostic validity of the four yellow-flag items (phase 2)

Confirmatory factor analysis showed that the 4 yellow flag items represented a common latent construct (CYFI), with age and pathology being controlled for, and with the 4 CYFI-item loadings on the common CYFI factor being constrained to be the same for men and women (RMSEA = 0.05, CFI = 0.96, χ² (19) = 141.60, χ²/df = 7.45). Cronbach's alpha for the four yellow-flag items was 0.79, showing good internal consistency.

The test of prognostic validity for CYFI included a structural equation model with CYFI predicting COMI at 3-month follow-up and 12-month follow-up while controlling for preoperative COMI and pathology (Fig. 1). On a cross-sectional basis, preoperative CYFI and COMI scores were highly correlated (Fig. 1: β = 0.52 for men, β = 0.42 for women; each p < 0.001). CYFI explained a significant proportion of the variance in COMI at 3-month FU (β = 0.24, approximately 8% variance explained in men and β = 0.11, approximately 2% variance in women, p < 0.001; Fig. 1), i.e. CYFI contributed to a small but significant extent to explaining the treatment effect. The stability between COMI at baseline and COMI at 3-month FU was low—due to the treatment—with β = 0.15 in men, β = 0.20 in women (Fig. 1). The stability between COMI at 3-month FU and COMI at 12-month FU was high (β = 0.61 in men, β = 0.55 in women, p < 0.001; Fig. 1). Nonetheless, CYFI added significantly and independently to the prediction of COMI at 12-month FU (β = 0.14 in men, approx. 4% variance explained, p < 0.001; β = 0.13 in women, approx. 3% variance explained, p < 0.001; Fig. 1) and explained variation in the COMI at 12-month FU that was not explained by individual differences in COMI existing at either baseline or 3-month FU. The fit of the model was good (RMSEA = 0.04, CFI = 0.97, χ² (39) = 216.92, χ²/df = 5.56).

Test retest reliability for each item of the CYFI was 0.60–0.76 and for the CYFI whole score, 0.72 (95% CI 0.58–0.86).

Discussion

Our study showed that the newly developed 4-item CYFI constitutes a simple, practicable, reliable and valid tool for routinely assessing key psychological attributes in patients undergoing treatment for spinal disorders in tertiary care. The brevity of the CYFI should make it a useful addition to the brief COMI in the self-assessment of baseline status before surgery. It may be used by clinicians to orientate themselves with regard to the yellow flag status of their patients, and its data may be able to strengthen the existing predictor models of surgical outcome.

A number of brief tools exist to assess yellow flags, but these have focused on chronic LBP patients in primary care, occupational health or physical therapy settings [13,14,15, 35]. Several factors provided the impetus for us to create a new tool designed to be used with surgical patients. Patients in tertiary care are intrinsically different from those in primary care, in terms of both their symptom severity and degree of psychological disturbance [16]. In creating our own tool, we wished to use, as a basis, questionnaires that had previously been used with patients in secondary and tertiary care study settings. We also wanted to select items from questionnaires that were available in our 3 national languages (German, French and Italian) as well as English and other languages spoken in our country for which a version of COMI exists (see Table 6). Further, rather than employing a binary response option (yes/no to whether the statement applies), as used for example in the STarT Back, we wanted to offer a 5-point graded scale that would be consistent with the items in the COMI. Nonetheless, in considering the final items for inclusion in our tool, we attempted to align with the STarT Back, where feasible and supported by the item-quality analyses. The STarT Back items did not all come from the same full-length questionnaires as used in the present study: they were the same for anxiety (i.e. HADS) and catastrophising (i.e. PCS), and the same two items were considered to be most representative of these domains in both studies. The depression item in the STarT Back (“in general, I have not enjoyed all the things I used to enjoy”) came from the Patient Health Questionnaire (PHQ-2) rather than the ZUNG. The ZUNG contains a similar item (item 20) and, although it showed good item quality in our sample A, it was not consistently good for samples C and D (Table 3). Moreover, when presented as a stand-alone item, we considered that "not enjoying the things I used to enjoy" was too unspecific as a depression item, liable to inadvertently capture the impact of pain on the enjoyment of activities rather than the mental state of being depressed and losing interest (especially in surgical patients with their higher pain levels). The fear item in the STarT Back (“not safe for a person with a condition like mine to be physically active”) originates from the Tampa Kinesiophobia questionnaire and could perhaps be considered a more unwieldy way of saying "Physical activity might harm my back" (our chosen FABQ item), albeit with some ambiguity in the interpretation of the word “safe”. Rasch analyses have previously identified this Tampa Kinesiophobia item as being psychometrically poor [36] and showing differential item functioning with respect to gender [37]. Interestingly, recent qualitative analyses performed by the STarT Back group revealed that the STarT Back depression and fear items were considered “cumbersome” by both patients and general practitioners alike [38]. This substantiates our aforementioned misgivings about these two items. Despite the above differences, test–retest reliabilities were similar for the two tools: the quadratic weighted kappa for the psychosocial subscale of the STarT Back completed by all 53 patients studied was 0.69 (0.51–0.81) and, for 23 of their patients reporting stable symptoms, 0.76 (0.52–0.89) [13]; for the CYFI, the corresponding value was 0.72 (95% CI 0.58–0.86).

Identifying a need to include a yellow flag measure in the baseline assessment of back pain patients, Cedraschi et al. [35] added two yellow flag questions to the COMI, to assess depression and anxiety. The wording was created by the authors, rather than being extracted from established questionnaires, and simply enquired “how much did you feel anxious?” and “how much did you feel depressed?”, with a list of 5–6 thoughts and feelings being provided for each question as examples of what it might mean to feel anxious or depressed. Such "double/multiple-barrelled" (or compound) questions that enquire about many feelings/thoughts within one and the same question can pose difficulties, since respondents wishing to endorse only one of the options might be confused how to answer [32]. Moreover, the predictive validity of their flag items in relation to outcome was not evaluated. It was suggested that the items be incorporated into the existing COMI to provide a modified-COMI with a psychological dimension, by taking the higher of the two scores (anxiety or depression) and averaging it with the remaining COMI item scores. We see numerous problems with this. Firstly, it would cause confusion with respect to the scoring of the COMI as an outcome instrument and would render incomparable the scores from studies with and without the flag questions. Secondly, the psychological items do not constitute key outcomes for many spinal disorders; they may be important predictors or screening items, but they are not “core outcomes” [39], which means inclusion of their scores in the overall COMI score would likely reduce the responsiveness of the instrument (as was seen in [35]). For the CYFI, our recommendation is to view it as an independent tool, calculating an unweighted sum-score for its four items, since in factor analysis all made a reasonable contribution to the latent variable "yellow flags" (Fig. 1).

We showed that the CYFI made a significant independent contribution to the prediction of COMI scores at 3- and 12-month follow-up. Our findings were hence in keeping with the numerous studies that have shown that higher scores on yellow flag questionnaires generally predispose to poorer outcome [40,41,42]. In the present study, the proportion of variance in outcome accounted for by CYFI (2–8%, depending on gender and follow-up time-point) was greater than that reported for the psychological variables in some previous studies (0–2% [6, 43, 44]) and lower than that reported in others (15–20% [4]). In many studies, only the statistical significance of the effect or the variance accounted for by the whole model was reported, rather than the size of the effect for the psychological variables per se, making it difficult to draw comparisons [45, 46] (and see reviews in [40, 41, 47]). Also, some of the published studies were not truly prospective and most omitted from their models the cross-sectional relationship between psychosocial factors and baseline outcome scores. In the present study, COMI and CYFI were highly correlated at baseline, meaning that the unique contribution of CYFI in predicting COMI at follow-up—beyond that explained by COMI at baseline—was somewhat limited. In our prediction of 12-month COMI, there was, in addition to the direct effect of CYFI, also the indirect effect of CYFI on COMI at 12 months that was mediated by COMI at 3 months. The strong correlation between baseline COMI and CYFI probably indicates that the psychological status of patients at baseline is closely related to their ongoing pain problem and reflects to a lesser extent psychological problems beyond this. In other words, the yellow flags measured in the current sample have a more "situational" origin, driven by current pain and disability, and less of a "stable" trait-like origin reflecting psychological problems unrelated to current pain and disability. The situational component of CYFI is probably less powerful in predicting outcome compared with the more stable component. It is also highly likely that in some patients the psychological factors play a major role, whereas in other patients they have no significance. This has been reported in the literature before, where psychological factors appear to have a greater part to play in more "contentious" diagnoses for which the indication for surgery is less certain, compared with those for which the indication is more clear-cut [41]. Further investigations in this area are warranted such that we might direct our future attention to those patients whose outcome is especially influenced by psychological factors. It is difficult to do true experimental studies in this field to prove causality; however, the future collection of CYFI data also at follow-up, in addition to COMI data, and the use of cross-lagged panel correlations, might provide a method for identifying the source, direction and extent of the associations.

The observation that psychological variables significantly influence outcome often provokes the discussion as to whether, having identified that a patient demonstrates significant yellow flags, surgery should still be recommended. We do not believe that the effect size (in the present study, small to moderate; see above) is great enough to promote the CYFI as a tool to be used to deny operative procedures to patients who otherwise have a clear clinical indication for surgery. Indeed, to the authors' knowledge, no such psychological screening tool currently exists, and it is well known that many high-scoring patients still derive great benefit from surgery. Instead, we believe the current findings provide an impetus for administering the CYFI as part of a systematic collection of baseline data, along with numerous other risk factors, such that these can be included in predictive analytical models to improve the accuracy of individual outcome prediction. Many factors ultimately contribute to explaining the variance in individual outcomes; the more variables we are able to identify that make a significant contribution, the more accurate our overall predictions should be. Having a knowledge of the preoperative CYFI score for individual patients may also be useful in daily clinical practice to open a dialogue about these issues with the patient and to better manage their expectations of treatment. This may minimise the subsequent dissatisfaction with outcome that can follow from having overly optimistic expectations [48]. The findings might also be considered as support for more research on the clinical benefit of cognitive behavioural therapy (CBT) accompanying surgery. A number of studies [49,50,51] have shown positive effects, and this is a field of ongoing study, particularly in relation to the selection of appropriate cases.

Our study had a number of limitations. First, the data used in the development of the CYFI were from patients in secondary or tertiary care; the majority, but not all, were surgical patients. Second, the CYFI contains only “negative items” and there are no items enquiring about positive affect, coping strategies or resilience. Although these attributes are often believed to be the "opposite" of the yellow flag attributes, in some studies of spine surgery patients they have been shown to contribute to the prediction of outcome [43]. Third, in the longitudinal study, questionnaires were not completed by all patients at baseline (11% failed to complete one, mostly due to language problems, administrative errors, and emergency admissions) and other patients did not return a questionnaire at 3-month or 12-month follow-up (12–18%). This may have introduced attrition bias in the findings. Fourth, the reason that sex-specific models showed better fit currently eludes us. However, it is important to appreciate that yellow flags do not operate in isolation from other factors [2], and more elaborate models will ultimately be required. Further, such models should be externally validated (i.e. tested for their predictive ability in a separate population of patients), a step that was beyond the scope of the present study. The CYFI items were taken from published versions of the corresponding full-length questionnaires in each language. Nonetheless, confirmation of the adequacy of the different language versions as a group of items and of the corresponding introduction and response options, which have not been formally validated (Table 6), along with further evaluation of the performance of the CYFI (internal and test–retest reliability, construct and longitudinal validity, etc.) in each language, is encouraged. And finally, we cannot yet advise on the cut-offs required for indicating that a patient is "yellow flag positive", on a binary basis; we hope to address this in future studies.

In summary, the 4-item CYFI proved to be a simple, practicable, reliable and valid tool for routinely assessing key psychological attributes in spine surgery patients. The CYFI made a statistically significant contribution to the prediction of patient outcome after surgery. In this way, its widespread use may assist in developing better outcome prediction tools, based on the systematic collection of baseline data, e.g. in spine registries. The brevity of the instrument makes it suitable for implementation in everyday clinical practice, as part of the baseline assessment of patients undergoing spine surgery.

References

Kendall NA (1999) Psychological approaches to the prevention of chronic pain: the low back paradigm. Baillieres Best Pract Res Clin Rheumatol 13:545–554
Article CAS Google Scholar
Nicholas MK, Linton SJ, Watson PJ, Main CJ, Decade of the Flags" Working G (2011) Early identification and management of psychological risk factors ("yellow flags") in patients with low back pain: a reappraisal. Phys Ther 91:737–753. https://doi.org/10.2522/ptj.20100224
Article Google Scholar
Block AR, Ohnmeiss DD, Guyer RD, Rashbaum RF, Hochschuler SH (2001) The use of presurgical psychological screening to predict the outcome of spine surgery. Spine J 1:274–282
Article CAS Google Scholar
Mannion AF, Elfering A, Staerkle R, Junge A, Grob D, Dvorak J, Jacobshagen N, Semmer NK, Boos N (2007) Predictors of multidimensional outcome after spinal surgery. Eur Spine J 16:777–786
Article CAS Google Scholar
Wilhelm M, Reiman M, Goode A, Richardson W, Brown C, Vaughn D, Cook C (2017) Psychological Predictors of Outcomes with Lumbar Spinal Fusion: A Systematic Literature Review. Physiother Res Int. https://doi.org/10.1002/pri.1648
Article PubMed Google Scholar
Staerkle R, Mannion AF, Elfering A, Junge A, Semmer NK, Jacobshagen N, Grob D, Dvorak J, Boos N (2004) Longitudinal validation of the fear-avoidance beliefs questionnaire (FABQ) in a Swiss-German sample of low back pain patients. Eur Spine J 13:332–340
Article Google Scholar
Havakeshian S, Mannion AF (2013) Negative beliefs and psychological disturbance in spine surgery patients: a cause or consequence of a poor treatment outcome? Eur Spine J 22:2827–2835. https://doi.org/10.1007/s00586-013-2822-5
Article CAS PubMed PubMed Central Google Scholar
de Groot KI, Boeke S, van den Berge HJ, Duivenvoorden HJ, Bonke B, Passchier J (1997) The influence of psychological variables on postoperative anxiety and physical complaints in patients undergoing lumbar surgery. Pain 69:19–25
Article Google Scholar
Grevitt M, Pande K, O'Dowd J, Webb J (1998) Do first impressions count? A comparison of subjective and psychologic assessment of spinal patients. Eur Spine J 7:218–223
Article CAS Google Scholar
Mannion AF, Pittet V, Steiger F, Vader JP, Becker HJ, Porchet F (2014) Development of appropriateness criteria for the surgical treatment of symptomatic lumbar degenerative spondylolisthesis (LDS). Eur Spine J 23:1903–1917. https://doi.org/10.1007/s00586-014-3284-0
Article CAS PubMed Google Scholar
Lee KC, Patel S, Sell P (2014) Identification of obstacles to recovery in secondary care. Eur Spine J Suppl 1:S125
Google Scholar
Kent P, Mirkhil S, Keating J, Buchbinder R, Manniche C, Albert HB (2014) The concurrent validity of brief screening questions for anxiety, depression, social isolation, catastrophization, and fear of movement in people with low back pain. Clin J Pain 30:479–489. https://doi.org/10.1097/AJP.0000000000000010
Article PubMed Google Scholar
Hill JC, Dunn KM, Lewis M, Mullis R, Main CJ, Foster NE, Hay EM (2008) A primary care back pain screening tool: identifying patient subgroups for initial treatment. Arthritis Rheum 59:632–641. https://doi.org/10.1002/art.23563
Article PubMed Google Scholar
Linton SJ, Nicholas M, MacDonald S (2011) Development of a short form of the Orebro Musculoskeletal Pain Screening Questionnaire. Spine (Phila Pa 1976) 36:1891–1895. https://doi.org/10.1097/BRS.0b013e3181f8f775
Article Google Scholar
Lentz TA, Beneciuk JM, Bialosky JE, Zeppieri G Jr, Dai Y, Wu SS, George SZ (2016) Development of a Yellow Flag Assessment Tool for Orthopaedic Physical Therapists: Results From the Optimal Screening for Prediction of Referral and Outcome (OSPRO) Cohort. J Orthop Sports Phys Ther 46:327–343. https://doi.org/10.2519/jospt.2016.6487
Article PubMed Google Scholar
Abbott AD, Tyni-Lenne R, Hedlund R (2010) The influence of psychological factors on pre-operative levels of pain intensity, disability and health-related quality of life in lumbar spinal fusion surgery patients. Physiotherapy 96:213–221. https://doi.org/10.1016/j.physio.2009.11.013
Article PubMed Google Scholar
Mannion AF, Porchet F, Kleinstück F, Lattig F, Jeszenszky D, Bartanusz V, Dvorak J, Grob D (2009) The quality of spine surgery from the patient’s perspective: Part 1. The Core Outcome Measures Index (COMI) in clinical practice. Eur Spine J 18:367–373
Article Google Scholar
Deyo RA, Battié M, Beurskens AJHM, Bombardier C, Croft P, Koes B, Malmivaara A, Roland M, Von Korff M, Waddell G (1998) Outcome measures for low back pain research. A proposal for standardized use. Spine 23:2003–2013
Article CAS Google Scholar
Abbott AD, Tyni-Lenne R, Hedlund R (2011) Leg pain and psychological variables predict outcome 2–3 years after lumbar fusion surgery. Eur Spine J 20:1626–1634. https://doi.org/10.1007/s00586-011-1709-6
Article PubMed PubMed Central Google Scholar
Stanton JM, Sinar EF, Balzer WK, Smith PC (2002) Issues and strategies for reducing the length of self-report scales. Pers Psychol 55:167–194
Article Google Scholar
Rosenstiel AK, Keefe FJ (1983) The use of coping strategies in chronic low back pain patients: relationship to patient characteristics and current adjustments. Pain 17:33–44
Article CAS Google Scholar
Sullivan MJL, Stanish W, Waite H, Sullivan M, Tripp D (1998) Catastrophizing, pain, and disability in patients with soft-tissue injuries. Pain 77:253–260
Article CAS Google Scholar
Meyer K, Sprott H, Mannion AF (2008) Cross-cultural adaptation, reliability, and validity of the German version of the Pain Catastrophizing Scale. J Psychosom Res 64:469–478
Article Google Scholar
Zung WW (1965) A Self-Rating Depression Scale. Arch Gen Psychiatry 12:63–70
Article CAS Google Scholar
Zigmond AS, Snaith RP (1983) The Hospital Anxiety and Depression Scale. Acta Psychiatr Scand 67:361–370
Article CAS Google Scholar
Herrmann C, Buss U (1994) Vorstellung und Validierung einer deutschen Version der "Hospital Anxiety and Depression Scale" (HAD-Skala): Ein Fragebogen zur Erfassung des psychischen Befindens bei Patienten mit körperlichen Beschwerden. Diagnostica 40:143–154
Google Scholar
Waddell G, Newton M, Henderson I, Somerville D, Main CJ (1993) A Fear-Avoidance Beliefs Questionnaire (FABQ) and the role of fear-avoidance beliefs in chronic low back pain and disability. Pain 52:157–168
Article CAS Google Scholar
Haefeli M, Elfering A (2006) Pain assessment. Eur Spine J 15(Suppl 1):S17–24
Article Google Scholar
Roland M, Morris R (1983) A study of the natural history of back pain. Part 1: Development of a reliable and sensitive measure of disability in low-back pain. Spine 8:141–144
Article CAS Google Scholar
Exner V, Keel P (2000) Erfassung der Behinderung bei Patienten mit chronischen Rückenschmerzen. Schmerz 14:392–400
Article CAS Google Scholar
Mannion AF, Elfering A, Staerkle R, Junge A, Grob D, Semmer NK, Jacobshagen N, Dvorak J, Boos N (2005) Outcome assessment in low back pain: how low can you go? Eur Spine J 14:1014–1026
Article Google Scholar
Streiner DL, Norman GR (1995) Health Measurement Scales: a practical guide to their development and use. Oxford University Press Inc., Oxford
Google Scholar
Cohen J (1977) Statistical power analysis for the behavioral sciencies, Revised edn. Academic Press, San Diego
Google Scholar
Landis JR, Koch GG (1977) The measurement of observer agreement for categorical data. Biometrics 33:159–174
Article CAS Google Scholar
Cedraschi C, Marty M, Courvoisier DS, Foltz V, Mahieu G, Demoulin C, Gierasimowicz Fontana A, Norberg M, de Goumoens P, Rozenberg S, Genevay S, Section Rachis de la Societe Francaise de R (2016) Core Outcome Measure Index for low back patients: do we miss anxiety and depression? Eur Spine J 25:265–274. https://doi.org/10.1007/s00586-015-3935-9
Article CAS PubMed Google Scholar
Woby SR, Roach NK, Urmston M, Watson PJ (2005) Psychometric properties of the TSK-11: a shortened version of the Tampa Scale for Kinesiophobia. Pain 117:137–144. https://doi.org/10.1016/j.pain.2005.05.029
Article PubMed Google Scholar
Damsgard E, Fors T, Anke A, Roe C (2007) The Tampa Scale of Kinesiophobia: A Rasch analysis of its properties in subjects with low back and more widespread pain. J Rehabil Med 39:672–678. https://doi.org/10.2340/16501977-0125
Article PubMed Google Scholar
Hill JC, Tooth S, Coopers V, Chen Y, Lewis M, Walthall S, Saunders B, Bartlam B, Protheroe J, Chudyk A, Dunn KM, Foster NE (2019) Stratified care for patients with back, neck, knee, shoulder or multi-site pain: the STarT MSK Feasibility/Pilot Randomised Controlled Trial (Isrctn15366334). In: Society for back pain research. Sheffield, UK
Chiarotto A, Deyo RA, Terwee CB, Boers M, Buchbinder R, Corbin TP, Costa LO, Foster NE, Grotle M, Koes BW, Kovacs FM, Lin CW, Maher CG, Pearson AM, Peul WC, Schoene ML, Turk DC, van Tulder MW, Ostelo RW (2015) Core outcome domains for clinical trials in non-specific low back pain. Eur Spine J 24:1127–1142. https://doi.org/10.1007/s00586-015-3892-3
Article PubMed Google Scholar
Celestin J, Edwards RR, Jamison RN (2009) Pretreatment psychosocial variables as predictors of outcomes following lumbar surgery and spinal cord stimulation: a systematic review and literature synthesis. Pain Med 10:639–653. https://doi.org/10.1111/j.1526-4637.2009.00632.x
Article PubMed Google Scholar
Mannion AF, Elfering A (2006) Predictors of surgical outcome and their assessment. Eur Spine J 15(Suppl 1):S93–108
Article Google Scholar
den Boer JJ, Oostendorp RA, Beems T, Munneke M, Oerlemans M, Evers AW (2006) A systematic review of bio-psychosocial risk factors for an unfavourable outcome after lumbar disc surgery. Eur Spine J 15:527–536
Article Google Scholar
Seebach CL, Kirkhart M, Lating JM, Wegener ST, Song Y, Riley LH 3rd, Archer KR (2012) Examining the role of positive and negative affect in recovery from spine surgery. Pain 153:518–525. https://doi.org/10.1016/j.pain.2011.10.012
Article PubMed Google Scholar
Katz JN, Stucki G, Lipson SJ, Fossel AH, Grobler LJ, Weinstein JN (1999) Predictors of surgical outcome in degenerative lumbar spinal stenosis. Spine 24:2229–2233
Article CAS Google Scholar
Sinikallio S, Aalto T, Airaksinen O, Herno A, Kroger H, Savolainen S, Turunen V, Viinamaki H (2007) Depression is associated with poorer outcome of lumbar spinal stenosis surgery. Eur Spine J 16:905–912
Article Google Scholar
Trief PM, Grant W, Fredrickson B (2000) A prospective study of psychological predictors of lumbar surgery outcome. Spine 25:2616–2621
Article CAS Google Scholar
Aalto TJ, Malmivaara A, Kovacs F, Herno A, Alen M, Salmi L, Kroger H, Andrade J, Jimenez R, Tapaninaho A, Turunen V, Savolainen S, Airaksinen O (2006) Preoperative predictors for postoperative clinical outcome in lumbar spinal stenosis: systematic review. Spine 31:E648–663
Article Google Scholar
Mannion AF, Junge A, Elfering A, Dvorak J, Porchet F, Grob D (2009) Great expectations: really the novel predictor of outcome after spinal surgery? Spine 34:1590–1599
Article Google Scholar
Monticone M, Ferrante S, Teli M, Rocca B, Foti C, Lovi A, Brayda Bruno M (2014) Management of catastrophising and kinesiophobia improves rehabilitation after fusion for lumbar spondylolisthesis and stenosis. A randomised controlled trial. Eur Spine J 23:87–95. https://doi.org/10.1007/s00586-013-2889-z
Article PubMed Google Scholar
Abbott AD, Tyni-Lenne R, Hedlund R (2010) Early rehabilitation targeting cognition, behavior, and motor function after lumbar fusion: a randomized controlled trial. Spine (Phila Pa 1976) 35:848–857. https://doi.org/10.1097/BRS.0b013e3181d1049f
Article Google Scholar
Archer KR, Devin CJ, Vanston SW, Koyama T, Phillips SE, Mathis SL, George SZ, McGirt MJ, Spengler DM, Aaronson OS, Cheng JS, Wegener ST (2016) Cognitive-behavioral-based physical therapy for patients with chronic pain undergoing lumbar spine surgery: a randomized controlled trial. J Pain 17:76–89. https://doi.org/10.1016/j.jpain.2015.09.013
Article PubMed Google Scholar
Steurer J, Nydegger A, Held U, Brunner F, Hodler J, Porchet F, Min K, Mannion AF, Michel B (2010) LumbSten: the lumbar spinal stenosis outcome study. BMC Musculoskelet Disord 11:254. https://doi.org/10.1186/1471-2474-11-254
Article PubMed PubMed Central Google Scholar
Becker HJ, Nauer S, Porchet F, Kleinstuck FS, Haschtmann D, Fekete TF, Steurer J, Mannion AF (2017) A novel use of the Spine Tango registry to evaluate selection bias in patient recruitment into clinical studies: an analysis of patients participating in the Lumbar Spinal Stenosis Outcome Study (LSOS). Eur Spine J 26:441–449. https://doi.org/10.1007/s00586-016-4850-4
Article PubMed Google Scholar
Pulkovski N, Mannion AF, Caporaso F, Toma V, Gubler D, Helbling D, Sprott H (2012) Ultrasound assessment of transversus abdominis muscle contraction ratio during abdominal hollowing: a useful tool to distinguish between patients with chronic low back pain and healthy controls? Eur Spine J 21(Suppl 6):S750–759. https://doi.org/10.1007/s00586-011-1707-8
Article PubMed Google Scholar
Mannion AF, Caporaso F, Pulkovski N, Sprott H (2012) Spine stabilisation exercises in the treatment of chronic low back pain: a good clinical outcome is not associated with improved abdominal muscle function. Eur Spine J 21:1301–1310. https://doi.org/10.1007/s00586-012-2155-9
Article CAS PubMed PubMed Central Google Scholar
Caporaso F, Pulkovski N, Sprott H, Mannion AF (2012) How well do observed functional limitations explain the variance in Roland Morris scores in patients with chronic non-specific low back pain undergoing physiotherapy? Eur Spine J 21(Suppl 2):S187–195. https://doi.org/10.1007/s00586-012-2255-6
Article PubMed Google Scholar
Terwee CB, Bot SD, de Boer MR, van der Windt DA, Knol DL, Dekker J, Bouter LM, de Vet HC (2007) Quality criteria were proposed for measurement properties of health status questionnaires. J Clin Epidemiol 60:34–42
Article Google Scholar

Download references

Acknowledgements

We thank all the patients of the Schulthess Klinik who contributed their data. We thank Dave O’Riordan, Gordana Balaban, Stéphanie Dosch, and Selina Nauer for their administration of the Spine Tango surgery forms and patient-rated outcome measures. We thank Astrid Junge for her help and advice in finalising the wording of the introductory text and response options for the German version of the CYFI and Melissa Wilhelmi for her English translations of the same. We thank the authors of the original studies from which data were derived to develop the new CYFI.

Author information

Authors and Affiliations

Spine Center Division, Department of Teaching, Research and Development, Schulthess Klinik, Lengghalde 2, 8008, Zurich, Switzerland
Anne F. Mannion & Francine Mariaux
Spine Center, Schulthess Klinik, Lengghalde 2, 8008, Zurich, Switzerland
Raluca Reitmeir, Tamas F. Fekete, Daniel Haschtmann, Markus Loibl, Dezsö Jeszenszky, Frank S. Kleinstück & François Porchet
Institute for Psychology, University of Bern, Fabrikstrasse 8, 3012, Bern, Switzerland
Achim Elfering

Authors

Anne F. Mannion
View author publications
You can also search for this author in PubMed Google Scholar
Francine Mariaux
View author publications
You can also search for this author in PubMed Google Scholar
Raluca Reitmeir
View author publications
You can also search for this author in PubMed Google Scholar
Tamas F. Fekete
View author publications
You can also search for this author in PubMed Google Scholar
Daniel Haschtmann
View author publications
You can also search for this author in PubMed Google Scholar
Markus Loibl
View author publications
You can also search for this author in PubMed Google Scholar
Dezsö Jeszenszky
View author publications
You can also search for this author in PubMed Google Scholar
Frank S. Kleinstück
View author publications
You can also search for this author in PubMed Google Scholar
François Porchet
View author publications
You can also search for this author in PubMed Google Scholar
Achim Elfering
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Anne F. Mannion.

Ethics declarations

Conflict of interest

The authors declare that they have no potential conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Mannion, A.F., Mariaux, F., Reitmeir, R. et al. Development of the "Core Yellow Flags Index" (CYFI) as a brief instrument for the assessment of key psychological factors in patients undergoing spine surgery. Eur Spine J 29, 1935–1952 (2020). https://doi.org/10.1007/s00586-020-06462-z

Download citation

Received: 05 March 2020
Accepted: 10 May 2020
Published: 16 June 2020
Issue Date: August 2020
DOI: https://doi.org/10.1007/s00586-020-06462-z

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Development of the "Core Yellow Flags Index" (CYFI) as a brief instrument for the assessment of key psychological factors in patients undergoing spine surgery