Psychometric properties of chronic low back pain diagnostic classification systems: a systematic review

Abdelnaeem, Ahmed Omar; Rehan Youssef, Aliaa; Mahmoud, Nesreen Fawzy; Fayaz, Nadia Abdalazeem; Vining, Robert

doi:10.1007/s00586-020-06712-0

Psychometric properties of chronic low back pain diagnostic classification systems: a systematic review

Review Article
Published: 20 January 2021

Volume 30, pages 957–989, (2021)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

European Spine Journal Aims and scope Submit manuscript

Psychometric properties of chronic low back pain diagnostic classification systems: a systematic review

Download PDF

1416 Accesses
12 Altmetric
Explore all metrics

Abstract

Objectives

To identify and critically appraise studies evaluating psychometric properties of functionally oriented diagnostic classification systems for Non-Specific Chronic Low Back Pain (NS-CLBP).

Methods

This review employed methodology consistent with PRISMA guidelines. Electronic databases and journals: (PubMed, EMBASE, Cochrane, PEDro, CINAHL, Index to chiropractic literature, ProQuest, Physical Therapy, Journal of Physiotherapy, Canadian Physiotherapy and Physiotherapy Theory and Practice) were searched from inception until January 2020. Included studies evaluated the validity and reliability of NS-CLBP diagnostic classification systems in adults. Risk of bias was assessed using a Critical Appraisal Tool.

Results

Twenty-two studies were eligible: Five investigated inter-rater reliability, and 17 studies analyzed validity of O’Sullivan’s classification system (OCS, n = 15), motor control impairment (MCI) test battery (n = 1), and Pain Behavior Assessment (PBA, n = 1). Evidence from multiple low risk of bias studies demonstrates that OCS has moderate to excellent inter-rater reliability (kappa > 0.4). Also, two low risk of bias studies support of OCS-MCI subcategory. Three tests within the MCI test battery show acceptable inter- and intra-rater reliability for clinical use (the "sitting knee extension," the “one leg stance,” and the “pelvic tilt” tests). Evidence for the reliability and validity of the PBA is limited to one high bias risk study.

Conclusions

Multiple low risk of bias studies demonstrate strong inter-rater reliability for OCS classification specifically OCS-MCI subcategory. Future studies with low risk of bias are needed to evaluate reliability and validity of the MCI test battery and the PBA.

Development of a standard set of outcome measures for non-specific low back pain in Dutch primary care physiotherapy practices: a Delphi study

Article Open access 19 April 2019

Physiotherapeutic and non-conventional approaches in patients with chronic low-back pain: a level I Bayesian network meta-analysis

Article Open access 21 May 2024

Improving Rehabilitation Research to Optimize Care and Outcomes for People with Chronic Primary Low Back Pain: Methodological and Reporting Recommendations from a WHO Systematic Review Series

Article Open access 22 November 2023

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Introduction

Low back pain (LBP) is a highly prevalent problem [1, 2] and a major cause of disability worldwide [1, 3], with a tendency to recur or persist, leading to chronicity [4, 5]. Often, no definitive pathology or underlying mechanism can be identified [6]. To account for uncertain and potentially multiple contributing factors, chronic LBP is commonly labeled as Non-Specific Chronic LBP (NS-CLBP) [1, 7,8,9]. NS-CLBP is a general diagnosis encompassing a wide range of conditions, symptoms, and clinical features. However, such diagnosis does not distinguish characteristics to guide specific clinical decision-making [10].

There is currently no optimum treatment strategy for NS-CLBP [11]. This is partially due to a lack of standardized, valid, and reliable methods which classify specific characteristics. An effective classification system is expected to be based on studies reporting selection criteria in clear and standardized terms [12]; requirements that have been proposed as a key research priority [13].

There are many approaches to classify NS-CLBP, such as identifying symptom sources [14,15,16], functional characteristics, and/or psychosocial risk [17]. Functional evaluation, which generally assesses how people move and perform movement tasks, is typically designed to both inform treatment and be used as a clinical outcome. Functional classification is promising, in part because it identifies characteristics of a condition and offers information about potential mechanisms contributing to chronicity. Active treatments designed to address functional deficits may also positively influence self-efficacy, reduce fear avoidance behaviors, and promote symptom self-management [18, 19].

Despite the potential importance of functional classification as a diagnostic process, to the authors’ knowledge, no previous comprehensive evaluation of the reliability and validity of existing classification systems has been performed. Therefore, the aims of this review were to describe and critically appraise the psychometric properties of functionally oriented NS-CLBP diagnostic classification systems.

Methods

This systematic review was registered on PROSPERO (CRD42015023958) [20] and conducted according to PRISMA guidelines [21].

Search strategy

A systematic electronic and manual search of the literature published in English, from inception until January 2020, was conducted in the following electronic databases and Journals: PubMed, EMBASE, Cochrane, PEDro, CINAHL, Index to chiropractic literature, ProQuest, Physical Therapy, Journal of Physiotherapy, Canadian Physiotherapy and Physiotherapy Theory and Practice.

The search strategy consisted of keywords and Medical Subject headings (MeSH) related to “Non specific,” “mechanical,” “low back pain,” “simple backache,” “lumbar strain,” “spinal degeneration,” “classification,” “clinical test,” “clinical examination,” “clinical sign,” “valid*,” and “reliabl*.” Searching strategy details are available in Appendix 1. Reference lists of eligible articles were also searched for relevant publications.

Eligibility criteria

Inclusion and exclusion criteria are summarized in Table 1.

Table 1 Studies eligibility criteria

Full size table

Data collection and analysis

Selection of studies

Studies from electronic databases and manual searches were imported into EndNote X5.0.1 and checked for eligibility by two reviewers (AA and NFM) independently; first by title, abstract, and finally by full text. Discordance was resolved through discussion with co-authors (ARY and RV).

Data extraction

Two reviewers (AA and NFM) independently extracted all relevant information into an Excel spreadsheet. All discrepancies were resolved through discussion.

Risk of bias assessment

Risk of bias was assessed independently by two authors (AA and NFM) using the Critical Appraisal Tool (CAT) developed by Brink and Louw [22] (Appendix 2). This tool consists of 13 items designed to appraise the quality of the validity and/or reliability studies; 4 items assess reliability studies, 4 items evaluate validation studies, and the remaining 5 items evaluate both validity and reliability studies. Each item is scored as “Yes,” “No,” or “Not Applicable (N/A)” [22]. Disagreement was resolved through consensus discussion. Studies were considered of low risk of bias if they scored \(\ge \hspace{0.17em}\)60% [23,24,25].

Results

Selection of studies

Database and manual searching identified a total of 2899 articles. After full article screening, 22 studies published in 21 articles (one article reported 2 studies) were included. The article screening process is outlined in Fig. 1.

Characteristics of included studies

The methodological approach of each included study is depicted in Tables 2 and 3. Of the 22 included studies, 5 investigated inter-rater reliability [26,27,28,29], with one study reporting both intra- and inter-rater reliability [27] (Table 2).

Table 2 Description of included reliability studies

Full size table

Table 3 Description of included validity studies

Full size table

Validity was assessed in 17 studies; 15 studies evaluated O’Sullivan’s Classification System (OCS) [30,31,32,33,34,35,36,37,38,39,40,41,42,43,44], one assessed the 10-item Motor Control Impairment (MCI) test battery [45], and one investigated the Pain Behavior Assessment (PBA) classification system [46]. All validity studies were cross-sectional design except 2 studies [42, 45] were cross-sectional case–control design (Table 3).

Demographic data of participants in eligible studies

Sixteen studies enrolled asymptomatic participants as controls [27, 29,30,31,32,33,34,35,36,37,38, 40,41,42,43, 45]. Sample size ranged from 12 [39] to 200 [46]. Participant mean age ranged from 28.4 [41] to 55.1 years [27]. The mean body mass index (BMI) ranged from 20.8 [42, 43] to 26.9 kg/m² [45], although four studies did not report BMI [26, 27, 29, 46].

Classification systems

All eligible studies described three different classification systems (Tables 2, 3):

1.
OCS (n = 18 studies) classifies NS-CLBP as predominantly centrally (e.g., central sensitization) or peripherally mediated (e.g., injury, inflammation of peripheral tissues). OCS also includes a psychosocial assessment step and separates pain presumed from lumbar and pelvic origin. Functional testing of lumbar and pelvic girdle pain evaluates presumed motor control impairment by identifying specific postural and movement characteristics [26] (Appendix 3).
2.
MCI Test Battery (n = 3 studies) used specific movements/positions to differentiate participants with MCI from normal individuals. This battery consists of 10 individual tests that identify possible flexion, extension and rotational dysfunction. Assessment is dichotomous (impairment or no impairment) with the severity described on 3 levels (none, mild, moderate/severe) [45].
3.
PBA classification (n = 1 study) rates: (1) pain perception; (2) overt pain behavior (e.g., guarding movements); (3) effort during physical test performance; and (4) consistency of behavior across different situations of clinical testing. Categories include no pain, low pain or high pain behaviors [46].

Reliability of different classification systems

OCS and MCI test Battery inter-rater reliability testing was assessed in 5 studies (Table 4).

Table 4 Overview of the results of reliability studies

Full size table

Inter-rater reliability of using the entire OCS classification system (all steps) was moderate (kappa (K) > 0.4) [26]. For levels 1–4, the mean agreement (%) was excellent (96%; range 75–100%). For the fifth level, K- and mean agreement between 4 testers was strong 0.82 (range 0.66–0.90) and 86% (range 73–92%), respectively. The final classification level had a moderate mean K of 0.65 (range 0.57–0.74) and excellent agreement of 87% (range 85–92%) [26]. Within the OCS-MCI subcategory, the most reliably identified subgroup is the passive extension pattern (PEP) (K = 0.90; strong), while the least reliable is the active extension pattern (AEP) (K = 0.66; moderate) [26].

The OCS-MCI subcategory was also tested in 2 studies reported within a single article [28]. In the first study, the inter-rater agreement was excellent (K = 0.96 and %-of-agreement = 97%). In the second study, the inter-rater agreement was moderate (K = 0.61) on average and ranged from 0.47 to 0.80, while the mean agreement was 70%, ranging from 60 to 84%, among 13 examiners who assessed 25 cases including subjective information from participants and video recorded functional tests [28].

MCI test battery

In one study, four examiners independently rated video recordings of 27 participants with NS-CLBP and 13 controls performing 10 MCI tests. K values for inter-rater reliability ranged between minimal and moderate (0.24–0.71). Six out of 10 tests showed substantial K > 0.6 inter-rater-reliability. The most reliable tests (for both rater pairs) were the “pelvic tilt” for extension dysfunction, “one leg stance” for rotational dysfunction and “sitting knee extension” and “waiter’s bow” tests for flexion dysfunction. The poorest reliability was reported for the “abduction in crook lying" test for rotational dysfunction where both rater pairs had low K-values (K = 0.44; 95%, CI 0.18–0.70 and K = 0.32; 95% CI 0.10–0.54). Intra-tester reliability ranged from 0.51 to 0.96. All tests, except abduction in crook lying, showed substantial reliability (K > 0.6) [27].

The second study reported the inter-rater reliability between two examiners who independently examined 25 participants with NS-CLBP and 15 asymptomatic controls using the five MCI clinical tests. Intra-class correlation coefficients (ICCs) were excellent (0.90) for repositioning (RPS), 0.96 for sitting forward lean (SFL), 0.96 for sitting knee extension (SKE), 0.94 for bent knee fall out (BKFO) and 0.98 for leg lowering (LL) [29].

Validity of different classification systems

All three diagnostic systems underwent some aspect of validation testing (Table 5):

1.
OCS Fifteen studies assessed OCS validity (12 for OCS-MCI subcategories and 3 for sacroiliac joint dysfunction (SIJD)).

OCS-MCI construct validity was reported by measuring lumbosacral kinematics and trunk muscle activation in 33 participants with NS-CLBP (20 Flexion Pattern (FP) and 13 AEP) and 34 asymptomatic controls. The biomechanical model used lower lumbar kinematics in sitting and forward bending and two trunk muscle activation variables (lack of flexion relaxation of the superficial lumbar multifidus in slump sitting and end range of forward bending). The model correctly classified 96.4% of cases and distinguished between individuals with No LBP, AEP and FP [38].

Discriminant validity of MCI subcategories through spinal kinematics testing:

Sitting postures were tested to distinguish participants with NS-CLBP with AEP (lordotic lumbar posture) and FP (kyphotic lumbar posture) from asymptomatic controls (P < 0.001). Participants with NS-CLBP had less ability to consciously alter posture when asked to slump from usual sitting (P < 0.001) [36]. Similar findings were reported during cycling as participants with NS-CLBP (FP) exhibited greater lumbar region flexion compared to asymptomatic controls (p = 0.018) and reported remarkable pain increase over 2 hours of cycling (p < 0.001) [41]. Further, cyclists with NS-CLBP (FP) showed increased, although non-significant, lumbar flexion and rotation tendency compared to controls (P > 0.05) [35]. In another study, participants with NS-CLBP (FP) sat with less hip flexion (P = 0:05), suggesting a relative posterior pelvic tilt. During “usual” sitting, the FP group positioned the lumbar spine significantly closer to end range lumbar flexion compared to asymptomatic controls [30].

Functional tasks: spinal kinematics during functional tasks were not different between the AEP and asymptomatic controls in a single study [43]. However, the AEP group distinctively adopted more upper lumbar and lower thoracic (T6—L3) extension compared to the FP group which adopted more flexion during these activities (p < 0.05). The FP group also exhibited greater thoraco-lumbar kyphosis than asymptomatic controls [43].

Spinal position sense (SPS): Lumbar repositioning accuracy was assessed in 3 studies [31, 33, 40]. Participants with NS-CLBP, compared to asymptomatic controls, developed substantially greater magnitude of Absolute Error (AE) [31, 40] and Variable Error (VE) [40]. The FP group underestimated lumbar target positions [31, 33, 40], while the AEP group overestimated lumbar and underestimated thoracic target positions compared to FP [40]. The Cardiff Dempster–Shafer Theory (DST) Classifier method, based on objective measures of repositioning sense during sitting and standing, discriminated the No LBP from NS-CLBP (pooled and in subsets) with an accuracy ranging between 93.83 and 98.15%. Further, the DST classifier method distinguished different NS-CLBP subgroups with an accuracy of 96.8%, 87.7% and 70.27% for FP from PEP, FP from AEP and AEP from PEP subtypes, respectively. Finally, ranking analysis showed that lumbar AE in sitting could distinguish participants with NS-CLBP from No LBP and FP from No LBP, while lumbar constant error in standing consistently discriminated LBP extension subsets (AEP and PEP) from No LBP [44].

Discriminant validity of MCI subcategories through trunk muscle activity testing:

Surface electromyography (sEMG) recorded from five trunk muscles during unsupported “usual” and “slumped” sitting postures could not distinguish trunk muscle activity between asymptomatic controls and a pooled NS-CLBP group. However, compared to controls, participants classified with AEP presented with significantly higher co-contraction of lumbar multifidus, ilio-costalis lumborum pars thoracis and transverse fibers of internal oblique muscles (p < 0.05) [37]. Burnett et al. [35] reported less co-contraction of the lower lumbar multifidus in the FP group compared to controls during a cycling task [35]. Sheeran et al. [40] reported the NS-CLBP (FP and AEP combined) group produced significantly higher abdominal activity (p < 0.01) compared to controls during usual sitting and standing postures. Hemming et al. [42] also reported significantly greater muscle activation in right-sided superficial lumbar multifidus muscles during the functional tasks of step up, reach up and box replace (p < 0.05). External oblique muscle contraction during box lift differed significantly between participants with AEP and asymptomatic controls (p = 0.016). Significant differences between participants with FP and asymptomatic controls were also reported for left-sided transversus abdominis/internal oblique and superficial lumbar multifidus activity during stand-to-sit tasks (p = 0.009) [42].

SIJD subcategory: Two studies reported decreased diaphragmatic excursion, altered respiratory patterns and depression of the pelvic floor (PF) in participants with NS-CLBP during the ASLR test compared to controls [32, 39]. When the examiner added manual pelvic compression to the ASLR test, there were no differences between the two groups. Manual pelvic compression during the ASLR theoretically improves load transfer by enhancing passive stability of the SIJs and MC patterns/force closure [32]. Another study reported delayed activation of obliquus internus abdominis (OI), multifidus and gluteus maximus muscles in patients with SIJD compared to controls. Delayed OI and multifidus activation occurred in both the symptomatic and the asymptomatic sides in the SIJD group. Biceps femoris activation occurred earlier in SIJP group [34].
2.
MCI Test Battery: One study assessed the clinical validity of the MCI test battery for classifying participants with NS-CLBP [45]. For both the two-class (impairment or not) and the three-class (none, mild/moderate and severe) categorization, the ideal number of MCI tests was 10. The overall discrimination potential for two-class categorization was good (Area Under the Curve (AUC) > 0.8, sensitivity = 0.75, specificity = 0.82, Youden = 0.57, LR + = 3.40, LR- = 0.20, effect size = 1.45), with an optimal cutoff of three tests. To classify MCI, at least four failed items are needed. The overall discrimination potential for the three-class categorization was fair (volume under the surface > 0.5, sensitivity = 0.48, sensitivity = 0.50, specificity = 0.82, Youden = 0.40, effect size = 1.56), with an optimal cutoff of three and six tests. At least four failed MCI tests are needed to classify mild/moderate MCI and six or more failed tests classify severe cases [45].
3.
PBA: The internal consistency (reliability) of PBA showed good person separation index (0.83). Construct validity evaluated by Rasch analysis resulted in 41 items. PBA convergent validity was supported by a significant correlation with other questionnaires [46].

Table 5 Overview of results of validity studies

Full size table

Risk of bias assessment

Risk of bias assessment showed an excellent inter-assessor agreement (92.2% and K = 0.84) [47]. Nine studies [30,31,32, 34, 39,40,41, 45] did not clarify evaluators’ characteristics (Item 2). Reference standard tests (Item 3) were reported only in two studies [38, 44] and were performed independently (Item 9). All inter-rater reliability studies used raters blinded to each other’s findings [26,27,28,29] (Item 4). All studies reported clear descriptions of measurement procedures (Item 10). Appropriate reliability and validity statistical methods (item 13) were employed in 21 studies, while one study employed the Kolmogorov–Smirnov test, a less than ideal test used to confirm normality distribution in a small sample study (n < 50) [41]. Overall, all five reliability studies [26,27,28,29] and two OCS validity studies [38, 44] were rated as low risk of bias. The remaining validity studies were rated as high risk [30,31,32,33,34,35,36,37, 39,40,41,42,43, 45, 46] (Table 6).

Table 6 Quality assessment of the included studies with the Clinical Appraisal Tool (CAT)

Full size table

Discussion

This systematic review identified and critically appraised studies reporting reliability and validity of functionally oriented NS-CLBP diagnostic classification systems; specifically, the OCS, MCI test battery, and PBA systems. Of the 3 systems evaluated through studies included in this review, the OCS is the most reliable and valid. All included reliability studies were consistently rated as high quality. However, validity in this context is limited to the capacity to systematically identify different muscle activation patterns and spinal kinematic changes from each other and controls (construct and discriminant validity) as demonstrated in two high-quality studies. The remaining reviewed studies that assessed some aspects of OCS validation had high risk of bias. Limited evidence supports acceptable inter- and intra-rater reliability for clinical use of the following MCI test battery tests: "sitting knee extension" (to identify flexion dysfunction), “one leg stance” (for rotational dysfunction) and “pelvic tilt” (for extension dysfunction). Evidence supporting validity of the PBA is inconclusive.

The OCS system

The OCS is currently the most studied, functionally based classification system with inter-rater reliability among various stages ranging from moderate to excellent [26]. For the OCS-MCI subcategory, three low risk of bias studies reported strong reliability (FP, AEP, PEP, flexion/lateral shift pattern and multidirectional pattern) [26, 28], with PEP as the most reliable subgroup, and AEP as the least reliable [26]. This review identified two low risk of bias studies demonstrating construct [38] and discriminant validity [44] of OCS-MCI subcategories based on determining and explaining aberrant muscle activity and spinal kinematic changes in participants with NS-CLBP. These studies generally adhered to guidelines for developing and validating classification systems [12, 48,49,50,51,52,53,54,55,56].

MCI test battery

Based on 2 low risk of bias reliability studies [27, 29], 3 individual tests included in the MCI test battery show good to excellent reliability to classify people with NS-CLBP with or without MCI. Two previous systematic reviews concluded similarly [57, 58]. Current evidence suggests clinical use of the "sitting knee extension" test to identify flexion dysfunction, the “one leg stance” test for rotational dysfunction and the “pelvic tilt” test for extension dysfunction are suitable for clinical use based on good–excellent values both for intra- and inter-rater reliability [27, 29]. Because validity of the 10-test battery is based on a single study [45] with a high risk of bias, recommending routine clinical use is premature.

PBA

The PBA consistently recognizes and classifies pain behavior into three categories (none, low, and high). However, evidence is limited to a single study with a high risk of bias [46]. People with no or low levels of pain behavior are likely to benefit from a physically oriented rehabilitation program with little emphasis on psychological and behavioral approaches. Conversely, those with high pain behavior may benefit from programs that emphasize psychological and behavioral factors. This reasoning incorporates a biopsychosocial approach similar to stratified care informed by the 9-item STarT Back questionnaire [59]. Unlike the STarT back questionnaire, the PBA also includes observing movement tasks and behaviors.

Risk of bias assessment of individual studies

All reliability studies for OCS and MCI test battery were rated as high quality. The main risk of bias in reliability studies was the lack of randomizing test order.

Only two OCS validation studies [38, 44] were rated as high-quality largely because they employed expert opinion as a reference standard. However, no currently available objective tests classify function [54]. When no such tests are available, expert opinion, though limited, represents the best available reference standard [60,61,62]. The main risk of bias for the MCI test battery and PBA was the absence of a reference standard. The choice of statistical methods was considered appropriate for all studies; although one study used [41] the Kolmogorov–Smirnov test, which is not recommended for testing normality [63, 64].

Implications for clinical practice

The findings of this review suggest that clinicians can use OCS to reliably classify functional characteristics of patients with NS-CLBP. Upper lumbar and lower thoracic spine kinematic studies offer mechanistic evidence supporting the rationale for assessing MCI. However, evidence supporting the validity of the 10-item MCI test battery is inconclusive because it is available from only 1 high risk of bias study. Because the effectiveness of therapies informed by functional classification is generally unknown, it is unclear if such diagnosis can be used to both inform effective care and/or as an objective measure of condition severity or response to care.

Implication for future research

Standardized assessment protocols for determining MCI require well-defined procedures, operational definitions and quantifiable values. Standardizing these will facilitate more clinically useful findings and the ability to pool data from clinical trials [29]. Included classification systems used sagittal plane MCI assessment. Future studies should consider frontal and transverse planes to more comprehensively assess complex movement strategies. Further studies with lower risk of bias are needed to confirm the clinical usefulness of PBA classification. Finally, RCTs validating the clinical effectiveness of treatments based on functional assessment are needed.

Review strengths and limitations

This review employed methodology consistent with PRISMA guidelines. However, as with all systematic reviews, articles may have been missed in the database searches. A meta-analysis was not feasible, due to heterogeneity in the methodological design and the statistical analyses employed. Only articles in the English language were included. Another limitation is the inclusion low-quality studies (small sample sizes and no reference standards).

Conclusions

Evidence from multiple studies with low risk of bias demonstrates OCS as a reliable classification method. Strong inter-rater reliability also exists for using 3 tests of the 10-item MCI test battery. Evidence for the reliability and validity of the PBA is limited to one study with high risk of bias. While clinicians are encouraged to categorize the functional capacity of patients with NS-LBP using reliable methods, research evidence is not yet available to answer questions about the effectiveness of care informed by such classification.

References

Hartvigsen J, Hancock MJ, Kongsted A et al (2018) What low back pain is and why we need to pay attention. Lancet 391:2356–2367. https://doi.org/10.1016/S0140-6736(18)30480-X
Article PubMed Google Scholar
Woolf AD, Pfleger B (2003) Burden of major musculoskeletal conditions. Bull World Health Organ 81:646–656
PubMed PubMed Central Google Scholar
Buchbinder R, van Tulder M, Öberg B et al (2018) Low back pain: a call for action. Lancet 391:2384–2388. https://doi.org/10.1016/S0140-6736(18)30488-4
Article PubMed Google Scholar
da C Menezes Costa L, Maher CG, Hancock MJ, et al (2012) The prognosis of acute and persistent low-back pain: a meta-analysis. CMAJ 184:E613–E624. https://doi.org/10.1503/cmaj.111271
Article Google Scholar
Burton AK, McClune TD, Clarke RD, Main CJ (2004) Long-term follow-up of patients with low back pain attending for manipulative care: outcomes and predictors. Man Ther 9:30–35. https://doi.org/10.1016/s1356-689x(03)00052-3
Article PubMed Google Scholar
Wáng YXJ, Wu A-M, Ruiz Santiago F, Nogueira-Barbosa MH (2018) Informed appropriate imaging for low back pain management: a narrative review. J Orthop Transl 15:21–34. https://doi.org/10.1016/j.jot.2018.07.009
Article Google Scholar
Hancock MJ, Maher CG, Latimer J et al (2007) Systematic review of tests to identify the disc, SIJ or facet joint as the source of low back pain. Eur Spine J 16:1539–1550. https://doi.org/10.1007/s00586-007-0391-1
Article CAS PubMed PubMed Central Google Scholar
Maher C, Underwood M, Buchbinder R (2017) Non-specific low back pain. Lancet 389:736–747. https://doi.org/10.1016/S0140-6736(16)30970-9
Article PubMed Google Scholar
Balagué F, Mannion AF, Pellisé F, Cedraschi C (2012) Non-specific low back pain. Lancet 379:482–491. https://doi.org/10.1016/S0140-6736(11)60610-7
Article PubMed Google Scholar
Vining RD, Minkalis AL, Shannon ZK, Twist EJ (2019) Development of an evidence-based practical diagnostic checklist and corresponding clinical exam for low back pain. J Manipulative Physiol Ther 42:665–676. https://doi.org/10.1016/j.jmpt.2019.08.003
Article PubMed Google Scholar
Patel S, Psychol C, Friede T et al (2012) Systematic review of randomized controlled trials of clinical prediction rules for physical therapy in low back pain. Spine. https://doi.org/10.1097/BRS.0b013e31827b158f
Article PubMed PubMed Central Google Scholar
Amundsen PA, Evans DW, Rajendran D et al (2018) Inclusion and exclusion criteria used in non-specific low back pain trials: a review of randomised controlled trials published between 2006 and 2012. BMC Musculoskelet Disord 19:113. https://doi.org/10.1186/s12891-018-2034-6
Article PubMed PubMed Central Google Scholar
Foster NE, Hill JC, Hay EM (2011) Subgrouping patients with low back pain in primary care: are we getting any better at it? Man Ther 16:3–8. https://doi.org/10.1016/j.math.2010.05.013
Article PubMed Google Scholar
Petersen T, Laslett M, Thorsen H et al (2003) Diagnostic classification of non-specific low back pain. A new system integrating patho-anatomic and clinical categories. Physiother Theory Pract 19:213–237. https://doi.org/10.1080/09593980390246760
Article Google Scholar
Vining R, Potocki E, Seidman M, Morgenthal P (2013) An evidence-based diagnostic classification system for low back pain. J Can Chiropr Assoc 57:189–204
PubMed PubMed Central Google Scholar
Spitzer WO, LeBlanc FE, Dupuis M, Abenhaim L, Belanger AY, Bloch R, Bombardier C, Cruess RL, Drouin G, Duval-Hesler N, Laflamme J, Lamoureux G, Nachemson A, Page JJ, Rossignol M, Salmi LR, Salois-Arsenault S, Suissa SW-DS (1987) Scientific approach to the assessment and management of activity-related spinal disorders. A monograph for clinicians. Report of the Quebec Task Force on Spinal Disorders. Spine 12:S1-59
Article Google Scholar
Alrwaily M, Timko M, Schneider M et al (2016) Treatment-based classification system for low back pain: revision and update. Phys Ther 96:1057–1066. https://doi.org/10.2522/ptj.20150345
Article PubMed Google Scholar
Cosio D, Lin E (2018) Role of active versus passive complementary and integrative health approaches in pain management. Glob Adv Heal Med 7:216495611876849. https://doi.org/10.1177/2164956118768492
Article Google Scholar
Alhowimel A, AlOtaibi M, Radford K, Coulson N (2018) Psychosocial factors associated with change in pain and disability outcomes in chronic low back pain patients treated by physiotherapist: a systematic review. SAGE Open Med. https://doi.org/10.1177/2050312118757387
Article PubMed PubMed Central Google Scholar
Booth A, Clarke M, Dooley G et al (2012) The nuts and bolts of PROSPERO: an international prospective register of systematic reviews. Syst Rev 1:2. https://doi.org/10.1186/2046-4053-1-2
Article PubMed PubMed Central Google Scholar
Moher D, Liberati A, Tetzlaff J, Altman DG (2009) Preferred reporting items for systematic reviews and meta-analyses: the PRISMA statement. PLoS Med 6:e1000097. https://doi.org/10.1371/journal.pmed.1000097
Article PubMed PubMed Central Google Scholar
Brink Y, Louw QA (2012) Clinical instruments: reliability and validity critical appraisal. J Eval Clin Pract 18:1126–1132. https://doi.org/10.1111/j.1365-2753.2011.01707.x
Article PubMed Google Scholar
May S, Littlewood C, Bishop A (2006) Reliability of procedures used in the physical examination of non-specific low back pain: a systematic review. Aust J Physiother 52:91–102. https://doi.org/10.1016/S0004-9514(06)70044-7
Article PubMed Google Scholar
May S, Chance-Larsen K, Littlewood C et al (2010) Reliability of physical examination tests used in the assessment of patients with shoulder problems: a systematic review. Physiotherapy 96:179–190
Article PubMed Google Scholar
Barrett E, McCreesh K, Lewis J (2014) Reliability and validity of non-radiographic methods of thoracic kyphosis measurement: a systematic review. Man Ther 19:10–17. https://doi.org/10.1016/j.math.2013.09.003
Article PubMed Google Scholar
Vibe Fersum K, O’Sullivan PB, Kvale A, Skouen JS (2009) Inter-examiner reliability of a classification system for patients with non-specific low back pain. Man Ther 14:555–561. https://doi.org/10.1016/j.math.2008.08.003
Article CAS PubMed Google Scholar
Luomajoki H, Kool J (2007) Reliability of movement control tests in the lumbar spine. BMC Musculoskelet Disord 8:90. https://doi.org/10.1186/1471-2474-8-90
Article PubMed PubMed Central Google Scholar
Dankaerts W, O’Sullivan PB, Straker LM et al (2006) The inter-examiner reliability of a classification method for non-specific chronic low back pain patients with motor control impairment. Man Ther 11:28–39. https://doi.org/10.1016/j.math.2005.02.001
Article CAS PubMed Google Scholar
Enoch F, Kjaer P, Elkjaer A et al (2011) Inter-examiner reproducibility of tests for lumbar motor control. BMC Musculoskelet Disord 12:114. https://doi.org/10.1186/1471-2474-12-114
Article PubMed PubMed Central Google Scholar
O’Sullivan PB, Mitchell T, Bulich P et al (2006) The relationship beween posture and back muscle endurance in industrial workers with flexion-related low back pain. Man Ther 11:264–271. https://doi.org/10.1016/j.math.2005.04.004
Article PubMed Google Scholar
O’Sullivan K, Verschueren S, Van Hoof W et al (2013) Lumbar repositioning error in sitting: healthy controls versus people with sitting-related non-specific chronic low back pain (flexion pattern). Man Ther 18:526–532. https://doi.org/10.1016/j.math.2013.05.005
Article PubMed Google Scholar
O’Sullivan PB, Beales DJ, Beetham JA et al (2002) Altered motor control strategies in subjects with sacroiliac joint pain during the active straight-leg-raise test. Spine 27:E1-8. https://doi.org/10.1097/00007632-200201010-00015
Article PubMed Google Scholar
O’Sullivan PB, Burnett A, Floyd AN et al (2003) Lumbar repositioning deficit in a specific low back pain population. Spine 28:1074–1079. https://doi.org/10.1097/01.BRS.0000061990.56113.6F
Article PubMed Google Scholar
Hungerford B, Gilleard W, Hodges P (2003) Evidence of altered lumbopelvic muscle recruitment in the presence of sacroiliac joint pain. Spine 28:1593–1600. https://doi.org/10.1097/00007632-200307150-00022
Article PubMed Google Scholar
Burnett A, Cornelius M, Dankaerts W, O’Sullivan P (2004) Spinal kinematics and trunk muscle activity in cyclists: a comparison between healthy controls and non-specific chronic low back pain subjects—a pilot investigation. Man Ther 9:211–219. https://doi.org/10.1016/j.math.2004.06.002
Article PubMed Google Scholar
Dankaerts W, O’Sullivan P, Burnett A, Straker L (2006) Differences in sitting postures are associated with nonspecific chronic low back pain disorders when patients are subclassified. Spine 31:698–704. https://doi.org/10.1097/01.brs.0000202532.76925.d2
Article PubMed Google Scholar
Dankaerts W, O’Sullivan P, Burnett A, Straker L (2006) Altered patterns of superficial trunk muscle activation during sitting in nonspecific chronic low back pain patients: importance of subclassification. Spine 31:2017–2023. https://doi.org/10.1097/01.brs.0000228728.11076.82
Article PubMed Google Scholar
Dankaerts W, O’Sullivan P, Burnett A et al (2009) Discriminating healthy controls and two clinical subgroups of nonspecific chronic low back pain patients using trunk muscle activation and lumbosacral kinematics of postures and movements: a statistical classification model. Spine 34:1610–1618. https://doi.org/10.1097/BRS.0b013e3181aa6175
Article PubMed Google Scholar
Beales DJ, Ther MM, O’Sullivan PB, Briffa NK (2009) Motor control patterns during an active straight leg raise in chronic pelvic girdle pain subjects. Spine 34:861–870. https://doi.org/10.1097/BRS.0b013e318198d212
Article PubMed Google Scholar
Sheeran L, Sparkes V, Caterson B et al (2012) Spinal position sense and trunk muscle activity during sitting and standing in nonspecific chronic low back pain: classification analysis. Spine 37:E486–E495. https://doi.org/10.1097/BRS.0b013e31823b00ce
Article PubMed Google Scholar
Van Hoof W, Volkaerts K, O’Sullivan K et al (2012) Comparing lower lumbar kinematics in cyclists with low back pain (flexion pattern) versus asymptomatic controls—field study using a wireless posture monitoring system. Man Ther 17:312–317. https://doi.org/10.1016/j.math.2012.02.012
Article PubMed Google Scholar
Hemming R, Sheeran L, van deursen R, Sparkes V, (2019) Investigating differences in trunk muscle activity in non-specific chronic low back pain subgroups and no-low back pain controls during functional tasks: a case-control study. BMC Musculoskelet Disord 20:459. https://doi.org/10.1186/s12891-019-2843-2
Article PubMed PubMed Central Google Scholar
Hemming R, Sheeran L, van Deursen R, Sparkes V (2017) Non-specific chronic low back pain: differences in spinal kinematics in subgroups during functional tasks. Eur Spine J. https://doi.org/10.1007/s00586-017-5217-1
Article PubMed Google Scholar
Sheeran L, Sparkes V, Whatling G et al (2019) Identifying non-specific low back pain clinical subgroups from sitting and standing repositioning posture tasks using a novel cardiff Dempster–Shafer theory classifier. Clin Biomech. https://doi.org/10.1016/j.clinbiomech.2019.10.004
Article Google Scholar
Biele C, Moller D, von Piekartz H et al (2019) Validity of increasing the number of motor control tests within a test battery for discrimination of low back pain conditions in people attending a physiotherapy clinic: a case–control study. BMJ Open 9:e032340. https://doi.org/10.1136/bmjopen-2019-032340
Article PubMed PubMed Central Google Scholar
Meyer K, Klipstein A, Oesch P et al (2016) Development and validation of a pain behavior assessment in patients with chronic low back pain. J Occup Rehabil 26:103–113. https://doi.org/10.1007/s10926-015-9593-2
Article PubMed Google Scholar
Landis JR, Koch GG (1977) The measurement of observer agreement for categorical data data for categorical of observer agreement. Biometrics 33:159–174
Article CAS PubMed Google Scholar
Ford J (2003) A systematic review on methodology of classification system research for low back pain. In: Musculoskeletal physiotherapy Australia 13th biennial conference, Sydney, Australia, 2003
Anderson JA (1977) Problems of classification of low-back pain. Rheumatol Rehabil 16:34–36. https://doi.org/10.1093/rheumatology/16.1.34
Article CAS PubMed Google Scholar
Deyo RA, Haselkorn J, Hoffman R, Kent DL (1994) Designing studies of diagnostic tests for low back pain or radiculopathy. Spine 19:2057S-2065S. https://doi.org/10.1097/00007632-199409151-00007
Article CAS PubMed Google Scholar
Fairbank JCT, Pynsent PB (1992) Syndromes of back pain and their classification. In: The Lumbar spine and back pain. Edinburgh: Churchill Livingstone
Petersen T, Thorsen H, Manniche C, Ekdahl C (1999) Classification of non-specific low back pain: a review of the literature on classifications systems relevant to physiotherapy. Phys Ther Rev 4:265–281. https://doi.org/10.1179/108331999786821690
Article Google Scholar
Ford J, Story I, O’Sullivan P, McMeeken J (2007) Classification systems for low back pain: a review of the methodology for development and validation. Phys Ther Rev 12(33–42):10p
Google Scholar
Woolf CJ, Bennett GJ, Doherty M et al (1998) Towards a mechanism-based classification of pain. Pain 77:227–229
Article PubMed Google Scholar
McCarthy CJ, Arnall FA, Strimpakos N et al (2004) The biopsychosocial classification of non-specific low back pain: a systematic review. Phys Ther Rev 9:17–30. https://doi.org/10.1179/108331904225003955
Article Google Scholar
Fairbank J, Gwilym S, France J, Daffner S (2011) The role of classification of chronic low back pain. Spine 1:36. https://doi.org/10.1097/BRS.0b013e31822ef72c
Article Google Scholar
Salvioli S, Pozzi A, Testa M (2019) Movement control impairment and low back pain: state of the art of diagnostic framing. Medicina (Kaunas). https://doi.org/10.3390/medicina55090548
Article Google Scholar
Carlsson H, Rasmussen-Barr E (2013) Clinical screening tests for assessing movement control in non-specific low-back pain. A systematic review of intra-and inter-observer reliability studies. Man Ther 18:103–110. https://doi.org/10.1016/j.math.2012.08.004
Article PubMed Google Scholar
Murphy SE, Blake C, Power CK, Fullen BM (2016) Comparison of a stratified group intervention (STarT back) with usual group care in patients with low back pain: a nonrandomized controlled trial. Spine 41:645–652. https://doi.org/10.1097/BRS.0000000000001305
Article PubMed Google Scholar
Mjøsund HL, Boyle E, Kjaer P et al (2017) Clinically acceptable agreement between the ViMove wireless motion sensor system and the Vicon motion capture system when measuring lumbar region inclination motion in the sagittal and coronal planes. BMC Musculoskelet Disord 18:124. https://doi.org/10.1186/s12891-017-1489-1
Article PubMed PubMed Central Google Scholar
Gracovetsky S, Newman N, Pawlowsky M et al (1995) A database for estimating normal spinal motion derived from noninvasive measurements. Spine 20:1036–1046. https://doi.org/10.1097/00007632-199505000-00010
Article CAS PubMed Google Scholar
Mannion AF, Knecht K, Balaban G et al (2004) A new skin-surface device for measuring the curvature and global and segmental ranges of motion of the spine: reliability of measurements and comparison with data reviewed from the literature. Eur Spine J 13:122–136. https://doi.org/10.1007/s00586-003-0618-8
Article PubMed Google Scholar
Öztuna D, Elhan AH, Tüccar E (2006) Investigation of four different normality tests in terms of type 1 error rate and power under different distributions. Turkish J Med Sci 36:171–176
Google Scholar
Thode HC (2002) Statistics: textbooks and monographs 164 Testing for normality. CRC Press, New York, NY
Google Scholar

Download references

Author information

Authors and Affiliations

Faculty of Physical Therapy, Cairo University, Cairo, Egypt
Ahmed Omar Abdelnaeem, Aliaa Rehan Youssef, Nesreen Fawzy Mahmoud & Nadia Abdalazeem Fayaz
Faculty of Physical Therapy, Ahram Canadian University, Giza, Egypt
Aliaa Rehan Youssef
Palmer Center for Chiropractic Research, Palmer College of Chiropractic, Davenport, IA, USA
Robert Vining
Cairo, Egypt
Ahmed Omar Abdelnaeem

Authors

Ahmed Omar Abdelnaeem
View author publications
You can also search for this author in PubMed Google Scholar
Aliaa Rehan Youssef
View author publications
You can also search for this author in PubMed Google Scholar
Nesreen Fawzy Mahmoud
View author publications
You can also search for this author in PubMed Google Scholar
Nadia Abdalazeem Fayaz
View author publications
You can also search for this author in PubMed Google Scholar
Robert Vining
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ahmed Omar Abdelnaeem.

Ethics declarations

Conflict of interest

The authors have no conflicts of interest to declare that are relevant to the content of this article.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix 1 Search strategies of the searched databases and journals

Database/journal	Last citation no	Keywords
PubMed	364	(((((Non specific OR non-specific OR nonspecific OR mechanical))) AND ((low back pain OR simple backache OR lumbar strain OR spinal degeneration))) AND ((clinical test OR clinical examination OR clinical sign))) AND ((valid* OR reliabl*)) simple search
EMbase	738	(clinical) AND (test* OR exam* OR sign) AND (non-specific OR nonspecific OR 'non specific' OR mechanical OR simple) AND (low back pain OR back pain OR LBP) AND (reliab OR valid*) in English only and limited to human plus searching in EMbase only
Cochrane	92	(Non specific or non-specific or nonspecific or mechanical) and (low back pain or simple backache or lumbar strain or spinal degeneration) and (clinical test or clinical examination or clinical sign) and (valid* or reliabl*) in search manager choose in Trials, Methods Studies, Technology Assessments and Economic Evaluations (Word variations have been searched)
PEDro	226	Non specific low back pain (abstract and title) in advanced search (method clinical trials)
CINHAL	286	(Non specific OR non-specific OR nonspecific OR mechanical) AND (low back pain OR simple backache OR lumbar strain OR spinal degeneration) AND (clinical test OR clinical examination OR clinical sign) AND (valid* OR reliabl*) in advanced search
ProQuest	358	(("non specific" OR "non-specific" OR "nonspecific" OR "mechanical back pain") AND ("back pain" OR "lumbar strain" OR "simple backache") AND ("clinical test" OR "clinical examination" OR "clinical sign") AND ("valid" OR "reliab")) AND la.exact("ENG")
Physical therapy journal	460	"non specific" "non-specific" "nonspecific" "mechanical back pain" "back pain" "lumbar strain" "simple backache" "clinical test" "clinical examination" "clinical sign" "valid" "reliab"
Chiroindex	79	"non specific" "non-specific" "nonspecific" "mechanical back pain" "back pain" "lumbar strain" "simple backache" "clinical test" "clinical examination" "clinical sign" "valid" "reliab"
Australian journal of physiotherapy	54	Non specific in Title/Abs/Keywords OR nonspecific inTitle/Abs/Keywords OR non-specific in Title/Abs/Keywords AND Low Back Pain inTitle/Abs/Keywords OR Mechanical low back pain in Title/Abs/Keywords OR simple backache in Title/Abs/Keywords
Canadian physiotherapy In advanced search	113	Non specific OR non-specific OR nonspecific OR mechanical AND low back pain AND clinical tests OR clinical examination OR clinical sign AND valid* OR reliabl*
physiotherapy theory and practice journal	113	Non specific OR non-specific OR nonspecific OR mechanical AND low back pain AND clinical test OR clinical examination OR clinical sign AND valid* OR reliabl*

Appendix 2 Systematic review critical appraisal tool (Reproduced from Brink and Louw (2011))

Item 1: If human subjects were used, did the authors give a detailed description of the sample of subjects used to perform the (index) test on?

Why the criterion should be evaluated: The validity and reliability of a test will be affected by the sample characteristics or composition, and therefore, the study has to report on the sample characteristics because the validity and reliability scores will then only be applicable to that particular population. A study does not contribute to validity and reliability testing if the subjects were not recruited appropriately
This item can be scored yes if:
1 the sample characteristics (e.g., height, weight, age, diagnosis and symptom status) were described or the manner of recruiting subjects was stated or if selection criteria were applied
If none of the above have been described or if insufficient information was provided, select “no.” If inhuman or inanimate objects were used, select N/A

Item 2: Did the authors clarify the qualification, or competence of the rater(s) who performed the (index) test?

Why the criterion should be evaluated: The amount of experience of the rater(s), performing the (index) test, will influence the validity and reliability scores and needs to be explained
This item can be scored yes if:
1 the rater(s) characteristics (e.g., qualification, specialization and amount of experience using the instrument under investigation) have been described
If the above have not been described or insufficient information was provided, select “no”

Item 3: Was the reference standard explained?

Why the criterion should be evaluated: The index test scores need to be compared to the scores obtained from the reference standard in order to test validity, and therefore, the reference standard needs to be explained appropriately
This item can be scored yes if:
1 the reference standard is likely to produce correct measurements;
2 the reference standard is the best method available; and
3 details (name of the instrument, references to the accuracy of the instrument) of the reference standard are reported
If none of the above is applicable to the reference standard’s description, then select “no”

Item 4: If inter-rater reliability was tested, were raters blinded to the findings of other raters?

Why the criterion should be evaluated: When raters have access to the findings of other raters, it compromises the quality of the reliability testing procedure by inflating the agreement among the raters, and therefore, blinding needs to be performed
This item can be scored yes if:
1 it is stated that the raters were blinded to each other’s findings or if a description that implies that the raters were blinded was reported
If no information is provided, then select “no.” If intra-rater reliability was examined, then select “N/A”

Item 5: If intra-rater reliability was tested, were raters blinded to their own prior findings of the test under evaluation?

Why the criterion should be evaluated: If raters have knowledge of their prior own findings, it will influence the findings of their repeated measurements and could inflate the rater agreement, and therefore, appropriate measures, depending on the characteristics or the study design of the research study, need to be applied to ensure blinding
This item can be scored yes if:
1 rater(s) has/have examined the same subjects on more than one occasion, it should be stated whether the rater(s) was/were blinded to the subjects they have examined previously
If insufficient information is provided, then select “no.” If inter-rater reliability was examined, then select “N/A”

Item 6: Was the order of examination varied?

Why the criterion should be evaluated: If the order is varied, in which the raters examine the subjects when inter-rater reliability is tested, it reduces the risk of systematic bias. If the order is varied in which subjects are examined by one rater when intra-rater reliability is tested, it reduces the risk of the rater recalling the previous test scores and reduces bias
This item can be scored yes if:
1 the order in which subjects were tested varied between raters if inter-rater reliability was tested;
2 the order of subjects was varied when intra-rater reliability was tested
If insufficient information is provided, then select “no.” If varied order of examination is unnecessary or impractical (e.g., rater(s) digitizing or reading X-rays) then select “N/A”

Item 7: If human subjects were used, was the time period between the reference standard and the index test short enough to be reasonably sure that the target condition did not change between the two tests?

Why the criterion should be evaluated: The index test and the reference standard should be performed at the same time; however, this is not always possible. It becomes important to know whether it is possible that the test variable did not change between the two tests, otherwise it will affect the index test’s validity performance
This item can be scored yes if:
1 result from the index test and the reference standard were collected on the same subjects at the same time;
2 a delay between measurements occurs, it is important that the target condition should not change between measurements
If the time period between performing the index test and the reference standard was sufficiently long that the target condition may have changed between the two tests or if insufficient information is provided, then select “no.” If inhuman or inanimate objects were used, then select N/A

Item 8: Was the stability (or theoretical stability) of the variable being measured considered when determining the suitability of the time interval between repeated measures?

Why the criterion should be evaluated: For reliability, the test variable should not change between repeated measures, otherwise it will decrease the amount of agreement obtained between and within the rater(s)
This item can be scored yes if:
1 the stability of the variable is known or reported, and reviewers then decide on an appropriate time interval between repeated measures (stability of a test variable can only be determined if there is a reference standard);
2 there is no reference standard, then the reviewers should agree upon the theoretical stability of the variable and decide on an appropriate time interval between repeated measures
If insufficient information is provided, then select “no”

Item 9: Was the reference standard independent of the index test?

Why the criterion should be evaluated: If the reference standard and the index test are not independently performed, then the index test cannot replace the reference standard on its own
This item can be scored yes if:
1 it is clear from the study that the index test did not form part of the reference standard
If it appears that the index test formed part of the reference standard, then select “no”

Item 10: Was the execution of the (index) test described in enough detail to permit replication of the test?

Why the criterion should be evaluated: Variations in the execution of the reference standard and the (index) test might affect the agreement between the two tests and it is also important to be able to replicate the same study procedure in another setting when needed
This item can be scored yes if:
1 the study reported a clear description of the measurement procedure (e.g., the positioning of the instrument or rater and execution sequence of events);
2 citations of methodology were supplied
The extent to which details is expected to be reported depends on the ability of different procedures to influence the results and on the type of instrument or test under evaluation
If insufficient information is provided, then select “no”

Item 11: Was the execution of the reference standard described in enough detail to permit its replication?

Why the criterion should be evaluated: For the same reason as item 10
This item can be scored yes if:
1 the study reported a clear description of the measurement procedure (e.g., the positioning of the instrument or rater and execution sequence of events);
2 citations were supplied
If insufficient information is provided, then select “no”

Item 12: Were withdrawals from the study explained?

Why the criterion should be evaluated: The sample composition will influence the validity and reliability performance of the (index) test; therefore, it is important to know whether any withdrawals from the sample might have changed the composition of the sample
This item can be scored yes if:
1 it is clear what happened to all subjects who entered the study;
2 subjects who entered but did not complete the study are considered
If it appears that subjects who entered but did not complete the study were not accounted for or if insufficient information is provided, then select “no.” If inhuman or inanimate objects were used, then select N/A

Item 13: Were the statistical methods appropriate for the purpose of the study?

Why the criterion should be evaluated: The aim of validity and reliability studies is to report on an estimate of validity and reliability for the particular test and appropriate statistical methods need to be implemented in order to produce this estimate
This item can be scored yes if:
1 the analysis is appropriate in terms of the type of data (e.g., categorical, continuous and dichotomous);
2 statistical analysis for validity studies incorporates, for example means, differences between measurements, 95% confidence interval and ANOVA; and
3 statistical analysis for reliability studies incorporates, for example, interclass correlation coefficient and 95% confidence interval
If the analysis is not appropriate or if insufficient information was provided, then select “no”

Appendix 3 Classification processes of OCS

Rights and permissions

Reprints and permissions

About this article

Cite this article

Abdelnaeem, A.O., Rehan Youssef, A., Mahmoud, N.F. et al. Psychometric properties of chronic low back pain diagnostic classification systems: a systematic review. Eur Spine J 30, 957–989 (2021). https://doi.org/10.1007/s00586-020-06712-0

Download citation

Received: 23 November 2020
Revised: 23 November 2020
Accepted: 27 December 2020
Published: 20 January 2021
Issue Date: April 2021
DOI: https://doi.org/10.1007/s00586-020-06712-0

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Psychometric properties of chronic low back pain diagnostic classification systems: a systematic review

Abstract

Objectives

Methods

Results

Conclusions

Similar content being viewed by others

Development of a standard set of outcome measures for non-specific low back pain in Dutch primary care physiotherapy practices: a Delphi study

Physiotherapeutic and non-conventional approaches in patients with chronic low-back pain: a level I Bayesian network meta-analysis

Improving Rehabilitation Research to Optimize Care and Outcomes for People with Chronic Primary Low Back Pain: Methodological and Reporting Recommendations from a WHO Systematic Review Series

Introduction

Methods

Search strategy

Eligibility criteria

Data collection and analysis

Selection of studies

Data extraction

Risk of bias assessment

Results

Selection of studies

Characteristics of included studies

Demographic data of participants in eligible studies

Classification systems

Reliability of different classification systems

MCI test battery

Validity of different classification systems

Risk of bias assessment

Discussion

The OCS system

MCI test battery

PBA

Risk of bias assessment of individual studies

Implications for clinical practice

Implication for future research

Review strengths and limitations

Conclusions

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Appendices

Appendix 1

Search strategies of the searched databases and journals

Appendix 2

Systematic review critical appraisal tool (Reproduced from Brink and Louw (2011))

Appendix 3

Classification processes of OCS

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation