Abstract
Objectives
To identify and critically appraise studies evaluating psychometric properties of functionally oriented diagnostic classification systems for Non-Specific Chronic Low Back Pain (NS-CLBP).
Methods
This review employed methodology consistent with PRISMA guidelines. Electronic databases and journals: (PubMed, EMBASE, Cochrane, PEDro, CINAHL, Index to chiropractic literature, ProQuest, Physical Therapy, Journal of Physiotherapy, Canadian Physiotherapy and Physiotherapy Theory and Practice) were searched from inception until January 2020. Included studies evaluated the validity and reliability of NS-CLBP diagnostic classification systems in adults. Risk of bias was assessed using a Critical Appraisal Tool.
Results
Twenty-two studies were eligible: Five investigated inter-rater reliability, and 17 studies analyzed validity of O’Sullivan’s classification system (OCS, n = 15), motor control impairment (MCI) test battery (n = 1), and Pain Behavior Assessment (PBA, n = 1). Evidence from multiple low risk of bias studies demonstrates that OCS has moderate to excellent inter-rater reliability (kappa > 0.4). Also, two low risk of bias studies support of OCS-MCI subcategory. Three tests within the MCI test battery show acceptable inter- and intra-rater reliability for clinical use (the "sitting knee extension," the “one leg stance,” and the “pelvic tilt” tests). Evidence for the reliability and validity of the PBA is limited to one high bias risk study.
Conclusions
Multiple low risk of bias studies demonstrate strong inter-rater reliability for OCS classification specifically OCS-MCI subcategory. Future studies with low risk of bias are needed to evaluate reliability and validity of the MCI test battery and the PBA.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
Introduction
Low back pain (LBP) is a highly prevalent problem [1, 2] and a major cause of disability worldwide [1, 3], with a tendency to recur or persist, leading to chronicity [4, 5]. Often, no definitive pathology or underlying mechanism can be identified [6]. To account for uncertain and potentially multiple contributing factors, chronic LBP is commonly labeled as Non-Specific Chronic LBP (NS-CLBP) [1, 7,8,9]. NS-CLBP is a general diagnosis encompassing a wide range of conditions, symptoms, and clinical features. However, such diagnosis does not distinguish characteristics to guide specific clinical decision-making [10].
There is currently no optimum treatment strategy for NS-CLBP [11]. This is partially due to a lack of standardized, valid, and reliable methods which classify specific characteristics. An effective classification system is expected to be based on studies reporting selection criteria in clear and standardized terms [12]; requirements that have been proposed as a key research priority [13].
There are many approaches to classify NS-CLBP, such as identifying symptom sources [14,15,16], functional characteristics, and/or psychosocial risk [17]. Functional evaluation, which generally assesses how people move and perform movement tasks, is typically designed to both inform treatment and be used as a clinical outcome. Functional classification is promising, in part because it identifies characteristics of a condition and offers information about potential mechanisms contributing to chronicity. Active treatments designed to address functional deficits may also positively influence self-efficacy, reduce fear avoidance behaviors, and promote symptom self-management [18, 19].
Despite the potential importance of functional classification as a diagnostic process, to the authors’ knowledge, no previous comprehensive evaluation of the reliability and validity of existing classification systems has been performed. Therefore, the aims of this review were to describe and critically appraise the psychometric properties of functionally oriented NS-CLBP diagnostic classification systems.
Methods
This systematic review was registered on PROSPERO (CRD42015023958) [20] and conducted according to PRISMA guidelines [21].
Search strategy
A systematic electronic and manual search of the literature published in English, from inception until January 2020, was conducted in the following electronic databases and Journals: PubMed, EMBASE, Cochrane, PEDro, CINAHL, Index to chiropractic literature, ProQuest, Physical Therapy, Journal of Physiotherapy, Canadian Physiotherapy and Physiotherapy Theory and Practice.
The search strategy consisted of keywords and Medical Subject headings (MeSH) related to “Non specific,” “mechanical,” “low back pain,” “simple backache,” “lumbar strain,” “spinal degeneration,” “classification,” “clinical test,” “clinical examination,” “clinical sign,” “valid*,” and “reliabl*.” Searching strategy details are available in Appendix 1. Reference lists of eligible articles were also searched for relevant publications.
Eligibility criteria
Inclusion and exclusion criteria are summarized in Table 1.
Data collection and analysis
Selection of studies
Studies from electronic databases and manual searches were imported into EndNote X5.0.1 and checked for eligibility by two reviewers (AA and NFM) independently; first by title, abstract, and finally by full text. Discordance was resolved through discussion with co-authors (ARY and RV).
Data extraction
Two reviewers (AA and NFM) independently extracted all relevant information into an Excel spreadsheet. All discrepancies were resolved through discussion.
Risk of bias assessment
Risk of bias was assessed independently by two authors (AA and NFM) using the Critical Appraisal Tool (CAT) developed by Brink and Louw [22] (Appendix 2). This tool consists of 13 items designed to appraise the quality of the validity and/or reliability studies; 4 items assess reliability studies, 4 items evaluate validation studies, and the remaining 5 items evaluate both validity and reliability studies. Each item is scored as “Yes,” “No,” or “Not Applicable (N/A)” [22]. Disagreement was resolved through consensus discussion. Studies were considered of low risk of bias if they scored \(\ge \hspace{0.17em}\)60% [23,24,25].
Results
Selection of studies
Database and manual searching identified a total of 2899 articles. After full article screening, 22 studies published in 21 articles (one article reported 2 studies) were included. The article screening process is outlined in Fig. 1.
Characteristics of included studies
The methodological approach of each included study is depicted in Tables 2 and 3. Of the 22 included studies, 5 investigated inter-rater reliability [26,27,28,29], with one study reporting both intra- and inter-rater reliability [27] (Table 2).
Validity was assessed in 17 studies; 15 studies evaluated O’Sullivan’s Classification System (OCS) [30,31,32,33,34,35,36,37,38,39,40,41,42,43,44], one assessed the 10-item Motor Control Impairment (MCI) test battery [45], and one investigated the Pain Behavior Assessment (PBA) classification system [46]. All validity studies were cross-sectional design except 2 studies [42, 45] were cross-sectional case–control design (Table 3).
Demographic data of participants in eligible studies
Sixteen studies enrolled asymptomatic participants as controls [27, 29,30,31,32,33,34,35,36,37,38, 40,41,42,43, 45]. Sample size ranged from 12 [39] to 200 [46]. Participant mean age ranged from 28.4 [41] to 55.1 years [27]. The mean body mass index (BMI) ranged from 20.8 [42, 43] to 26.9 kg/m2 [45], although four studies did not report BMI [26, 27, 29, 46].
Classification systems
All eligible studies described three different classification systems (Tables 2, 3):
-
1.
OCS (n = 18 studies) classifies NS-CLBP as predominantly centrally (e.g., central sensitization) or peripherally mediated (e.g., injury, inflammation of peripheral tissues). OCS also includes a psychosocial assessment step and separates pain presumed from lumbar and pelvic origin. Functional testing of lumbar and pelvic girdle pain evaluates presumed motor control impairment by identifying specific postural and movement characteristics [26] (Appendix 3).
-
2.
MCI Test Battery (n = 3 studies) used specific movements/positions to differentiate participants with MCI from normal individuals. This battery consists of 10 individual tests that identify possible flexion, extension and rotational dysfunction. Assessment is dichotomous (impairment or no impairment) with the severity described on 3 levels (none, mild, moderate/severe) [45].
-
3.
PBA classification (n = 1 study) rates: (1) pain perception; (2) overt pain behavior (e.g., guarding movements); (3) effort during physical test performance; and (4) consistency of behavior across different situations of clinical testing. Categories include no pain, low pain or high pain behaviors [46].
Reliability of different classification systems
OCS and MCI test Battery inter-rater reliability testing was assessed in 5 studies (Table 4).
Inter-rater reliability of using the entire OCS classification system (all steps) was moderate (kappa (K) > 0.4) [26]. For levels 1–4, the mean agreement (%) was excellent (96%; range 75–100%). For the fifth level, K- and mean agreement between 4 testers was strong 0.82 (range 0.66–0.90) and 86% (range 73–92%), respectively. The final classification level had a moderate mean K of 0.65 (range 0.57–0.74) and excellent agreement of 87% (range 85–92%) [26]. Within the OCS-MCI subcategory, the most reliably identified subgroup is the passive extension pattern (PEP) (K = 0.90; strong), while the least reliable is the active extension pattern (AEP) (K = 0.66; moderate) [26].
The OCS-MCI subcategory was also tested in 2 studies reported within a single article [28]. In the first study, the inter-rater agreement was excellent (K = 0.96 and %-of-agreement = 97%). In the second study, the inter-rater agreement was moderate (K = 0.61) on average and ranged from 0.47 to 0.80, while the mean agreement was 70%, ranging from 60 to 84%, among 13 examiners who assessed 25 cases including subjective information from participants and video recorded functional tests [28].
MCI test battery
In one study, four examiners independently rated video recordings of 27 participants with NS-CLBP and 13 controls performing 10 MCI tests. K values for inter-rater reliability ranged between minimal and moderate (0.24–0.71). Six out of 10 tests showed substantial K > 0.6 inter-rater-reliability. The most reliable tests (for both rater pairs) were the “pelvic tilt” for extension dysfunction, “one leg stance” for rotational dysfunction and “sitting knee extension” and “waiter’s bow” tests for flexion dysfunction. The poorest reliability was reported for the “abduction in crook lying" test for rotational dysfunction where both rater pairs had low K-values (K = 0.44; 95%, CI 0.18–0.70 and K = 0.32; 95% CI 0.10–0.54). Intra-tester reliability ranged from 0.51 to 0.96. All tests, except abduction in crook lying, showed substantial reliability (K > 0.6) [27].
The second study reported the inter-rater reliability between two examiners who independently examined 25 participants with NS-CLBP and 15 asymptomatic controls using the five MCI clinical tests. Intra-class correlation coefficients (ICCs) were excellent (0.90) for repositioning (RPS), 0.96 for sitting forward lean (SFL), 0.96 for sitting knee extension (SKE), 0.94 for bent knee fall out (BKFO) and 0.98 for leg lowering (LL) [29].
Validity of different classification systems
All three diagnostic systems underwent some aspect of validation testing (Table 5):
-
1.
OCS Fifteen studies assessed OCS validity (12 for OCS-MCI subcategories and 3 for sacroiliac joint dysfunction (SIJD)).
OCS-MCI construct validity was reported by measuring lumbosacral kinematics and trunk muscle activation in 33 participants with NS-CLBP (20 Flexion Pattern (FP) and 13 AEP) and 34 asymptomatic controls. The biomechanical model used lower lumbar kinematics in sitting and forward bending and two trunk muscle activation variables (lack of flexion relaxation of the superficial lumbar multifidus in slump sitting and end range of forward bending). The model correctly classified 96.4% of cases and distinguished between individuals with No LBP, AEP and FP [38].
Discriminant validity of MCI subcategories through spinal kinematics testing:
Sitting postures were tested to distinguish participants with NS-CLBP with AEP (lordotic lumbar posture) and FP (kyphotic lumbar posture) from asymptomatic controls (P < 0.001). Participants with NS-CLBP had less ability to consciously alter posture when asked to slump from usual sitting (P < 0.001) [36]. Similar findings were reported during cycling as participants with NS-CLBP (FP) exhibited greater lumbar region flexion compared to asymptomatic controls (p = 0.018) and reported remarkable pain increase over 2 hours of cycling (p < 0.001) [41]. Further, cyclists with NS-CLBP (FP) showed increased, although non-significant, lumbar flexion and rotation tendency compared to controls (P > 0.05) [35]. In another study, participants with NS-CLBP (FP) sat with less hip flexion (P = 0:05), suggesting a relative posterior pelvic tilt. During “usual” sitting, the FP group positioned the lumbar spine significantly closer to end range lumbar flexion compared to asymptomatic controls [30].
Functional tasks: spinal kinematics during functional tasks were not different between the AEP and asymptomatic controls in a single study [43]. However, the AEP group distinctively adopted more upper lumbar and lower thoracic (T6—L3) extension compared to the FP group which adopted more flexion during these activities (p < 0.05). The FP group also exhibited greater thoraco-lumbar kyphosis than asymptomatic controls [43].
Spinal position sense (SPS): Lumbar repositioning accuracy was assessed in 3 studies [31, 33, 40]. Participants with NS-CLBP, compared to asymptomatic controls, developed substantially greater magnitude of Absolute Error (AE) [31, 40] and Variable Error (VE) [40]. The FP group underestimated lumbar target positions [31, 33, 40], while the AEP group overestimated lumbar and underestimated thoracic target positions compared to FP [40]. The Cardiff Dempster–Shafer Theory (DST) Classifier method, based on objective measures of repositioning sense during sitting and standing, discriminated the No LBP from NS-CLBP (pooled and in subsets) with an accuracy ranging between 93.83 and 98.15%. Further, the DST classifier method distinguished different NS-CLBP subgroups with an accuracy of 96.8%, 87.7% and 70.27% for FP from PEP, FP from AEP and AEP from PEP subtypes, respectively. Finally, ranking analysis showed that lumbar AE in sitting could distinguish participants with NS-CLBP from No LBP and FP from No LBP, while lumbar constant error in standing consistently discriminated LBP extension subsets (AEP and PEP) from No LBP [44].
Discriminant validity of MCI subcategories through trunk muscle activity testing:
Surface electromyography (sEMG) recorded from five trunk muscles during unsupported “usual” and “slumped” sitting postures could not distinguish trunk muscle activity between asymptomatic controls and a pooled NS-CLBP group. However, compared to controls, participants classified with AEP presented with significantly higher co-contraction of lumbar multifidus, ilio-costalis lumborum pars thoracis and transverse fibers of internal oblique muscles (p < 0.05) [37]. Burnett et al. [35] reported less co-contraction of the lower lumbar multifidus in the FP group compared to controls during a cycling task [35]. Sheeran et al. [40] reported the NS-CLBP (FP and AEP combined) group produced significantly higher abdominal activity (p < 0.01) compared to controls during usual sitting and standing postures. Hemming et al. [42] also reported significantly greater muscle activation in right-sided superficial lumbar multifidus muscles during the functional tasks of step up, reach up and box replace (p < 0.05). External oblique muscle contraction during box lift differed significantly between participants with AEP and asymptomatic controls (p = 0.016). Significant differences between participants with FP and asymptomatic controls were also reported for left-sided transversus abdominis/internal oblique and superficial lumbar multifidus activity during stand-to-sit tasks (p = 0.009) [42].
SIJD subcategory: Two studies reported decreased diaphragmatic excursion, altered respiratory patterns and depression of the pelvic floor (PF) in participants with NS-CLBP during the ASLR test compared to controls [32, 39]. When the examiner added manual pelvic compression to the ASLR test, there were no differences between the two groups. Manual pelvic compression during the ASLR theoretically improves load transfer by enhancing passive stability of the SIJs and MC patterns/force closure [32]. Another study reported delayed activation of obliquus internus abdominis (OI), multifidus and gluteus maximus muscles in patients with SIJD compared to controls. Delayed OI and multifidus activation occurred in both the symptomatic and the asymptomatic sides in the SIJD group. Biceps femoris activation occurred earlier in SIJP group [34].
-
2.
MCI Test Battery: One study assessed the clinical validity of the MCI test battery for classifying participants with NS-CLBP [45]. For both the two-class (impairment or not) and the three-class (none, mild/moderate and severe) categorization, the ideal number of MCI tests was 10. The overall discrimination potential for two-class categorization was good (Area Under the Curve (AUC) > 0.8, sensitivity = 0.75, specificity = 0.82, Youden = 0.57, LR + = 3.40, LR- = 0.20, effect size = 1.45), with an optimal cutoff of three tests. To classify MCI, at least four failed items are needed. The overall discrimination potential for the three-class categorization was fair (volume under the surface > 0.5, sensitivity = 0.48, sensitivity = 0.50, specificity = 0.82, Youden = 0.40, effect size = 1.56), with an optimal cutoff of three and six tests. At least four failed MCI tests are needed to classify mild/moderate MCI and six or more failed tests classify severe cases [45].
-
3.
PBA: The internal consistency (reliability) of PBA showed good person separation index (0.83). Construct validity evaluated by Rasch analysis resulted in 41 items. PBA convergent validity was supported by a significant correlation with other questionnaires [46].
Risk of bias assessment
Risk of bias assessment showed an excellent inter-assessor agreement (92.2% and K = 0.84) [47]. Nine studies [30,31,32, 34, 39,40,41, 45] did not clarify evaluators’ characteristics (Item 2). Reference standard tests (Item 3) were reported only in two studies [38, 44] and were performed independently (Item 9). All inter-rater reliability studies used raters blinded to each other’s findings [26,27,28,29] (Item 4). All studies reported clear descriptions of measurement procedures (Item 10). Appropriate reliability and validity statistical methods (item 13) were employed in 21 studies, while one study employed the Kolmogorov–Smirnov test, a less than ideal test used to confirm normality distribution in a small sample study (n < 50) [41]. Overall, all five reliability studies [26,27,28,29] and two OCS validity studies [38, 44] were rated as low risk of bias. The remaining validity studies were rated as high risk [30,31,32,33,34,35,36,37, 39,40,41,42,43, 45, 46] (Table 6).
Discussion
This systematic review identified and critically appraised studies reporting reliability and validity of functionally oriented NS-CLBP diagnostic classification systems; specifically, the OCS, MCI test battery, and PBA systems. Of the 3 systems evaluated through studies included in this review, the OCS is the most reliable and valid. All included reliability studies were consistently rated as high quality. However, validity in this context is limited to the capacity to systematically identify different muscle activation patterns and spinal kinematic changes from each other and controls (construct and discriminant validity) as demonstrated in two high-quality studies. The remaining reviewed studies that assessed some aspects of OCS validation had high risk of bias. Limited evidence supports acceptable inter- and intra-rater reliability for clinical use of the following MCI test battery tests: "sitting knee extension" (to identify flexion dysfunction), “one leg stance” (for rotational dysfunction) and “pelvic tilt” (for extension dysfunction). Evidence supporting validity of the PBA is inconclusive.
The OCS system
The OCS is currently the most studied, functionally based classification system with inter-rater reliability among various stages ranging from moderate to excellent [26]. For the OCS-MCI subcategory, three low risk of bias studies reported strong reliability (FP, AEP, PEP, flexion/lateral shift pattern and multidirectional pattern) [26, 28], with PEP as the most reliable subgroup, and AEP as the least reliable [26]. This review identified two low risk of bias studies demonstrating construct [38] and discriminant validity [44] of OCS-MCI subcategories based on determining and explaining aberrant muscle activity and spinal kinematic changes in participants with NS-CLBP. These studies generally adhered to guidelines for developing and validating classification systems [12, 48,49,50,51,52,53,54,55,56].
MCI test battery
Based on 2 low risk of bias reliability studies [27, 29], 3 individual tests included in the MCI test battery show good to excellent reliability to classify people with NS-CLBP with or without MCI. Two previous systematic reviews concluded similarly [57, 58]. Current evidence suggests clinical use of the "sitting knee extension" test to identify flexion dysfunction, the “one leg stance” test for rotational dysfunction and the “pelvic tilt” test for extension dysfunction are suitable for clinical use based on good–excellent values both for intra- and inter-rater reliability [27, 29]. Because validity of the 10-test battery is based on a single study [45] with a high risk of bias, recommending routine clinical use is premature.
PBA
The PBA consistently recognizes and classifies pain behavior into three categories (none, low, and high). However, evidence is limited to a single study with a high risk of bias [46]. People with no or low levels of pain behavior are likely to benefit from a physically oriented rehabilitation program with little emphasis on psychological and behavioral approaches. Conversely, those with high pain behavior may benefit from programs that emphasize psychological and behavioral factors. This reasoning incorporates a biopsychosocial approach similar to stratified care informed by the 9-item STarT Back questionnaire [59]. Unlike the STarT back questionnaire, the PBA also includes observing movement tasks and behaviors.
Risk of bias assessment of individual studies
All reliability studies for OCS and MCI test battery were rated as high quality. The main risk of bias in reliability studies was the lack of randomizing test order.
Only two OCS validation studies [38, 44] were rated as high-quality largely because they employed expert opinion as a reference standard. However, no currently available objective tests classify function [54]. When no such tests are available, expert opinion, though limited, represents the best available reference standard [60,61,62]. The main risk of bias for the MCI test battery and PBA was the absence of a reference standard. The choice of statistical methods was considered appropriate for all studies; although one study used [41] the Kolmogorov–Smirnov test, which is not recommended for testing normality [63, 64].
Implications for clinical practice
The findings of this review suggest that clinicians can use OCS to reliably classify functional characteristics of patients with NS-CLBP. Upper lumbar and lower thoracic spine kinematic studies offer mechanistic evidence supporting the rationale for assessing MCI. However, evidence supporting the validity of the 10-item MCI test battery is inconclusive because it is available from only 1 high risk of bias study. Because the effectiveness of therapies informed by functional classification is generally unknown, it is unclear if such diagnosis can be used to both inform effective care and/or as an objective measure of condition severity or response to care.
Implication for future research
Standardized assessment protocols for determining MCI require well-defined procedures, operational definitions and quantifiable values. Standardizing these will facilitate more clinically useful findings and the ability to pool data from clinical trials [29]. Included classification systems used sagittal plane MCI assessment. Future studies should consider frontal and transverse planes to more comprehensively assess complex movement strategies. Further studies with lower risk of bias are needed to confirm the clinical usefulness of PBA classification. Finally, RCTs validating the clinical effectiveness of treatments based on functional assessment are needed.
Review strengths and limitations
This review employed methodology consistent with PRISMA guidelines. However, as with all systematic reviews, articles may have been missed in the database searches. A meta-analysis was not feasible, due to heterogeneity in the methodological design and the statistical analyses employed. Only articles in the English language were included. Another limitation is the inclusion low-quality studies (small sample sizes and no reference standards).
Conclusions
Evidence from multiple studies with low risk of bias demonstrates OCS as a reliable classification method. Strong inter-rater reliability also exists for using 3 tests of the 10-item MCI test battery. Evidence for the reliability and validity of the PBA is limited to one study with high risk of bias. While clinicians are encouraged to categorize the functional capacity of patients with NS-LBP using reliable methods, research evidence is not yet available to answer questions about the effectiveness of care informed by such classification.
References
Hartvigsen J, Hancock MJ, Kongsted A et al (2018) What low back pain is and why we need to pay attention. Lancet 391:2356–2367. https://doi.org/10.1016/S0140-6736(18)30480-X
Woolf AD, Pfleger B (2003) Burden of major musculoskeletal conditions. Bull World Health Organ 81:646–656
Buchbinder R, van Tulder M, Öberg B et al (2018) Low back pain: a call for action. Lancet 391:2384–2388. https://doi.org/10.1016/S0140-6736(18)30488-4
da C Menezes Costa L, Maher CG, Hancock MJ, et al (2012) The prognosis of acute and persistent low-back pain: a meta-analysis. CMAJ 184:E613–E624. https://doi.org/10.1503/cmaj.111271
Burton AK, McClune TD, Clarke RD, Main CJ (2004) Long-term follow-up of patients with low back pain attending for manipulative care: outcomes and predictors. Man Ther 9:30–35. https://doi.org/10.1016/s1356-689x(03)00052-3
Wáng YXJ, Wu A-M, Ruiz Santiago F, Nogueira-Barbosa MH (2018) Informed appropriate imaging for low back pain management: a narrative review. J Orthop Transl 15:21–34. https://doi.org/10.1016/j.jot.2018.07.009
Hancock MJ, Maher CG, Latimer J et al (2007) Systematic review of tests to identify the disc, SIJ or facet joint as the source of low back pain. Eur Spine J 16:1539–1550. https://doi.org/10.1007/s00586-007-0391-1
Maher C, Underwood M, Buchbinder R (2017) Non-specific low back pain. Lancet 389:736–747. https://doi.org/10.1016/S0140-6736(16)30970-9
Balagué F, Mannion AF, Pellisé F, Cedraschi C (2012) Non-specific low back pain. Lancet 379:482–491. https://doi.org/10.1016/S0140-6736(11)60610-7
Vining RD, Minkalis AL, Shannon ZK, Twist EJ (2019) Development of an evidence-based practical diagnostic checklist and corresponding clinical exam for low back pain. J Manipulative Physiol Ther 42:665–676. https://doi.org/10.1016/j.jmpt.2019.08.003
Patel S, Psychol C, Friede T et al (2012) Systematic review of randomized controlled trials of clinical prediction rules for physical therapy in low back pain. Spine. https://doi.org/10.1097/BRS.0b013e31827b158f
Amundsen PA, Evans DW, Rajendran D et al (2018) Inclusion and exclusion criteria used in non-specific low back pain trials: a review of randomised controlled trials published between 2006 and 2012. BMC Musculoskelet Disord 19:113. https://doi.org/10.1186/s12891-018-2034-6
Foster NE, Hill JC, Hay EM (2011) Subgrouping patients with low back pain in primary care: are we getting any better at it? Man Ther 16:3–8. https://doi.org/10.1016/j.math.2010.05.013
Petersen T, Laslett M, Thorsen H et al (2003) Diagnostic classification of non-specific low back pain. A new system integrating patho-anatomic and clinical categories. Physiother Theory Pract 19:213–237. https://doi.org/10.1080/09593980390246760
Vining R, Potocki E, Seidman M, Morgenthal P (2013) An evidence-based diagnostic classification system for low back pain. J Can Chiropr Assoc 57:189–204
Spitzer WO, LeBlanc FE, Dupuis M, Abenhaim L, Belanger AY, Bloch R, Bombardier C, Cruess RL, Drouin G, Duval-Hesler N, Laflamme J, Lamoureux G, Nachemson A, Page JJ, Rossignol M, Salmi LR, Salois-Arsenault S, Suissa SW-DS (1987) Scientific approach to the assessment and management of activity-related spinal disorders. A monograph for clinicians. Report of the Quebec Task Force on Spinal Disorders. Spine 12:S1-59
Alrwaily M, Timko M, Schneider M et al (2016) Treatment-based classification system for low back pain: revision and update. Phys Ther 96:1057–1066. https://doi.org/10.2522/ptj.20150345
Cosio D, Lin E (2018) Role of active versus passive complementary and integrative health approaches in pain management. Glob Adv Heal Med 7:216495611876849. https://doi.org/10.1177/2164956118768492
Alhowimel A, AlOtaibi M, Radford K, Coulson N (2018) Psychosocial factors associated with change in pain and disability outcomes in chronic low back pain patients treated by physiotherapist: a systematic review. SAGE Open Med. https://doi.org/10.1177/2050312118757387
Booth A, Clarke M, Dooley G et al (2012) The nuts and bolts of PROSPERO: an international prospective register of systematic reviews. Syst Rev 1:2. https://doi.org/10.1186/2046-4053-1-2
Moher D, Liberati A, Tetzlaff J, Altman DG (2009) Preferred reporting items for systematic reviews and meta-analyses: the PRISMA statement. PLoS Med 6:e1000097. https://doi.org/10.1371/journal.pmed.1000097
Brink Y, Louw QA (2012) Clinical instruments: reliability and validity critical appraisal. J Eval Clin Pract 18:1126–1132. https://doi.org/10.1111/j.1365-2753.2011.01707.x
May S, Littlewood C, Bishop A (2006) Reliability of procedures used in the physical examination of non-specific low back pain: a systematic review. Aust J Physiother 52:91–102. https://doi.org/10.1016/S0004-9514(06)70044-7
May S, Chance-Larsen K, Littlewood C et al (2010) Reliability of physical examination tests used in the assessment of patients with shoulder problems: a systematic review. Physiotherapy 96:179–190
Barrett E, McCreesh K, Lewis J (2014) Reliability and validity of non-radiographic methods of thoracic kyphosis measurement: a systematic review. Man Ther 19:10–17. https://doi.org/10.1016/j.math.2013.09.003
Vibe Fersum K, O’Sullivan PB, Kvale A, Skouen JS (2009) Inter-examiner reliability of a classification system for patients with non-specific low back pain. Man Ther 14:555–561. https://doi.org/10.1016/j.math.2008.08.003
Luomajoki H, Kool J (2007) Reliability of movement control tests in the lumbar spine. BMC Musculoskelet Disord 8:90. https://doi.org/10.1186/1471-2474-8-90
Dankaerts W, O’Sullivan PB, Straker LM et al (2006) The inter-examiner reliability of a classification method for non-specific chronic low back pain patients with motor control impairment. Man Ther 11:28–39. https://doi.org/10.1016/j.math.2005.02.001
Enoch F, Kjaer P, Elkjaer A et al (2011) Inter-examiner reproducibility of tests for lumbar motor control. BMC Musculoskelet Disord 12:114. https://doi.org/10.1186/1471-2474-12-114
O’Sullivan PB, Mitchell T, Bulich P et al (2006) The relationship beween posture and back muscle endurance in industrial workers with flexion-related low back pain. Man Ther 11:264–271. https://doi.org/10.1016/j.math.2005.04.004
O’Sullivan K, Verschueren S, Van Hoof W et al (2013) Lumbar repositioning error in sitting: healthy controls versus people with sitting-related non-specific chronic low back pain (flexion pattern). Man Ther 18:526–532. https://doi.org/10.1016/j.math.2013.05.005
O’Sullivan PB, Beales DJ, Beetham JA et al (2002) Altered motor control strategies in subjects with sacroiliac joint pain during the active straight-leg-raise test. Spine 27:E1-8. https://doi.org/10.1097/00007632-200201010-00015
O’Sullivan PB, Burnett A, Floyd AN et al (2003) Lumbar repositioning deficit in a specific low back pain population. Spine 28:1074–1079. https://doi.org/10.1097/01.BRS.0000061990.56113.6F
Hungerford B, Gilleard W, Hodges P (2003) Evidence of altered lumbopelvic muscle recruitment in the presence of sacroiliac joint pain. Spine 28:1593–1600. https://doi.org/10.1097/00007632-200307150-00022
Burnett A, Cornelius M, Dankaerts W, O’Sullivan P (2004) Spinal kinematics and trunk muscle activity in cyclists: a comparison between healthy controls and non-specific chronic low back pain subjects—a pilot investigation. Man Ther 9:211–219. https://doi.org/10.1016/j.math.2004.06.002
Dankaerts W, O’Sullivan P, Burnett A, Straker L (2006) Differences in sitting postures are associated with nonspecific chronic low back pain disorders when patients are subclassified. Spine 31:698–704. https://doi.org/10.1097/01.brs.0000202532.76925.d2
Dankaerts W, O’Sullivan P, Burnett A, Straker L (2006) Altered patterns of superficial trunk muscle activation during sitting in nonspecific chronic low back pain patients: importance of subclassification. Spine 31:2017–2023. https://doi.org/10.1097/01.brs.0000228728.11076.82
Dankaerts W, O’Sullivan P, Burnett A et al (2009) Discriminating healthy controls and two clinical subgroups of nonspecific chronic low back pain patients using trunk muscle activation and lumbosacral kinematics of postures and movements: a statistical classification model. Spine 34:1610–1618. https://doi.org/10.1097/BRS.0b013e3181aa6175
Beales DJ, Ther MM, O’Sullivan PB, Briffa NK (2009) Motor control patterns during an active straight leg raise in chronic pelvic girdle pain subjects. Spine 34:861–870. https://doi.org/10.1097/BRS.0b013e318198d212
Sheeran L, Sparkes V, Caterson B et al (2012) Spinal position sense and trunk muscle activity during sitting and standing in nonspecific chronic low back pain: classification analysis. Spine 37:E486–E495. https://doi.org/10.1097/BRS.0b013e31823b00ce
Van Hoof W, Volkaerts K, O’Sullivan K et al (2012) Comparing lower lumbar kinematics in cyclists with low back pain (flexion pattern) versus asymptomatic controls—field study using a wireless posture monitoring system. Man Ther 17:312–317. https://doi.org/10.1016/j.math.2012.02.012
Hemming R, Sheeran L, van deursen R, Sparkes V, (2019) Investigating differences in trunk muscle activity in non-specific chronic low back pain subgroups and no-low back pain controls during functional tasks: a case-control study. BMC Musculoskelet Disord 20:459. https://doi.org/10.1186/s12891-019-2843-2
Hemming R, Sheeran L, van Deursen R, Sparkes V (2017) Non-specific chronic low back pain: differences in spinal kinematics in subgroups during functional tasks. Eur Spine J. https://doi.org/10.1007/s00586-017-5217-1
Sheeran L, Sparkes V, Whatling G et al (2019) Identifying non-specific low back pain clinical subgroups from sitting and standing repositioning posture tasks using a novel cardiff Dempster–Shafer theory classifier. Clin Biomech. https://doi.org/10.1016/j.clinbiomech.2019.10.004
Biele C, Moller D, von Piekartz H et al (2019) Validity of increasing the number of motor control tests within a test battery for discrimination of low back pain conditions in people attending a physiotherapy clinic: a case–control study. BMJ Open 9:e032340. https://doi.org/10.1136/bmjopen-2019-032340
Meyer K, Klipstein A, Oesch P et al (2016) Development and validation of a pain behavior assessment in patients with chronic low back pain. J Occup Rehabil 26:103–113. https://doi.org/10.1007/s10926-015-9593-2
Landis JR, Koch GG (1977) The measurement of observer agreement for categorical data data for categorical of observer agreement. Biometrics 33:159–174
Ford J (2003) A systematic review on methodology of classification system research for low back pain. In: Musculoskeletal physiotherapy Australia 13th biennial conference, Sydney, Australia, 2003
Anderson JA (1977) Problems of classification of low-back pain. Rheumatol Rehabil 16:34–36. https://doi.org/10.1093/rheumatology/16.1.34
Deyo RA, Haselkorn J, Hoffman R, Kent DL (1994) Designing studies of diagnostic tests for low back pain or radiculopathy. Spine 19:2057S-2065S. https://doi.org/10.1097/00007632-199409151-00007
Fairbank JCT, Pynsent PB (1992) Syndromes of back pain and their classification. In: The Lumbar spine and back pain. Edinburgh: Churchill Livingstone
Petersen T, Thorsen H, Manniche C, Ekdahl C (1999) Classification of non-specific low back pain: a review of the literature on classifications systems relevant to physiotherapy. Phys Ther Rev 4:265–281. https://doi.org/10.1179/108331999786821690
Ford J, Story I, O’Sullivan P, McMeeken J (2007) Classification systems for low back pain: a review of the methodology for development and validation. Phys Ther Rev 12(33–42):10p
Woolf CJ, Bennett GJ, Doherty M et al (1998) Towards a mechanism-based classification of pain. Pain 77:227–229
McCarthy CJ, Arnall FA, Strimpakos N et al (2004) The biopsychosocial classification of non-specific low back pain: a systematic review. Phys Ther Rev 9:17–30. https://doi.org/10.1179/108331904225003955
Fairbank J, Gwilym S, France J, Daffner S (2011) The role of classification of chronic low back pain. Spine 1:36. https://doi.org/10.1097/BRS.0b013e31822ef72c
Salvioli S, Pozzi A, Testa M (2019) Movement control impairment and low back pain: state of the art of diagnostic framing. Medicina (Kaunas). https://doi.org/10.3390/medicina55090548
Carlsson H, Rasmussen-Barr E (2013) Clinical screening tests for assessing movement control in non-specific low-back pain. A systematic review of intra-and inter-observer reliability studies. Man Ther 18:103–110. https://doi.org/10.1016/j.math.2012.08.004
Murphy SE, Blake C, Power CK, Fullen BM (2016) Comparison of a stratified group intervention (STarT back) with usual group care in patients with low back pain: a nonrandomized controlled trial. Spine 41:645–652. https://doi.org/10.1097/BRS.0000000000001305
Mjøsund HL, Boyle E, Kjaer P et al (2017) Clinically acceptable agreement between the ViMove wireless motion sensor system and the Vicon motion capture system when measuring lumbar region inclination motion in the sagittal and coronal planes. BMC Musculoskelet Disord 18:124. https://doi.org/10.1186/s12891-017-1489-1
Gracovetsky S, Newman N, Pawlowsky M et al (1995) A database for estimating normal spinal motion derived from noninvasive measurements. Spine 20:1036–1046. https://doi.org/10.1097/00007632-199505000-00010
Mannion AF, Knecht K, Balaban G et al (2004) A new skin-surface device for measuring the curvature and global and segmental ranges of motion of the spine: reliability of measurements and comparison with data reviewed from the literature. Eur Spine J 13:122–136. https://doi.org/10.1007/s00586-003-0618-8
Öztuna D, Elhan AH, Tüccar E (2006) Investigation of four different normality tests in terms of type 1 error rate and power under different distributions. Turkish J Med Sci 36:171–176
Thode HC (2002) Statistics: textbooks and monographs 164 Testing for normality. CRC Press, New York, NY
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors have no conflicts of interest to declare that are relevant to the content of this article.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendices
Appendix 1
Search strategies of the searched databases and journals
Database/journal | Last citation no | Keywords |
---|---|---|
PubMed | 364 | (((((Non specific OR non-specific OR nonspecific OR mechanical))) AND ((low back pain OR simple backache OR lumbar strain OR spinal degeneration))) AND ((clinical test OR clinical examination OR clinical sign))) AND ((valid* OR reliabl*)) simple search |
EMbase | 738 | (clinical) AND (test* OR exam* OR sign*) AND (non-specific OR nonspecific OR 'non specific' OR mechanical OR simple) AND (low back pain OR back pain OR LBP) AND (reliab* OR valid*) in English only and limited to human plus searching in EMbase only |
Cochrane | 92 | (Non specific or non-specific or nonspecific or mechanical) and (low back pain or simple backache or lumbar strain or spinal degeneration) and (clinical test or clinical examination or clinical sign) and (valid* or reliabl*) in search manager choose in Trials, Methods Studies, Technology Assessments and Economic Evaluations (Word variations have been searched) |
PEDro | 226 | Non specific low back pain (abstract and title) in advanced search (method clinical trials) |
CINHAL | 286 | (Non specific OR non-specific OR nonspecific OR mechanical) AND (low back pain OR simple backache OR lumbar strain OR spinal degeneration) AND (clinical test OR clinical examination OR clinical sign) AND (valid* OR reliabl*) in advanced search |
ProQuest | 358 | (("non specific" OR "non-specific" OR "nonspecific" OR "mechanical back pain") AND ("back pain" OR "lumbar strain" OR "simple backache") AND ("clinical test" OR "clinical examination" OR "clinical sign") AND ("valid*" OR "reliab*")) AND la.exact("ENG") |
Physical therapy journal | 460 | "non specific" "non-specific" "nonspecific" "mechanical back pain" "back pain" "lumbar strain" "simple backache" "clinical test" "clinical examination" "clinical sign" "valid*" "reliab*" |
Chiroindex | 79 | "non specific" "non-specific" "nonspecific" "mechanical back pain" "back pain" "lumbar strain" "simple backache" "clinical test" "clinical examination" "clinical sign" "valid*" "reliab*" |
Australian journal of physiotherapy | 54 | Non specific in Title/Abs/Keywords OR nonspecific inTitle/Abs/Keywords OR non-specific in Title/Abs/Keywords AND Low Back Pain inTitle/Abs/Keywords OR Mechanical low back pain in Title/Abs/Keywords OR simple backache in Title/Abs/Keywords |
Canadian physiotherapy In advanced search | 113 | Non specific OR non-specific OR nonspecific OR mechanical AND low back pain AND clinical tests OR clinical examination OR clinical sign AND valid* OR reliabl* |
physiotherapy theory and practice journal | 113 | Non specific OR non-specific OR nonspecific OR mechanical AND low back pain AND clinical test OR clinical examination OR clinical sign AND valid* OR reliabl* |
Appendix 2
Systematic review critical appraisal tool (Reproduced from Brink and Louw (2011))
Item 1: If human subjects were used, did the authors give a detailed description of the sample of subjects used to perform the (index) test on?
Why the criterion should be evaluated: The validity and reliability of a test will be affected by the sample characteristics or composition, and therefore, the study has to report on the sample characteristics because the validity and reliability scores will then only be applicable to that particular population. A study does not contribute to validity and reliability testing if the subjects were not recruited appropriately |
This item can be scored yes if: |
1 the sample characteristics (e.g., height, weight, age, diagnosis and symptom status) were described or the manner of recruiting subjects was stated or if selection criteria were applied |
If none of the above have been described or if insufficient information was provided, select “no.” If inhuman or inanimate objects were used, select N/A |
Item 2: Did the authors clarify the qualification, or competence of the rater(s) who performed the (index) test?
Why the criterion should be evaluated: The amount of experience of the rater(s), performing the (index) test, will influence the validity and reliability scores and needs to be explained |
This item can be scored yes if: |
1 the rater(s) characteristics (e.g., qualification, specialization and amount of experience using the instrument under investigation) have been described |
If the above have not been described or insufficient information was provided, select “no” |
Item 3: Was the reference standard explained?
Why the criterion should be evaluated: The index test scores need to be compared to the scores obtained from the reference standard in order to test validity, and therefore, the reference standard needs to be explained appropriately |
This item can be scored yes if: |
1 the reference standard is likely to produce correct measurements; |
2 the reference standard is the best method available; and |
3 details (name of the instrument, references to the accuracy of the instrument) of the reference standard are reported |
If none of the above is applicable to the reference standard’s description, then select “no” |
Item 4: If inter-rater reliability was tested, were raters blinded to the findings of other raters?
Why the criterion should be evaluated: When raters have access to the findings of other raters, it compromises the quality of the reliability testing procedure by inflating the agreement among the raters, and therefore, blinding needs to be performed |
This item can be scored yes if: |
1 it is stated that the raters were blinded to each other’s findings or if a description that implies that the raters were blinded was reported |
If no information is provided, then select “no.” If intra-rater reliability was examined, then select “N/A” |
Item 5: If intra-rater reliability was tested, were raters blinded to their own prior findings of the test under evaluation?
Why the criterion should be evaluated: If raters have knowledge of their prior own findings, it will influence the findings of their repeated measurements and could inflate the rater agreement, and therefore, appropriate measures, depending on the characteristics or the study design of the research study, need to be applied to ensure blinding |
This item can be scored yes if: |
1 rater(s) has/have examined the same subjects on more than one occasion, it should be stated whether the rater(s) was/were blinded to the subjects they have examined previously |
If insufficient information is provided, then select “no.” If inter-rater reliability was examined, then select “N/A” |
Item 6: Was the order of examination varied?
Why the criterion should be evaluated: If the order is varied, in which the raters examine the subjects when inter-rater reliability is tested, it reduces the risk of systematic bias. If the order is varied in which subjects are examined by one rater when intra-rater reliability is tested, it reduces the risk of the rater recalling the previous test scores and reduces bias |
This item can be scored yes if: |
1 the order in which subjects were tested varied between raters if inter-rater reliability was tested; |
2 the order of subjects was varied when intra-rater reliability was tested |
If insufficient information is provided, then select “no.” If varied order of examination is unnecessary or impractical (e.g., rater(s) digitizing or reading X-rays) then select “N/A” |
Item 7: If human subjects were used, was the time period between the reference standard and the index test short enough to be reasonably sure that the target condition did not change between the two tests?
Why the criterion should be evaluated: The index test and the reference standard should be performed at the same time; however, this is not always possible. It becomes important to know whether it is possible that the test variable did not change between the two tests, otherwise it will affect the index test’s validity performance |
This item can be scored yes if: |
1 result from the index test and the reference standard were collected on the same subjects at the same time; |
2 a delay between measurements occurs, it is important that the target condition should not change between measurements |
If the time period between performing the index test and the reference standard was sufficiently long that the target condition may have changed between the two tests or if insufficient information is provided, then select “no.” If inhuman or inanimate objects were used, then select N/A |
Item 8: Was the stability (or theoretical stability) of the variable being measured considered when determining the suitability of the time interval between repeated measures?
Why the criterion should be evaluated: For reliability, the test variable should not change between repeated measures, otherwise it will decrease the amount of agreement obtained between and within the rater(s) |
This item can be scored yes if: |
1 the stability of the variable is known or reported, and reviewers then decide on an appropriate time interval between repeated measures (stability of a test variable can only be determined if there is a reference standard); |
2 there is no reference standard, then the reviewers should agree upon the theoretical stability of the variable and decide on an appropriate time interval between repeated measures |
If insufficient information is provided, then select “no” |
Item 9: Was the reference standard independent of the index test?
Why the criterion should be evaluated: If the reference standard and the index test are not independently performed, then the index test cannot replace the reference standard on its own |
This item can be scored yes if: |
1 it is clear from the study that the index test did not form part of the reference standard |
If it appears that the index test formed part of the reference standard, then select “no” |
Item 10: Was the execution of the (index) test described in enough detail to permit replication of the test?
Why the criterion should be evaluated: Variations in the execution of the reference standard and the (index) test might affect the agreement between the two tests and it is also important to be able to replicate the same study procedure in another setting when needed |
This item can be scored yes if: |
1 the study reported a clear description of the measurement procedure (e.g., the positioning of the instrument or rater and execution sequence of events); |
2 citations of methodology were supplied |
The extent to which details is expected to be reported depends on the ability of different procedures to influence the results and on the type of instrument or test under evaluation |
If insufficient information is provided, then select “no” |
Item 11: Was the execution of the reference standard described in enough detail to permit its replication?
Why the criterion should be evaluated: For the same reason as item 10 |
This item can be scored yes if: |
1 the study reported a clear description of the measurement procedure (e.g., the positioning of the instrument or rater and execution sequence of events); |
2 citations were supplied |
If insufficient information is provided, then select “no” |
Item 12: Were withdrawals from the study explained?
Why the criterion should be evaluated: The sample composition will influence the validity and reliability performance of the (index) test; therefore, it is important to know whether any withdrawals from the sample might have changed the composition of the sample |
This item can be scored yes if: |
1 it is clear what happened to all subjects who entered the study; |
2 subjects who entered but did not complete the study are considered |
If it appears that subjects who entered but did not complete the study were not accounted for or if insufficient information is provided, then select “no.” If inhuman or inanimate objects were used, then select N/A |
Item 13: Were the statistical methods appropriate for the purpose of the study?
Why the criterion should be evaluated: The aim of validity and reliability studies is to report on an estimate of validity and reliability for the particular test and appropriate statistical methods need to be implemented in order to produce this estimate |
This item can be scored yes if: |
1 the analysis is appropriate in terms of the type of data (e.g., categorical, continuous and dichotomous); |
2 statistical analysis for validity studies incorporates, for example means, differences between measurements, 95% confidence interval and ANOVA; and |
3 statistical analysis for reliability studies incorporates, for example, interclass correlation coefficient and 95% confidence interval |
If the analysis is not appropriate or if insufficient information was provided, then select “no” |
Appendix 3
Classification processes of OCS
Rights and permissions
About this article
Cite this article
Abdelnaeem, A.O., Rehan Youssef, A., Mahmoud, N.F. et al. Psychometric properties of chronic low back pain diagnostic classification systems: a systematic review. Eur Spine J 30, 957–989 (2021). https://doi.org/10.1007/s00586-020-06712-0
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00586-020-06712-0