Introduction

Adult idiopathic inflammatory myopathies (IIM) are a group of rare, systemic autoimmune diseases, which include dermatomyositis (DM), polymyositis (PM), immune-mediated necrotizing myopathies, anti-synthetase syndrome and inclusion body myositis (IBM). While these conditions can affect other organ systems including skin, lungs and joints; they are typically characterized by muscle weakness [1]. The muscle weakness has a negative impact on physical activity and function, and subsequently on activities of daily living in myositis patients [2]. Therefore, it is critical to have patient-centric, reliable, valid and responsive outcome measures to accurately assess these domains in evaluation of patients with myositis both in routine clinic visits and research trials.

There are six myositis core set measures (CSMs) proposed by the International Myositis Assessment and Clinical Studies (IMACS) Group for use in clinical trials and practice [3]. These core set measures include serum creatine kinase (CK) level, patient- and physician-reported global disease activity, an extra-muscular disease activity, manual muscle testing (MMT), and the HAQ Disability Index (HAQ-DI). These core set measures are also used to calculate total improvement score using 2016 ACR/EULAR criteria for clinical response in myositis, which has been increasingly used in myositis clinical trials [4]. Majority of the core set measures capture the disease activity, while HAQ-DI and MMT are used to evaluate physical function and muscle strength, respectively. To complement these core set measures, the Outcome Measure in Rheumatology (OMERACT) Myositis Working Group identified five core domains that can best reflect the life impact of adults living with myositis [5•]. These core domains include fatigue, pain, level of physical activity, physical function and muscle symptoms.

In this review, we will describe the currently available measures of muscle strength, endurance, physical function and physical activity in adult inflammatory myopathies in regard to clinical applications and psychometric properties (Table 1; IBM and juvenile myositis are beyond the scope of this article [6]).

Table 1 Summary of psychometric properties of outcome measures used to assess muscle strength, endurance, physical activity, and physical function in adult idiopathic inflammatory myositis

Muscle Strength and Endurance Assessment

Muscle strength and endurance are domains of the health-related physical fitness [7]. Muscle strength is defined as the voluntary ability of muscle to exert maximal force, while endurance is the ability of muscle to continue to exert submaximal force without fatigue [8]. Therefore, muscle strength is usually assessed by measuring the maximal force against resistance, whereas muscle endurance is measured by the number of repetitions that can be performed by a specific muscle group. Several outcome measures which will be discussed under physical activity and function assessment also test muscle strength and endurance.

Patients with IIM experience muscle weakness and reduced muscle endurance due to degeneration of muscle fibres as a result of inflammation, potential effects of cytokines on contractility and regeneration of muscle fibres, muscle atrophy and scarring [9]. Lower proportion of slow-twitch type I muscle fibres and capillary loss are also thought to contribute to decreased muscle endurance which requires oxygen supply and functional type I fibres [9, 10•]. The most commonly affected muscle groups in IIM are hip flexors, extensors and abductors, followed by neck flexors, shoulder abductors and knee extensors [11]. MMT and hand-held dynamometry are used in assessment of muscle strength in myositis. An adaptation of the Functional Index (FI), FI-3, and one-kilogram arm-lift test predominantly assess muscle endurance.

MMT

Manual muscle testing (MMT) has been used to grade muscle strength since 1916 [12]. The assessment is undertaken by palpation if there is no movement, observing the range of motion if there is movement only on horizontal plane, and asking the patient to exert maximal force against the break-force applied by the examiner if there is movement against gravity. The Medical Research Council (MRC) grade and Kendall scale are the most commonly used approaches for scoring [6]. The MRC grading ranges from 0 to 5 (modified for 10-point scale by using + and −), whereas Kendall score is on the 10-point scale. Various MMT scores have been used depending on the number and type of muscle groups included. MMT is feasible, widely recognized, inexpensive, easily performed and requires no equipment.

MMT is one of the IIM CSMs and has been the method of choice for assessing muscle strength in the majority of IIM clinical trials. However, MMT used in these studies differs in regard to the number and type of muscle groups included and the use of 5-point MRC vs Kendall scale. In this landmark study by Rider et al., psychometric properties of several MMT scores using the 10-point Kendall scale were tested in a large group of patients with adult and juvenile myositis [13]. These included total MMT with 13 bilateral muscle groups (neck flexor and extensors, trapezius, deltoid, biceps, iliopsoas, gluteus maximus and medius, quadriceps, wrist flexor and extensors, ankle dorsi- and plantar flexors; maximum score of 240), proximal MMT with 7 bilateral muscle groups, 144 different subsets of MMT6 (maximum score of 60; unilateral), and 96 subsets of MMT8 (maximum score of 80; unilateral) with each subset including different muscle groups. This study showed adequate internal reliability and construct validity of total MMT, proximal MMT and several subsets of MMT6 and MMT8 for use in adult and juvenile myositis. MMT6 and MMT8 subsets not only performed as well as the total and proximal MMT scores, but also could be more responsive to detect change over time and were more time efficient. At the end of the nominal group ranking exercise, MMT8 subset which included neck flexors, deltoid, biceps, gluteus maximus, gluteus medius, quadriceps, wrist extensors and ankle dorsiflexors, was ranked as the best considering the affected muscle groups in myositis, contribution to functional limitations of patients, ease of testing and psychometric properties.

Apart from the advantages of the MMT, several concerns remain in regard to the subjectivity of the assessment based on the tester resistance and tester experience [14, 15]. It has been suggested that rheumatologists, for example, typically obtain higher scores than experienced physical therapists [6]. The MMT has shown to have a ceiling effect with difficulty capturing mild muscle weakness [16•, 17•]. Myositis patients with normal MMT scores were shown to have a significant variability in hand-held dynamometry scores, HAQ-DI, functional measure and patient-reported disease activity scores [16•]. Limited sensitivity of MMT to detect mild muscle weakness poses important challenges to clinicians. Additionally, scoring errors can be seen due to patient positioning and commands used. For example, adequate stabilization and applying resistance at the correct point are key in capturing the maximal force generated by the tested muscle group [18].

HHD

As an objective and quantitative tool, hand-held dynamometry (HHD) has been used in the assessment of muscle strength in myositis since around 1980s [19,20,21]. HHD is a portable, compact, often battery-operated device (Fig. 1) and has become more affordable over time for use in clinics. There are two ways to use HHD for muscle strength measurement, which are known as “make” and “break” tests. During the “make” test, the examiner holds the device stable as the patient exerts the maximum force against the device, whereas during the “break” test, the examiner exerts force to overcome the patient’s maximum force [18]. “Break” test was also shown to be more accurate than the “make” test with 1.3–1.5 times higher scores in patients with IIM [19].

Fig. 1
figure 1

Hand-held dynamometer used in clinical practice for the assessment of muscle function in IIM a grip strength component attached and b concave component attached for the measurement of other muscle groups. Model pictured: Baseline® Hydraulic Hand Dynamometer (Fabrication Enterprises, Inc., Elmsford, NY)

There is currently no standardization on how to obtain the final HHD score in IIM. For example, the number of attempts to test a muscle group varies from single attempt to multiple attempts per muscle group. In a study with 50 myositis patients, each muscle group was tested three times, and there was no significant difference in HHD score among three attempts [16•]. In another study, both inter-rater and intra-rater reliability was higher when average of the two attempts were used instead of the maximum of two attempts [17•]. These results suggest that single attempt could be an acceptable and more efficient alternative to three attempts, and if > 1 attempt was obtained, averaging could provide more reliable results than the maximum. Total HHD score was usually obtained by the sum of all the scores divided by the number of muscle groups tested [16•, 17•].

Several studies showed good intra-rater reliability of HHD in myositis patients [16•, 17•, 20]. In regard to inter-rater reliability, one study showed strong inter-rater reliability in all the 13 muscle groups tested in stable myositis patients [20]. Another study demonstrated variation in inter-rater reliability among different muscle groups with excellent reliability for shoulder abduction, elbow flexion and knee extension, and fair-to-good for ankle extension, hip abduction and extension, wrist extension and neck flexion [17•].

In a study by Laing et al., myositis patients were divided into groups based on MMT scores, and each patient underwent strength testing with both fixed dynamometry and HHD [19]. The results of fixed dynamometry and HHD (with both make and break tests) were similar in weak patients, whereas there was a significant difference between fixed dynamometry, and HHD in strong patients (MRC 4 + and − 5). This result suggests that holding the device steady or overcoming the patient’s force in strong patients could be influenced by the examiner’s strength and may explain this difference in strong patients. Therefore, more studies are required to further assess the inter-rater reliability of HHD with testers with different strengths.

Regarding construct validity, HHD showed moderate correlations with MMT, physician global disease activity, HAQ-DI, muscle disease activity, sit-to-stand and 6-min-walk tests (0.40–0.59), and poor correlations with patient global disease activity, SF-36 physical function and timed up-and-go tests (0.30–0.37) [16•]. HHD was also found to be responsive to change with correlations with total improvement score using 2016 ACR/EULAR myositis response criteria, change in MMT, physician global disease activity and HAQ-DI [16•].

Currently, there is no consensus on which muscle groups to test with HHD in IIM. One study included bilateral shoulder abduction and hip flexion muscle groups; another included shoulder abduction, knee extension and grip strength groups and others included the same muscle groups as in MMT8 [16•, 17•, 22•]. Addition of hand grip muscle group was justified by previous research showing that the grip strength of those living with IIM was significantly lower than in age-matched controls and correlated significantly with ability to perform domestic activities [2].

In summary, HHD is a promising, quantitative tool with adequate validity, reliability, responsiveness to change and no ceiling effect in myositis patients. Standardized protocols that include the muscle groups that should be tested, testing position, type of device, the number of testing attempts and data on normal range in healthy controls are required for optimal use of HHD in clinical studies.

FI-3

The Functional-Index 3 (FI-3) was the first functional impairment outcome measure developed specifically for patients with PM and DM [23]. It is based on repetitive movements involving selected muscle groups to capture decreased muscle endurance and hence a more sensitive detection of weakness that takes fatigue into account. The original version involved 14 tasks and around 1 h to complete. An updated FI-2, comprising 7 tasks and suggested to take 21–33 min to complete, was revised and found to have good inter- and intra-rater reliability and construct validity [24]. Recently, the FI-2 was revised to FI-3 to shorten the administration time [10•]. The FI-3 consists of 3 tasks: shoulder flexion, head lift (neck flexion) and hip flexion [10•]. The patients are given maximum 3 min for each task and asked to perform as many repetitions as possible in this time frame. Scoring of FI-3 ranges from 0 to 60 with 60 indicating normal muscle endurance. The completion of the FI-3 takes 9–15 min in all assessment situations.

The FI-3 was shown to have good to excellent inter-rater and intra-rater reliability for all tasks [10•]. There were also moderate-strong correlations between FI-3 scores and the Myositis Activities Profile (MAP) and HAQ scores indicating good construct validity [10•]. Further studies are required to assess the responsiveness of change and minimal clinical important difference of FI-3 to inform its use in clinical practice and trials.

The One-Kilogram Arm-Lift Test

The 1-kg arm-lift test was devised as a measure for use in IIM in 2006 with justification of this particular test related to the finding of shoulder girdle, neck and proximal upper limb as the most affected muscle groups in IIM [25]. The test involves repeated lifts of a 1-kg weight held in the subjects’ hands while sitting and recommends a comfortable pace or rhythm to the repetitions, rather than trying to achieve a maximum possible number of repetitions in the 30 s allotted time for the test. It can be seen that the aims of this test are well covered in the FI-2 (released that same year — 2006) and FI-3, and hence it is anticipated that 1-kg arm-lift test has now been fully replaced by the FI-2 and FI-3. A Delphi review of 15 experts concluded that there were better options for assessing muscle strength and endurance than 1-kg arm-lift test [26].

Physical Function Assessment

Physical function is defined as the ability to perform basic and instrumental activities of daily living. Therefore, the measurement of physical function not only gives important information on impact of the disease on daily activities, but also on physical activity (defined as any bodily movement that requires energy expenditure), muscle strength and endurance of the individuals. Physical function tests include self-reported questionnaires and functional tests with specific tasks.

Self-reported physical function questionnaires used in IIM include HAQ-DI, SF-36 physical functioning scale, PROMIS physical function forms and MAP. Functional tests used in IIM include 6-min walk, sit-to-stand and timed up-and-go tests. Given that the goal of measurement of physical function is to assess the impact of IIM on daily activities, the most commonly impaired daily activities in IIM should be assessed with these available tools. In a study with 183 patients with DM and PM, 35–48% of the patients reported requiring aid from caregivers in errands and chores, followed by gripping and opening, reaching, dressing and walking [27].

HAQ-DI

The Health Assessment Questionnaire-Disability Index (HAQ-DI) is a patient-reported outcome measure of physical function and one of the six myositis CSMs proposed by the IMACS. HAQ-DI was developed by Bruce and Fries in the 1980s and has been widely used across several rheumatic diseases [28•]. HAQ-DI consists of 20 items assessing the difficulty in performing daily activities. Each item is scored from 0 (without any difficulty) to 3 (unable to do) with scores below 0.5 being considered as normal.

Although commonly used in myositis, HAQ-DI has not been thoroughly studied in patients with IIM. In a study by Saygin et al., HAQ-DI correlated strongly with muscle disease activity, MMT, PROMIS PF-20, SF-36 PF10, moderately with physician and patient reported disease activity, HHD, fatigue, pain, sit-to-stand and 6-min-walk tests, and weakly with timed up-and-go test showing good construct validity of HAQ-DI in IIM [29]. Internal consistency was found to be excellent with potential concern for redundancy. HAQ-DI was also found to have ceiling effect indicating less sensitivity for patients with high levels of functioning. Changes in HAQ-DI correlated significantly with total improvement score and changes in other core set measures along with large effect size in patients with moderate-major improvement per 2016 ACR/EULAR myositis response criteria [29]. These results support responsiveness to change of the HAQ-DI in IIM. Further studies on content and face validity of the HAQ-DI tool in myositis should be conducted to assess the relevance of the questions for IIM.

SF-36 PF10

The 36-Item Short-Form Survey Physical Functioning Scale (SF-36) was developed in the 1980s as a patient-reported outcome measure of health-related quality of life [30]. The survey consists of 36 questions and 8 domains with two summary measures known as physical health and mental health. SF-36 PF10 is one of the eight domains tested in SF36 under physical health summary measure. SF-36 PF10 contains 10 questions with each question asking the degree of limitation in 10 daily activities. There are three response options for each question, as following: “yes, limited a lot”, “yes, limited a little” and “no, not limited at all”.

SF-36 PF10 correlated with HAQ-DI, MMT, CT-based midthigh muscle density and accelerometer-measured physical activity levels in patients with IIM [31, 32, 33•, 34•]. SF-36 PF10 also correlated strongly with patient reported global disease activity, fatigue VAS, PROMIS PF-20, moderately with muscle disease activity, physician reported disease activity, MMT, pain VAS, sit-to-stand and 6-min-walk tests and weakly with HHD and timed up-and-go test supporting good construct validity [29]. SF36 PF10 did not show any ceiling or floor effect and had excellent internal consistency [29].

Changes in SF36 PF10 correlated significantly with total improvement score and changes in other core set measures along with large effect size in patients with moderate-major improvement per 2016 ACR/EULAR myositis response criteria suggesting good responsiveness [29]. Further studies are required to assess content validity and test–retest reliability of SF36 PF10 in myositis.

PROMIS Physical Function

The Patient Reported Measurement Information System (PROMIS) data bank offers domain specific patient reported outcome measures covering areas such as pain, fatigue, physical functioning, emotional distress and social role participation. PROMIS physical function item bank includes 154 items, which are rigorously tested for clarity, translatability, specificity and bias. PROMIS physical function instruments were derived from the item bank using item response theory. Fixed SF instruments include PROMIS PF − 4, 6, 8, 10 and 20 based on number of questions. Each question asks about present time and has 5 response options.

Content validity of the PROMIS physical function short form 8b (PF-8b) was investigated by the OMERACT myositis study group [5•]. PROMIS 8b was completed by 10 patients, who reported that questions were clear, and easy to read, understand and complete. Preliminary results of the OMERACT-led study suggest strong test–retest reliability of PROMIS PF-8b, moderate correlations with PROMIS depression 4a, MAP and International Physical Activity Questionnaire (IPAQ) supporting good construct validity [35].

Another PROMIS physical function form with 20 questions (PF-20), named PROMIS PF-20, was also shown to have strong test–retest reliability, and moderate to strong correlations with muscle disease, physician reported, and patient reported disease activity, MMT, HHD, fatigue, pain, HAQ-DI, SF-36 PF10, 6MWD, TUG and STS tests in patients with IIM supporting good construct validity [29]. PROMIS PF-20 had excellent internal consistency with potential concern for redundancy with no ceiling or floor effect [29]. Responsiveness to change of PROMIS PF-20 was good with moderate correlations with total improvement score and change in other core set measures, along with large effect size in patients who had moderate-major improvement per 2019 ACR/EULAR myositis response criteria.

Based on the available data, PROMIS PF forms are robust patient-reported outcome measures that can be used in routine clinic and myositis clinical trials. Computerized forms of PROMIS physical function which may offer higher precision and efficiency are yet to be studied in patients with adult IIM.

MAP

Myositis Activities Profile (MAP) was developed in 2002 specifically for patients with polymyositis and dermatomyositis in 2002 [36]. Initial version of the MAP (Swedish) included 31 items with 4 subscales and 4 single questions [36]. One additional question was added and one question was changed based on patient interviews for content validity in 2012 (US version) [37]. Each item is a daily activity which is scored on a 10-point VAS based on difficulty and importance of the activity for the patient. MAP showed good test–retest reliability and internal consistency in both Swedish and US cohorts with IIM [36, 37]. MAP had good construct validity with moderate correlations with HAQ-DI, physician disease activity score and Myositis Intention to Treat Index, and weak correlations with MMT, FI-2, extra-muscular disease activity and CK levels [37]. MAP (32 questions, US version) was reported to take approximately 5 min to complete.

Given established test–retest reliability and construct validity, specificity for myositis, and patient involvement in its development, MAP is a promising functional measure. Further studies are needed to investigate the responsiveness to change properties of MAP to be used in clinic practice and trials. Additionally, MAP with 32 questions is longer compared to other physical function measures described above, therefore may not be efficient in the clinic setting.

Six-Minute-Walk Distance, Timed Up-and-Go, and Sit-to-Stand Tests

Six-minute-walk test (6MWD) is a standardized, self-paced test, which measures the distance walked in 6 min [6, 38]. Timed up-and-go (TUG) test measures the time required to sit up from a standard armchair, walk 3 m at a comfortable pace, walk back, and sit down the chair again [6]. Sit-to-stand (STS) test is the number of times a patient can stand up and sit down in a chair in 30 s [6]. These three tests exclusively assess the physical function of the lower extremities, which are commonly affected in IIM, but may not capture the overall effect of disease on physical function. Application of 6MWD could be challenging in the clinical setting given the time and space requirements.

There is preliminary evidence to suggest good construct validity of the 6MWD, TUG and STS; however, further studies are required to establish the psychometric properties of these tools in adult myositis [39].

Physical Activity Assessment

Physical activity is defined as any bodily movement generated by skeletal muscles that result in energy expenditure [40]. In addition to the direct effects of active muscle disease, other features of IIM including pain, fatigue, reduced cardiorespiratory function and self-reported depression, have the potential to reduce physical activity and therefore may be reflected in patients’ physical activity levels [41, 42]. This suggests that the measurement of physical activity levels could be valuable towards complete understanding of the impact of disease on the patients’ daily life. The existing IIM core set measures focus heavily on disease activity and do not currently include an assessment of physical activity and performance.

Assessment of physical activity can be done via self-report (i.e., physical activity questionnaires) and objective measurements (i.e., wearable motion sensors). Numerous self-reported physical activity questionnaires exist with variable number of questions and different recall periods [43]. International physical activity questionnaire (IPAQ) is one of the most widely used physical activity questionnaires. IPAQ has short- (7 items) and long-versions (27 items) with questions on the time spent in 4 physical activity levels including sitting, walking, moderate activity and vigorous activity in the last 7 days (days/week and min/h/day)[44]. On the other hand, motion sensors such as accelerometers and pedometers, are costly, but provide an objective, concurrent measurement of physical activity. Pedometers measure step count and can estimate the distance travelled, whereas accelerometers measure acceleration of the body in one or more planes as change in velocity over time and provide information about the duration, frequency and intensity of physical activity [45•]. Quantification of physical activity using accelerometers is commonly done by using the following measures: average daily step counts, steps per min (average number of steps in a minute for a given day), peak 1-min cadence (highest step count in a minute for a given day), vector magnitude (sum of movement over lateral, longitudinal and vertical axes per minute) and time spent in different physical activity intensity categories including sedentary, light, moderate and vigorous intensity (min per day) [34•, 45•, 46].

Currently, only a few studies have examined the relationship between physical activity levels and clinical outcomes related to disease activity in adult IIM patients. A study comparing the physical activity levels recorded by Actigraph® accelerometer and IPAQ demonstrated that IPAQ tends to underestimate sedentary and light-physical activity with highly variable biases for moderate-vigorous physical activity levels in patients with JDM [47•]. These results raise concern about the validity of IPAQ particularly for the low levels of activity; however, similar criterion validity studies are lacking in adult IIM. Another study demonstrated significantly lower physical activity levels [total physical activity score, and time spent walking and moderate physical activity] compared to controls suggesting good discriminant validity of IPAQ in adult IIM. Further studies are required to demonstrate the concurrent and construct validity and responsiveness of IPAQ as a self-reported physical activity measure in myositis [33•].

Of the two studies examining psychometric properties of physical activity monitors in adult IIM patients, the first study used wrist-worn, triaxial GENEActiv® accelerometer [34•]. The primary physical activity variable of this study was vector magnitude corrected for gravity [Euclidean norm minus 1 g (ENMO)], which was then used to calculate ENMO z-score based on age- and gender-matched values from controls. The second study used waist-worn, triaxial Actigraph® accelerometer [46]. The primary physical activity variables of this study were steps per min, peak 1-min cadence, and vector magnitude. Test–retest reliability of these Actigraph® variables was strong in one-month. Physical activity variables from both studies had good construct validity with correlations with physical function (HAQ, SF36 physical functioning, PROMIS PF-20 and task-oriented tests), muscle strength (MMT, hand-held dynamometry), disease activity, fatigue and pain. In both studies, the physical activity measures demonstrated adequate responsiveness with correlations with total improvement score and changes in physical function and muscle strength. In the second study, the patients also wore Fitbit One® in addition to Actigraph® at the same time for the same duration, and psychometric properties of Fitbit activity variables were examined and compared with Actigraph® variables [48]. The results suggested a high level of agreement between the Fitbit and ActiGraph®-measured daily step counts; therefore, it was not surprising that the Fitbit had similarly encouraging validity and responsiveness results. Fitbit tended to overestimate the peak 1-min cadence compared to Actigraph®, particularly at slower peak-cadence [34, 46, 48].

These results are encouraging to use physical activity monitors in the longitudinal assessment of myositis patients in clinical practice and therapeutic trials.

Conclusions

The available outcome measures for use in clinical practice in IIM with regard to muscle strength and endurance, physical function and activity have expanded over the past 15 years. There are valid and reliable options for a number of domains and methods for assessing these factors. In a busy clinical setting, efficiency is important, but there also needs to be considered the choosing of tools that work together to give the fullest picture of the muscle function status of the patient. The advent of wearable technology and its ability to track activity has moved the conversation even closer to the patient. Serial data adds depth when considering physical activity performance. Physical activity monitors could be used as a great tool to not only assess physical activity, but also encourage those using them to achieve their maximum fitness levels through daily, weekly and monthly targets.