Introduction

Osteoporosis is a disease characterized by reduced bone mass and deterioration of bone micro-architecture. Patients are usually diagnosed using bone mineral density (BMD) scan measurements at the hip and spine [1]. These bone abnormalities result in enhanced bone fragility, and as a consequence, patients with osteoporosis have a much higher fracture risk. The likelihood of developing osteoporosis is highest in North America and Europe. However, incidence rates in developing countries are predicted to rise as population longevity continues to increase in these regions [2]. The global burden of osteoporosis is substantial with an estimated 9.0 million osteoporotic fractures occurring per year [3].

Current mainstays of treatment are bisphosphonates, calcium, and vitamin D. However, there is evidence that these therapies cannot address non-BMD-related determinants of fracture risk [4, 5]. Previous work has suggested that targeting the non-BMD determinants of fracture risk may be a more effective means of treatment in some patients, or at the least somewhat helpful to most [6]. Exercise programs designed to improve balance, strength, and coordination are simple yet cost-effective interventions that may lower a patient’s risk of falling and experiencing negative health outcomes [7]. In addition, certain types of exercise have been shown to improve BMD through the simulation of bone remodeling [8]. This suggests that physical activity can partially address both the BMD and non-BMD determinants of fracture risk for osteoporotic patients. However, a RCT published in the CMAJ found that a major barrier for patients with osteoporosis engaging in physical activity was the fear of potential injury [9].

To address this, education regarding safe movement can be combined with exercise programs. This has been shown to be effective at reducing fall risk and therefore the risk of associated fractures [10]. As the benefits of these interventions greatly outweigh the costs, physicians should consider employing both physical activity and safe movement education as methods of risk reduction for osteoporotic patients. In accordance with this, many nationally and internationally published osteoporosis clinical practice guidelines (CPGs) now contain recommendations regarding physical activity and/or safe movement.

Healthcare professionals rely heavily on such CPGs [11] that aim to improve quality, consistency, and effectiveness of care by applying evidence-based medicine and providing healthcare practitioners with expert summaries of the most recent evidence [12]. However, evidence exists to suggest that in general, CPG quality may be low [13] and the rigor with which CPGs follow standard development methods is unsatisfactory [1416]. Therefore, a common, widely accepted, and standardized method to evaluate CPGs is required. The benefit is twofold: readers will know in which CPGs to place the most trust, and CPG developers will be able to improve the quality of future publications.

To address this need, several CPG quality appraisal instruments have been developed. However, to date, there has been no evaluation of the methodological quality of osteoporosis CPGs. Our study utilized the Appraisal of Guidelines for Research and Evaluation version II (AGREE II) instrument [17] to evaluate the methodological quality of 19 osteoporosis CPGs as they pertain to physical activity and safe movement. This instrument, along with its previous versions, has previously been applied to many CPGs for different diseases [1822]. This instrument has been validated and is considered a reliable and useful tool [23, 24]. In addition, differences in region, publication date, and quality of CPGs may result in some variability in recommendations made. Therefore, the purpose of this review is to provide an AGREE II quality assessment of currently available osteoporosis CPGs and to assess physical activity and safe movement recommendations, including a discussion of how they differ between CPGs.

Methods

Clinical practice guideline identification

CPGs were identified between June 15, 2015, and July 1, 2015, using PubMed, the Wiley Online Library, Scholars Portal, the National CPG Clearinghouse, and the International Osteoporosis Foundation’s National Guideline Database. A search strategy using the keywords “osteoporosis,” “national,” “guidelines,” “safe movement,” “physical activity,” and “exercise” was employed. The results of this search were further filtered to include only papers produced or commissioned by national/international professional associations or health ministries. This strategy identified 45 papers, of which 19 met the pre-defined inclusion criteria: (1) complete CPG text is available in English or as a translation (studies with only the abstract translated were excluded); (2) CPG contains recommendations regarding safe movement and/or exercise for patients with primary osteoporosis; (3) the target audience was primary or secondary healthcare providers; and (4) the most recent version of the CPG was published no later than 2004. Reasons for CPG exclusions were collected.

Quality assessment

This study employed the latest version of the AGREE II instrument [17] to evaluate each CPG meeting our inclusion criteria. According to AGREE II protocol, each CPG was scored on 23 items within 6 domains. Domain 1 (scope and purpose) is divided into 3 items: guideline objectives, health questions, and population application. Domain 2 (stakeholder involvement) is based on 3 items: guideline development group, preferences of target population, and target users. Domain 3 (rigor of development) includes 8 items: systematic methods used to search evidence, criteria for selection, strengths and limitations of the evidence, methods for formulating the evidence, health benefits and side effects of recommendations, explicit links between recommendation and supporting evidence, expert reviewers, and updating guideline for future use. Domain 4 (clarity and presentation) introduces 3 items: recommendations are specific and unambiguous, different options for management, and key recommendations. Domain 5 (applicability) includes 4 items: facilitators and barriers, advice/tools to implement recommendations into practice, resources for implications, and auditing criteria. Domain 6 (editorial independence) is based on 2 items: editorial independence from the funding body and conflicts of interest of the guideline development members.

CPGs were scored by three independent reviewers, each of whom were trained to use the AGREE II instrument and were provided with the AGREE II user manual. The user manual defines each item and assists the user in determining a CPG’s score for that item. Items were scored based on a scale ranging from 1 (absence of item) to 7 (item is reported with exceptional quality). Domain scores were calculated by summing item scores within each domain from each reviewer, then standardizing them as a percentage of the maximum possible score. Agreement between each reviewer’s scores was tested using a two-way ANOVA with single-rater two-way intra-class correlation coefficients (ICCs) for each domain across all guidelines as in a previous study [25]. This method was chosen based on recommendations from Shrout and Fleiss [26]. The degree of reviewer score agreement was defined using a previously used scale: agreement for ICC <0.20, poor; 0.21–0.40, fair; 0.41–0.60, moderate; 0.61–0.80, good; 0.81–1.00, very good [25]. To ensure uniformity, reviewers were instructed not to refer to supporting documents published separately unless they were explicitly referred to within the CPG.

AGREE II protocol states that no overall score is calculated to determine if a CPG is recommended or not recommended. Instead, CPGs in this study were ranked, as in a previous study, according to the number of domains scoring >60 % [26]. High-quality CPGs have 5 or 6 domains scoring >60 %, average-quality CPGs 3 or 4, and poor-quality CPGs have 2 or fewer of their domains scoring >60 % [27].

Extraction and analysis of relevant recommendations

Two reviewers used a pre-existing standardized table to independently extract each CPG’s recommendations on safe movement and physical activity. The standardized table was used to record the occurrence of specific physical activity and/or safe movement recommendations. Types of physical activity and/or safe movement were recorded as either recommended (1), recommended against (−1), or not addressed (0) by each CPG. This way, the positive as well as the negative frequency of each recommendation could be tracked.

Once extracted, the frequency of the different recommendations made and their consistency between CPGs were analyzed. Recommendation frequency was compared with CPG publication date, and CPG AGREE quality scores.

Results

Guideline selection

Our literature search identified 45 CPGs and 19 met inclusion criteria. CPG exclusion was mostly on the basis of language; 19 CPGs were not available in English and 3 had English language abstracts or summaries only. Three CPGs were excluded based on an absence of physical activity and safe movement recommendations. Finally, one CPG was excluded due to a publication date prior to 2004. The remaining 19 CPGs met the inclusion criteria and are summarized in Table 1.

Table 1 Intraclass Correlation Coefficients for each CPG AGREE Score

Quality assessment

Intra-reviewer item score agreement ranged from fair to good (Table 2). Dispute regarding whether an item’s criteria had been fulfilled or unfulfilled never occurred. However, when disagreement did occur, it was related to the degree to which an item’s criteria had been fulfilled.

Table 2 Intraclass correlation coefficients for each CPG AGREE score

Domain scores resulting from the AGREE II quality assessment are illustrated in Table 3. Collectively, the included CPGs’ scores varied significantly both within and across the six domains. Overall, CPGs were strongest in the “scope and purpose” and “clarity of presentation” domains, scoring 87 % with a SD of 15 % and 84 % with a SD of 11 % respectively. The CPGs scored poorly on the remaining four domains with the worst quality domain “editorial independence” scoring 46 % with a SD of 28 %.

Table 3 AGREE II domain scores

CPGs published by the National Osteoporosis Foundation of America, Australia, Scotland, and Malaysia met the criteria for being high quality, having at least 5 domains scoring greater than 60 %. The majority of CPGs included in this study were of average quality, meaning they had either three or four domains with scores over 60 %. The CPGs published by Greece, Asia, Ibero-america, the Middle East and North Africa, and Taiwan all had two or fewer domains that scored over 60 % and were considered low quality.

For domain 1, scope and purpose, CPGs included in this study had a mean AGREE II quality score of 87 %. Quality scores varied little between the individual CPGs for this domain with a standard deviation of 15 %. CPGs published by Malaysia, Australia, and the American Association of Endocrinologists best described their scope and purpose, while CPGs published by Greece, Asia and the Middle East, and North Africa failed to do so adequately, if at all.

The mean CPG quality score for domain 2, stakeholder involvement, was 59 %. Scores were marginally more variable with a standard deviation of 17 %. The Australian, Singaporean, and Scottish CPGs had the highest quality methods ensuring appropriate stakeholder involvement. The Greek and Asian CPG and the CPG from the Middle East and North Africa received the lowest quality scores in this domain and failed to adequately address this methodological category.

In domain 3, rigor of development, the mean CPG quality score was 58 %. Variation was slightly larger with a standard deviation of 19 %. Receiving scores of 96, 87, and 81 % respectively, CPGs from Australia, Canada, and Scotland were the most rigorously developed. Scoring the lowest in this category were CPGs from Greece, Asia and the Middle East, and North Africa.

Domain 4, clarity and presentation, had a mean score of 84 %. Small variation was observed with a standard deviation of 11 %. CPGs from Scotland, Australia, and South Africa scored the highest in this category, while CPGs from Lebanon, the UK, and Asia received the lowest scores.

In terms of general applicability, domain 5, CPGs had a mean AGREE quality score of 52 % and a standard deviation of 11 %, the smallest variation of any domain. CPGs from Malaysia, Scotland, and the National Osteoporosis Foundation of America scored the highest, while CPGs from the Middle East and North Africa, Greece, and Ibero-america scored the lowest in terms of applicability.

Domain 6, editorial independence, had a mean of 46 %. This domain was the most variable with a standard deviation of 28 %. CPGs with methods facilitating the greatest editorial independence were from Australia, Malaysia, and the UK while CPGs from Asia, Greece, and Ibero-america scored the lowest in this category, and their methods may have had the greatest potential for allowing a conflict of interest to affect their recommendations.

The quality of included CPGs demonstrated improvement over time. Included CPGs were published between 2004 and 2015, and those published in the first half (2004–2009) scored an average of 67, 42, 41, 74, 41, and 25 % for domains 1 through 6, respectively. CPGs with publication dates from 2010 to 2015 had average domain scores of 90, 64, 63, 89, 56, and 54 %, respectively.

Analysis of physical activity and safe movement recommendations

Physical activity and safe movement recommendations compiled from all CPGs are listed in Table 4 along with the proportion of CPGs making each specific recommendation. Safe movement recommendations were defined as instructions detailing specific movements and body positions that reduce a patient’s vulnerability to fractures and falls. As per inclusion criteria, 100 % of CPGs recommended physical activity. Commonly recommended physical activities included weight-bearing exercises (low-impact ones being the most commonly recommended), muscle-strengthening exercises, resistance exercises, and balance training exercises. Safe movement recommendations were much less common than physical activity recommendations, with only 58 % of included CPGs making any type of safe movement recommendation. The most frequent safe movement recommendations advised against immobility, forward spinal flexion, and spinal torsion. Recommendations usually used broad categories of exercise types or goals instead of indicating specific exercises.

Table 4 Relative frequency of recommendations

The most frequently made recommendations regarding physical activity were not substantially different between high- and average-quality CPGs. However, some differences were found in low-quality CPG recommendations. The most commonly made recommendations were low-impact weight-bearing exercise (such as walking), muscle-strengthening exercises, and balance training exercises in high-quality CPGs; muscle-strengthening exercises, low-impact weight-bearing exercises, and resistance training in average-quality CPGs; and balance training, high- and low-impact weight-bearing, and muscle-strengthening exercises in low-quality CPGs.

Safe movement recommendations were mostly consistent between high-, average-, and low-quality CPGs. In general, high- and average-quality CPGs tended to make safe movement recommendations more often than low-quality CPGs. High and average quality CPGs made an average of 3.4 (SD = 2.7) and 3.5 (SD = 3.9) recommendations per CPG. Low-quality CPGs made an average of 0.6 (SD = 1.5) recommendations per CPG. The most frequently made recommendations were to avoid immobility as well as excessive flexion or torsion of the spine.

The number of recommendations made by CPGs has increased significantly over time. CPGs published from 2004 to 2009 recommended an average of 4.4 (SD = 2.22) different types of physical activity whereas CPGs published from 2010 to 2015 recommended an average of 7.5 (SD = 3.28; P < 0.05). The same held true for safe movement recommendations; CPGs published from 2004 to 2009 made an average of 0.86 recommendations (SD = 1.46) while CPGs published from 2010 to 2015 made an average of 3.3 recommendations (SD = 3.53; P < 0.05).

Discussion

Our findings suggest that the overall quality of osteoporosis CPGs varies substantially and that the information on exercise and safe movement is variable and poorly defined. As CPGs play such an integral role in the provision of care [11], identifying a CPG’s quality before relying on it is crucial. As of now, with the current variable quality of osteoporosis CPGs, patients worldwide may not all be receiving care based on the best, most current peer-reviewed evidence. Presently, the onus is on the healthcare provider to ensure recommendations are sourced from CPGs with the highest methodological quality.

In our evaluation, the highest quality domains were scope and purpose and clarity of presentation. These results are in accordance with previous studies of CPG quality from a wide variety of healthcare disciplines [1820, 28]. In this study, CPGs from Australia, Malaysia, the UK, and the American Association of Endocrinologists received perfect scores in this domain. Why certain domains consistently score well on average across many different fields of healthcare is not currently known. Perhaps it is easier to fulfill the requirements of certain domains, or perhaps current authors place a higher priority on certain domains.

The poorest scoring domains were “editorial independence” and “applicability.” In a review of physician adherence to CPGs, it was suggested that as many as 38 % of physicians described CPGs as inconvenient or too difficult to use [29]. Making CPGs easy to implement is a crucial step toward increasing their rate of use. Domain 5, applicability, was the second lowest scoring domain among included CPGs. Hence, the lack of specific information about exercise and safe movement can be considered a barrier for implementation. Our results regarding this domain are in agreement with a multitude of previous studies of CPGs in other healthcare disciplines in which this domain scored the second lowest or lowest [3033]. The Malaysian CPG we reviewed scored the highest in this domain. It included a section defining specific factors required to be in place in order to ensure that the CPG was used effectively. Most other CPGs we reviewed completely neglected to include any such information. For exercise and safe movement, it might be anticipated that clinicians or patient guides would provide dosage information, and visual aids might facilitate implementation. Perhaps the low average score in this domain is due to the CPG development group composition. Groups usually include a variety of medical experts, patient representatives, and epidemiologists; however, they often lacked anyone with a high degree of administrative, financial, economic, knowledge translation, or logistical experience. If future CPG development groups included such individuals, the group may feel more comfortable devising a strategy easing the economic and logistical burden of applying CPGs and perhaps score higher in this domain as a result.

It is unclear why editorial independence is the lowest scoring and most variable domain in this study as well as in many other studies of CPG quality [22, 3335]. The requirements for this domain are rather straightforward: any funding received for CPG development must be disclosed as well as any potential conflicts of interest from its authors or members of the development group. It is possible that development groups assume that it is implied to the reader, or they simply do not appreciate the importance of editorial independence or its disclosure. It has been shown that conflicts of interest are almost endemic among CPG developers across many fields of healthcare [3639]. Further evidence suggests that these conflicts may actually have an effect on the recommendations found within CPGs [40]. It seems imperative that emphasis be placed on this domain. Perhaps if new conflict of interest reporting standards or even standards for CPG development panel member composition were created in order to limit potential conflicts from the beginning, CPG quality in this domain would improve. Since we focused on exercise and safe movement, it is unlikely that conflict of interest would affect which exercises were recommended. However, it might be that a lack of attention to these noncommercial interventions could be related to such influences.

It is important to acknowledge which AGREE II domains are most often lacking as this may aid in the development of future CPGs. We hope that with data from our study, future osteoporosis CPG development groups can concentrate on improving the applicability and editorial independence of their CPGs. In fact, the AGREE II instrument is also intended to serve as a framework for CPG development [17]. Indeed, when reviewing the included CPGs, it was noted that those which disclosed knowledge of the AGREE instrument tended to score higher than those which did not.

Overall, our CPGs showed an improvement in quality over time. Domain scores improved substantially when CPGs published before 2010 were compared with those published after. This contrasts a 2012 review of multidisciplinary CPGs which found that little improvement, if any, had occurred over the previous two decades [16]. The CPGs in our study that showed this improvement were largely published after that review was conducted, so the quality improvement is perhaps a more recent trend, or is more obvious in osteoporosis CPGs.

Physical activity recommendations found within the included CPGs were highly varied with the exception of a few generally agreed upon recommendations. The generally agreed upon recommendations of weight-bearing, muscle-strengthening, and resistance training exercises were often vaguely defined with deference often being given to physical therapists’ clinical judgment. It appears that osteoporosis CPG development groups agree that some sort of physical activity should be recommended to patients. However, there is no definitive consensus as to what specific type of exercise should be done, how often, and at what intensity. Furthermore, as the number of physical activity and safe movement recommendations increases over the years, recommendation variability seems to be increasing. The trend is moving toward increasing variability and decreased specificity. CPG readers may inevitably become more confused and find it more difficult to apply these recommendations to clinical practice. This trend is likely due to a multitude of newly published studies proclaiming specific types of exercise as beneficial to osteoporotic patients [4144]. While considerable research exists stating the value of several singular types of physical activity, little if any research comparing the benefits of different physical activities relative to one another has been conducted. Research in this direction is required in order to decrease the variability and increase the specificity of physical activity recommendations, making them easier to apply for healthcare providers.

Special attention should be given to the category of weight-bearing exercise. It was usually defined by CPGs as high (i.e., jogging), moderate (i.e., stair climbing), or low (i.e., walking) impact. CPGs with high AGREE quality scores almost unanimously recommended a low-impact exercise and advised against a high-impact exercise. Lower quality CPGs, on the other hand, while also recommending a low-impact weight-bearing exercise, frequently recommended high-impact weight-bearing activities and never cautioned against these. Most often, CPGs recommending against a high-impact weight-bearing exercise cited the ambiguous, weak, or low-quality evidence regarding its benefit, as well as safety concerns stemming from development panel members’ clinical judgment. Further, it may be important to consider safe movement principles when recommending exercise, as certain activities or exercises might expose people to more at-risk postures.

The methodological quality of this study is in line with previous work. However, it is not without its limitations. While we searched several online databases, only English language CPGs were examined. If CPGs in other languages were included in this study, results may have been different. Reviewer bias also could have affected our results. For example, one reviewer mentioned that if a particular CPG exceeded pre-existing expectations, it may have received a slightly higher mark than it otherwise would have. The same concern was also voiced for CPGs that underperformed compared to pre-existing expectations. Finally, there are limitations inherent to the AGREE II instrument. It is a tool for the assessment of methodological quality. It does not assess the clinical content. This limitation is common to all existing appraisal tools [45]. They can never fully replace a reader’s critical judgment or clinical decision-making that considers the nature of the health condition and person to whom recommendations would be applied.

In conclusion, CPGs aim to provide healthcare providers with summaries of the most recent and highest quality evidence in a particular field. However, given the currently variable quality of osteoporosis CPGs, the burden of critically appraising evidence still partially lies on the healthcare provider. To our knowledge, this study is the first to evaluate the quality of osteoporosis CPGs and provide guidance as to which are developed with the highest methodological standards. It would be prudent for CPG development groups worldwide to address and report adherence to the AGREE II instrument framework. Special attention is needed to ensure CPGs are easily applicable to current clinical practice paradigms. Improving editorial independence standards or reporting standards is also necessary. Lastly, regarding the content of the examined CPGs, greater detail and specificity is required for physical activity recommendations. More research is required in this area before CPG recommendations can improve.