Introduction

As a common chronic disease, knee osteoarthritis (OA) is with rising prevalence [1,2,3]. OA is regarded to be the fourth leading cause of disability worldwide [4]. Knee OA affects the whole body and self, ultimately affecting the quality of life on many levels [5]. The socioeconomic burden in terms of medical care cost for both government and individuals was also increased with knee OA. Currently, the ultimate goal of knee OA treatment is to relieve symptoms and to improve joint function and quality of life. Although total knee replacement may serve as an effective alternative for patients with severe knee OA, the risk of surgical complications cannot be eliminated completely. Numerous conservative treatment options are currently available, including analgesic medication, physical therapy, unloaded bracing, and intra-articular injections etc, aiming to relieve the knee joint pain and delay the surgical intervention [6].

As an innovative therapy, researchers believe that cell therapy is the next logical generation in the progression of surgical intervention [7]. As one type of cells, mesenchymal stem cells (MSCs) are extraordinarily popular due to their ease of harvesting, safety and potential to differentiate into cartilage tissue [8,9,10]. Furthermore, paracrine mechanism and immunoregulation effects of growth factor and cytokines released by MSCs are beneficial to treating knee OA [11,12,13]. Recently, the number of clinical studies within this field of MSCs research is fast growing, while more clinics are offering MSCs treatments with lax medical regulations [14]. Although several case-series and clinical controlled trials have shown favorable results of MSCs injections in knee OA [15,16,17,18,19], variation in evidence level of these primary studies cannot give us confidence to regular care practices with questions about MSCs treatment. Moreover, Osborne et al [20] reported that MSCs therapy might be hallmarks of ‘quack’ medicine: desperate patients, pseudoscience and large amounts of money being charged for unproven therapies. Thus, we do not have confidence in recommendation based on the current evidence and debates.

Recently, several systematic reviews or meta-analyses concerning this topic have been published [21,22,23,24]. Yubo et al [21] reported that MSCs intervention has great potential with relative safety as an efficacious cell therapy for patients with knee OA. Xia et al [23] reported beneficial effects of MSCs therapy in knee OA, although insufficient evidence remains available to recommend its use. However, Pas et al [24] did not recommend MSCs therapy for knee OA for the absence of high-level evidence. Filardo et al [25] demonstrated that the effectiveness of MSCs in treating articular defects and OA was in conclusive, because we could not distinguish the observed effects of MSCs themselves from placebo effects and related factors. We cannot conclude the MSCs effectiveness exactly according to results of these published primary studies and systematic reviews. These controversial results could not inform decisions on health and social interventions, and definite conclusions about using MSCs treating knee OA cannot yet be made with absolute certainty. Therefore, it is required to assess how much confidence to be paid in findings from systematic reviews evaluating the effectiveness of MSCs therapy in knee OA.

The purpose of this study was (1) to perform an overview of overlapping systematic reviews that assessed the efficacy and safety of MSCs injections for knee OA; (2) to evaluate the methodological quality and risk of bias of relevant systematic reviews; (3) to synthesize the current evidence qualitatively to determine how much confidence to place when using MSCs for knee OA.

Materials and methods

The present study was conducted according to the guideline of Preferred Reporting Items for Systematic reviews and Meta-analysis (PRISMA) checklist (Supplemental Table 1) [26]. Based on the EPC guidance, existing systematic reviews have been integrated into a new review [27].

Search strategy

All systematic reviews that meet the following inclusions/exclusion criteria were searched in databases (MEDLINE, EMBASE, Scopus and Cochrane library) from database inception to 1 August 2017. The following MeSH words and free texts were used for search: knee, osteoarthritis, arthritis, stem cell, mesenchymal stem cell, MSC, meta-analysis and systematic review (Supplemental Table 2). In addition to electronic literature search, the references of searched studies were also screened to identify other systematic reviews.

Secondary investigation into unpublished literature and abstracts was performed by searching the following conference ACR, OARSI and APLAR.

Inclusion and exclusion criteria

Systematic reviews or meta-analyses evaluating MSCs for knee OA patients that met the following inclusive /exclusive criteria were eligible for inclusion:

  1. 1.

    Type of studies: Meta-analysis or systematic review;

  2. 2.

    Participants: knee OA patients;

  3. 3.

    Interventions: The included systematic reviews had compared all types of MSCs from any origin injection with any other interventions in treating knee OA. MSCs injection combined with another intervention were included if this combined intervention was compare with an intervention without MSCs injection.

  4. 4.

    Compare: Placebo, HA and other intervention.

  5. 5.

    Outcomes: The included systematic reviews had to evaluate the effects of MSCs injection on pain, function, quality of life, radiological outcomes, histological analysis or adverse events.

Exclusion criteria included the following items:

  1. 1.

    MSCs used in hand, hip, ankle and other joints OA;

  2. 2.

    MSCs seeded into scaffold implantation for cartilage defects;

  3. 3.

    Basic science review and systematic reviews based on in vitro, in vivo preclinical studies;

  4. 4.

    Abstract without precise outcomes, commentary, methodology study, overview, narrative review, and clinical practice guidelines.

Studies selection and date extraction

Two reviewers independently screened the titles and abstracts of systematic reviews of choice for the eligibility criteria. They were uninformed of the journals/authors’ information and affiliations. Subsequently, systematic reviews which were regarded as potentially relevant by any of the reviewers were obtained in full text for further review by the same reviewers independently. Any disagreement was resolved by discussion for reaching a consensus.

The data from each included systematic reviews were extracted by two authors independently using a predefined data extraction form. The following data were extracted: title, year, journal, authors, study design, total number of primary studies, the pooled outcomes, methodology, type of MSCs and MSCs identification methods. Any disagreement concerning the extracted data was resolved by discussion.

Methodological quality assessment for systematic reviews

The methodological quality of included systematic reviews was evaluated by using Assessment of Multiple Systematic Reviews (AMSTAR) tool [28]. The assessment process was conducted by two authors independently. Any controversial viewpoint was resolved by discussion. The AMSTAR was a measurement tool with eleven items for evaluating methodological quality of systematic reviews [29].

Heterogeneity among systematic reviews

Heterogeneity of each outcome was summarized for each included systematic reviews when with pooling results. We also assessed the following two aspects: (1) whether sensitivity analysis was performed in systematic reviews and (2) whether the included reviews assessed potential sources of heterogeneity among primary studies. I 2 value was utilized to demonstrate the degree of heterogeneity quantitatively among primary studies.

Choice of best evidence

Best evidence choice procedure was conducted based on to the Jadad decision algorithm [30], which was adopted to help clinical decision maker select reliable ones among all the systematical reviews. The assessment criteria of Jadad decision algorithm include: clinical question development, study selection and inclusion process, data extraction, study quality assessment, feasibility to combine studies, and statistics for data synthesis [30]. This procedure was conducted by two independent authors. The consensus was reached through agreeing on which of the included systematic reviews can provide best evidence according to the current information.

Risk of bias assessment for systematic reviews

The risk of bias of included systematic reviews was assessed by two authors independently with the help of ROBIS tool [31]. Disagreements were resolved by discussion. According to the ROBIS tool, risk of bias was evaluated by assessing the following four domains: study eligibility criteria, identification and selection of studies, data collection and study appraisal, and synthesis and findings. These four domains covered the main review processes.

Each domain was evaluated for information that adopted to support the judgments, signaling questions, and judgement of concern about risk of bias. The answers for the signaling questions included “Yes”, “Probably Yes”, “No”, “Probably No” and “No Information”. The answer only with “Yes” indicates low concerns. Thus, each domain for risk of bias was classified as “Low”, “High”, or “Unclear”. If one domain was categorized as low level of concern, all signaling questions for the domain were Yes or Probably Yes. Concern about bias was elevated if any signaling questions were reported as “No” or “Probably No” [31].

Assessment of credibility in the review findings (CERQual)

Confidence in the Evidence from Reviews of Qualitative research (CERQual) [32] tool was utilized to assess our confidence in relying the outcomes of systematic review. The CERQual tool was developed under the supports of the Cochrane Methods Group, and draws on the principles used by the GRADE approach to quantitative literature systematic reviews [33]. CERQual tool assesses confidence in evidence based on the following four key components contributing to a review finding: (1) the methodological limitations; (2) the relevance; (3) the coherence; and (4) the adequacy of the data. Assessment of each of the four components allows for judgment about the overall confidence for each systematic review finding. Confidence ratings commence at ‘high confidence’ and are rated down by one or more levels if there are concerns regarding individual CERQual components [32,33,34]. The confident judgments were achieved through discussion between two reviewers.

Results

Literature search

A total of 47 titles and abstracts were preliminarily reviewed, and four published systematic reviews [21,22,23,24] ultimately met the eligibility criteria (Fig. 1). After title and abstract screening, one was excluded because they aimed to review animal studies of MSCs in treating OA. Seven studies were omitted because of MSCs used for rheumatoid arthritis. Two was excluded due to investigating MSCs in patients with juvenile idiopathic arthritis. Twenty-three narrative reviews and mini-review without methodological evaluation were excluded after full-text reading. One literature was not included because it conducted systematic review involving mechanism of MSCs in treating OA. One was excluded due to primary studies involving other joints.

Fig. 1
figure 1

The systematic reviews selection and inclusion process

Characteristics of systematic reviews

The characteristics of systematic reviews have been presented in Table 1. These reviews were published from 2015 [23] to 2017 [21, 24]. The numbers of original studies included in systematic reviews varied from six in that study published in 2017 [24] to 18 that published in 2016 [22] (Table 2). One systematic review conducted qualitative synthesis without pooled data [24]. Only one systematic review reported whether or not phenotypic characterization was performed in the primary trials [24].

Table 1 Characteristics of included systematic reviews
Table 2 Primary studies included in systematic reviews

Search methodology

The comprehensive search source which was utilized by individual systematic reviews is presented in Table 3. Medline, Embase and Cochrane database are the most frequently used searching databases of the included systematic reviews.

Table 3 Databases mentioned by included systematic reviews during literature searches

Methodological quality of systematic reviews

Table 4 presented the methodological features of individual systematic reviews. Only one included systematic review [21] reported that only RCTs were included, while others [22,23,24] included randomized controlled trials (RCTs) and non-RCTs. The degree of evidence for each systematic review was Level II. REVMAN or STATA software was used in systematic reviews with pooling data [21,22,23]. Both sensitivity and subgroup analysis were conducted in two included studies [22, 23]. None of systematic review assessed quality of evidence body in their study.

Table 4 Methodological characteristics of included systematic reviews

The total AMSTAR score with each item of individual systematic reviews are presented in Table 5. The average AMSTAR score of individual literatures was 8.25, ranging from 7 [21, 22] to 11 [24]. Two of the included systematic reviews [21, 24] declared no conflict of interest in making investigation. The systematic review conducted by Pas et al [24] was regarded as the highest quality study.

Table 5 AMSTAR criteria for included systematic reviews

Heterogeneity among primary studies

The heterogeneity of each outcome with pooled quantitatively in each systematic review have been presented in Table 6. The I2 parameter was shown to present the heterogeneity among primary clinical trials. The outcomes of almost all the pooled results had moderate or high heterogeneity.

Table 6 Heterogeneity of each pooled outcome in included systematic reviews

Jadad decision algorithm

All the pooled quantitative outcomes reported in systematic reviews are shown in Fig. 2. Based on the procedure of jadad decision algorithm, the eligible systematic review was selected on account of the methodological quality of systematic review (Fig. 3). Therefore, only one study conducted by Pas and his colleagues [24] with highest AMSTAR score was selected ultimately.

Fig. 2
figure 2

Results of each included systematic reviews with quantitative synthesis. Red means favoring MSCs; green means no difference; yellow means not reporting; and blue means favoring control group. Arabic numerals mean the number of included primary studies

Fig. 3
figure 3

Flow diagram of Jadad decision algorithm

Risk of bias of systematic reviews

The risk of bias of included reviews by ROBIS tool has been presented in Table 7, so do the assessment results of each item in phase 2 of ROBIS tool. The 3rd phase demonstrated whether the systematic reviews as a whole was at risk of bias. There was only one systematic review [24] with low risk of bias, while other three with high risk of bias. Judgments regarding each ROBIS item were presented as percentages across all the included SRs in Fig. 4. Based on the AMSTAR instrument and ROBIS tool, the above mentioned systematic review performed by Pas et al [24] with higher methodological quality and lower risk of bias was regarded to provide best evidence.

Table 7 Risk of bias assessment of systematic reviews using ROBIS tool
Fig. 4
figure 4

Risk of bias of the included systematic reviews with ROBIS tool. The ROBIS tool incorporates the assessment of study eligibility criteria, identification and selection of studies, data collection and study appraisal, and synthesis and findings. The overall risk of bias is determined based on the above four domains. Each risk of bias item is presented as the percentage across all the systematic reviews, which indicates the proportion of different levels of risk of bias for each item

Assessment of credibility in the review findings (CERQual)

Confidence ratings for the three main outcomes (self-reported measure, MRI/histological examination and adverse events) in the selected systematic review [24], assessed using the CERQual tool, are shown in Table 8. For all outcomes, confidence was either low (outcome: self-reported measure and MRI/histological examination) or moderate (outcome: adverse events). The general reasons for downgrading of ratings were the problems with internal validity of primary studies, the limited generalisability and transferability of some data, and the small number of articles and small sample sizes within available studies.

Table 8 Summary of the confidence rating of outcomes (CERQual Qualitative Evidence Profile Table)

Discussion

Meta-analysis and systematic review are generally considered to be the best way to obtain evidence for healthcare decision making, thus can be used to resolve wide range of clinical problems. Decision-makers in medical institutions look forward to get consistent, stable and unbiased recommendations based on systematic reviews [35]. However, it is not uncommon to have several systematic reviews under the same topic published evaluating the same interventions, yet without consistent conclusions. This also occurred in the study of MSCs injection in treating knee OA. Although several systematic reviews or primary studies had supported MSCs injections, current evidence was unable to recommend for or against MSCs for knee OA.

The methodological quality and risk of bias among individual systematic reviews may account for the discrepancy in outcomes of systematic reviews. The following types of biases can be induced when systematic reviews are applied at all steps of review process, including study eligibility criteria, study selection, data collection and evidence synthesis. Thus, the decision-makers should put methodological quality and risk of bias in systematic reviews into consideration when pooled conclusions are used [28]. [31]. Therefore, the AMSTAR tool [28] was used in the present study to evaluate the methodological quality of systematic reviews about MSCs in treating knee OA, and Jadad decision algorithm [30] was utilized to select the best evidence. Furthermore, in purpose of collecting the systematic reviews and assessing the risk of bias, a newly developed tool, ROBIS (http://www.robis-tool.info) was used [31]. Ultimately, one systematic review [24] with lower risk of bias and offered the best evidence were selected in the present study. Although the systematic review with high methodological quality and with lower risk of bias was selected and determined, the quality for evidence body for each outcome was unknown. We also do not have total confidence to place in outcomes from systematic review of the effectiveness of MSCs therapy in knee OA. Thus, in the present study, we used CERQual tool [32] to assess our confidence in the selected systematic review [24] findings across primary studies.

As the first domain of CERQual tool, the risk of bias of primary studies, namely the internal validity, is known to influence results in important ways. The high risk of bias found in the reviewed studies must be taken into account when interpreting the synthesized findings. It should be noted that though all trials proposed to have used MSCs, within the identified trials, phenotypic characterization as described here was only performed in four trials. Two primary studies did not perform any specific immune-phenotypic characterization, making their claim of having used MSCs questionable. Although it has been suggested that MSCs should meet the criteria put forward by the International Society for Cellular Therapy, not all MSCs fall under these definitions such as a subpopulation of BMSCs and adipose-derived MSCs that are nonadherent to plastic but still exhibit all the other properties of MSCs [36]. Furthermore, none but one [18] primary studies reported patients blinding. The subject outcomes, including visual analogue scale (VAS), Lysholm score, Tegner activity scale, International Knee Documentation Committee (IKDC) clinical scores and Western Ontario and McMaster Universities Osteoarthritis (WOMAC) index, can be influenced obviously by preference of patients and researchers. The pooled outcomes of forementioned self-reported score will be overestimated away from the real-world consequence. Unfortunately, the Delayed Gadolinium-Enhanced MRI of Cartilage (dGEMRIC) [37] that could assess the glycosaminoglycan content from the hyaline cartilage did not be used to evaluate the structural outcomes in primary studies. We, therefore, do not have sufficient confidence in getting good clinical results when treating knee OA by MSCs.

Although three primary studies [17,18,19] included in the selected systematic review reported assessors blinding in study design, no information was provided on how the assessors were blinded. Only one trial [18] reported investigators blinded to the patients’ data. Furthermore, high risk of selection bias was introduced to synthesized results due to the use of quasi-randomization procedures [15, 16, 19] or no allocation concealment [15, 16]. Therefore, detection bias can be introduced to cartilage evaluation using MRI or histological assay. Although assessors blinding reported in three trials, a high risk of selection bias was present before evaluating cartilage repair. Thus, these limitations of methodological quality in primary studies will lower our confidence when interpreting the results about cartilage healing.

For the aspect of generalisability to other clinical contexts, we should note that all but one trial [18] used a surgical cointervention. Performance bias could be introduced by surgical cointerventions introduces, as the personnel with varied surgical technique performing the surgical interventions could not be blinded. Four studies [15,16,17, 19] used PRP or HA as cointervention when evaluating the effects of MSC in symptoms change and cartilage healing. We do have confidence to not exclude the positive effects of PRP or HA in treating OA. Furthermore, the heterogeneity of varied inclusion criteria among the trials could limit the generalisability from current evidence to other clinical contexts. In the trials conducted by Koh et al. [16] and Wong et al [19], the patients with isolated medial knee compartment OA were recruited. One included study [17] included patients aged 18–50 years that was younger than that in other studies. Other heterogeneity may be caused by the preexisting conditions of patients, severity of OA, type of MSCs, dose of MSCs injection, follow-up duration and rehabilitation procedures after administration. These above mentioned sources of heterogeneity will also lower our confidence in transforming to other clinical contexts from current synthesized evidence.

Concerns about their safety remained among clinicians. Two systematic reviews [21, 22] reported no significant difference between MSCs treated and untreated groups by including RCTs and/or non-RCTs. The four studies [15, 17,18,19], included in the present selected systematic review, also reported no serious adverse events, although the follow-up was relatively short. Although the rate of adverse events could not be influenced largely by methodological limitations or generalisability, the relatively few studies with small participant numbers and short-term follow-up duration may low our confidence in drawing conclusion. Furthermore, we should bear in mind that their safety could be impacted by the dose, graft type (allogeneic, xenogeneic or autologous MSCs) and source of MSCs when application. The various procedures used in detaching, processing, storage and delivery of the MSCs could also influence their characteristics and reliability. Thus, we have moderate confidence to confirm that MSCs are relatively safe based on short-term adverse event. This conclusion is partly consistent with that drawn by Peeters et al. that application of cultured stem cells in joints appears to be safe [10]. Nonetheless, long-term adverse events are still poorly researched in further studies.

The strength of the present study is the combined utilization of ROBIS tool, AMSTAR instrument and Jadad decision algorithm and CERQual tool for evaluating the risk of bias and methodological quality of the systematic reviews and confidence in the systematic review findings across primary studies, simultaneously. The ultimate purpose was to help decision-makers select the best evidence with low risk of bias in terms of MSCs injection for knee OA from discordant systematic reviews and to give them confidence to place in specific review findings to help them judge how much emphasis they should give to these findings in their further decisions. Hence, based on the existing optimal evidence, we cannot take recommendations that intra-articular MSCs injection might be efficacious in treating knee OA.

The following items are the limitations of the present study: (1) English language systematic reviews were included. Non-english language literatures could have been omitted, leading to language bias. (2) The methodological quality of the primary studies may influence the results of included systematic reviews radically. Although we assessed risk of bias and quality of the included systematic reviews, the limitations of primary studies, especially conflict of interests, should be considered when the results are interpreted. (3) The primary studies that included in other systematic reviews which were not regarded as the best evidence with low risk of bias could be considered when ratings confidence for the outcomes.

In the present overview of systematic reviews investigating efficacy and safety of MSCs for knee OA, the best available evidence with low risk of bias suggested that there was no sufficient high-level evidence to recommend MSCs therapy for knee OA. Furthermore, we have moderate confidence to place in safety of MSCs therapy for knee OA but with low confidence in efficacy. High-quality clinical studies with rigorous standardized methodology are still required.