Introduction

Osteoarthritis (OA) is a degenerative joint disease which can affect the whole knee (patellofemoral and tibiofemoral joint) [2]. This degenerative joint disease is a progressive process that can be divided into stages or degrees [2, 26, 34].

With increasing age, OA is the most frequent cause for knee pain [2, 43]. In the fourth and fifth decade of life, light to moderate stages of OA have the highest prevalence [2]. But the severity of OA increases with aging.

There is no consensus about the criteria for knee OA in the literature. However, in most studies, the radiological classification of Kellgren and Lawrence (KL) is used to stage the progress of OA [26]. According to the American College of Rheumatism, the following clinical criteria should be present: knee pain, osteophytes and one further criterion such as tenderness, age over 50 or crunching of the joint [2].

The main symptoms of OA may have different causes. Pain can be caused mechanically by meniscus lesions, inflammation or subchondral edema. Loss of range of motion can be caused by capsular fibrosis or osteophytes. It has been shown that degenerative meniscal tears are associated with early OA. A non-traumatic meniscus lesion may be the first symptom of knee OA even in the absence of radiological OA signs [10, 11, 25]. Complete loss of the meniscus, however, is an important factor for the progression of OA [12, 44].

For many years, arthroscopic techniques were considered to be the treatment of choice for symptoms of OA because some of the underlying causes can be addressed by AT (partial resection of the meniscus, synovectomy, arthrolysis, removal of free bodies) [8, 25, 37, 38]. Nevertheless, several RCTs have shown that the clinical scores after arthroscopic treatment were not superior in comparison to a control group [29, 37,38,39]. After publication of these studies, several health care insurances stopped to reimburse AT of knee OA [37]. However, despite these clinical trials, arthroscopy for knee OA has not decreased in every country [8, 50]. One reason for this discrepancy may be that several orthopedic surgeons doubt the results of those trials because of methodological flaws [31].

Aim of this systematic review is to analyze randomized controlled trials of patients with several stages of knee OA and with non-traumatic meniscus lesions to find out if there is any clinically relevant effect of AT in knee OA.

Further objective of this systematic review was to find out if arthroscopy is associated with any side effects in patients with knee OA.

Regarding the outcome, we hypothesize that some subgroups of OA patients (e.g., patients with non-traumatic meniscus lesions) might benefit from arthroscopic surgery.

In contrast to previous systematic reviews, current studies such as the one by Gauffin et al. [15] were included. Furthermore, not only an intention-to-treat analysis of the original study was used to measure the outcome, but also a cross over analysis to identify subgroups of patients who benefit from AT.

Methods

Search details

A comprehensive literature search using the PubMed database to identify peer-reviewed articles about AT of knee OA according to the PRISMA statement was conducted. The PRISMA statement consists of a 27-item checklist and a 4-phase flow diagram [19, 36].

Prior to that, the study was registered at PROSPERO, which is an international database of prospectively registered systematic reviews [52]. The corresponding registry number is CRD42016047964.

For this systematic review, different combinations of keywords were utilized: osteoarthritis and arthroscopy, respectively, medial meniscus and arthroscopy. When a study of interest was found, related articles were searched. After identifying those articles, all references were screened for additional relevant publications.

Inclusion and exclusion criteria

The following inclusion criteria were applied:

  • prospective randomized trial (level one study),

  • trials reporting clinical outcome after AT of patients with any stage of radiological knee OA or of patients with non-traumatic meniscus lesions,

  • English language reports,

  • publication in a peer-reviewed journal.

All criteria should have been satisfied for inclusion in this systematic review.

All papers qualified for inclusion were read by the reviewers and checked for one of the following exclusion criteria:

  • number of patients less than 20,

  • Jadad score ≤ 1.

In case of implementation of at least one exclusion criterion the study was excluded.

Two reviewers (WP, KK) performed the initial study identification, secondary study screening, and final determination of eligibility and study inclusion. Both reviewers were also involved in the analysis of the articles.

Analysis

If two separate studies with the same authors and intervention as well as the same patient collective revealed a different follow-up, both publications were counted as one trial. For the analysis, also the appendices of the included study and publications of the study design were deconstructed.

After extraction of all studies’ data, a brief tabular narrative of each investigation was presented. Data of this tables included (1) first author and year of publication, (2) number of study centers, (3) country, (4) study groups and number of patients, (5) last follow-up, (6) mean age, (7) OA grade and (8) gender ration, (9) scores (Table 1). Additional tables were added to illustrate the procedures performed in the studies, results of clinical outcome, side effects and study limitations (Tables 2, 3, 4).

Table 1 Information about the study design randomized trials that were included in this review
Table 2 Procedures examined in the RCTs
Table 3 Summary of the clinical outcome of randomized trials which were included in this review
Table 4 Quality assessment with the Jadad score and Coleman methodology score

Primary and secondary endpoints

Primary endpoint was the group difference in the clinical outcome scores used in the studies.

Secondary endpoints were: (1) subgroup analysis for factors which might have an effect on the outcome after AT of OA, (2) the crossover rate (patients who changed from one treatment group to the other), (3) the rate of side effects and (4) a methodological analysis of the included studies.

Study quality and limitations

Each article was analyzed for limitation and bias by all reviewers. For the quality assessment, information has been extracted from the original article, from published appendices or from published study protocols. Study quality has been analyzed with the Jadad score [17] and with the Coleman methodology score [9].

Jadad score

The Jadad score is a three-point questionnaire that forms the basis of a score [17, 42]. This questionnaire focuses on randomization, blinding and description of dropouts. The questions are as follows: (1) Was the study described as randomized? (2) Was the study described as double blind? (3) Was there any description of withdrawals and dropouts?

For each answer one point is given [17, 42]. Additional points are given if the method of randomization is described in the paper, if that method was appropriate and if the method of blinding was described and appropriate. Points are deducted if the method of randomization or blinding was inappropriate. The highest score a study can receive is, therefore, five points [17].

Coleman methodology scoring system

The Coleman methodology scoring system was developed to analyze the quality of studies reporting surgical treatments of patellar tendinopathy [9]. It’s criteria takes into account number of patients, follow-up, number of different treatment procedures, type of study (randomized), diagnostic certainty, description of the surgical procedure, description of postoperative rehabilitation, outcome criteria, procedure for assessing outcome and patient selection [9, 42].

Limitations

These limitations were systematically analyzed: (1) description of the surgical procedure, (2) control of surgical process quality, (3) description of the rate of meniscus extrusion, (4) the rate of varus or valgus malalignment, (5) the outcome score and (6) control of use of pain killers and NSAIDs.

Results

Search results and study design

The search results are shown in Fig. 1 and details of the study design are shown in Table 1. In ten studies, partial meniscectomy was part of the AT. In six of those studies, arthroscopic partial meniscectomy (APM) was the only surgical procedure which was performed. In five studies, multiple procedures were allowed (Table 2). Additional procedures included partial synovectomy, debridement of chondral flaps and resection of osteophytes which blocked joint extension [7, 29, 35, 39]. In three studies, the AT was lavage only [3, 14, 23].

Fig. 1
figure 1

Flowchart showing the literature review

The control groups were also variable (Table 2). In five studies, the control treatment was sham surgery or arthroscopic lavage [7, 22, 23, 39, 48]. In six studies, control treatment was supervised or unsupervised exercise [15, 21, 24, 29, 30, 51].

Clinical outcome scores

Several different outcome scores were used and the results of the different studies were heterogeneous (Tables 1, 3).

WOMAC score

There was no significant difference in the WOMAC total score in both studies with this score as primary endpoint [23, 29]. In one of these studies, however, some secondary endpoints (WOMAC pain and VAS pain) were significantly better in the arthroscopy group (lavage with 3000 ml) in comparison to “placebo” surgery (lavage with 250 ml). In this study, patients with crystals in the synovial fluid had greater improvements in pain [23].

In one study, with the WOMAC pain subscale as primary endpoint, the improvement was significantly greater in the arthroscopy group (lavage) compared to intraarticular corticoid injections. In this study, patients with a knee effusion or with less severe radiographic OA responded better to both treatments [3].

In one study, the intention-to-treat analysis showed no significant difference in the WOMAC function subscale of knee OA patients after APM or exercise (n.s.). In this study, however, the WOMAC function subscale did not improve in 34.9% of the patients who were assigned to the exercise group. After cross over to APM, the WOMAC function scores at 12 months were similar to those of patients who were primarily assigned to the APM [24].

KOOS

The KOOS or a KOOS subscale was used in three studies as primary outcome measurement and in two studies as secondary outcome measure [15, 21, 24, 30]. All studies examined the effect of APM or exercise in patients with OA. The results were contradictory. In one study, patients of the surgery group had significantly less pain as measured with the KOOS pain subscale at 3 and 12 months postoperatively [15]. Three studies found no difference in the KOOS pain score [15, 21, 30]. In all three studies, crossover rates from the exercise group to the arthroscopy group have been described (19% [30], 21% [15] and 27.7% [21]). In the Herrlin et al. study, 8 of the 13 cross over patients had flap tears [21].

Lysholm score

Two studies found no statistical difference in the Lysholm score between the APM group and a control treatment (sham surgery or exercise) [48, 51]. In one study, arthroscopy with removal of chondral flaps and trimming of the bed of the flap led to a significantly better Lysholm score than control treatment [22]. In this study, a modified Lysholm score without the instability subscore was used.

Other scores

Four studies used other scores as outcome tools (Table 3). Three of those studies did not differentiate the outcome measures into primary and secondary endpoints [7, 14, 35].

Adverse events

Side effects were analyzed in seven studies. In all studies, the rate of adverse events in both the treatment and control group was low [3, 15, 24, 30, 35, 39, 48]. In four of these studies, AT was compared with a non-operatively treated control group [15, 24, 30, 35]. In three of these studies—with physiotherapy as control group—there was no significant difference in the rate of side effects between the two study groups [15, 24, 30]. In one study, AT was compared to oral NSAIDs [35]. In this study, two deep venous thrombosis, one superficial infection and one hemarthrosis were observed in the arthroscopy group, whereas no adverse effect was observed in the NSAID group [30].

Study quality and limitations

Quality assessment of the studies with the Jadad and the Coleman methodology score is shown in Table 4. The Jadad score ranges from 2 to 5 points. The Coleman methodology score ranges between 59 and 96.

Only three studies addressed varus or valgus malalignment of the participants [3, 29, 35]. No study mentioned the rate of meniscus root tears, but the percentage of participants with meniscus extrusion was described in one study. In this study, the rate of meniscus extrusion was 65% in the arthroscopy group and 50% in the control group [30].

The use of pain killers or NSAIDs was addressed in three studies [3, 29, 39]. In two studies, there was no difference in the consumption of pain killers or NSAIDs during the course of the studies [3, 29]. In one study, the use of pain killers or NSAIDs was described in the baseline characteristics only [39].

Seven of the included studies used a specific OA score as primary outcome measure (KOOS or WOMAC) [3, 15, 21, 23, 24, 29, 30].

Discussion

The most important finding of the present study was that certain subgroups of patients with knee osteoarthritis can benefit from AT.

This systematic review has shown that AT has no major advantage over non-operative treatment for the majority of patients with OA. However, there is evidence in the literature that AT can be a useful option for a subset of OA patients with non-traumatic meniscus lesions or crystal arthropathy.

This statement is in contrast with other previous systematic reviews. In a Cochrane review from 2008, Laupattarakasem et al. have shown that there is ‘gold’ level evidence that AT has no benefit for the treatment of OA [32]. Two systematic reviews from 2014 could also find no difference in the outcome of OA patients with AT and without AT [6, 37]. An explanation for the contradictory findings is that the study by Gauffin et al. could be not included to these systematic reviews because this study was only published in 2014 [15]. Gauffin et al. could show that patients with mild OA (stage 0–II according to KL [26]) with previous unsuccessful physiotherapy benefit from APM. Gauffin et al. found that the change in KOOS pain was larger in the surgery group compared to the non-surgery group. The difference in improvement between the groups was clinically relevant [15].

A qualitative flaw of these previous systematic reviews was that “the intention to treat analysis” of the original study was used to measure outcome. Katz et al. and Herrlin et al. found in the intention to treat analysis that there was no difference in outcome between patients with APM or physiotherapy [21, 24]. In both studies, however, there was a significant rate of patients in the physiotherapy group (34.9% and 27.7%) who crossed over to the arthroscopy group because they did not improve in clinical scores. After AT, the clinical scores improved in both studies to the same level of patients with initial APM [21, 24]. The studies by Katz et al. and Herrlin et al. have shown that a crossover analysis can be helpful in identifying subgroups of patients who benefit from the procedure [21, 24]. In this context, the “intention-to-treat” analysis popular in clinical research can also be seen critically. This can be illustrated by the following example. Diet A (treatment) is compared to diet B (placebo) in a clinical trial with 40 participants in each group. In group A, 38 participants lost weight, whereas in group B only five participants lost weight. If the weight loss would be analyzed in an “as-treated analysis”, the effect of diet A would be underestimated. Therefore, for this trial, an intention-to-treat analysis makes sense. If the same analysis is performed in a RCT about the effect of APM with a crossover rate of approximately one-third of patients with no improvement after physiotherapy, an intention-to-treat analysis is misleading [25].

All studies found that the various clinical scores at follow-up improved significantly after AT of patients with knee OA in comparison to the baseline. Regarding the superiority of AT, the results of the included studies were heterogeneous. Some studies have shown that the outcome after AT is better than control treatment [3, 15, 22, 23, 35]. Other studies have shown that there is no difference in outcome between patients with AT and control treatment [7, 14, 21, 24, 29, 30, 39, 48, 51]. The heterogeneity and discrepancy of the study results can be explained by differences in the stage of OA, type of AT, patient characteristics, study design and study quality.

With regard to the degree of osteoarthritis, very wide inclusion criteria were chosen in the present systematic review to include not only patients with advanced knee osteoarthritis but also patients with early osteoarthritis. Even at stage 0 according to KL [26], a non-traumatic meniscal lesion or a chondral lesion can be seen as an initial process in the development of osteoarthritis [34]. An effect of the AT was found especially in studies with patients in early OA stages (stage KL 0–II) [15, 22, 23, 35]. Two studies showed a benefit of APM even in patients with stage III in OA [7, 24]. Two studies including patients with Grade IV OA after KL failed to demonstrate superiority in AT [29, 39].

The clinical conclusion of these findings is that APM is a useful procedure in knees with stage 0–III OA with initial unsuccessful non-operative treatment. The studies reviewed indicate that the shape of the non-traumatic meniscal lesion may be a prognostic factor for the success of a partial meniscectomy. Yim et al. included only patients with a horizontal tear and found no difference in the outcome of APM in comparison to non-operative treatment [51]. In the Herrlin et al.’s study, the majority of patients who did not benefit from non-operative treatment in the cross over group had flap tears [20, 21]. This statement is in accordance with the 2016 ESSKA meniscus consensus [4]. However, the recommendations of the present paper are broader than the ESSKA meniscus consensus, because the literature did not focus on meniscus studies only. In one study, the removal of chondral flaps had a positive effect on outcome [22] and in one other RCT AT was beneficial for patients with crystal arthropathy [23].

This is a systematic review and flaws of studied RCTs are also flaws of this paper. The quality assessment with the Jadad score the Coleman methodology score shows also heterogeneous results for the 14 trials which were included in this review (Table 4).

The Jadad score was developed for quality assessment of RCTs and this score focuses on aspects as randomization and blinding [17]. Three studies received a maximum score of five points [23, 39, 48]. In all three studies, the control group was sham surgery (placebo). The lack of difference between arthroscopy and placebo suggests that the improvement is not only due to any intrinsic efficacy of the procedures [39]. However, the use of a placebo group has also disadvantages because blinding prevents a change from the control group to the treatment group (cross over). The studies by Katz et al. and Herrlin et al. have shown that a crossover analysis can be helpful in identifying subgroups of patients who benefit from the procedure [20, 21, 24].

The Coleman methodology score was developed for the assessment of orthopedic studies. This score covers additional aspects such as number of patients, follow-up, diagnostic certainty, description of the surgical procedure, description of rehabilitation, outcome criteria, and patient selection. With this score, the studies of Katz et al. [24], Kirckley et al. [29], Moseley et al. [39] and Shivonen et al. [46,47,48,49] received the best results.

Other limitations include that most authors give no information about the rate of subchondral edema or varus malalignment. Both factors are predictors for a poorer outcome after arthroscopic surgery.

It is also remarkable that only few studies reported the consumption of pain killers or NSAIDs during the treatment and follow-up period. Good results in the control groups could be the result of a higher NSAID use. The well-known adverse effects of an extensive NSAID use are gastrointestinal bleeding or ulcer [45].

Other flaws that where identified by the reviewers are a selection bias or the use of non-specific scores. Selection bias is a typical limitation of a randomized controlled trial. In the METEOR study, for example, only 26% of eligible patients could be included. That means that no follow-up of those patients preferring not to enter the study was done [24]. Selection bias is assumed when the recruitment rate is below 80% [9]. Therefore, the findings of the studies with a recruitment rate below 80% should only be generalized cautiously [42]. In contrast to the METEOR trial, the Gauffin et al. study has the participation rate of 84% [15].

It is also of concern that four studies used the Lysholm score as outcome measurement. The Lysholm score was originally developed for the assessment of patients with ligamentous instability. To our knowledge, this score is not validated for Finish and Korean. Briggs et al. have shown that there were unacceptable ceiling effects (> 30%) for the Lysholm domains of limp, instability, support and locking [5]. Hence, this score might not be the first choice for the evaluation of outcome after APM. Even the KOOS has floor effects when used for meniscus issues [16]. In this study, the IKDC subjective score showed the best performance on all measurement properties. Unfortunately, the IKDC subjective score was not used in any of the RCTs about APM.

All studies were initially designed to determine the difference between arthroscopy and control treatment for knee OA, but later claimed that the two or three interventions were equivalent. Nevertheless, in many orthopedic studies, improvements have progressed without simultaneously addressing the significant ceiling effect common to many patient-related clinical outcome measures. Alternative statistical strategies such as equivalence or non-inferiority clinical trial designs are needed to circumvent this ceiling-effect problem [33].

Future randomized trials examining surgical procedures should make more effort to describe and standardize the surgical technique. Important surgical details such as the use of tourniquet, the experience of the surgeon, the portals and the use of photos or videos for documentation were only described in few studies. A surgical treatment as variable in a clinical trial is more complex than a pharmacological treatment where all patients of one group receive the same pill [18, 27, 28, 42]. Under this aspect, it is also of concern that the surgical process quality was controlled in none of the studies [42]. If the documentation had been given more attention, meniscal “root tears” should have been discovered and described in any of the studies. This is of concern because several studies have shown that the biomechanical effect of a root tear is comparable to a total meniscectomy [1, 13, 40, 41]. The root injury leads to meniscus extrusion and loss of circular hoop tension [40, 41]. Meniscus extrusion was stated in only one study. In this study, the rate of meniscus extrusion was 50% in the control group and 65% in the arthroscopy group [30]. Kijowski et al. [27] reported poorer clinical outcome when APM was associated with root tears and greater severity of meniscal extrusion.

All these limitations suggest that the results of the RCTs which were included in this systematic review should be interpreted with care and larger randomized trials without the described methodological flaws are needed to make a definite conclusion regarding the value of AT for knee OA in its various stages. Evidence based medicine (EBM) originated primarily in internal medicine. Adapting EBM better to the specifics of clinical orthopedic research should be taken in mind by the orthopedic community.

Conclusion

Despite all limitations, this systematic review shows that the majority of patients with knee OA might not benefit from arthroscopic surgery. Therefore, the indication for this procedure should be given with care. However, this review has also shown that there are subgroups of patients with knee OA who might benefit from AT. Patients who belong to one of these subgroups are people with non-traumatic flap tears of the medial meniscus. Furthermore, there is very low quality evidence that the removal of chondral flaps has a positive effect.