Introduction

Osteoarthritis (OA) is the most frequent musculoskeletal disorder and its prevalence will likely increase in the next 20 years due to the growing rates of obesity and longevity [1]. Osteoarthritis of the knee, the most common localization of OA, is the major cause of disability worldwide [2, 3]. Chronic pain and functional impairment with difficulty to perform daily living activities can cause a marked reduction in Quality of Life (QoL) [4]. Patients with knee OA can lose about 1.9 quality-adjusted-life years (QALY), like patients with metastatic breast cancer or cardiovascular diseases [5].

Current treatment of OA includes non-pharmacological (education, exercise, diet if necessary, and lifestyle changes) and pharmacological treatments, such as acetaminophen, nonsteroidal anti-inflammatory drugs (NSAIDs), selective COX-2 inhibitors, opioids, duloxetine and topical drugs [6, 7].

Balneotherapy (BT) is one of the most common non-pharmacological treatment for OA patients, with a beneficial effect on pain, stiffness, function and a favorable economic profile [8,9,10,11]. Balneotherapy indicates the use of thermal mineral waters, mud/peloid packs, natural gases (CO2, iodine, sulfur, radon, etc.) or hay baths for preventive, therapeutic, and rehabilitative purposes. Balneotherapy is often confused with Hydrotherapy (HT) or spa therapy. However, HT is the use of normal tap water for therapeutic purposes, while spa therapy indicates a complex intervention at a spa resort employing a number of different treatment modalities, including HT and BT, often combined with massage, exercise, physical therapy or rehabilitation [12, 13].

Previous systematic reviews and meta-analyses mostly studied the efficacy of BT or spa therapy on pain and function in patients with knee OA [8,9,10,11]. In a systematic review by Harzy et al. data from nine (9) trials involving 493 patients were qualitatively assessed and results supported the efficacy of BT with thermal mineral water in improving pain and functional capacity of patients with knee OA for 24 weeks without any serious adverse event associated with balneological interventions [8]. In two other reviews on the topic, health benefits of BT with thermal mineral water and mud packs were reported to last even longer, up to 9 months after treatment, highlighting its usefulness in the treatment of a chronic condition like knee OA [9, 14]. An interesting meta-analysis of eight (8) trials about the effects of BT on symptomatology and function of patients with knee OA was performed by Matsumoto et al., underscoring a significant improvement in WOMAC pain, stiffness and physical function scores among patients treated with BT over controls [10]. Although these results highlight the importance of balneological interventions in improving the health status of patients with knee OA, the actual impact of BT and spa therapy on QoL has been less investigated.

This systematic review and meta-analysis aims to explore the possible effect of different BT treatments (thermal mineral baths and/or mud/peloid packs, or hay baths) and spa therapy on QoL in patients with knee OA. To better interpret results of meta-analyses about the impact of BT and spa therapy on QoL, variations of drug consumption within included trials are also evaluated. To our knowledge, this is the first systematic review and meta-analysis investigating the impact of BT and spa therapy on QoL of patients with knee OA.

Methods

The PRISMA statement was followed for this systematic review and meta-analysis [15]. Since there is no definite consensus about specific differences between QoL and Health-Related Quality of Life (HQoL), only the broader term QoL is used in the present work.

Trial selection criteria

Only randomized controlled clinical trials (RCTs) involving patients with knee OA diagnosed according to the American College of Rheumatology (ACR) criteria were included [16]. There were no restrictions in terms of the type of clinical setting.

Trials with unspecified randomization and allocation concealment of study participants or with a non-random component in the sequence generation process were considered eligible to minimize publication bias and maximize retrievable evidence about the topic. These aspects were further considered for selection bias assessment.

Only studies in which intervention comprised thermal mineral water immersion, hay baths or mud/peloid pack applications were included. Trials were excluded when normal tap water was used to treat participants or when no clear information about water and mud/peloid composition or their source was provided.

All eligible trials were included regardless of the type of intervention administered to the comparison group.

Studies were included only if QoL was assessed with at least one of the following validated scales: the Arthritis Impact Measurement Scales (AIMS/AIMS2), the Disease Repercussion Profile (DRP), the EuroQoL (EQ-5D), the Nottingham Health Profile (NHP), the Patient Generated Index (PGI), the Quality of Well-Being Scale (QWB), the RAQoL, the Short Form-36 (SF-36), the Sickness Impact Profile (SIP), the SIP-RA, the World Health Organization’s Quality of Life Instruments (WHOQoL, WHOQoL-100, WHOQoL-Bref) [17], or the Health Assessment Questionnaire (HAQ) [18].

WOMAC (Western Ontario and McMaster Universities Osteoarthritis Index) and LAI (Lequesne’s Algofunctional Index for Knee) indexes were purposely excluded from the systematic review and meta-analysis to avoid pooling together data from algofunctional indexes and data from QoL scales, which specifically describe the impact of the disease on the patient’s QoL. However, data about WOMAC and LAI indexes from included trials were extracted and qualitatively analyzed (Supplementary Material A).

Studies were excluded from quantitative synthesis when essential data were missing or unclear; when no data were provided for selected follow-up period (2–3 months after treatment); when it was not possible to pool extracted data, or when comparison group was different from standard therapy or sham therapy. Observational studies were excluded. Manuscripts written in languages different from English were excluded.

Trial sources

Medline via PubMed, Scopus, Web of Science, Cochrane Library, and PEDro (Physiotherapy Evidence Database) were searched by two investigators independently (M.A., D.D.) for articles about trials involving patients with knee OA and studying the effects of thermal mineral baths, mud/peloid therapy, hay baths and spa therapy on participants’ QoL. All mentioned databases were screened up to December 2017.

Search

The following search strategies were used:

  • PubMed/Medline: “((knee[Title/Abstract] AND (osteoarthr* [Title/Abstract] OR arthritis[Title/Abstract] OR arthrosis[Title/Abstract] OR arthropathy[Title/Abstract])) OR gonarthr*[Title/Abstract]) AND (balneotherapy[Title/Abstract] OR hydrotherapy[Title/Abstract] OR fangotherapy[Title/Abstract] OR spa[Title/Abstract] OR mud[Title/abstract] OR peloid[Title/Abstract] OR thermal water[Title/Abstract] OR hay bath*[Title/Abstract])”.

  • Scopus: “TITLE-ABS-KEY ((((knee AND (osteoarthr* OR arthritis OR arthrosis OR arthropathy)) OR gonarthr*) AND (spa OR hydrotherapy OR balneotherapy OR mud OR fangotherapy OR peloid OR “thermal water” OR “hay bath*”))) AND (LIMIT-TO (LANGUAGE, “English”)) AND (LIMIT-TO (DOCTYPE, “ar”))”.

  • Web of Science: “TOPIC: (((knee AND (osteoarthr* OR arthritis OR arthrosis OR arthropathy)) OR gonarthr*) AND (spa OR hydrotherapy OR balneotherapy OR mud OR fagotherapy OR peloid OR “thermal water” OR “hay bath*”)). Refined by: DOCUMENT TYPES: (ARTICLE) AND LANGUAGES: (ENGLISH)”.

  • Cochrane library: “knee osteoarthritis” AND “balneotherapy” in Title, Abstract, Keywords. In addition, the following terms were searched to retrieve all possible relevant articles: “balneotherapy”, “spa hydrotherapy”, “mud therapy”, “peloid therapy”, “fangotherapy”, “hay bath*”. Results of this additional search were pre-screened to assess their relevance and make sure they included a measurement of quality of life in patients with knee OA.

  • PEDro (Physiotherapy Evidence Database): “Advance search with Therapy: hydrotherapy, balneotherapy; Body part: lower leg or knee; Method: clinical trial”.

Study selection and data collection process

Details about the selection process of studies eligible for this review were summarized in a flowchart (Fig. 1). Results were screened and selected by two investigators independently (M.A., D.D.). In case of disagreement, items were discussed until consensus was reached. Extracted data are summarized in tables, grouped according to the type of control, as shown in Tables 1 and 2.

Fig. 1
figure 1

PRISMA flow diagram of the systematic review and meta-analysis (Liberati et al. [15]). 2Gasparyan et al. [19]

Table 1 Characteristics of included studies
Table 2 Significant variations in drug consumption and characteristics of standard treatment (ST) administered to patients of studies included in the systematic review

When data were only reported graphically, they were extracted from graphs with a plot digitizer. Data about QoL of one study [26] were collected from another article reporting a detailed analysis of data from the same trial [11].

When sample means and standard deviations were not available, they were estimated from sample medians and interquartile ranges with validated statistical tools [37, 38].

Data items

Collected data from included articles were the following ones: number of study participants, instruments used to measure QoL, number of therapy sessions, characteristics of intervention, report of significant (p < 0.05) differences in QoL items after treatment, namely, pre–post significant variations in intervention and comparison groups, significant differences between intervention and comparison groups after treatment, characteristics of standard treatment administered to all patients, use of a diary recording type and quantity of consumed drugs, other drugs taken by patients, relevant patients’ comorbidities at baseline, significant variations in drug consumption during study and follow-up periods.

Risk of bias in individual studies

The risk of bias for each included study was independently assessed by two investigators (M.A., D.D.) following the criteria of the Cochrane risk-of-bias tool for trials. Disagreements were discussed with the third investigator (A.F.) until consensus was reached. Performance bias was not considered a key domain, because thermal mineral baths, mud/peloid packs, hay baths and spa therapy require the patients’ participation and direct involvement. Detection bias was considered low when questionnaires were delivered by a blind researcher and unclear when self-completed by patients or when the method of administration was not indicated. Studies were considered at high risk of bias when there was a high risk of bias in at least one key domain or unclear risk of bias in at least two key domains. Studies were considered at unclear risk of bias if only one key domain had an unclear risk of bias. If all the key domains had a low risk of bias, the risk of bias of the entire study was reported to be low too.

Summary measures

Standardized mean difference (Hedges’ g) was used as a measure of effect size in the quantitative synthesis.

Synthesis of results

Results are summarized in Tables 1 and 2 and discussed to obtain a qualitative synthesis.

The software used to perform the meta-analyses was “Review Manager” (RevMan, version 5.3).

Included studies were heterogeneous in terms of comparison group types. Pre–post effect size analysis was excluded due to possible biased outcomes [39]. On the basis of available data and considering methodological issues and the aim of this work, it was decided to perform three meta-analyses. The first meta-analysis (Fig. 2) summarized the effects of thermal mineral baths, mud/peloid therapy and spa therapy in adjunct to standard treatment compared to standard treatment only on overall QoL on a long-term (measured at a 2–3-month follow-up visit). The second and third meta-analyses (Figs. 3, 4) summarized the effects of thermal mineral baths, mud/peloid therapy, and spa therapy compared to sham therapy (tap water immersion or hot packs) on pain as a QoL item and on social function on a long-term (measured at a 2–3-month follow-up visit), respectively. These two items were selected, since they represent a physical and a psychosocial component of individual QoL. This decision was also made, because the studies that used sham therapy in the comparison group were heterogeneous in terms of adopted QoL scales; therefore, to achieve homogeneity among extracted data, only comparable items of the various adopted QoL scales were considered (applying linear transformations when required).

Fig. 2
figure 2

Forest plot of the first meta-analysis pooling data from studies investigating the effects of balneological interventions in adjunct to standard treatment compared to standard treatment alone on overall QoL on a long-term (measured at a 2–3-month follow-up visit). Caption: three subgroups are described, each of them characterized by a different type of balneological intervention (balneotherapy, phytobalneotherapy, spa therapy with mud/peloid packs). In each subgroup, studies are listed according to the first author’s surname. Means and standard deviations are reported in columns and a random-effect model was adopted to better estimate subgroup and overall size effects

Fig. 3
figure 3

Forest plot of the second meta-analysis pooling data from studies investigating the effects of balneological interventions with thermal mineral water or mud/peloid packs compared to sham therapy (tap water immersion or hot packs) on pain as a QoL item on a long-term (measured at a 2–3-month follow-up visit). Caption: two subgroups are described, each of them characterized by a different type of balneological intervention. In each subgroup, studies are listed according to the first author’s surname. Means and standard deviations are reported in columns and a random-effect model was adopted to better estimate subgroup and overall size effects

Fig. 4
figure 4

Forest plot of the third meta-analysis pooling data from studies investigating the effects of balneological interventions with thermal mineral water or mud/peloid packs compared to sham therapy (tap water immersion or hot packs) on social function as a QoL item on a long term (measured at a 2–3-month follow-up visit). Caption: two subgroups are described, each of them characterized by a different type of balneological intervention. In each subgroup, studies are listed according to the first author’s surname. Means and standard deviations are reported in columns and a random-effect model was adopted to better estimate subgroup and overall size effects

Studies included in the first meta-analysis adopted scales like HAQ, AIMS, EQ-5D, and SF-36, while studies included in the other meta-analyses employed item-specific scales such as NHP and SF-36. Evidence shows that a strong correlation exists among the above-mentioned QoL scales [40,41,42,43,44], so it can be assumed that, when one of them variates, the others report a proportional linear change. However, high heterogeneity among scales was present, since some scales associate improvement of QoL to positive variations with high values indicating good QoL, while others to negative variations with high values indicating poor QoL. Moreover, specific scale elements which assess social QoL are measured with an item which oppositely changes in response to the same variation. For example, in the SF-36 an improvement of social QoL is reported as an improvement of social function, while in the NHP it is described as a reduction of social isolation. To overcome this heterogeneity and allow a possible comparison, in the first meta-analysis (Fig. 2) sample data were normalized to a 0–3 range with high numbers indicating poor QoL, as it happens for the HAQ. In the other two meta-analyses (Figs. 3, 4), sample data were normalized to a 0–100 range with high numbers indicating poor QoL, as it happens for the NHP. Social function was adopted as an item to measure social QoL so that the higher is social QoL the better is social function and the lower the score. Therefore, data from different scales were normalized with appropriate scale factors to obtain comparable means, and standard deviations were converted accordingly with validated methods [45]. After that, means were linearly shifted to align them with no additional changes for standard deviations, since linear shifts imply no modifications of measures of dispersion and variability.

Considering high heterogeneity of included studies, a random-effect model was adopted to better estimate subgroup and overall size effects. I2 was used as a measure of consistency.

Risk of bias across studies

When possible, funnel plots were used to visually assess potential publication bias (Supplementary Material B).

Additional analyses

In each meta-analysis, data were pooled in subgroups according to intervention type and an overall effect size was calculated.

Results

After full-text assessment, seventeen (17) articles were considered eligible for qualitative synthesis. Main data of included studies are summarized in Table 1. Efficacy of thermal mineral baths, mud/peloid packs and spa therapy to significantly influence the patients’ QoL was compared with the efficacy of standard treatment [20,21,22,23,24,25,26,27,28], HT (tap water immersion) [29,30,31], and sham mud/peloid therapy (hot packs) [32,33,34,35]. Standard treatment was defined as a combination of pharmacological and physical interventions mainly characterized by a drug-based approach implying the use of nonsteroidal anti-inflammatory drugs (NSAIDs), other analgesics and/or chondroprotective agents. One study compared spa therapy (including mud/peloid packs) administered consecutively with the same regimen administered intermittently [36]. Fourteen (14) studies showed significant changes in at least one QoL item after the treatment in the intervention group [20, 22, 25,26,27,28,29,30,31,32,33,34,35,36]. In two studies variations were not statistically significant [21, 23] and in one study this information was not reported [24].

For some of included studies which displayed significant improvements in QoL [20, 22, 25,26,27,28,29,30,31,32,33,34,35,36], it was possible to extract data about single QoL items, suggesting that scores describing both physical (such as pain and mobility) and psychosocial (such as vitality, social function or emotional role) aspects of QoL ameliorated in the intervention group (further details are reported in Table 1). In two studies even sleep, evaluated as a QoL item, was reported to improve after treatment in balneological intervention groups [31, 32].

Although WOMAC and LAI indexes were not considered for the systematic review and meta-analysis, in Supplementary Material A significant improvements and variations in WOMAC and LAI indexes in patients studied in included trials are reported.

Table 2 summarizes data about significant variations in drug consumption during the study and the follow-up periods, and characteristics of standard treatment administered to patients of included trials. Ten (10) studies [20,21,22, 24,25,26,27,28, 32, 35] reported data about variations in drug consumption, and eight (8) [20,21,22, 24,25,26,27,28] of them reported significant reductions (p < 0.05 or p < 0.01 or p < 0.001) of drug consumption in the intervention group compared to control group. Eight (8) studies [20,21,22, 24,25,26,27, 35] reported significant reductions (p < 0.05 or p < 0.01 or p < 0.001) of drug consumption within intervention group compared to baseline during the course of the study. Table 2 also describes characteristics of standard treatment administered to patients, reports if during the study it was kept a diary to record type and quantity of used drugs, and indicates the number of patients taking other medicines or with relevant comorbidities at baseline.

The overall risk of bias of all included trials is reported in Table 1, whereas further details are provided in Table 3. Overall risk of bias was described as low for two studies [20, 21], unclear for six studies [22, 25, 26, 29, 35, 36], high for nine studies [20, 23, 24, 27, 30,31,32,33,34].

Table 3 Risk of bias of included studies

Ten (10) studies were included in quantitative syntheses [20, 21, 23, 25,26,27,28, 31, 33, 34].

Seven (7) studies were included in the first meta-analysis (Fig. 2) whose results (overall effect size of − 1.03 with 95% CI − 1.66 to − 0.40) showed that balneological interventions in adjunct to standard treatment are significantly more effective than standard treatment alone in improving QoL in patients with knee OA [20, 21, 23, 25,26,27,28].

Three (3) studies were included in the second and third meta-analyses (Figs. 3, 4), in which effects on pain and social function of real balneological interventions (thermal mineral baths or mud/peloid packs) were compared with effects of sham interventions (tap water immersion or hot packs) [31, 33, 34]. Results favored real balneological interventions when considering improvement of pain as a QoL item (overall effect size of − 0.38 with 95% CI − 0.74 to − 0.02), while for social function no evidence in favor of one intervention over the other was reported (overall effect size of − 0.16 with 95% CI − 0.52 to 0.19).

High heterogeneity was reported in the first meta-analysis (Fig. 2) with I2 = 93%, while a value of I2 = 0% was reported for the second and third meta-analyses (Figs. 3, 4). Asymmetry in the funnel plot (Supplementary Material B), feasible only with studies included in the first meta-analysis, suggested a potential risk of publication bias with a possible over-representation of trials with low precision and with results in favor of BT and spa therapy.

Discussion

The impact of knee OA on individual QoL is comparable to that one experienced by patients with other chronic conditions such as cardiovascular or pulmonary diseases [3, 4]. BT represents a common complementary therapy in the management of OA with beneficial effects on pain, stiffness, and function, as reported by recent systematic reviews [10, 14]. Evidence from observational studies indicates that BT and spa therapy may be also effective in improving QoL of patients with knee OA [46,47,48].

Table 1 shows that overall evidence from included trials favors BT and spa therapy when considering significant changes in QoL in a pre–post treatment assessment. Three studies reported no information [24] or no significant changes in the intervention group after treatment [21, 23]. However, significant differences were described in two of them when comparing intervention group with comparison group after treatment [21, 24]. In one study [23], significant changes in QoL were reported neither in the intervention group after treatment, nor when intervention group was compared with control group after treatment. Despite that this study is the included trial with the highest number of participants, and is an important reference in the field, it reported a considerable attrition rate and employed a combination of hot mineral baths, mud/peloid packs, hydro-massage, manual therapy, and exercise, so it didn’t allow to precisely estimate to what extent each single intervention interfered with the others when considering QoL outcomes.

When investigating the effects of balneological interventions on knee OA, it is important to consider pharmacological treatments taken by patients and whether studied interventions can determine significant variations in drug consumption. Data from Table 2 show that, although from a qualitative point of view, balneological interventions can reduce drug consumption and this is important, especially considering the toxicity of NSAIDs, as well as their cost [20]. This aspect needs to be taken into account when interpreting results of meta-analyses on QoL.

Another aspect to consider is that, among included trials in which WOMAC and/or LAI indexes were assessed, significant improvements in these algofunctional scales were reported after intervention in patients treated with BT or spa therapy underscoring the importance of these balneological interventions in directly influencing also the patient’s health and functional status.

Results of quantitative analysis favored balneological interventions in adjunct to standard treatment compared to standard treatment alone when evaluating effects on a long-term overall QoL of patients with knee OA. Findings also show that real balneological interventions (hot mineral baths or mud/peloid packs) were significantly better than sham balneological interventions (tap water immersion or hot packs) in improving pain as a QoL item. This difference was probably due to specific (hydromineral and crenotherapeutic) mechanisms of action of thermal mineral waters and therapeutic muds/peloids with their chemo-physical properties, which can modulate endocrinological changes responsible for inflammation and pain reduction [25]. On the other hand, social function did not appear to be influenced by the treatment per se, maybe because, regardless of the type of intervention (real or sham), patients were asked for a short period to regularly go to a spa center, where they could relax, socialize with other people and where they were carefully assessed by dedicated physicians. Both real and sham balneological interventions may enhance self-perception of well-being and social function, possibly thanks to common placebo effects, which are mainly the rituality itself of an intervention, the treatment context and the patient–clinician relationship [49, 50]. Moreover, detailed medical interviews are reported to be beneficial for patients even in terms of psychosocial QoL, compliance, and satisfaction [51], and consist, in fact, of a true intervention in terms of rituality.

Limitations

Included studies were heterogeneous in terms of demographic and clinical characteristics of patients (age, sex, disease severity and duration, obesity and other comorbidities), comparison group type, association of balneological interventions with non-balneological treatments (exercises, massage, physical therapy, etc.), and scales used to measure the participants’ QoL, as shown in Table 1. To pool data from different trials, it was, therefore, necessary to use approximations which were considered acceptable because of scarce data about the topic and because of the nature of analyzed outcomes. Trials were conducted in different countries and, even though questionnaires were translated in the participants’ native language, social and cultural differences may have played a role in determining the patients’ perceived and reported QoL. Moreover, the the overall risk of bias of included studies tended to be high. Although potential publication bias has to be considered, it is to acknowledge that scarce evidence exists about the topic and most studies involved a limited number of participants. Funnel plot test has been performed only for the first meta-analysis with a small number of included studies. Furthermore, funnel plots have to be interpreted with caution, because even if they are widely used, it is not completely clear whether they are accurate tools to predict publication bias [52].

Conclusions

In conclusion, even though limitations of included studies must be taken into account, evidence shows that BT and spa therapy can significantly improve QoL of patients with knee OA. Moreover, BT and spa therapy may have a role in the reduction of drug consumption and improvement of algofunctional indexes among patients with knee OA. More trials with a higher number of patients and different types of balneological interventions are needed to better determine their effects on QoL. Furthermore, it is relevant to evaluate to what extent different chemical and physical properties of waters and muds/peloids may influence results. Further investigation is needed to better understand which QoL items are more likely to ameliorate in response to balneological treatments.