Introduction

Spa therapy is one of the most commonly used nonpharmacological approaches for rheumatological diseases in many European and Middle Eastern countries, as well as in Japan and Israel, in classical medicine as a cure for various illnesses. However, despite the long history and popularity of spa therapy, its role in modern medicine is still not clear (Tenti et al. 2015). Double-blind placebo controls are challenging in the field of medical hydrology. These controls, or patients receiving placebo, are difficult to establish because of the features of the therapies to be tested (e.g., mineral water baths, or mineral mud-packs).

In the last 5 years, many RCTs have addressed the efficacy of balneotherapy mainly in rheumatic diseases and general pain management (Bazzichi et al. 2013; Branco et al. 2016; Ciani et al. 2017; Ciprian et al., 2013; Costantino et al. 2012; Cozzi et al. 2015; Espejo-Antunez et al. 2013b; Fazaa et al. 2014; Fioravanti et al. 2012, 2014, 2015a, b; Franke and Franke 2013; Gremeaux et al. 2013; Gungen et al. 2016; Horvath et al. 2012; Karagülle et al. 2017; Kovács et al. 2016; Kulisch et al. 2014; Metin Ökmen et al. 2017; Özkuk et al. 2016; Pascarelli et al. 2016; Sarsan et al. 2012; Tefner et al. 2012). These studies have sought to elucidate the mechanisms of action of mineral waters. Promising data so far include increased plasma endorphin or cortisol levels, adrenal axis activation, and reduced plasma levels of some inflammatory mediators (IL, PGs, and TNF). Notwithstanding, the mechanisms explaining the beneficial effects of mineral or thermal waters or thermo-mineral mud observed in patients with rheumatic disease remain largely unresolved. It has been proposed that numerous factors such as mechanical, thermal, and chemical effects play a combined role (Bender et al. 2005; Fioravanti et al. 2011, 2017).

Spa therapy normally includes many different things in the tested therapy which may also complicate research protocols, making comparisons even more difficult. Basic components of health resort interventions are balneotherapy and climatotherapy. In many countries, treatments involving natural mineral waters, gases, and/or peloids are referred to as balneotherapy (crenobalneotherapy and spa therapy). Routes of application include steams for bathing, drinking, inhalations, etc. Hydrotherapy is the use of tap water for therapy, and climatotherapy is the use of climate factors for therapy. As part of a treatment regimen, non-thermal water techniques such as pool water-jets, exercises, mobilization, or massages may be added. Physical effects have been attributed to heat and massages along with beneficial effects of a relaxing less stressful lifestyle away from home (daily care, health education, meeting new people, etc.). There are numerous factors with presumed additive impacts that comprise a complex therapeutic intervention (Gutenbrunner et al. 2010; Gomes et al. 2013; Gomes 2017).

Although such complex factors have delayed the move of medical hydrology to evidence-based medicine (EBM), there have been some studies of acceptable quality for the present review. These are double-blind RCTs in which neither patients nor physicians were aware of the type of treatment a patient received.

The objective of this review is to summarize whether the mineral elements and other chemical compounds of mineral waters or muds/peloids of spa therapy have clinical effects; we reviewed double-blind randomized, controlled trials that assessed the efficacy of these chemical components for rheumatologic diseases compared with tap water and/or “non-mineral” mud/peloid treatments in adults undergoing spa therapy.

Methods

Search strategy

We conducted a literature search concerning clinical studies about spa therapy in September 2016, and in particular, we examined the period between “any year” and September 2016. Medline was searched using the terms “spa (therapy)” OR “balneotherapy” OR “mud” OR “peloid” OR “mud pack (therapy)” in combination with/ AND “randomized double blind.” For each article retrieved using our search terms, we looked for additional articles by using the related article link on Medline, reviewing Medline articles by the same authors as the retrieved article and reviewing the reference list of the retrieved article. We also search on previous systematic reviews (Espejo-Antunez et al. 2013a; Falagas et al. 2009; Forestier and Françon 2008; Forestier et al. 2016; Fortunati et al. 2016; Fraioli et al. 2013; Kardeş et al., 2017; Naumann and Sadaghiani 2014; Pittler et al. 2006; Santos et al. 2016; Tenti et al. 2015; Verhagen et al. 2015; Xiang et al. 2016).

Selection of articles

RCTs published in English language journals: trials that compared mineral baths and or mud, to hydrotherapy or to non-mineral similar treatments associated or not with other treatments (massage, shower, exercises, etc…). We excluded RCTs comparing mineral baths and or mud to no treatment or other type of treatment (like pharmacological treatments). We developed an independent data extraction sheet. All the studies were reviewed by the same reader. We did not contact other authors for further information apart from the trials.

Assessment of the methodological quality of retrieved articles

For the quality assessment of the studies that were included in the review after the preliminary selection, information was extracted from each included trial on the following: number of patients included in the studies and characteristics of trial participants (including age, stage and severity of disease, and method of diagnosis), and the trial’s inclusion and exclusion criteria; type of intervention (mineral baths and or mud versus hydrotherapy or to non-mineral similar treatments associated or not with other treatments); type of outcome measure and more frequent study bias, taking in account the non-excludable ones concerning studies of nonpharmacological trials. At present, there is no universally accepted checklist for evaluating the methodological quality of nonpharmacological trials (Forestier and Françon 2008).

We used a checklist specifically designed to evaluate the internal validity of non-pharmacological trials in 10-item checklist to evaluate a report of a nonpharmacological trial (CLEAR NPT). These items were selected using the Delphi method to develop a consensus among 55 experts (Boutron et al. 2005) (Table 1).

Table 1 Final checklist of items to assess quality of randomized controlled trials of non-pharmacological treatment (Boutron et al. 2005)

Results

The final selection comprised 27 trials, 20 related to rheumatology (Abu Shakra et al. 2014; Bálint et al. 2007; Bender et al. 2007; Codish et al. 2005; Elkayam et al. 1991; Flusser et al. 2002; Franke et al. 2000, 2007; Güngen et al. 2012; Kovács and Bender 2002; Kovács et al. 2012; Kulisch et al. 2009; Mahboob et al. 2009; Odabasi et al. 2008; Sukenik et al. 1992; Szucs et al. 1989; Tefner et al. 2013; Wigler et al. 1995; Winklmayr et al. 2015; Yurtkuran et al. 2006) and 7 to other medical fields: 3 on respiratory tract (Staffieri et al. 2008; Ottaviano et al. 2011, 2012), 3 on dermatology (Borroni et al. 2013; Wong et al. 2013; Hon et al. 2016), and 1 gynecology (Zambó et al. 2008).

We analyzed only rheumatology according to our topic. Rheumatology is the first indication for medical spas all over the world (Fig. 1).

Fig. 1
figure 1

Flow chart of articles

The studies had been conducted in 12 different health spas (7 in Hungary, 1 in Israel, 1 in Turkey, 1 in Iran, 1 in Germany, and 1 in Austria) and 8 in Rehabilitation Center and others. Table 2 shows the authors, diagnose, participants, treatment and control characteristics, locations, and mineral contents.

Table 2 Studies on spa therapy included in the review, diagnose, participants, treatment and control characteristics, locations, and mineral contents

In the 20 trials, 10 included patients with knee osteoarthritis (KOA), 4 of them tested mineral water bath (Szucs et al. 1989; Kovacs et al. 2002; Yurtkuran et al. 2006; Bálint et al. 2007), 5 of them tested mud compresses/pack/gel (Flusser et al. 2002; Odabasi et al. 2008; Mahboob et al. 2009; Güngen et al. 2012; Tefner et al. 2013) and 1 of them tested mineral water bath plus mud packs (Wigler et al. 1995); 1 trial with hand osteoarthritis (HOA), and tested mineral water baths (Kóvacs et al. 2012); 3 trials with low back pain (LBP), 2 of them tested mineral water baths (Bender et al. 2007; Kulisch et al. 2009), and 1 tested mud compresses (Abu-Shakra et al. 2014); 5 with rheumatoid arthritis (RA), 3 of them tested mineral water baths (Elkayam et al. 1991; Franke et al. 2000, 2007) and 2 tested mud packs/compresses (Sukenik et al. 1992; Codish et al. 2005) and 1 with osteoporosis (OP), that tested mineral water baths (Winklmayr et al. 2015). In Elkayam et al. (1991) study, we consider only the randomized population (patients with RA).

We have reviewed each report to identify the criteria used for assignment to experimental vs control groups: a total of 1118 subjects were initially enrolled in the 20 studies: 552 of KOA, 47 of HOA, 147 of LBP, 308 of RA, and 64 of OP; 293 of these participants were assigned to the experimental groups of KOA, 24 in HOA, 82 of LBP, 152 with RA, and 32 with OP. They were treated with thermo-mineral water baths and/or mud (with or without other forms of treatment, like physical therapy, exercise…). The rest were allocated to the control groups; they received mainly tap water and/or non-mineral mud treatments.

Other reviewed data were as follows: enrolment, characteristics of the treatments received, assessment methods and scales, statistical tests used for the analysis of the results, conclusions of the investigators. There is a huge heterogeneity of RCTs in the method used, very different from study to study (inclusion and exclusion criteria, number of patients, endpoints, statistical test used…); this makes formal conclusions or inappropriate doing a meta-analysis difficult (also evaluation criteria, obvious for different diseases but even happen in same pathology (KOA), with so different evaluation criteria: pain, Lequesne index, Wester Ontario and McMaster Universities Osteoarthrities index (WOMAC index)…, and follow up duration (1 day to 1 year). Instead, each study has been critically analyzed and compared to others.

Evaluation of study methodologies

Internal validity

Table 3 shows the internal validity of the selected studies upon the CLEAR NTP criteria.

Table 3 Evaluation of the internal validity of the selected articles based on Boutron et al. (2005)

In the 20 trials, 4 have high internal validity: 2 get 10/10 (Franke et al. 2000; Winklmayr et al. 2015), 1 get 9/10 (Franke et al. 2007), 1 get 8/10 (Kovács et al. 2012); 5 have medium internal validity: 3 get 7/10 (Bálint et al. 2007; Mahboob et al. 2009; Tefner et al. 2013), 2 get 6/10 (Kovacs et al. 2002; Kulisch et al. 2009); 11 low internal validity: 4 get 5/10 (Wigler et al. 1995; Flusser et al. 2002; Yurtkuran et al. 2006; Abu-Shakra et al. 2014), 7 get 4/10 (Szucs et al. 1989; Odabasi et al. 2008; Gungen et al. 2012; Bender et al. 2007; Elkayam et al. 1991; Sukenik et al. 1992; Codish et al. 2005). Randomization procedures were adequate in 9 trials, but just 6 achieve concealment of allocation. The interventions were described in detail in all of them, also because only few have other interventions a part from the tested ones (in contrast with majority of spa therapy published trials, which normally include so other treatments as physiotherapy, exercise but the control is no intervention or other physical medicine intervention). The level of experience of the therapist is not mentioned in the studies, but as most of them are in spas or rehabilitation centers, they were probably qualified. Note that some studies are done at home, self-made, or with assistant not specified. Adherence of the patients to the prescribed treatments is reported in four studies. Blinding of the patients and investigators was one of our inclusion criteria, in some studies is not directly specified. The blinding of the therapist is just mentioned in five studies. Only 6 of the 20 trials used the intention to treat approach for the statistical analysis.

Criteria used for study enrolment and treatment characteristics

In 12 of the 20 studies we reviewed, enrolment was based on the ACR criteria for diagnosis of KOA, HOA, or RA. Others used were as follows: 1 with EULAR criteria (Szucs et al. 1989) but ACR criteria for KOA were published by Altman et al. (1986), 1 with Derek et al. (1999) criteria (Kovács and Bender 2002), and in 2 is not mentioned (Flusser et al. 2002; Wigler et al. 1995). In LBP, one of the three studies just mentions “chronic complaints of lumbar pain for at least 6 month” (Bender et al. 2007), “for more than a year” (Abu Shakra et al. 2014), and in the other one “more than 3 months” (Kulisch et al. 2009). Patients in the experimental groups of 10 studies were treated with hot mineral water baths (Bálint et al. 2007; Bender et al. 2007; Franke et al. 2000, 2007; Kovács and Bender 2002; Kovács et al. 2012; Kulisch et al. 2009; Szucs et al. 1989; Winklmayr et al. 2015; Yurtkuran et al. 2006) while two received hot mineral baths plus mud pack therapy (Elkayam et al. 1991; Wigler et al. 1995); 7 received mud pack therapy alone (Flusser et al. 2002; Güngen et al. 2012; Odabasi et al. 2008; Sukenik et al. 1992; Tefner et al. 2013; Abu-Shakra et al., 2014); and 1 received mud gel alone (Mahboob et al. 2009).

The experimental group of Szucs et al. (1989) (KOA) received 15 baths in the space of 18 days; those studied by Kovács and Bender (2002) (KOA) same number of baths but in 15 days, same as Yurtkuran et al. (2006), Kóvacs et al. (2012) 15 baths in 3 weeks (HOA), same as Franke et al. (2000, 2007) in RA. Bálint et al. (2007) were also 15 baths but in 4 weeks (KOA). Bender et al. (2007) 10 baths in 2 weeks and Kulisch et al. (2009) 21 baths in 3 weeks for LBP.

Wigler et al. (1995) for KOA and Elkayam et al. (1991) for RA used thermal mineral water everyday plus mud packs every second day in the experimental group during 2 weeks.

Flusser et al. (2002) and Odabasi et al. (2008) for KOA used 15 mud packs in the space of 3 weeks, but Güngen et al. (2012) and Sukenik et al. (1992) were 12 mud packs in 2 weeks and Tefner et al. (2013) 10 mud packs treatment in 2 weeks; Abu-Shakra et al. (2014) for LBP and Codish et al. (2005) in RA used 15 application of mud compresses during 3 weeks for the treatment group and Mahboo et al. (2009) 30 days of daily applications of mud gel for KOA. The experimental group of Winklmayr et al. (2015) (OP) received 5 baths in the space of 7 days followed by a 6 weeks lasting off-site non-treatment interval, followed by a second on-site “brush-up” time of treatment of two more baths in 3 days.

Only six studies mentioned other treatments (a part of pharmacotherapy) in the experimental groups: home-based exercise program for KOA (Yurtkuran et al. 2006); regular exercise also in KOA (Bálint et al. 2007), 3 to 4 h mountain hiking tours in OP (Winklmayr et al. 2015) and specially described electrotherapy for LBP in the Kulisch et al. (2009) trial, and a complete rehabilitation program (exercise, physiotherapy, and others related with a spa experience) in the Franke et al. (2000, 2007) studies for RA.

The studies also differed in the ways they handled treatments already being used by patients at the time of enrolment. In one study, all such treatment was not allowed (Kovács and Bender 2002 KOA). The rest permitted the use of pain killers and NSAID (doses not mentioned, only mentioned the “usual ones” or “standard ones”) if the patient had taken them at least some time before starting the study (1 month to 6 months). Some of them specify that supplementing the drug regimen of subjects with new agents or introducing new treatments was avoided during the study period. In RA, new DMARD or steroids was not allowed in the last 3 to 6 months in the five studies (others like injections of steroids were on the exclusion criteria). In OP, participants should not have used hormone replacement therapy or any other therapy affecting the bone metabolism during the last 12 months before enrollment.

Timing of assessments of treatment efficacy

In all of the studies of the review, the patients either for experimental or control groups were clinically evaluated (and sometimes laboratory tested) before and after treatment; at least two assessments were made, one in the beginning and a second one after 10 to 30 days (minimum and maximum duration of the interventions). Reassessments are the difference: after 1st day, 6th day, 8th day of the treatment (middle), 1 and/or 2 month after, 3 months after, 15 or 20 weeks after, 6 or 7 months after, and even 1 year after the treatment.

Evaluation criteria: clinical variables and indexes of treatment efficacy

We reviewed 20 trials of rheumatology, focus on 5 different end-points: KOA, HOA, LBP, RA, and OP. Of course, pain has been the main outcome measured with visual scales, but we can find three different ones and we also see that pain has been evaluated in different circumstances: rest, after movement, night pain…, every trial with its own considerations. Tenderness has been evaluated in six trials and other clinical parameters as swelling, effusions or crepitus, deformity…, only in 4. The range of movement of joins has been evaluated in six studies (with different methods) and muscle strength only in 3 (always with a dynamometer).

Laboratory test had been included in the assessments of 9 studies of the 20 studied. The more used had been the erythrocyte sedimentation rate (ESR) and high sensitivity reactive C protein (hs-CRP) parameters, especially in the ones with longer re-assessments. The OP study has measurements of individual bone marker levels to visualize the balance of bone formation and bone reabsorption together with the rate of bone turnover.

We can find also some validated scales (WOMAC as a prominent one, used in 6 of the 11 studies… but other’s has been used as Lequesne Index, Owestry Index, Qualeffo-41, Ronald and Morris Questionnaire); in some of them, also, we find patient assessments with different short scales (visual or oral, rated from a minimum to a maximum) for impairments, disease severity or status and physician’s opinion with same scales. Table 4 shows all the variables and index used in the 20 studied trials.

Table 4 Clinical variables assessed as indexes of treatment efficacy

Statistical analysis

The methods used to analyze the results of treatment also varied a lot in the 20 studies, and global analysis of all the data was not possible. The assessment of treatment efficacy was done in two ways: (1) intragroup comparison of data collected before and after (and after) the treatments and (2) comparison of experimental group vs control group data collected with the same timing: 14 of the 20 RCTs have this statistical between group comparison which is the only one which makes it possible to conclude that a treatment is superior to its control. Most of the treatment groups (16 of the total 20) included only balneotherapy (baths and/or mud or derivates) but no other treatments as physiotherapy or exercise, therefore, pre- vs post-treatment comparisons are done within each of the two groups, precluding confirmation or not of the study hypothesis.

In the 20 studies reviewed, quantitative results were expressed as means ± standard deviations, facilitating comparisons of the results of the different trials. Other four studies reported some descriptive data in medians or ranges.

The statistical tests were usually appropriate for the distribution of the data, but also differ widely. Five of the studies used the Mann-Whitney U test (in some cases for comparisons intra and inter-group); in others, 2 only for the later. Other studies used Student’s t test for comparisons at the baseline, 8 between the two groups and the Friedman test for the comparisons between the different times of the study in the single groups, 3 studies. Other studies used just the Student’s t test for inter- and intra-group comparisons. We can find also other type of analysis in all studies: analysis of variance (ANOVA), multiple linear regression, Chi-squared test, Fisher’s exact test, Friedman test, Wilcoxon test, Spearman’s correlation test, or McNemar’s test for specific data.

In few of the studies, seven describe the source population and seven studies specify the amount of evaluated patients who were finally included in the study, a measure of external validity which represents representatively of the study population from the entire population who has the same condition.

See Table 5 for the statistical analysis and external validity.

Table 5 Statistical analysis and external validity in the 20 selected trials

Clinical results

In our review of KOA, with 10 trials, internal validity of the studies is low in 6 studies to medium in 4; the studies using baths found better improve in pain at the end of the treatment and more leukocyte reaction in the middle of the treatment group. To better improve in pain, range of movement, tenderness of palpation, medical assessment at the end of the treatment group, the improvement after 3 months is only significant in pain and range of movement and tenderness in other study and also in WOMAC index after 3 months (Bálint et al., 2007). The studies using mineral mud compresses at home found better improvement in pain at the end of the treatment and longer 1 month, and better improvement in severity index (Lequesne) at 1- and 3-month follow-up. Odabasi et al. (2008) found better reduction in pain, WOMAC parameters, and disease severity during 6 months after mud packs vs control, but internal validity is low; less analgesic consumption also in most of the weeks followed up is also recorded in two studies with mud packs. Güngen et al. (2012) found longer improvements 3 months in physical activity status and maintenance of YKL-40 levels (related with the stop of the progression of the disease) in the treatment group also using mud packs versus hot packs. Mud topical gel appears to be better than control in reduction of pain, stiffness, join mobility, and reduction of TNF levels at the end of treatment in the study of Mahboob et al. (2009), with a high internal validity but low statistical power. In studies which use mineral baths and mud also found better improvements in night pain and severity index and longer effects (up to 5 months vs 4 in control groups).

In HOA, with only one trial, internal validity is high and found better improvement at the end of the treatment in all parameters studied and longer 3 months in most of them (not for morning stiffness or grip strength) but only some of them in 6 months time (not quality of life parameters).

In LBP, with three trials, internal validity is low in 2 studies to medium in 1. One of the studies (Kulisch et al. 2009), in the group treated with thermal water, improvement occurred earlier, lasted longer (15 weeks), and was statistically significant. In other study (Abu-Shakra et al. 2014), data suggest better pain control in patients treated with mineral-rich mud compresses compared to those treated with mineral-depleted mud packs. In other study, we find better activation of the antioxidant system by reducing the activity of 4 enzymes studied from the first session and in the end of the treatment, although in other study of LBP, Tefner et al. (2013) do not show significant difference (lack of statistical power? Lack of main judgment criteria?).

In RA, we find two trials with high internal validity (Franke et al. 2000, 2007) and low the others. Carbon dioxide and radon bath are superior to carbon dioxide bath alone for pain, function, and quality of life for patients included in a rehabilitation program, among less consumption of drugs. In others studies, we also found better and longer (1 or 3 months) improvement in grip strength, pain, and severity according to physician observation (1 month), activities of daily living and patient self-assessment of disease severity (3 month), and reduction in the number of swollen joints and in the Ritchie index (3 months). On the other hand, Sukenik et al. (1992) found improvement in morning stiffness at the end of the treatment period only in the control group.

In OP, with only 1 trial, internal validity is high and found significant changes over time in the concentrations of almost all analyzed bone markers and humoral factors as well as on quality of life parameters (6 months). Although low-dose radon hyperthermia balneo treatment does not significantly outmatch conventional thermal water treatment in this study, borderline significant differences of some bone markers indicate a possible additive effect of radon balneo treatment on the achieved biological effects.

Discussion

A sufficiently large number of trials were identified for this review. All trials reported improvements after treatment, though sometimes in the longer term compared to control treatments lacking mineral components and other chemical compounds. We thus propose a specific effect of the “mineral component” of spa therapy in rheumatology. However, several methodological limitations of the studies preclude any solid conclusions concerning the efficacy of balneotherapy, even for KOA or RA. These two clinical entities have been the focus of most of the scientific work conducted in the field over the course of many years. In effect, double-blind trials are needed to support evidence-based medicine, but many single-blind trials (Balogh et al. 2005; Evcik et al. 2007; Sherman et al. 2009; Tefner et al. 2012) with a good methodological approach have helped confirm the effectiveness of balneotherapy. In such single-blind trials, control groups are other kinds of physical therapies or no intervention at all besides medication and/or exercise and/or education protocols. This hinders our understanding of the role of the mineral elements and other components of mineral waters. The discussion of the specific mechanism of action the “mineral elements” of mineral natural waters and derivate peloids is not the objective of the present review, but as far as the latter is concerned, Burguera et al. (2014, 2016) have published promising studies about the effect of hydrogen sulfide on inflammation and catabolic makers on human articular chondrocytes. The evidence supporting an effect of aquatic exercises on rheumatological diseases (as an active treatment) in tap water is strong (Kamioka et al. 2010; Verhagen et al. 2012; Waller et al. 2013; Yázigi et al. 2013), but the protocols differ markedly from the used in medical spas, as we are testing water mineral baths or peloids (as a passive treatment).

We are also aware that some relevant studies were not included in our review. Although most good quality studies are usually published in English, other valuable trials written in other languages, especially German, Japanese, Turkish, Hungarian, Russian, and French, could exist. Indeed, these countries have an active tradition in balneotherapy and in research in this field.

The systematic review reported here combines data across studies in order to estimate treatment effects with more precision than is possible in a single study. The main limitation of this systematic review, as with any overview, is that the patient population, the balneotherapy protocol/characteristics of the control group, and the outcome definitions are not the same across studies even in each specific diagnosis.

We consider that one of the major challenges in medical hydrology research is recruiting a sufficient number of patients for studies. Sample size calculations are essential for comparative studies, and a lack of differences between the study group and control group could be sometimes due to a small number of patients rather than a lack of treatment efficacy. We should not forget that investigations addressing drug-based treatments have available data for thousands of patients as many more are using them every year. Medical hydrology research has to rely on data from hundreds of patients at most.

Another major challenge is the evaluation criteria. The presence of widely varying tests and variables for a given disorder makes it difficult to define appropriate outcome measures. Although there are published assessment protocols for KOA, RA, and other rheumatologic conditions, these are usually not used by researchers in spa therapy. Protocols for clinical research in medical hydrology need to be standardized to those of the more traditional therapeutic approaches.

The third challenge is placebo. Spa cures have always been subjected to skepticism because of the intrinsic characteristics of treatment and the fact that many variables are subjective. Hence, even the best evaluation criteria (e.g., WOMAC) rely on subjective tests. To account for placebo effects, double-blind trials are best, but how do we distinguish between the organoleptic characteristics of water and control treatments. The heterogeneity of RCTs (inclusion and exclusion criteria, number of patients, endpoints, statistical tests, etc.) hinders comparisons of results preventing reliable conclusions. Meta-analyses of data are therefore difficult, as reflected by the scarce number of published metanalyses of the effects of spa therapy (Liu et al. 2013; Matsumoto et al. 2017; Pittler et al. 2006).

Our study has several limitations. The quality of the studies varied. Randomization was adequate in all trials, and we assessed the quality of all of them; however, 11 of the articles did not explicitly state that analysis of data adhered to the intention-to-treat principle and 3 remain unclear, which could lead to overestimation of treatment effect in these trials. Publication bias might account for some of the effects we observed; especially, small trials may have an overestimation of effect sizes.

Lastly, the statistical treatment of intergroup comparisons is essential to conclude that a given treatment is superior to its control. We are close to confirming the beneficial effects of some mineral waters and peloids used to treat pain among other parameters in KOA, and this might also be the case for HOA and RA. The key to the differences observed could lie in better and longer-term effects attributable to the presence of mineral elements, but the longest follow-up reported has been 1 year and maximum sample size was 134 patients. Deciphering the impacts of the patient’s environment or the somatic effect of changes and macroclimate remain a clear challenge for research.

The lack of double-blinded studies is a consequence of the blinding difficulty that exists in medical hydrology.

Greatest evidence levels have been obtained for combined interventions in KOA though the mechanisms responsible for the effects observed remain unknown. Theoretically, many components of water or mud, especially trace elements, could be absorbed systemically through the skin or airways (Bacle et al. 1999; Beer et al. 2003; Chen et al. 2007; Shani et al. 1985). Some trace elements are known to affect the immune and inflammatory systems. However, the studies selected failed to measure trace elements in serum, soft tissues, or synovial fluid (Svenson et al. 1985). The real mechanisms of action of balneotherapy are unclear, but it has been established that minerals absorbed from the water can play a therapeutic role (Halevy et al. 2001) besides that played by the physical properties of water. It is difficult to analyze the effects of each component of spa waters separately; we are probably looking at a complex synergistic effect. The literature lacks data on the absorption of trace elements from mineral waters. With regards to mechanisms of action, most papers describe the hormone effects of balneotherapy, e.g., its impacts on beta-endorphin and cortisol levels. We have still insufficient data on the interaction between balneotherapy and the body’s antioxidant system. Notwithstanding, most of these publications present arguments for the beneficial effects of balneotherapy on anti-oxidative processes (Benedetti et al. 2010; Leibetseder et al. 2004). There is a clear need for more research, preferably conducted in multiple spas on larger numbers of patients and following rigorous methodological criteria.

Conclusion

Mineral bathing or mineral mud application appears to be more efficient, particularly to relieve pain, than non-mineral baths or mud, but the lack of sufficient quality of papers is a limit to these conclusions. Double-bind investigation has been possible and offers good possibilities for further investigations.