Introduction

Traumatic brain injuries (TBIs) are a major public health problem, causing physical and neuropsychological sequelae and representing one of the main causes of death in young adults [1]. These injuries are the result of a mechanical insult to the brain, as occurs, for example, in traffic accidents, gunshot wounds, falls or shocks related to sports practice [2, 3]. From a pathophysiological point of view, TBI are considered a heterogeneous condition that can be localized or diffuse, variably affecting different cell types at two different times: (i) primary injury, which is irreversible and occurs as a result of initial mechanical insult; (ii) secondary injury, which could be reversible and occurs as a consequence of the cascade of processes triggered by the primary injury (neuroinflammation, cerebral edema, excitotoxicity, oxidative stress, and cell death) [4].

Concerning the epidemiology, in the 1990s, it was estimated that in the USA, every 15 s, a person suffered a TBI, and every 5 min, a person was permanently disabled due to these injuries [5], being characterized as a “silent epidemic” [6]. From 1990 to 2016, TBI had an increase of 3.6% and 8.4% in incidence and prevalence, respectively, reaching an incidence rate of 369 cases per 100,000 inhabitants, and a prevalence of 55.05 million cases worldwide for in 2016 [7]. In addition to the social impact of these injuries, it was reported that the direct medical costs and costs related to the loss of productivity of patients were of approximately 33 billion euros in 2010 in Europe [8].

Sex differences in outcomes after TBI have been reported, with women having a more favorable prognosis than men. Studies indicate that after a high-speed traffic accident, men may have more severe injuries and greater posttraumatic amnesia than women [9]. Another retrospective study with more than 70,000 patients indicated that women have a lower risk of death and of developing any type of complications after moderate or severe TBI compared to males [10]. These data are corroborated by other studies showing that women have a greater degree of recovery than men in executive functions and visual memory after TBI [11, 12].

Due to the high impact of these injuries on society, there are currently more than 1300 clinical trials registered with TBI patients (http://clinicaltrials.gov). However, there is still no protective treatment that promotes the rescue or regeneration of neural cells, enabling the improvement of the prognosis and quality of life of people who suffer the devastating consequences of these injuries. Based on the supposed protection against TBI-related consequences among women, there is a growing literature suggesting that female sex steroids, such as progesterone, could have neuroprotective properties. In addition to the protective effects described in cases of TBI [13, 14], progesterone has also elicited beneficial effects in preclinical models of ischemia (stroke and neonatal), spinal cord injury, peripheral nerve injury, motor neuron disease, demyelinating diseases, epilepsy, and Alzheimer’s disease [15, 16].

Studies evaluating the neuroprotective role of progesterone in experimental models of TBI have showed that some mechanisms are involved in the recovery of animals, such as modulation of astrocytic function (including a protective effect on the blood–brain barrier (BBB)) [17, 18] and a reduction in brain edema after injury (through the modulation of the expression of aquaporin-4 (AQP-4)) [19, 20]. In a systematic review, Gibson et al. (2008) observed that progesterone decreased lesion volume in a dose-dependent manner after cerebral ischemia or TBI in animal models. However, despite the multiple benefits of progesterone observed in experimental models of TBI and the promising results of two phase II clinical trials [22, 23], two phase III clinical trials [24, 25] failed to show any benefit of progesterone administration in TBI patients. Possible causes of this failure include the following: the high heterogeneity of patients enrolled in relation to sex, age, and severity of TBI; the dose of progesterone and regimen used; the lack of patient stratification; and the lack of follow-up for a longer period [26,27,28].

Therefore, the aim of this study was to perform a systematic review and meta-analysis to revisit the preclinical evidence involving the assessment of the neuroprotective effect of progesterone in preclinical models of TBI, regarding brain edema, lesion volume, and survival rate. The present work could have a great impact on the planning of new clinical trials related to the development of new treatments for TBI.

Methods

The protocol of this systematic review and meta-analysis was published on the PROSPERO platform (https://www.crd.york.ac.uk/prospero/) under the code CRD42020218398.

Study Identification

Information Sources. The primary databases used for the systematic review were PubMed, Scopus, Web of Science, and LILACS. OpenGrey and Google Scholar (first 200 results) were screened for gray literature. A secondary search was performed by listing the ten authors with the greatest publication output on the field according to Scopus, and asking them for any additional study not included in the database search (for more details, see Supplementary Material I). Additionally, the reference lists of the selected studies were reviewed in order to identify other records that met the inclusion criteria.

Search Strategy. The following search strategy was designed for PubMed and adapted according to the syntax and searching engine of other databases tool: (progesterone [MeSH] OR progesterone OR progest* OR pregnenedione) AND (Traumatic Brain Injury [MeSH] OR “Traumatic Brain Injury” OR “TBI” OR (Brain AND (Injury OR trauma OR concussion OR contusion)) OR “Traumatic Encephalopath*”) AND [SYRCLE FILTER] [29] (Supplemental Material I). All searches were carried out on November 11, 2020, in all databases and websites without restriction of year, language or type of publication. To identify new studies published during data extraction and analysis, the searches were updated on September 8, 2021.

Selection Process. The resulting records of each database and website were exported to Rayyan platform (https://www.rayyan.ai/) for the semi-automatic exclusion of duplicates and the eligibility analysis. Non-duplicated records were analyzed independently by two reviewers (RGNN and MMRS) in two phases: (1) analysis by title and abstract and (2) analysis by full text. In both phases, discrepancies were solved by a third reviewer (BDA). For the reports which full text could not be retrieved through different databases or Web sites, we have contacted the corresponding authors via email to obtain the raw data or the full report.

Eligibility Criteria. The inclusion criteria included (1) preclinical animal studies; (2) in which a TBI model was induced; (3) using progesterone as intervention; (4) having a parallel control group not receiving progesterone, neither any other concurrent intervention; (5) measuring one or more of the following outcomes: brain edema, lesion size, and survival rate; and (6) published in English, Portuguese, or Spanish. In order to be considered eligible, a manuscript must have fulfilled all the six criteria above. In contrast, the exclusion criteria included studies in which only progesterone analogues or derivatives were used (without testing the progesterone molecule), animal models of ischemia, ex vivo studies and theoretical or narrative reports related to the subject (for example, reviews, letters to the editors etc.).

Data Extraction

Data were extracted directly from tables or textual description by a single reviewer (RGNN) and verified by a second reviewer (BDA) using a standardized data extraction form (Microsoft Excel). Whenever data were only presented in graphics or figures, it was used a digital ruler to identify the numerical value [30]. In cases of missing details of the outcomes of interest, we requested the raw data or specific details to the authors via email. For numerical outcome variables, data were extracted as mean ± standard deviation (SD). When the standard error of the mean (SEM) was used, the SD was calculated by multiplying SEM by the square root of the sample size.

The data extracted included the following: characteristics of the animal model used (species, strain, age, sex, hormonal status, and TBI inducing method), details about the intervention of interest (doses, route of administration, timing, frequency, and duration of progesterone administration), quantitative data of all variables analyzed in tests of the outcomes of interest (brain edema, lesion size, and survival rate), and data of secondary outcomes (such as neurological deficit, motor activity, and spatial memory and learning).

Data Synthesis and Risk of Bias Assessment

All data extracted were summarized in tables with information about the characteristics of the animal model, intervention, and outcomes of interest (Microsoft Excel). Furthermore, the SYRCLE risk of bias tool [31] to assess risk of bias for animal studies was used independently by two reviewers (RGNN and MMRS), with disagreements being resolved by a third reviewer (BDA).

Meta-analysis and Publication Bias

Quantitative data were analyzed using the Cochrane Review Manager software (RevMan version 5.4.1). The effect size of progesterone administration for each outcome in each article was measured using the standardized mean difference (SMD, through Hedge’s G method [32]) and all meta-analyses were performed using a DerSimonian and Laird random effect model [33]. Results are presented as effect size ± 95% confidence interval (CI) and analysis with a p < 0.05 was considered as statistically significant. In cases in which the same control group was used as a comparator for more than one experimental group within the same study, the sample size of the control group was divided by the number of comparisons included in the meta-analysis. To assess heterogeneity, the I2 test and the Cochran’s Q test were applied [34]. Regarding the analysis of subgroups, the following variables were considered: sex, dose of progesterone, damage induction method, and outcome assessment method. For the evaluation of publication bias, there was used the software STATA (version 15.1) to perform the funnel plot and Egger’s test [35].

Results

Study Identification

After implementing the different search strategies in each of the databases and Web sites, 1581 records were exported to Rayyan platform. A total of 969 non-duplicated records were analyzed, resulting in 99 reports included for full-text analysis. Among these, 47 studies were considered eligible after full-text analysis [5, 17,18,19,20, 36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77], and one more article was added after the search update [78], resulting in a final sample of 48 articles. The flowchart disclosing the selection process is presented in Fig. 1.

Fig. 1
figure 1

PRISMA flow diagram summing the study identification and selection. A*MEDLINE results were excluded with an automatic filter. B*The first 200 results of this Web site were analyzed. C*Full-text unavailable and without any answer by the authors

Characteristics of the Animal Model, Intervention, and Outcomes of Interest

Table S1 (Supplementary Material II) summarizes data inherent to year of publication of the included articles, animal model (species, strain, age, and sex), TBI inducing methods, and relevant data about the intervention (doses, vehicles, routes of administration, and time of the intervention according to the injury). In general terms, most of the articles included in this review were published in the decade 2010–2019 (54.2%), followed by the decade 2000–2009 (33.3%). Rat was the most used animal species (approximately 87.5% of all articles included). Although the age description was somewhat imprecise and sometimes absent, most of the studies used young adult animals, with males as the predominant sex.

Regarding the method used to induce TBI, 36 studies (75%) used the controlled cortical impact or cortical contusion injury (CCI) followed by 9 studies (18.75%) that used weight-drop by the Marmarou’s method (diffuse brain injury (DBI)). The most frequently used progesterone doses were as follows: 16 mg/kg (39.6%), 8 mg/kg (31.3%), and 4 mg/kg of body weight (22.9%). The most used vehicle was 2-hydroxypropyl-β-cyclodextrin (HBC, 45.8% of all included studies) followed by peanut oil (25%) and sesame oil (20.8%). The most commonly used routes of administration (Tables 1 and 2) were intraperitoneal (83.3%, for the first dose) and subcutaneous (74.3%, when subsequent doses were administered).

Table 1 Main findings described for cerebral edema by study
Table 2 Main findings described for lesion volume by study

Variables Reported for Each Outcome of Interest

On the primary outcomes analyzed, 29 studies (60.4%) evaluated brain edema, 21 studies (43.8%) evaluated lesion size, and 0 studies reported the survival rate after the intervention with progesterone. Regarding brain edema, the most used variables to report this outcome were the difference % in edema between the lesion area and the uninjured ipsilateral region (13 studies) and total % of brain water content (11 studies). On the other hand, the most used variable to report lesion size was the % of lesion volume (10 studies) (Table S2). Concerning the secondary outcomes, 14 studies used the Morris Water Maze (MWM), mainly reporting the latency to find the submerged platform in the initial position (12 studies) and total path to reach the platform (6 studies). In relation to neurological deficit, 6 studies analyzed this outcome, using 3 different scales (2 studies used Neurological Severity Score (NSS), 2 studies used the modified Neurological Severity Score (mNSS), and 2 studies used Veterinary Coma Scale (VCS)). Regarding the evaluation of motor activity, the locomotor activity was evaluated by digiscan boxes (5 studies) and open field test (4 studies), while vestibulomotor activity was evaluated mainly through the rotarod test (3 studies).

Effect of Progesterone on Brain Edema and Lesion Size

Qualitative Synthesis

Table 1 summarizes the main findings for cerebral edema by study. Of the 29 studies that reported brain edema as one of their outcomes, 82.8% (24 studies) showed a significant reduction in water content of whole brain or injured area in at least one of the progesterone-treated groups compared to vehicle-treated group. In contrast, 10.3% (3 studies) did not show statistically significant differences in this outcome after progesterone administration in any experimental group. There were 3 studies with unclear report of the statistical differences in one the experimental groups.

Table 2 summarizes the main findings for lesion volume by study. Of the 21 studies that reported lesion volume as one of their outcomes, 47.6% (10 studies) showed a significant reduction in lesion volume in at least one of the progesterone groups compared to the vehicle group. On the other hand, 52.4% (11 studies) did not show statistically significant differences in this outcome after progesterone treatment.

It is worth mentioning that in some studies the presence or absence of a beneficial effect on the outcome of brain edema or lesion volume was not always in agreement with other results. Therefore, in some studies without statistical difference between progesterone-treated group versus vehicle-treated group for edema or lesion volume, statistically significant changes were found for other outcomes, such as protein expression or behavioral analysis.

Quantitative Synthesis

In the meta-analysis, it was found a beneficial effect of progesterone for both brain edema (SMD − 1.73 [CI: − 2.02, − 1.44], p < 0.0001, Figs. 2 and 3) and lesion volume (SMD − 0.40 [CI: − 0.65, − 0.14], p = 0.002, Figs. 4 and 5). After subgroup analysis, the effect of progesterone was maintained for brain edema regardless of sex (Figures S1-S2), dose (Figures S3-S6), method of injury induction (Figures S7-S8), and method of outcome assessment (Figures S9-S11). However, when we evaluated the subgroups for lesion volume in relation to sex (Figures S12-S13), dose (Figures S14-S17), and treatment withdrawal (abrupt in Figure S18 and tapered in Figure S19), the beneficial effect was only maintained in studies that used males, doses of 10 mg/kg and 16 mg/kg, as well as tapered withdrawal regimens of treatment. It should be mentioned that there are some limitations regarding the number of studies involving some of these analyses. Therefore, more studies about lesion volume are needed with females, doses of 4 mg/kg and 8 mg/kg, as well as the comparison between abrupt or tapered withdrawal schemes to reach more reliable conclusions.

Fig. 2
figure 2

Forest plot of progesterone’s overall effect in brain edema. CI, confidence interval; df, degrees of freedom; IV, inverse variance; SD, standard deviation; Std., standardized

Fig. 3
figure 3

Analysis by subgroups of progesterone’s effect in brain edema. The dark gray vertical band discloses the 95%CI of the overall brain edema meta-analysis (as shown in Fig. 2)

Fig. 4
figure 4

Forest plot of progesterone’s overall effect in lesion volume. CI, confidence interval; df, degrees of freedom; IV, inverse variance; SD, standard deviation; Std., standardized

Fig. 5
figure 5

Analysis by subgroups of progesterone’s effect in lesion volume. The dark gray vertical band discloses the 95% CI of the overall lesion volume meta-analysis (as shown in Fig. 4)

Analysis involving subgroups related to the method of induction of the lesion and the method of lesion volume evaluation were not performed since most of the studies used the CCI as a method for inducing the injury, as well as considering that there is a great variability in the methods to assess the outcome of lesion volume as observed in Table S2.

Assessment of Risk of Bias in Included Articles and Publication Bias

Figure 6 synthesizes the responses that translate into low, high, or unclear risk of bias per question of the tool used [31]. First, it should be highlighted that in the overall appreciation more than 75% of the responses were unclear due to lack of details in the report of each study. The questions with the highest proportion of “yes” answers (denoting low risk of bias—green color in Fig. 6) were the questions related to reporting bias, selection bias (baseline characteristics of the animals used), and attrition bias. On the other hand, the questions with the highest proportion of “no” responses (denoting high risk of bias—red color in Fig. 6) were the questions inherent to other sources of bias (potential conflicts of interest), reporting bias and attrition bias. More details on the individual rating of each study in each of the categories of the tool used can be found in the Figure S20.

Fig. 6
figure 6

Consolidated assessment of risk of bias by question. The graph shows the data consolidated by questions from the SYRCLE risk of bias assessment tool [31]. Red: high risk of bias (“no” answers); yellow: unclear (when details in the article to made a decision were lacking); green: low risk of bias (“yes” answers)

Regarding publication bias, there was evidenced an asymmetry in the funnel plots for both brain edema (intercept − 3.11 (95% CI: − 4.21 to − 2.01), t =  − 5.65, p = 0.000; Figure S21) and lesion volume (intercept − 2.42 (95% CI: − 4.62 to − 0.22), t =  − 2.24, p = 0.032; Figure S22).

Discussion

After the publication of the results of the ProTECT (Progesterone for the treatment of Traumatic Brain injury) III [24] and SyNAPSe (Study of the Neuroprotective Activity of Progesterone in Severe Traumatic Brain Injuries) [25], two phase III clinical trials, it did not take long to generate controversy and comments such as the following: …it is clear that progesterone does not represent a viable treatment option for patients with severe TBI [79], Another failed attempt of neuroprotection: progesterone for moderate and severe traumatic brain injury [80] or Progesterone for Traumatic Brain Injury – Resisting the Sirens’ Song [81]. Some of these comments also suggested limitations in the analysis of preclinical evidence [80] or even that some preclinical studies could be reporting false positives due to various unrecognized biases [81]. Faced with this controversy, here we have systematically reviewed the literature and qualitatively and quantitatively analyzed the results inherent to brain edema and lesion size in preclinical studies of progesterone for TBI, finding the following: (1) evidence of beneficial effect of progesterone to reduce brain edema and to decrease the lesion volume or increase the remaining tissue in rodents (rats and mice) exposed to TBI; (2) absence of studies reporting the effect of progesterone on the mortality of animals after experimental TBI; and (3) there was evidenced an asymmetry in funnel plots, suggesting a publication bias.

Previously, Gibson and colleagues [21] carried out a systematic review and meta-analysis of lesion volume in animal studies involving the administration of progesterone before or after a brain injury (ischemia or TBI). They identified 18 studies (11 of ischemia and 7 of TBI) measuring this outcome between 1980 and 2006. As a result of this analysis, they concluded that progesterone could have a neuroprotective effect in these types of brain injury. However, the beneficial effects observed in reducing lesion volume in the TBI studies were only significant in those studies with the highest quality scores (based on Stroke Therapy Academic Industry Roundtable (STAIR) recommendations). It is worth mentioning that the authors pointed to the presence of publication bias in the TBI studies reporting lesion volume, but not in ischemia studies. In our study, 29 and 21 studies were identified reporting brain edema and lesion volume, respectively. When compared to the vehicle, progesterone reduced cerebral edema or lesion volume after TBI. However, these findings could have been biased by publication or related bias observed in the funnel plot [35, 82], with a possible overestimation of the effect size [83]. In another meta-analysis, Wong et al. [84] analyzed the individual data (published and unpublished) of animals from studies that assessed the effect of progesterone in stroke, showing a decrease in the ischemic lesion volume, consistent with the findings from Gibson et al. [21]. However, it was also evidenced an increase in the stroke-related mortality, particularly of ovariectomized young females [84].

In preclinical studies, the rate of non-publication results varies between 14 and 33% [83, 85]. These data are worrisome considering the waste of resources involving the unnecessary repetition of experiments. Furthermore, even more troublesome is the incomplete analysis of potentially harmful preclinical evidence, considering that decision-making about launching trials in humans is based on these studies [85]. This allows us to question whether the lack of publications reporting the effects of progesterone on the mortality or survival rate in TBI models is due to neutral or negative results. Of course, other factors could also be related to this, such as the feasibility of evaluating the mortality in animals with moderate or severe injuries in the medium or long term, or even methodological and ethical issues (animal welfare and regulatory reasons) that may make the use of models that involve more severe injuries (associated with a higher incidence of this outcome) unfeasible.

In clinical trials evaluating the effects of progesterone in individuals with TBI, the main outcomes analyzed are mortality, functional neurological outcomes (through the Glasgow Coma Scale (GCS), Glasgow Outcome Scale (GOS), Functional Independence Measure (FIM), and/or Karnofsky Performance Scale (KPS)), intracranial pressure (ICP), and safety [26, 86]. There is a well-documented relationship between cerebral edema, ICP, and neurological outcomes in patients with TBI, with brain edema being considered a major contributor to detrimental consequences [87]. Although many researches have evaluated the effects of progesterone on brain edema, only 6 studies [18, 53, 58, 66, 69, 76] included in our systematic review analyzed the neurological deficit (through NSS, mNSS, and VCS) among their outcomes and none reported mortality after the intervention with progesterone. It is worth mentioning that our findings reflect the reporting of these outcomes as “secondary” results within the studies that we identified with any of the primary outcomes (cerebral edema and lesion volume). Of the 6 preclinical studies that analyzed neurological deficit, 4 of them [18, 53, 69, 76] reported improvement in the score (according to the scale used) in at least one of the moments in which the outcome was evaluated. Generally, the improvement in the scores was observed between the first 4–24 h post-injury, except in the study by Yu et al. [76] in which the improvement was reported at 7 and 14 days post-injury.

In this context, several meta-analysis of clinical trials [86, 88,89,90,91,92,93,94,95,96,97] were performed including between five and eleven studies, resulting in conflicting findings. Seven of these meta-analysis did not observe benefits using progesterone in patients with TBI for mortality or functional outcomes [88,89,90,91,92,93,94], while the other four studies showed some benefit in the neurological outcomes of patients with TBI at some point of follow-up (3, 6, or 12 months), particularly when progesterone was administered intramuscularly and/or in young patients [86, 95,96,97]. Only one meta-analysis showed a significant reduction in mortality up to 6 months post-TBI [97]. In this sense, as it was mentioned above, we were unable to identify any study assessing survival or mortality rate after the intervention with progesterone in a non-human animal model of TBI, demonstrating a discouraging dissociation between preclinical and clinical studies.

On the other hand, although we did not initially consider ICP as a primary or secondary outcome in this systematic review, we identified three studies [55, 66, 78] reporting a decrease in ICP at 4, 24, and 48 h after the induction of the injury depending on the dose of progesterone that was administered. In the 12 clinical trials [22,23,24,25, 98,99,100,101,102,103,104,105] of progesterone for TBI, only 5 studies [22, 23, 25, 101, 104] analyzed the difference in ICP values or tomographic findings (mainly through the Marshall classification) with signs of cerebral edema (Table S3). Of the latter, only Dahroug et al. [104] showed an improvement in the Marshall classification scores at 1 and 7 days after the injury in the group treated with progesterone. In addition, many patients classified as grade III, IV and V on day 1 post-injury passed to groups I and II on day 7 post-injury after being treated with progesterone. Other 2 studies [22, 23] reported a trend towards lower ICP or values with less variability in the groups treated with progesterone in comparison with the control group; however, these differences were not statistically significant.

At this point, it is important to highlight the concern and importance of identifying new biomarkers and neuroimaging resources that can be used in both preclinical and clinical studies, for a more efficient translational research in the field of TBI [26, 27, 106]. In light of this, it was recently reported that progesterone did not change the serum levels of biomarkers of neuronal (ubiquitin carboxy-terminal hydrolase-L1 (UCH-L1) and alpha II spectrin breakdown product 150 (SBDP150)) and glial (glial fibrillary acidic protein (GFAP) and S100 calcium-binding protein B (S100B)) death at 24 h or 48 h post-injury in samples from patients enrolled in the ProTECT III clinical trial (BIO-ProTECT) [107]. However, Sayeed and Stein [108] suggest that these findings should be analyzed with caution considering the heterogeneity of the subjects included in the study (in relation to age, gender, severity, and locus of injury), peak levels of these biomarkers post-injury (UCH-L1 and SBDP150 at 6–8 h, and GFAP at about 24–48 h), selection of biomarkers with theranostic significance or included in pre-clinical studies testing progesterone efficacy, the power to detect interaction between treatment group and endpoint. In addition, these researchers highlight the possibility that phase III clinical trials may have used suboptimal doses of progesterone [28, 108]. This assertion is supported by the findings of Mofid et al. [100], in which lower doses of progesterone significantly decreased the levels of the biomarker S-100B in patients with diffuse axonal injury at 1 and 6 days post-injury.

Progesterone has been studied for experimental TBI at different doses (from 1.7 to 32 mg/kg of weight), presenting a dose–response bell-shape or U-shape curve as described by most authors [28, 45], with the most effective doses being in the middle of the curve. In our review, we found that the most used doses are between 4 and 16 mg/kg of weight, with only 3 studies testing doses of 20 or more mg/kg of weight and 6 studies assessing the dose of 1.7 mg/kg. This could be related to the fact that the dose of 32 mg/kg has been found to be less effective than doses of 16 mg/kg or less in animal models of TBI [45, 59] and stroke [109, 110]. Indeed, it was even found that progesterone at doses of 30 or 60 mg/kg increased the infarct volume in the subcortical regions of the brain of OVX female rats when administered in a single dose or during 7 to 10 days before a reversible middle cerebral artery occlusion (MCAO) [111].

The translation of these doses from preclinical to clinical studies has caused some debate considering the hypothesis postulated by Howard et al. [28], in which it was considered that beneficial effects would be obtained with progesterone at serum levels between 50 and 100 ng/ml, which would correspond to the administration of 2–4 mg/kg. To better understand this premise, the complete scenario should be analyzed: most of the preclinical studies with evidence of beneficial effects of progesterone in TBI used doses ranging from 4 to 16 mg/kg, which should produce an increase in serum or plasma progesterone levels ranging from approximately 20 to 100 ng/ml according to age and animal model used, as well as with the time after administration (see studies by Wright et al. [5], Kasturi and Stein [50], Peterson et al. [59], and Wong et al. [112] in Table 3). In the same way, studies with OVX animals showed that progesterone subcutaneous implants that produced serum levels of progesterone ranging between 10 to 50 ng/ml were also beneficial for TBI [51, 55, 69]. Besides, it has been observed that single or multiple doses of progesterone greater than or equal to 30 mg/kg were associated with serum progesterone levels greater than 100 ng/ml (see the study by Murphy et al. [111] in Table 3), presenting a decreased efficacy in TBI [45] or even increasing the area of ischemia in stroke [111]. Particularly, in a TBI model, multiple doses of 20 mg/kg could increase serum progesterone levels to 146 ± SEM 11.3 ng/ml without showing beneficial effect comparing with vehicle group [59].

Table 3 Progesterone levels in serum or plasma after the exogenous progesterone administration in both human and rodent models

On the other hand, ProTECT II [22], ProTECT III [24], and SyNAPSe [25] trials were based primarily on the study by Wright et al. [113], where progesterone was used at intravenous doses of approximately 12 mg/kg/day, producing serum progesterone levels of 337 ng/ml. However, there are other clinical studies based on the dose used in the Hangzhou trial (China) [23] (2 mg/kg/day intramuscular), which is associated with serum progesterone levels of approximately 14 to 21 ng/ml [100, 102] (Table 3). Among the clinical trials of progesterone for TBI that used the dose of 2 mg/kg/day, it were reported the following beneficial effects: decrease in mortality rate at 28 days (with 7 days of intervention) [104] and 6 months (with 5 day of intervention) [23, 102]; improvement in functional neurological outcome at 28 days [104], 3 months [23, 98, 99], and 6 months [23, 100, 102]; decrease in days of hospitalization in intensive care units [103, 104]; decrease in days requiring mechanical ventilation [104]; fewer decompressive craniotomies [101]; decrease in circulating levels of intercellular adhesion molecule 1 (ICAM-1) [105]; and modulation of the serum levels of cytokines, injury, and oxidative stress biomarkers [100]. Contrasting with these findings, the intravenous dose of 12 mg/kg/day only showed benefits in the improvement of the functional neurological outcome at 30 days post-TBI in the ProTECT II [22]. The promising results of this trial using higher intravenous doses of progesterone for a shorter period remain unclear.

In the field of biomedicine, multidisciplinary translational research is an essential pillar in the understanding of the health-disease process, as well as in the development of new prevention and treatment strategies. Both preclinical and clinical studies have characteristics, advantages, and disadvantages that, together, end up making their findings complementary [114]. However, the high rates of failure in the translation of preclinical to clinical studies raise many questions about how research is being performed both in human and non-human animals [81]. In this context, it seems essential to standardize the procedures necessary to refine the scientific and clinical practice of professionals both in the research area and in the health system [115, 116]. Similarly, it was postulated the crucial role of systematic reviews of animal studies to improve the translation of findings “from the bench to the patient’s bedside,” particularly before conducting clinical trials, thus allowing a refinement in the experimental design and a greater understanding of the preclinical findings [117, 118].

Thus, there are multiple lessons derived from systematic reviews of preclinical evidence, namely the following: (1) they can identify the effect of an intervention in a certain model, avoiding the duplication of “unnecessary” studies and/or stimulating the methodological refinement of new studies (crucial in the contemplation of the principles proposed by Russell-Burch on the 3R: replacement, reduction, and refinement [119]) [117, 118]; (2) they provide better data regarding efficacy and safety of interventions [117]; (3) they might encourage systematic and critical analysis of preclinical evidence with the aim of avoiding misinterpretations or partial reviews of the literature, which could lead to bias in the synthesis the evidence [117]; and (4) they could improve the translation of preclinical to clinical results through the identification of animal models that can better reproduce the human problem/disease [117, 118]. However, this type of reviews have also limitations, such as being considerably time consuming, the reporting problems of methodology/results of preclinical studies and the use of methods/outcomes measures that do not allow the appropriate translation of results to humans [117]).

As suggested by Stein [27] and Schumacher et al. [26], there are many reasons why progesterone could not elicit beneficial effects in patients enrolled in some phase III clinical trials (mentioned above). Although the differences between species could explain the discrepancy in the findings between humans and other animals [120], there are several aspects involving the experimental design and the transparency in the interpretation and publication of the results that could help to combat the “reproducibility crisis” [121], with an irreproducibility ranging between 75 and 90% [122]. Among these factors, there are some avoidable factors such as poor methodological quality, differences in design between experimental animal studies and clinical trials, and publication bias [118]. In this regard, SYRCLE’s risk of bias tool [31] assess the main potential bias in animals studies (selection, performance, detection, attrition and reporting bias). In the present review, we show that most of the studies lack details to carry out an adequate evaluation of the questions of the SYRCLE tool, particularly in those inherent to the method of randomization and allocation of animals. These aspects could impact on the reliability of the results, as shown by De Vries et al. [118] and Hirst et al. [123]. It is also noteworthy that more than 30% of the responses to the questions of selective reporting outcome and other source of bias (potential conflicts of interest) indicated a high risk of bias (Fig. 4). In this sense, it was also observed that potential conflicts of interest, especially those related to the industry, could affect the results [124, 125].

To make a complete and clear analysis, we had some limitations during the course of the study, which were as follows: (1) imprecise description or lack of details of the methods or results on the articles included in our review might have reduced the precision and increased the heterogeneity of our own results; (2) few authors provided the information requested to assure a complete and reliable data extraction process; (3) high variability in the evaluation of outcomes and presentation of results, particularly in relation to the volume of the lesion; (4) for lesion volume, 16 of the 21 articles with this outcome used a calculation related to the cavity/absent area, which could present certain artifacts or variations inherent to the histological preparation [44]; (5) although the behavioral outcomes are of interest and deserve to be analyzed, the considerable methodological heterogeneity found in the tests prevented us from performing a meta-analysis of these outcomes. Since the behavioral outcomes (Table 2S) were included as secondary outcomes in our protocol, a proper appraisal of the behavioral effects of progesterone would require a new and independent systematic review, with specifically designed search strategies, aiming to properly perform a meta-analysis of these findings; (6) the evidenced publication bias that could impact in the effect size of the progesterone for TBI in both brain edema and lesion volume; (7) published clinical data report few or no values that could be used to compare with the results of brain edema outcome of the present review.

In conclusion, the present systematic review demonstrated that there is evidence that progesterone has an anti-edema effect in animal models of TBI involving the use of rodents, as well as decreases lesion volume or increases remaining tissue. However, more studies are needed using assessing methods with lower risk of histological artifacts. Moreover, we were unable to identify any study that evaluated the survival or mortality rate after the intervention with progesterone in a non-human animal model of TBI. Therefore, it is important when planning a clinical trial to consider the main findings and outcome measures of preclinical studies by performing a systematic/comprehensive review and, if possible, a meta-analysis. Finally, considering the difficulties presented at the time of evaluating the risk of bias of the articles included in this review, it is necessary to improve the reporting of data both in the methodology and results, increasing the accuracy and transparency related to the information presented by the studies.