Introduction

The use of vague and mostly arbitrary definitions for poor ovarian response (POR) precludes appropriate pooling of data and prevents the development of evidence-based management strategies. In an effort to standardize the definition of POR, the ESHRE Working Group on POR defined a set of variables named as the Bologna Criteria [1]. The proposed definition is based on the number of oocytes collected in prior ovarian stimulation (OS) cycles, with female age and ovarian reserve tests (ORT) being as two major contributing factors. Four different conditions that might be associated with POR have been defined: (I) collection of ≤ 3 oocytes in two prior OS cycles, (II) collection of ≤ 3 oocytes in a single OS cycle from a woman who is > 40 years of age, (III) collection of ≤ 3 oocytes in a single OS cycle and an abnormal ovarian reserve test (ORT) or, (IV) the presence of an abnormal ORT (AFC < 5–7 follicles or AMH < 0.5–1.1 ng/mL) in a woman over 40 years of age. Since their introduction in 2011, several concerns have been raised about their reliability, prognostic value and clinical relevance [2,3,4,5,6,7,8,9]. We have recently shown that the criteria have not been embraced by infertility specialists and still many clinical trials prefer other arbitrary definitions to define poor response and poor responders [10]. Prompted by the ongoing debate, we aimed to assess the clinical relevance of the criteria and designed a retrospective cohort analysis to test the hypothesis that women who fulfilled the Bologna criteria would differ from women who did not fulfill the criteria in their ovarian responses and live birth rates in their subsequent OS cycles.

Materials and methods

Study population

This study was approved by the Institutional Review Board (2015.013.IRB2.006). Women selected for inclusion were identified after review of all ART cycles performed between May 2010 and January 2017 at the Assisted Reproduction Unit of the American Hospital, Istanbul, Turkey. All women who had undergone consecutive OS cycles leading to oocyte retrieval within 2 years after the index OS cycle were included. Patients who underwent ART in an unstimulated natural cycle and who had embryo cryopreservation were excluded. A total of 9904 fresh IVF cycles were analyzed for inclusion.

Cycle characteristics, number of oocytes retrieved, number of embryos transferred and live birth rates in OS cycles subsequent to the index cycle were analyzed. Poor response and normal response were defined as retrieval of ≤ 3 and > 3 oocytes, respectively.

ART procedures

Women underwent ovarian stimulation using long gonadotropin releasing hormone (GnRH) agonist (Lucrin, Abbott, Istanbul, Turkey) or multidose flexible GnRH antagonist (Cetrotide, Merck, Istanbul, Turkey) protocol, combined with recombinant human follicle stimulating hormone (Gonal-F, Merck, Istanbul, Turkey), based on the physicians’ preferences. Dose adjustments were done according to the follicular development and serum estradiol levels. Final maturation of the oocytes was induced with 10,000 IU human chorionic gonadotropin (Pregnyl, Organon, Istanbul, Turkey) when the leading follicle reached 20 mm in the mean diameter accompanied by two follicles of > 16 mm. Follicle aspiration was performed 36 h after the triggering of ovulation. ICSI was used to fertilize the oocytes. Embryos were transferred on the third day after ICSI under ultrasound guidance using the Wallace (Sims Portex Ltd., Hythe, Kent, UK) or Labotect (Labotect, Bovender-Gottingen, Germany) catheters.

The luteal phase was supported with 90 mg vaginal progesterone gel (Crinone 8%, Merck, Serono, Istanbul, Turkey) starting from the day of oocyte collection. Pregnancy test was performed 10–12 days after embryo transfer. Women with a positive pregnancy test continued the vaginal progesterone gel until the 10th week of gestation. Pregnancy was confirmed by measuring serum β-hCG levels 12 days after embryo transfer. Live birth was defined as delivery of one or more live infants.

Statistical analysis

Primary outcome parameters were the rate of OS cycles with poor response and live birth rates. Secondary outcome measures were the number of oocytes retrieved and the number of OS cycles reaching to embryo transfer. Continuous variables of baseline demographic characteristics and IVF outcomes were expressed as mean ± SD. They were compared using independent Student’s t test or Mann–Whitney U test, according to the distribution of their values. Categorical variables were compared using the Chi square or Fisher’s exact test, where appropriate. The significance level was set at 5% (p < 0.05). Graphpad Prism (version 7) was used to analyze the data and create the figures.

Results

The study group comprised 1153 and 288 women who had two and three consecutive OS cycles, respectively. Their descriptive characteristics at the index OS cycle are presented in Table 1. The Bologna criteria-defined poor responder women were older than women who did not fulfil the criteria (38.3 vs 33.7 years and 39.0 vs 34.1 years, for women with two and three OS cycles, respectively; both p < 0.001). Criteria-positive women who underwent three OS cycles had a longer duration of infertility compared to those who did not fulfil the criteria (5.1 vs 4.0, p = 0.027). Treatment indications showed a similar distribution among the groups (Table 1).

Table 1 Descriptive characteristics of women who had two and three ovarian stimulation cycles according to the fulfillment of the Bologna criteria

Cycle characteristics of women who underwent consecutive OS cycles are presented in Table 2. Despite an increase in the dose of gonadotropins used per day for each subsequent cycle, no significant improvement was detected in serum peak E2 levels or mean number of oocytes retrieved in women who had two or three OS cycles. For both groups, fewer women showed poor response as they proceeded through repeated OS cycles. Out of 58 < 40-year-old women with AFC > 7 or AMH > 1.1 ng/mL who experienced unexpected poor response in the first cycle, 25 (43.1%) showed a normal response in the subsequent cycle (not shown in the table).

Table 2 Cycle characteristics of women who underwent two and three ovarian stimulation cycles

Ovarian response and live birth rates according to fulfillment of the Bologna criteria

Figure 1 summarizes the ovarian response and live birth rates in consecutive OS cycles depending on the fulfillment of the Bologna criteria. Among women who had two OS cycles, out of 240 fulfilling the criteria, 183 (76.2%) showed poor response and 35 (14.6%) achieved live births, whereas out of 913 women not fulfilling the criteria, 131 (14.3%) showed poor response and 304 (33.3%) achieved live births. For women who had three OS cycles, out of 116 fulfilling the criteria, 70 (60.3%) showed poor response and 15 (12.9%) achieved live births, whereas out of 172 women not fulfilling the criteria, 23 (13.4%) showed poor response and 59 (34.3%) achieved live births. Overall, women who fulfilled the criteria achieved higher rates of poor response and lower LBRs in their second and third OS cycles, respectively (both p < 0.001) compared to those who did not fulfil the criteria.

Fig. 1
figure 1

Poor ovarian response and live birth rates in consecutive ovarian stimulation cycles depending on the fulfillment of the Bologna criteria. OS ovarian stimulation; a,b,c,dp < 0.001

Then, we stratified women in the study group according to the female age. Table 3 shows the clinical outcome parameters of women who had two OS cycles according to the age and fulfillment of the criteria. Young, Bologna criteria-defined poor responder women were found to achieve lower number of oocytes (2.7 vs 8.3), higher rate of poor response (70.5% vs 12.9%), lower likelihood of having embryo transfer (67.4% vs 92.5%) than young women who did not fulfill the criteria (p < 0.001). Similarly, older women who fulfilled the criteria were found to achieve lower number of oocytes (2.2 vs 6.2), higher rate of poor response (82.9% vs 22.3%), and lower likelihood of having embryo transfer (64% vs 85.6%) than older women who did not fulfill the criteria (p < 0.001).

Table 3 Clinical outcome in the subsequent cycle of women who had two consecutive ovarian stimulation cycles depending on female age and fulfillment of the Bologna criteria

Despite the transfer of similar number of embryos, LBRs were significantly lower for young women who fulfilled the criteria (20.2% vs 36.6%, p = 0.0002) compared to young women who did not. There was no significant difference in LBRs between older women who did and did not fulfill the criteria (8.1% vs 15.1%; p = 0.12).

Among those who underwent three OS cycles, young women who fulfilled the criteria had lower number of oocytes (2.7 vs 8.3; p < 0.001), higher rate of poor response (66.7% vs 11%; p < 0.001), lower chance of embryo transfer (59.3% vs 90.4%, p < 0.001) and lower LBR (18.5% vs 37.7%, p = 0.011) compared to women who did not fulfill the criteria (Table 4). In the older age group, Bologna criteria-defined poor responders had significantly lower number of oocytes (2.3 vs 6.3, p < 0.0001) and higher rates of poor response (54.8% vs 26.9%, p = 0.02). However, both groups were comparable in terms of OS cycles ending up with embryo transfer (75.8% vs 80.8%) and LBRs (8.1% vs 15.4%).

Table 4 Clinical outcome in the third OS cycle of women who underwent three consecutive OS cycles, stratified according to female age and fulfillment of the Bologna criteria

Discussion

In this study, we found that women who fulfilled the Bologna criteria had lower number of oocytes, higher risk of poor response, lower likelihood of having embryo transfer and lower live birth rates in their subsequent OS cycles, compared to women who did not fulfill the criteria. For < 40-year-old women, the Bologna criteria were able to predict both ovarian response and clinical outcome in the consecutive cycles. However, the criteria were predictive only for the ovarian response but not for the clinical outcome in women over 40 years of age, who exhibited very LBRs regardless of the fulfillment of the criteria. The chance of live birth was below 10% for  ≥ 40 year olds who fulfill the criteria. These findings were consistent with previous reports showing that the Bologna criteria were able to identify women with poor prognosis, suffering from poor live birth rates ranging from 5.5 to 7.4% [5]. A retrospective cohort analysis which incorporated several risk factors into the Bologna criteria, such as previous history chemotherapy exposure, adnexal surgery, menstrual irregularities, karyotype anomalies and presence of an endometrioma, suggested that the criteria were able to define patients with poor chance of live birth at the expense of a high cost [11]. Given the persistence of poor response and low birth rates, treatment of these patients deemed questionable [12], particularly in settings where oocyte donation could be a viable option.

This study also showed that a substantial number of young women who had ≤ 3 oocytes in their first OS cycle achieved a better response in the subsequent cycle. These results are reminiscent of the previously discussed fact that women under age 40 should not be considered as poor responders based solely on a low oocyte yield in their first OS cycle [13, 14], providing further support to the Bologna criteria. On the contrary, a comprehensive analysis of the literature on the management of poor responders showed that the vast majority of the published trials analyzed women who had low oocyte counts in their very first treatment cycle without taking age into consideration [7]. Therefore, caution should be exercised when interpreting their results.

Since the introduction of the Bologna criteria in 2011, there has been an ongoing debate on its reliability and applicability. The criteria have been criticized for several reasons such as selected cutoff values for female age and number of oocytes harvested, as well as ignorance of many medical and genetic factors that might potentially impact ovarian reserve [2,3,4,5,6,7,8,9, 15,16,17,18]. Additional concerns were raised by different investigators, who pointed out that using descriptive criteria in a study population with heterogeneous clinical characteristics and prognostic features would inevitably introduce significant methodological bias [6, 19]. Exclusion of young women with abnormal ORT at their first cycle was noted to be another major downside of the criteria since those are the ones who could actually benefit from research on POR [2].

While older Bologna criteria-positive women showed poor LBRs (8.1%), consistent with previous reports, acceptable LBRs (18.5–20.2%) were achieved in younger women who fulfill the criteria. A detailed analysis of these young Bologna criteria-positive women revealed that among patients with two consecutive OS cycles, those who had three or more oocytes in the final treatment cycle achieved much higher LBR (28.1%; 18/64), compared to women who had ≤ 2 oocytes (12.3%, 8/65). Likewise, for women who had three consecutive OS cycles, those who managed to have ≥ 3 oocytes in the third OS cycle had higher LBR (33.3%, 7/21), compared to women who had ≤ 2 oocytes (9.1%, 3/33). Thus, the wide range in LBRs for the Bologna criteria-defined poor responders observed in this study reiterates the above-mentioned shortcomings of the criteria in terms of subsuming a heterogeneous group of patients with diverse clinical characteristic and prognostic features under a single category.

Recently, a modified definition of impaired ovarian response was proposed by the Poseidon Group (Patient-Oriented Strategies Encompassing Individualized Oocyte Number) to serve as a guide to personalize treatment to obtain the minimum number of oocytes sufficient to transfer at least one euploid embryo [20]. The rationale behind the proposal was that the Bologna criteria selectively define a very poor prognosis group, overlooking suboptimal or low responder women, who might benefit from modifications of stimulation strategies or therapeutic options [2, 21]. The Poseidon criteria categorize women according to age, ovarian reserve parameters (antral follicle count and/or AMH) and oocyte yield in previous OS cycles, similar to the Bologna criteria. However, the Poseidon criteria set 35 years as the cutoff level for female age and include patients with 4–9 oocytes despite optimal pre-stimulation parameters (AFC ≥ 5, AMH ≥ 1.2 ng/mL). Currently, there are no data comparing the clinical relevance of the Bologna and the Poseidon criteria.

A critical appraisal of the current literature shows that clinicians and researchers are reluctant to incorporate the Bologna criteria into clinical practice and research [10]. Among over hundred trials published since the introduction of the criteria, only half applied the Bologna criteria, whereas the others used 12 different arbitrary definitions. Similar preferences were recorded for the ongoing and unpublished trials. These numbers indicate the unfortunate loss of a substantial amount of data from thousands of women that could otherwise have been combined and analyzed to derive more reliable solutions regarding the efficacy of tested interventions and treatment strategies.

Our analysis was strengthened by the large number of patients treated in a single IVF center but its retrospective nature was the major weaknesses. We did not evaluate the effect of starting doses of gonadotropins and stimulation protocols on ovarian response and clinical outcome among the groups. Another limitation of the study is the absence of data regarding the significance of the Bologna criteria in the prediction of clinical outcome for women who are entering their very first treatment cycle. Based on the Bologna criteria statement that young women should not be considered as poor responders based on a single OS response, here in this study, we compared women who had at least two consecutive OS cycles.

No definition system is without its shortcomings and blind spots, and the Bologna criteria are no exception. Our study shows that they are clinically relevant and predictive for ovarian response and live birth in subsequent treatment cycles, whereas the Poseidon criteria are waiting to be tested clinically. There is no time like the present to reignite the debate on how to define poor responders, whether we would continue using the Bologna criteria, or modify them according to the addressed issues, or switch to the Poseidon criteria. Considering the heterogeneity of the patients who respond poorly to ovarian stimulation, researchers should rather adhere either to the Bologna or the Poseidon criteria when designing clinical trials on poor responders. It would be far more applicable and useful for other researchers if they provide detailed description of patient characteristics. The consistent use of a single definition would facilitate gathering of valuable clinical data from various studies to draw reliable conclusions and translate them into clinical practice.