Introduction

As women continue to delay childbearing for various reasons, ovarian reserve assessment has become an important tool to predict the response to stimulation and likelihood of pregnancy and live birth in women undergoing treatment for infertility and is almost universally used in the basic work-up of those presenting with infertility. Currently, the most commonly used ovarian reserve markers include anti-Müllerian hormone (AMH), follicle-stimulating hormone (FSH), and antral follicle count (AFC). However, utilization of AFC is limited by the need for a skilled ultrasonographer and the invasive nature of the transvaginal ultrasound required for accurate evaluation. Given these limitations, serum AMH is often considered the simplest and most accurate parameter of age-related ovarian follicle depletion and thus oocyte yield in women undergoing controlled ovarian stimulation during assisted reproductive technology (ART) [1]. Additionally, AMH is not as affected by hormone fluctuations throughout the menstrual cycle as are other ovarian reserve markers [2]. While FSH is released by the pituitary gland, AMH is a glycoprotein hormone from the transforming growth factor β superfamily that is released from the granulosa cells of large preantral and small antral follicles and therefore reflects the number of ovarian primordial follicles [3].

While many studies have shown the predictive benefit of ovarian reserve testing (ORT) and the dose-dependent ovarian responsiveness to gonadotropin stimulation predicted by ORT, it is substantially less clear how well ORTs can predict oocyte quality, clinical pregnancy, or live birth (L/B) [4,5,6,7]. A recent meta-analysis and systematic review by Iliodromiti et al. observed that while AMH has a weak association with L/B outcomes in women undergoing in vitro fertilization (IVF), its overall predictive accuracy is poor with a large negative likelihood ratio, and it should not be used alone to predict the probability of L/B. This notion was further validated by Tal et al., who concluded that AMH is a poor independent predictor of L/B outcomes [8]. However, other studies have had inconsistent results with evidence that AMH may be a superior predictor of L/B in women undergoing IVF, particularly when AMH and FSH are discordant [9]. Finally, in the Time to Conceive Study, a prospective study of women aged 30 to 44 years without a history of infertility who were attempting to conceive naturally, ovarian reserve markers did not predict fecundability [10]. However, low ovarian reserve was subsequently shown to be related to a higher rate of pregnancy loss, particularly in older reproductive-aged women [11].

Because of these mixed results, the goal of this study was to explore the predictive value of AMH and FSH in predicting L/B at a single institution in women undergoing their first IVF cycle. All biochemical ovarian reserve testing was performed in the same laboratories using the same assay platform, thus controlling for potential subtle differences seen in various assays used in individual laboratories, which has been cited as limitations in previously published studies. Furthermore, all patients and embryos were handled in identical protocolized fashion at the same laboratory by the same physicians and embryologists.

Materials and Methods

Study Design

This was a retrospective study using verified data from the Society for Assisted Reproductive Technology Clinical Reporting System (SART CORS) database of women undergoing ART at CU ARM between 2017 and 2019. The data from this national repository is in compliance with the Fertility Clinic Success and Certification Act of 1992 (Public Law 102–493), which mandates that all clinics performing ART provide standardized and accurate annual data regarding pregnancy success rates and certification of embryo laboratories. CU ARM is a participating clinic of the SART CORS database. The study was exempt by the University of Colorado Institutional Review Board.

For the present study, the database was interrogated for all ART cycles between the above-mentioned years and was organized to only include the following cycles for analysis: (1) first transfer cycle only of a single autologous fresh or frozen embryo transfer (FET), (2) treatment outcome listed, (3) no use of a gestational carrier, and (4) no use of donor oocytes or donor embryos. Donor gametes and frozen oocytes were excluded. The primary outcome was L/B per transfer cycle, and the primary aim was to compare the pregnancy outcome based on pre-IVF levels of FSH and AMH. Secondary outcomes of interest included the lowest AMH values and highest FSH values resulting in a L/B within the various SART age groups as well as analyzing other pregnancy outcome factors including age, gravidity, parity, and body mass index (BMI) as these covariates are all thought to contribute to pregnancy outcome. The ovarian reserve markers collected at CU ARM are all examined in the same fashion: AMH levels are analyzed by the Associated Regional and University Pathologists, Inc. (ARUP) Laboratories (Salt Lake City, Utah), and FSH levels are analyzed by the University of Colorado Health Laboratory at the Anschutz Medical Campus. In order to be offered ART at CU ARM during the studied time period, previously established age-specific ovarian reserve thresholds were used. The necessary minimum AMH values at CU ARM ranged from 0.5 ng/mL if 35 years old or younger to 1.0 ng/mL if 41 years old or older, and the maximum FSH values ranged from 14 to 12 IU/L if 35 years old or younger or 41 years old or older respectively.

Statistical Methods

Fisher’s exact tests were used to identify differences in pregnancy outcome by age group. The pregnancy outcomes of spontaneous abortion (SAB), implantation failure, or biochemical pregnancy were combined into the “no live birth” group for analysis. The outcome was classified as a SAB when there was a loss after an initially visualized intrauterine pregnancy seen on ultrasound, and a biochemical pregnancy occurred when a patient had an initial positive pregnancy test but no signs of pregnancy apparent on ultrasound. Kruskal–Wallis tests were used to assess whether there were statistically significant differences in AMH or FSH across the age categories. Poisson regression was used to assess if there were significant differences in the number of total two pronuclei (2PN), oocytes retrieved, or embryos cryopreserved across the age categories. Linear regression was used to assess whether body mass index (BMI) or history of a live birth was significantly associated with FSH or AMH. In these models, FSH and AMH were natural log transformed. Finally, significant predictors of L/B were identified using logistic regression. We used the Area Under the Receiver Operator Characteristic Curve (AUC) to quantify how well AMH and FSH could accurately classify the pregnancy outcome. The Receiver Operating Characteristic (ROC) curve plots the sensitivity against the 1-specificity at various thresholds of the predictor. In other words, the ROC curve evaluates how well various cut points of AMH or FSH are able to correctly discriminate between a live birth or not. We compared the AUC for AMH and FSH independently to random chance (i.e., AUC significantly greater than 0.5) to assess whether these measures performed better than random chance. Additionally, we tested whether the AUC of a model with AMH or FSH only, age with AMH or FSH, or age with AMH or FSH and their interaction significantly improved discrimination compared to a model with age only.

Results

Demographics

A total of 1083 entries were identified in the SART database from 557 distinct patients. Of these, 95 were excluded due to donated oocytes, donated embryos, and/or thawed oocytes; 3 were excluded due to missing pregnancy outcomes; 40 transfers were cancelled; 1 oocyte retrieval was canceled; and 187 cycles were beyond the first transfer. The reasons for a transfer being cancelled included an inadequate endometrial response (33/40, 82.5%), concurrent illness (1/40, 2.5%), high response (2/40, 5%), and patient withdrawal (4/40, 10%). The final number of participants who underwent their first transfer cycle and had both AMH and day 3 FSH values collected as well as pregnancy outcome recorded was 270 (Fig. 1, consort diagram). Of these, the mean age was 33.93 years (SD: 4.12) with only two patients 42 years or older. Therefore, those who were age 41 years or older were analyzed in the same group. The mean BMI was in the overweight category (BMI of 26.74 (SD: 5.79) and most women were nulliparous (n = 214 (79.3%)). The most common reason(s) for infertility in this population included ovulatory dysfunction (n = 101, 37.4.%), male factor (n = 101, 37.4%), and unexplained (n = 61, 22.6%). Additional reasons included endometriosis (n = 11, 5.3%), tubal factor (n = 52, 19.3%), uterine factor (n = 4, 1.5%), hypothalamic amenorrhea (n = 3, 1.1%), diminished ovarian reserve (n = 35, 13.0%), and/or premature ovarian failure (n = 1, 0.4%).

Fig. 1
figure 1

Consort diagram. Flow diagram of patients included in the study

Transfer Characteristics

Of the patients undergoing their first transfer cycle at CU ARM, the serum AMH level range was 0.31 to 25.0 ng/mL (median 3.46) and the day 3 FSH levels ranged from 1.0 to 15.0 IU/L (median 6.50). AMH levels decreased significantly as age increased (p < 0.001). However, while FSH tended to increase with age, this association was not statistically significant (p = 0.079).

Out of the 270 transfers, 248 (91.85%) of these were frozen embryo transfers (FET) and 22 (8.15%) were from fresh transfer cycles. The mean number of retrieved oocytes was 17.1 (SD:8.6) and the number of retrieved oocytes decreased significantly with age (p < 0.001); this was also significantly associated with AMH levels with higher AMH levels correlating to an increased number of oocytes retrieved (p < 0.001). The mean number of total 2PN and cryopreserved embryos was 10.6 (SD: 5.6) and 6.8 (SD: 4.2) respectively, both of which significantly decreased with age (p < 0.001) (Table 1). Most cycles underwent preimplantation genetic testing for aneuploidy (PGT-A) on some (53.3%) or all (1.5%) of the embryos. BMI and history of L/B were not significantly associated with log(AMH) or log(FSH) (Table 2).

Table 1 Age-related transfer cycle characteristics
Table 2 Predictors of ovarian reserve markers

Pregnancy Outcomes

Overall Pregnancy Outcomes

The overall L/B rate was 58.15% (157/270), which declined with increasing age group (p = 0.011) (Table 3). There was no difference in overall L/B rate between the FET and fresh transfer cycles (58.9% vs. 50%, p = 0.46). While utilizing PGT-A on some or all of the embryos had slightly higher L/B rates compared to the cycles not using PGT-A (60.8% vs. 54.9%), the overall pregnancy outcomes were not significantly different (p = 0.33). Of those patients without a L/B, 12.9% (35/270) had a SAB. The remaining 28.1% of transfers (76/270) had either a biochemical pregnancy or an implantation failure. These latter outcomes were combined into the “no live birth” group for analysis as stated above. There were 40 cycles that were excluded because no transfer was attempted due to the above-mentioned reasons.

Table 3 Predictors of live birth

AMH and FSH Associations with Live Birth

The minimum AMH value observed that resulted in a live birth increased across increasing age groups though the average AMH values did not differ among the pregnancy outcome across all age groups (p = 0.52). Additionally, the maximum and mean FSH observed resulting in a live birth did not differ among the age groups (p = 0.61) (Table 4; Fig. 2).

Table 4 AMH and FSH as predictors of live birth
Fig. 2
figure 2

Live birth rates and AMH and FSH. Association between AMH and FSH (average and extremes) and pregnancy outcome. AMH, anti-Müllerian hormone; FSH, follicle-stimulating hormone; LB, live birth; CI, confidence interval

Logistic Regression

Age was a significant predictor of live birth in the logistic regression models (p = 0.011), with older women having reduced odds of an L/B compared to younger women (Table 3). BMI and history of an L/B were not significant predictors of L/B. Additionally, AMH and FSH were not significant predictors of L/B in the univariable logistic regression. The relationship between AMH or FSH and L/B did not differ by age group.

ROC Analysis

We next evaluated the performance of our markers to determine how well they could correctly classify pregnancy outcomes. The area under the receiver operator characteristic curves (AUCs) for AMH alone or FSH alone did not differ from random chance (p = 0.72 and p = 0.26, respectively) (Table 5). However, the AUC for age categories was significantly different from chance (AUC = 0.60 [95%CI: 0.54, 0.66]; p = 0.002). There were no statistically significant differences between the AUC from an age-only model and the AUCs of a model with AMH or FSH (Supplemental Table 1). We did not pursue a validation because of these results.

Table 5 Area under curve (AUC) for AMH, FSH, and age

Discussion

In this study, we report age, but not AMH and/or FSH, was associated with a live birth outcome among a cohort of 270 women from a single IVF center. These findings extend the growing body of literature that highlights the limitations of these ovarian reserve markers to prognosticate pregnancy outcomes and contrast the recent findings by Tal that showed AMH was associated with cumulative live birth rate independent of age [12]. A major strength, however, of this study is that all patients were treated at the same institution, which helps to control for subtle differences in laboratory evaluation and specimen handling that has been noted as a limitation in previously published meta-analyses that are not able to accurately compare AMH levels between reported years and between various laboratories [8, 9].

Additionally, our findings corroborate the well-accepted fact that AMH is directly correlated with oocyte yield in patients undergoing ART with higher AMH levels predicting a higher oocyte yield (p < 0.0001) [6, 13]. Although one would infer that the higher the oocyte yield the greater the chance of a live birth, this has not been consistently recognized in the current published literature [14, 15]. Similarly, in the present study, while higher AMH was associated with a larger oocyte yield, it did not predict live birth even after adjusting for age (p = 0.52). Furthermore, the AUC for AMH as a performance marker for live birth was 0.72, and there was no significant difference between the AUC from an age-only model and the AUCs of a model with age and AMH (p = 0.20). However, our data was limited to a single embryo transfer per patient over the observation period chosen, and it is possible that cumulative live birth rates might differ. Such a study would require further follow-up of patients for these long-term outcomes.

FSH is the other commonly used serum ovarian reserve marker and is endorsed by the American Society for Reproductive Medicine (ASRM) as well as the American College of Obstetricians and Gynecologists (ACOG) as an acceptable and appropriate diagnostic tool for assessing diminished ovarian reserve in women seeking care for infertility [16, 17]. FSH has been shown to be a specific marker of declining ovarian function and has been suggested by some to predict the likelihood of pregnancy in women undergoing infertility treatment [18, 19] as well as the response to gonadotropin stimulation [20]. However, a meta-analysis by Broekmans et al. observed that basal FSH levels and prediction of pregnancy outcomes in patients undergoing IVF varied widely between the 37 studies analyzed and seemed to only be predictive of poor pregnancy outcomes when FSH was extremely high (> 25 mIU/ml) [21]. Our finding that FSH is a poor independent predictor of live birth across all age groups is in agreement with some, but not all prior work [22].

Our findings are limited by certain considerations. First, during the time-period of the presented report, CU ARM had implemented previously established age-specific ovarian reserve thresholds with minimum AMH and maximum FSH cut-offs beyond which they are not offered ART. These restrictions, while very rarely utilized, precluded us from being able to examine extremes of AMH and FSH in the analysis that may have a poorer prognosis and thus limits the generalizability of the study. Furthermore, most patients studied were in a favorable age range (mean = 33.93 years old (SD: 4.12)) with good ovarian reserve (median AMH = 3.46 ng/mL, median day 3 FSH 6.50 IU/L) again limiting the generalizability of the study to older age groups or those with diminished ovarian reserve. Additionally, the retrospective nature of this study inherently causes a lack of complete data despite the standardization in reporting required by SART CORS. In the present study, nearly 12.4% of all patient charts were excluded because of missing data, most commonly pregnancy outcome, and this could lead to reporting bias. This missing information was attempted to be gleaned from the medical records but was inaccessible. There were also 40 transfers that were cancelled, most commonly due to an inadequate endometrial response (33/40, 82.5%), and were therefore excluded from analysis. Finally, an additional limitation includes the known fact that many other factors that were not studied in this analysis may play a role in a patients’ success of ART including but not limited to race/ethnicity, socioeconomic factors, genetic factors, reason(s) for infertility, semen analysis, and/or medical comorbidities.

Despite these limitations, the findings from this study add to the growing body of literature regarding the limitations of ovarian reserve testing and the poor predictive ability of AMH and FSH and live birth. These limitations, therefore, should be kept in mind when counseling infertility patients regarding their ART outcomes. Results from this study call into question the implementation of ovarian reserve thresholds currently in place at many reproductive endocrinology and infertility clinics, and more research is needed to clarify the appropriate ovarian reserve extremes, if any, where offering ART may be futile.

Conclusion

In conclusion, both AMH and FSH are weak independent predictors of live birth outcomes. Adding AMH/FSH to predictive models that incorporate age does not improve the prediction of live birth. However, more research is needed to determine the appropriateness of such models in patients of older age groups and/or in those with poor ovarian reserve.