FormalPara What does this study add to the clinical work

Our findings will guide further studies on predictive ability of AMH and facilitate clinicians note in mind that higher AMH levels in advanced-age women links to live birth more compared to younger women.

Introduction

Age is one of the most vital predictors of outcome in assisted conception. However, age alone is incapable of predicting assisted reproduction technology (ART) outcome precisely. As a competitive candidate for ART outcome prediction, (AMH) has been widely used as a promising marker of ovarian reserve and ovarian response. It is also strongly correlated with the number of retrieved oocytes of women undergoing ovarian stimulation [1,2,3,4,5]. Expressed by granulosa cells from pre–antral and antral follicles [6], AMH acts to reduce both primordial follicle initiation and follicle sensitivity to follicle–stimulating hormone (FSH) by inhibition of aromatase [7]. Previous meta-analyses from Tal et al. and Iliodromiti et al. with 5373 women and 6356 women, respectively, presented that AMH had weak predictive ability in predicting implantation and clinical pregnancy and poor accuracy in predicting live birth [8, 9]. Till today, plethoric researches focusing on the association between AMH and live birth after assisted conception have been conducted, and the evidence should be updated. Moreover, Wang and colleagues found that AMH had limited predictive value for IVF outcomes regarding extremes of female reproductive age [10]. However, previous meta-analyses had not investigated the impact of age on AMH predicting live birth among women undergoing in vitro fertilization (IVF) or intracytoplasmic sperm injection (ICSI). Meanwhile, results from women with low ovarian reserve were not conclusive due to small sample size (n = 542 women) and needed to be substantiated in more extensive studies [8].

To further assess the predictive capacity of AMH for live birth in women undergoing IVF/ICSI and to provide insights on the modulating effect of age, we performed an updated systematic review and meta-analysis of all eligible studies. To explore the predictive ability of AMH for live birth among different subpopulations of infertile patients, we separately analyzed studies including only women with diminished ovarian reserve and those including women with unspecified ovarian reserve. In addition, we divided original data of enrolled studies according to age and evaluated the effect of age on AMH predictive capacity for live birth.

Methods

This systematic review and meta-analysis was conducted according to the PRISMA guidelines [11]. A well-established structured procedure was followed from the start of this study.

Eligibility criteria

Studies were included if they met the following criteria: (i) women of reproductive age undergoing IVF/ICSI cycles with any stimulation protocols; (ii) serum AMH was measured before ovarian stimulation; (iii) live birth outcome was recorded for all participants; (iv) any study design other than case reports. Additionally, studies referring to oocyte donation programs were excluded.

Literature search and selection strategy

The following databases were searched: PubMed, Embase, Medline, and Web of Science. The systematic search was performed using combinations of the following keywords: “live birth” (MeSH: live birth, pregnancy, ongoing pregnancy) and key words “anti–müllerian hormone”, “AMH”, “müllerian–inhibiting substance”, or “müllerian–inhibiting factor”. Studies published up to June 2021 were included, and there was no language restriction. Two researchers (N.J.L. and Q.Y.Y.) screened the abstract of all identified studies independently. Any disagreement between the two researchers was resolved with discussion. If a study met the eligibility criteria, it was included in the systematic review. If a study displayed data to construct a 2 × 2 table, in which a specific cut-off value of AMH level was related to live birth outcome, the study was selected for final inclusion in the meta-analysis. If a study was chosen for the systematic review but had no extractable data, an email requesting for data would be sent to the author. If the author did not reply, the study was not included in the meta-analysis. The article and data would be included in the meta-analysis when a study did not provide wanted data but could be extracted using a plot digitizer.

For each study, the first author, year of publication, number of cycles, number of patients, stimulation protocol, mean/median age of the patients, suggested cut-off point of AMH (converted to ng/ml using the conversion formula 1 ng/ml = 7.14 pmol/l), AMH assay used, number of live births below or above the cut-off point, study design and patient selection were extracted.

Quality assessment

Each study was assessed based on the QUADAS–2 checklist to judge the risk of bias and applicability of primary diagnostic accuracy studies [12]. QUADAS–2 checklist consists of four main domains: patient selection, index test, reference test, and flow of each study. A funnel plot, which plots estimates of diagnostic accuracy against sample size, was constructed to visually assess the risk of publication bias. A linear regression of log diagnostic ratios on the inverse root of effective sample sizes was performed to quantitatively assess publication bias, where a non-zero slope coefficient (p < 0.10) suggests significant asymmetry and slight study bias. Because this meta-analysis used only published data obtained from online resources, no approval from the institutional review board was required.

Data analysis

The statistical analysis was performed using the Stata/SE (version 12.0, Stata Corp, USA) software. We built a 2 × 2 contingency table for each study consisting of true positive, false positive, false negative, and true negative based on accordance between live birth and AMH levels. Using the random effects model or fixed effects model with metan command, the pooled estimate for live birth among the participants with AMH below and above a cut-off point was calculated. A summary estimate of diagnostic odds ratio (DOR) and 95% confidence intervals (CI) were generated. The DOR compiles the diagnostic accuracy of the AMH tests. It elucidates the odds of AMH above a particular cut-off value among women with live births regarding the odds of AMH below the cut-off value among women without live births. Heterogeneity between studies was measured by I-squared. High heterogeneity was delineated when I-squared was greater than 50% and should be appropriately treated by finding its source. Given that ethnicities may result in variations between studies [13], we further excluded studies referring to the Chinese population to minimize heterogeneity. Given that PCOS related to AMH levels [14], we perform analyses according to whether studies had excluded women with PCOS.

To further explore the predictive capability of AMH on live birth in different populations, studies were categorized into those with unspecified ovarian reserve and those with diminished ovarian reserve. “Unspecified ovarian reserve” stands that the ovarian reserve of subjects is not stated clearly. In addition, using the midas command, a summary receiver operating characteristic curve, sensitivity, specificity, positive and negative likelihood ratio were generated by fitting a two-level mixed logistic regression model restricted to sensitivity and specificity of each study and a bivariate normal model for the logit transforms of sensitivity and specificity between studies. A hierarchical model was used to estimate the characteristics of the receiver operating characteristics (ROC) curve and DOR. The hierarchical summary receiver operating characteristics (HSROC) and study-specific estimates, with a no-discrimination line, were plotted. If the 95% prediction region reached the line of no discrimination, the predictive accuracy of AMH on live birth was considered none. For stratified analysis, we built a new 2 × 2 contingency table in which live birth results were categorized by AMH levels and age. Thus, studies with strict participants’ age categorization or limited to advanced age were included in the stratified analysis. Similarly, random effects model, hierarchical model and HSROC were used in the stratified analysis. No specific cut-off value of age was used. Using the macro “METADAS” in SAS 9.4, we further used the hierarchical model to estimate the statistical independence of AMH by observe the relative diagnostic odds ratio (RDOR) after adjusting for age and AMH assay. Median or mean age of each study was used and treated as continuous variable. DSL and GEN II assay was set as dummy variables. If 95% CI of RDOR included 1, it indicated that DOR of AMH was statistically independent of the adjusted covariate.

Results

Search results

The systematic search retrieved a total of 880 articles through PubMed, Embase, and Web of Science. After title and abstract screening, 104 articles were selected and 39 were included in the systematic review (Fig. 1). Five of these were excluded from the meta-analysis as extraction of relevant data was not possible even after contacting the authors [10, 15,16,17,18]. One study [19] was excluded because the data were included in another study [20], which contributed to the meta-analysis. Two articles were excluded as the participants had not undergone IVF/ICSI cycles [21, 22]. Two studies were excluded from the meta-analysis as their original data had been obtained from SART CORS database and may had repetition with other included studies [23, 24]. Two studies were excluded due to lack of cut-off value of AMH levels [25, 26]. One of the studies was in French [27]. Two studies had enrolled women with advanced age only [28, 29]. Four studies excluded women with polycystic ovary syndrome (PCOS) [30,31,32,33]. Five studies had participants categorized not only by their serum AMH level but also by age groups [33,34,35,36,37], which allowed us to extract original data stratified by age for further analysis. Finally, 27 studies were included in the quantitative meta-analysis. The characteristics of the studies included in the meta-analysis are listed in Table 1.

Fig. 1
figure 1

Flow diagram of the systematic review and meta-analysis: search and study selection

Table 1 Characteristics of the studies included in the meta-analysis

Accuracy of AMH in prediction of live birth

We presented data on 27,911 cycles (27,029 women) undergoing IVF or ICSI. First of all, the univariate pooled DOR of all 27 studies for AMH predicting a live birth was 2.14 (95% CI: 1.85–2.48) (Fig. 2). The estimated I-squared was 73.0%, suggesting high heterogeneity between the studies. To reduce the heterogeneity, we first categorize studies by ovarian reserve. The studies were categorized into women with diminished ovarian reserve (n = 2981) and those with women with unspecified ovarian reserve (n = 24,048). The pooled DOR among women with unspecified ovarian reserve was 2.15 (95% CI 1.88–2.45) (Fig. 3a). The estimated I-squared was 62.3%. The pooled DOR for women with expected low ovarian reserve was 2.45 (95% CI 1.19–5.02). The estimated I-squared was 84.2% (Fig. 3b). The subgroup analysis by ovarian reserve did not lower the heterogeneity between studies. Pooled DOR of studies which had not excluded women with PCOS was 2.29 (95% CI 2.05, 2.58) (I-squared 37.5%) (Supplementary data, Fig. S1), while pooled DOR of studies which had excluded women with PCOS was 1.52 (95% CI: 0.91, 2.52) (I-squared 90.0%) (Supplementary data, Fig. S2).

Fig. 2
figure 2

Forest plot of diagnostic odds ratio (DOR) of all 27 studies

Fig. 3
figure 3

a Forest plot of diagnostic odds ratio (DOR) of all 21 studies including women with unknown ovarian reserve before being included in the pooled studies. b Forest plot of diagnostic odds ratio (DOR) of all six studies including women with low ovarian reserve before being included in the pooled studies. c Forest plot of diagnostic odds ratio (DOR) of six studies including women with younger age before being included in the pooled studies. D Forest plot of diagnostic odds ratio (DOR) of ten studies including women with advanced age before being included in the pooled studies

Conducted through a hierarchical logistic regression model, the overall DOR was 2.19 (95% CI 1.85–2.58). For women with unspecified ovarian reserve, the DOR was 2.21 (95% CI 1.90–2.56). For women with low ovarian reserve, the DOR was 2.49 (95% CI 1.26–4.90). The DOR of studies which had not excluded women with PCOS was 2.28 (95% CI 1.97, 2.65), while pooled DOR of studies which had excluded women with PCOS was 1.49 (95% CI 0.93, 2.39).

The hierarchical summary receiver operating characteristics (HSROC) were plotted to predict live birth with respect to ovarian reserve (Fig. 4a, b). For all studies, the summary receiver operating characteristics did not cross the no-discrimination line while the 95% CIs were on the margin. The summary estimates of overall 27 studies for AMH and live birth were sensitivity of 78.1% (95% CI 70.4–84.3%) and specificity of 38.0% (95% CI 30.3–46.3%). The AUC was 0.61 (95% CI 0.56–0.65). The summary estimates of 21 studies of those with unspecified ovarian reserve were sensitivity of 82.6% (95% CI 76.5–87.4%) and specificity of 31.7% (95% CI 25.7–38.4%). The AUC was 0.59 (95% CI 0.54–0.63). The summary estimates of 6 studies of those with low ovarian reserve were sensitivity of 60.5% (95% CI 35.0–81.3%) and specificity of 61.9% (95% CI 43.9–77.1%). The AUC was 0.65 (95% CI 0.61–0.69). The summary estimates of studies which had not excluded women with PCOS were sensitivity of 80.1% (95% CI 73.0–85.8%) and specificity of 36.1% (95% CI 28.0–45.1%). The AUC was 0.63 (95% CI 0.59–0.67) and the confidence region did not cross the no-discrimination line. The summary estimates of studies which had excluded women with PCOS were sensitivity of 60.5% (95% CI 37.1–79.9%) and specificity of 49.4% (95% CI 32.8–66.1%). The AUC was 0.55 (95% CI 0.51–0.60), but the confidence region crossed the no-discrimination line (Supplementary data, Fig. S3). After adjusting for age, DSL assay and GEN II assay, the RDOR was 0.97 (95% CI 0.67, 1.40), 0.76 (95% CI 0.52, 1.10) and 1.26 (95% CI: 0.91, 1.75), respectively, indicating statistical independence of AMH from age and AMH assay used.

Fig. 4
figure 4

Hierarchical summary receiver operating characteristic curve (HSROC) of AMH in the prediction of live birth after IVF/ICSI with 95% confidence region, 95% prediction region and diagonal line of no discrimination. a Women with unknown ovarian reserve. b Women with low ovarian reserve. c Women with younger age. d Women with advanced age

Modulating effect of age

Eleven studies (Table 2) were selected to perform stratified analysis and were categorized into those with advanced age (n = 4479 women) and those with younger age (n = 11,087 women). The univariate pooled DOR of studies with advanced ages for AMH predicting a live birth was 2.15 (95% CI 1.47–3.15) (Fig. 3c). The estimated I-squared was 58.1%, suggesting a moderate heterogeneity. The pooled DOR of studies with younger ages was 1.97 (95% CI 1.51–2.58) (Fig. 3d). The estimated I-squared was 47.3%, suggesting a moderate heterogeneity.

Table 2 Characteristics of the studies included in the analysis

Conducted through a hierarchical logistic regression model, the DOR of those with advanced ages was 2.25 (95% CI 1.62–3.12). For those with younger ages, the DOR was 1.41 (95% CI 0.99–2.02).

The hierarchical summary receiver operating characteristics (HSROC) were also plotted to predict live birth with respect to age groups (Fig. 4c, d). The summary estimates of studies with advanced ages for AMH and live birth were sensitivity of 77.1% (95% CI 62.0–87.4%) and specificity of 40.0% (95% CI 25.4–56.6%). The AUC was 0.63 (95% CI 0.59–0.67). The summary estimates of studies with younger ages for AMH and live birth were sensitivity of 89.7% (95% CI: 83.5–93.6%) and specificity of 14.0% (95% CI 6.7–26.6%). The AUC was 0.63 (95% CI 0.58–0.67).

Heterogeneity resulting from human races

Differences in serum AMH level may be present between Chinese women and Caucasian women [13]. Therefore, to furthermore lower the heterogeneity, we removed the possible source of heterogeneity resulting from human races. Therefore, five cohorts based on Chinese participants were excluded, which is a subset of studies regarding women with unspecified ovarian reserve. The pooled DOR among women with unspecified ovarian reserve was 2.15 (95% CI 1.94–2.38) (I-squared 27.8%). However, the pooled DOR for selected five cohorts was 1.90 (95% CI 1.35–2.68) (I-squared 87.1%). The pooled DOR among women with advanced ages was 2.08 (95% CI 1.22–3.53) (I-squared 42.0%). The pooled DOR among women with younger ages was 1.69 (95% CI 1.02–2.81) (I-squared 51.0%) (Supplementary data, Fig. S4).

Study quality assessment and publication bias

The quality assessment of selected 27 studies is represented as percentage of high, low or unclear bias in each domain assessed by the QUADAS–2 tool (Supplementary data, Fig. S5). Most studies reported live birth per transfer cycle start or per patient [27, 29,30,31,32,33,34,35,36, 38,39,40,41,42,43,44,45,46,47,48,49,50], and one study reported live birth per ovum retrieval [28] and three reported the cumulative live birth rate[20, 37, 51]. The majority of the studies measured AMH using the Beckman Coulter Generation II assay (GenII) assay [20, 29,30,31,32,33, 36, 45, 50, 51], nine studies used the Diagnostic System Laboratories (DSL) assay [28, 34, 38,39,40,41, 43, 44, 52], three studies used the Immunotech–Beckman Coulter (IBC) assay [27, 42, 46], two studies used the Roche kit [47, 48] and one study used the Ansh Lab ELISA kit [37]. Selection bias was present in the majority of the studies. Six studies included only women with low ovarian reserve recognized by advanced age or high FSH or low AMH [28,29,30, 33, 40, 44]. Seven studies excluded women with polycystic ovary syndrome [27, 30,31,32,33, 39, 42], one study excluded couples with severe male factor infertility [42]. Publication bias was assessed using the funnel plot (Supplementary data, Fig. S6). The funnel plot for live birth suggests asymmetry, revealing that studies with smaller sample size or results lacking statistical significance are required. However, the statistical test for publication bias did not reach statistical significance (p = 0.118).

Discussion

This systematic review and meta-analysis of 27 studies (27,029 women) summarized current evidence regarding the predictive ability of AMH for live birth among women undergoing IVF or ICSI. It suggested that AMH had some association with live birth but the predictive ability is weak. High heterogeneity ruled out the reliability of results from random effect model. From hierarchical model, the pooled DOR among 24,048 women with unspecified ovarian reserve was 2.21, whereas the AUC was 0.59. Among 2981 women with diminished ovarian reserve, AMH had the better but still small predictive ability with the DOR of 2.49, whereas the AUC was 0.65. The HSROC model and 95% CIs of the pooled data concerning those with unspecified ovarian reserve did not cross the line of no discrimination, indicating that AMH has some value in predicting live birth among women with unspecified ovarian reserve. In addition, the 95% prediction region, which suggests the confidence region for a forecast of the true specificity and sensitivity in a future study, did not cross the line of no-discrimination either. It indicates that a future predictive value of AMH will be located restricted to the prediction region. Nonetheless, among women with diminished ovarian reserve, the HSROC model and 95% CIs of the pooled data, along with the prediction region, crossed the no-discrimination line, suggesting that AMH was not a suitable predictor for live birth in women with diminished ovarian reserve. From the analysis where age was categorized, the pooled DOR among 5082 women with advanced age was 2.24, whereas the AUC was 0.62. The pooled DOR among 11,087 women with younger age was 1.40, whereas the AUC was 0.53. The HSROC model and 95% CIs of the pooled data of studies with advanced age did not cross a no-discrimination line, suggesting that AMH has some value in predicting live birth among advanced–age women. For women with younger age, the HSROC model and 95% CIs of the pooled data had an intersection with the no-discrimination line, indicating that AMH has no role in predicting live birth among younger–age women. To lower the heterogeneity between studies, five Chinese cohort studies were selected out of pooling, which did not cause dramatic changes in DOR but in the I-squared of random effects model. It suggests that race could be the possible source of heterogeneity between studies. Women with PCOS were excluded in four studies with different study designs. Pooled estimates from excluding studies without PCOS patients was not materially changed compared to the DOR obtained from overall 27 studies. The heterogeneity lowered from 73.0% to 37.5%, indicating studies ruling out PCOS may be a source of heterogeneity. In our results, AMH predicts better in women with advanced age than those with younger age, while prevalence of PCOS is higher in younger women [14], indicating not ruling out PCOS may not confound our results concerning women with advanced age. The DOR from four studies having had excluded women with PCOS was not statistically significant, with high heterogeneity, suggesting the estimate was not reliable and should not be interpreted. More studies concerning women without PCOS are warranted to elucidate the predictive ability of AMH on live birth.

The predictive value of AMH for ART outcomes has been studied in recent years. AMH tends to have a weak predictive ability towards implantation and clinical pregnancy [9]. Consistent with our results restricting to women with unspecified ovarian reserve, a meta-analysis demonstrated that AMH had small predictive effect on live birth [8]. On the contrary, the predictive effect of AMH on women with low ovarian reserve was invalid based on existing researches, hence the results of the meta-analysis in 2014 cannot be substantiated. The discordance between categorization of the ovarian reserve may result from oocyte quality. While studies showed the independent value of AMH for live birth in patients with low ovarian reserve [15, 40, 42, 45, 52, 53], Pereira et al. showed that AMH was not associated with live birth rates in patients aged under 35 years but with diminished ovarian reserve [30]. These findings along with our results would suggest that the predictive effect of AMH for live birth not only focuses on ovarian reserve, but also on oocyte quality, which both decline with age [54]. However, several studies did not find an association between serum AMH and oocyte or embryo quality [49, 51, 55,56,57,58,59,60,61,62,63,64,65,66,67], while others found a positive association [16, 41, 43, 68,69,70,71,72,73,74,75]. An animal study demonstrated that AMHR II is expressed in both oocytes and cumulus cells and supplementation of 100 ng/ml of rh–AMH into IVM medium together with FSH and EGF improves oocyte quality [76]. Two studies reported a positive association between follicular fluid of AMH and oocyte or embryo quality [58, 77]. Taken together, serum AMH may not strongly associate with oocyte quality, but oocyte quality was positively associated with AMH in culture environing oocytes.

The predictive ability of AMH for live birth was found to be modified by age in the current analysis. AMH had better predictive ability for live birth in women with advanced age. However, the effect of age on the association between AMH and live birth remained contradictory. Several studies demonstrated that age and AMH are independently associated with live birth [4, 38, 40,41,42, 78]. Goswami et al. found that AMH level better predicts live birth following IVF in older women and has limited predictive value in women aged below 35 years, which was consistent with our results [79]. Wang et al. found a positive relationship between serum AMH levels and IVF pregnancy outcomes and the association was modulated by age [10]. Animal studies showed that AMH remained constant in young mice despite growing age as well as declining primordial follicles, while AMH reflects the reserve of primordial follicles in elder mice. Meanwhile, AMH is always associated with growing follicles at all ages [80]. Looking at AMH at all ages in humans, AMH rose to maximum by 15.8 years of age and then remained stable until 25 years of age where it started to decline [81]. Taken together, AMH and its association with live birth are not stable among all ages. The age modulating effect on the association between AMH and live birth may suggest that the extent of oocyte quality decline can be partially compensated by utilizing the excessive ovarian reserve.

In our analysis, the human race was found to be a source of heterogeneity in the pooled univariate analysis, which had been underestimated in previous meta-analyses [8, 9]. Heterogeneity substantially decreased after the removal of Chinese cohorts in univariate analyses, especially in women with unspecified ovarian reserve, and in women with advanced ages but not in women with younger ages. It indicated that the association between AMH and live birth in Chinese women may differ from those in western countries, or say, Caucasian women. Chinese healthy women initially showed higher AMH levels than those in European women but tended to have significantly lower AMH concentrations than those in European women after age 25 [82]. Caucasian women had consistently higher AMH levels than all other ethnic groups until age 35 [83]. These findings may suggest that Chinese women have a higher decreasing rate of AMH levels compared to Caucasian women, which substantiated our results. These findings suggest that racial differences can contribute to heterogeneity among studies on AMH. The predictive power of AMH for live births may also vary by race and should be noted in future studies.

Implications for clinical practice

The clinical application of the present findings is that AMH among different groups of women provides additional information for advanced–age couples considering assisted reproduction. However, the diagnostic accuracy towards live birth remains poor and cannot be treated as a diagnostic test for live birth. In addition, clinicians should note in mind that higher AMH levels in advanced-age women links to live birth more compared to younger women. Although age is a strong denominator in assisted reproduction, the effect of race should not be neglected when judging patient’s possible outcomes with their AMH levels. Sensitivity, specificity, AUC and DOR and cut-off value range from Table 3 could be references for clinical practice according to the subgroup population. However, an optimal cut-off value of AMH level is impossible to calculate in our meta-analysis due to lack of individual patient data.

Table 3 Subgroup analysis and stratified analysis

Implications for future researches

Previous meta-analysis found AMH predicting live birth is independent of age [8], while we found its predictive ability appeared better in women with advanced ages. This suggests that future prediction models should carefully include AMH as an alternative covariate, considering the effect of age. Among advanced-age women, we observed higher prediction ability of AMH on live birth. We speculated that AMH may also limit higher miscarriage rate in elder patients. To obtain the predictive ability of AMH on miscarriage, literature on AMH and miscarriage must be reviewed. In addition, human race already showed significant heterogeneity between studies in the current analysis, which should be treated with caution as a confounder in future studies on AMH. Inclusion of AMH may improve the predictive effects of existing models. However, critical calibration and adjustment of age and race, will increase the reliability and validity in prediction of live birth.

Strengths and limitations

This is an updating meta-analysis exhibiting pooled data of numerous cycles to assess the predictive ability of serum AMH in live birth after IVF/ICSI, meanwhile focusing on the possible modification effect caused by age categorization and race. The strengths of this review lie in sufficient literature searching, in compliance with recent guidelines [84], using robust statistical analysis without language restriction. Although the process of systematic literature review and meta-analysis is a robust way of generating a more powerful estimate of true effect size with less random error than individual studies, it does come with limitations, bias, and heterogeneity resulting from original studies. First, the heterogeneity between studies needs to be addressed as it may affect the justification for pooling the data into one analysis. In the case of the present meta-analysis, heterogeneity may have been from stimulation protocols, AMH threshold, AMH assay, and other baseline characteristics. For the stratified analysis concerning age categorization, heterogeneity may have been from age threshold. The statistical estimation of heterogeneity was high in the random effects model, which could not be interpreted directly. Removal of Chinese cohorts, and studies ruling out PCOS patients lowered the heterogeneity to an acceptable level. In addition, HSROC analysis includes a thorough range of variation in the data, diversity within study from between–study variability and systematic variation from random variability [85]. Secondly, introduced bias may generate from literatures having not been included. We excluded studies due to not extractable data even after contacting the authors [10, 15,16,17,18], and due to data from subgroups from other studies [20, 39]. The largest three of these studies showed positive association between AMH and live birth (n = 1558 [10]; n = 1152 [20]; n = 609 [18]), while the others with small sample size showed null association (n = 213 [39]; n = 128 [15]; n = 83 [16]; n = 192 [17]). In addition, the asymmetry in the funnel plot for this meta-analysis showed small studies with negative or not statistically significant results may be missing. Studies with smaller sample size are generally conducted with less methodological rigor. Statistical error from sample size is estimated by re-parameterization of the asymptotic estimator. Egger’s test was used to assess the statistical significance of sample size effect. As a result, the test was not statistically significant. Another limitation was the use of different cut-off values for AMH among studies. However, a single threshold should not be used due to different clinical characteristics such as AMH assay and race. The studies included in the meta-analysis reported AMH according to the DSL, GEN II, IBC, Roche, or Ansh Lab kit assays. As the DSL and IBC assays do not give comparable values due to different pairs of monoclonal antibodies, the conversion of formula of the DSL assay data into IBC values of 2.02*DSL = IBC has been used for data aggregation studies [86]. The AMH value measured with GEN II assay has been shown to be significantly lower compared with the DSL assay [87]. A good correlation was found between the Roche AMH and Gen II ELISA methods for the entire measuring range [88]. The Ansh Lab kit assay was found to have similar performance characteristics to the GEN II assay [89, 90]. However, we adjusted the AMH assays in the hierarchical model and the result was not materially changed. Consistent with our result, previous meta-analysis showed slight change in DOR for both women of unspecified ovarian reserve and all women after adjustment for AMH assay [8]. This may suggest the prediction value of AMH of live birth is irrespective of the assays used. The included studies measured serum AMH at different time points, which may confound the results. However, AMH has been proven to be stable throughout menstrual cycles [86, 91, 92]. We cannot obtain an optimal AMH cut-off value due to lack of individual patient data. Pooled analysis with individual data of studies are needed to obtain an optimal cut-off value.

Conclusion

Based on the current evidence, we found that AMH had limited value in predicting live birth, however, with the modification effect of age, and under the influence of race. Despite the 95% CIs and prediction 95% CIs not crossing the no-discrimination line, the predictive ability was still limited and should not be overestimated. This study did not aim to seek an applicable threshold to determine the possible live birth outcome based on AMH among women undergoing assisted conception but to provide evidence for future researches. Thus, AMH may have some clinical value in counseling women undergoing fertility treatment regarding their live birth outcome, particularly for those with advanced age.