Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Introduction

Progress

For pre and perimenopausal women, current clinically available ORTs provide important new benefits (Table 5.1) which primarily has been driven for the last decade by advancements in clinical research, much of which incorporates the use of antral follicle count (AFC) and serum biomarkers such as anti-mullerian hormone (AMH). In the field of assisted reproductive technologies (ART), progress with ORT clinical research has led to improved clinical practice with respect to prediction of ovulatory response [16] and optimization of oocyte retrieval using ORTs that can help more efficiently dose medications [7] and minimize side effects such as ovarian hyperstimulation syndrome (OHSS) [8, 9]. Although not ready for routine, general use, under appropriate guidance by a fertility specialist, women can now obtain noninvasive, widely accessible ORTs that provide clinically useful general information regarding their egg supply and likelihood of menopause being earlier relative to the population average [1013]. The recent advances with ORTs have far ­reaching implications for improvements in ­medical care by earlier detection of primary ovarian insufficiency (POI) and polycystic ovary syndrome (PCOS), counseling regarding use of fertility preservation, assessment of ovarian injury via surgery or medications such as chemotherapy, and monitoring of ovarian-related cancers [14]. These advances can, therefore, directly improve the quality of life for many women and their partners through better medical management as well as more informed ­decision making across a wide spectrum of medical topics.

Table 5.1 Overview of clinical applications of ORTs

Challenges

Although correctly interpreted ORT results currently have great potential to benefit patients, there remains a significant risk that the results may be misinterpreted either by the clinician or the patient. Furthermore, the medical community remains far from reaching consensus regarding the use of ORTs [15]. The causes for this concerning lack of consensus can be grouped into two major areas: (1) lack of standardization and (2) lack of tests that can assess egg quality. In the generation and application of ORT results, major challenges exist in standardizing testing materials and methods and in widespread definitional differences used in research and clinical care with respect to the phenotypes of patients tested, medical indications, clinical outcomes managed, and selected diagnostic cut points. Overcoming these challenges is further hampered by the testing technology itself which currently only demonstrates strong prediction of oocyte quantity not oocyte quality, both of which are needed for full assessment of ovarian reserve and chances of pregnancy. Underlying the difficulty in connecting ORT results to oocyte quality is that oocyte quality ultimately is proven by the success of an oocyte to develop into a healthy baby which is a process that requires many other factors in addition to oocyte quality.

Definitions

Ovarian Reserve

A woman’s reproductive potential, as determined by her oocyte quantity and quality, is often defined as her ovarian reserve. Although multiple factors contribute to a woman’s ability to have a baby, an assessment of ovarian reserve allows approximation of a woman’s fertility potential as it relates to the contribution from her oocytes. However, currently, no test can definitively ­determine how many oocytes a woman has and/or which oocytes are capable of conceiving an embryo that can become a healthy baby. Therefore, ovarian reserve functionally is defined in the literature by those clinical outcomes that can be measured. The advent of the ART field has provided an artificial circumstance that allows measurement of a wide variety of clinical outcome parameters not available for measure in natural reproduction. In fact, until recently [16, 17], available tests of ovarian reserve were not validated to any natural fertility parameter but were mainly calibrated to surrogates of only ovarian quantity obtained from ART treatment outcomes, such as oocytes retrieved through controlled ovarian stimulation (COS). Although future studies may prove otherwise, when compared to age alone, ORTs have not consistently demonstrated a substantially superior ability to predict chance of spontaneous conception or live birth rate with fertility treatment.

Oocyte Quantity

Although recently a question has been raised as to whether human oocytes may be regenerated later in life [18], most data support the concept that oocyte supply is set at birth and is depleted over time [19]. Ironically, as the number of oocytes is not actually measurable directly without removing and dissecting the ovaries, clinical measurements of oocyte quantity are defined qualitatively. Sonographic assessment of the number of growing follicles appears to correlate with the total number of oocytes as quantified histologically [20]. Another quantifiable, clinically available measure of oocyte quantity is the number of oocytes retrieved through COS. In order for response to COS to provide a reasonable assessment of ovarian reserve, gonadotropins must be administered at doses chosen to achieve an oocyte number that maximizes live birth rate without undue risk of OHSS. The number of oocytes retrieved during an IVF attempt functions as a surrogate to approximate the number of remaining oocytes in woman [4, 6, 2124].

Oocyte Quality

Oocyte quality generally refers to the ability of an oocyte to perform its primary function: to produce a healthy baby in conjunction with the genetic material supplied by a sperm. However, the creation of a healthy baby involves a multitude of factors such that the oocyte plays the classic scientific “necessary but not sufficient” role. It is currently difficult to independently measure and accurately quantify the non-oocyte contributions that are required to have a healthy baby such as sperm or endometrial quality. Furthermore, when fewer live births occur than embryos transferred to the recipient woman, it has historically not been possible to definitively link an individual oocyte to the individual baby born. Recently, however, the increased use of elective single embryo transfer and “genetic fingerprinting” of each embryo prior to transfer, allows individual assessments of oocytes or embryos to be linked to their outcome [2527]. Currently, oocyte quality for a woman is not assessed at the individual oocyte level but generally is inferred from calculated rates or averages from clinical endpoints such as fertilization rate, blastocyst formation rate, morphologic assessment of embryo quality, implantation rate, and live birth rate.

Cumulative Live Birth Rate/Total Reproductive Potential

As a concept, the number of oocytes avialable that are capable of producing a healthy baby can be further larger concepts such as cumulative live birth rate or total reproductive potential. There is a growing sentiment that the live birth rate per cycle has perhaps been overemphasized as a measure of fertility treatment success, and instead perhaps more focus should be placed on the cumulative chance of live birth rate over a course of treatment which may include multiple cycles of intrauterine insemination or multiple fresh and frozen cycles of IVF [28]. The term “total reproductive potential” has been introduced and is defined as the chance of live birth from one ovarian stimulation and oocyte retrieval, including the pregnancies from all fresh and all frozen embryo transfers associated with this ovarian stimulation [29]. Nearly all publications to date which have examined the prognostic value of ORTs for IVF cycles have focused on the live birth rate from one fresh transfer, not the cumulative live birth rate over multiple fresh and frozen cycles, nor the total reproductive potential from a single cycle. It would be valuable that future studies examining the prognostic value of ORTs also assess cumulative live birth rate and/or total reproductive potential as outcomes of interest.

Modalities of Ovarian Reserve Testing

Broadly speaking, in conjunction with a proper history and physical exam, there are at least three common modalities of ORTs: imaging, biomarker testing, and ovarian response itself (Table 5.2).

Table 5.2 Qualitative overview of ORT correlation strength to various clinical outcomes

Imaging

Ultrasonography is the imaging modality of choice for testing ovarian reserve generally via a transvaginal ultrasound probe which can provide ovarian volume measurements or antral follicle counts (AFC). AFC is the more commonly used metric in the literature and identifies follicles generally from 2 to 10 mm in diameter [30]. Although AFC is most frequently obtained manually counting follicle diameter, there are efforts to automate the processing of the images to provide count and volume measurements with the thought there would be less user-dependent variability [31, 32]. Ovarian volume has also been considered as a potential ORT, but studies demonstrate it not to be as predictive of ovarian response as AFC [5, 30, 33]

Biomarkers

Biomarker testing primary involves biochemical evaluation of the hypothalamic pituitary ovarian (HPO). A frequently used biomarker historically and currently is follicle stimulating hormone (FSH) which is secreted by the pituitary and is well known to begin to rise early in the menstrual cycle to stimulate follicles to mature and become candidates for ovulation [34]. Excess FSH secretion and follicle stimulation is prevented through subsequent FSH suppression by rising levels of estradiol from oocytes, as well as by the glycoprotein hormone, inhibin B, which is produced by granulosa cells of pre-antral and antral follicles [35]. FSH secretion may vary widely from cycle to cycle (perhaps warranting the nickname “Fluctuating Severely Hormone”), with the prognostic value of the test being most accurate with the highest values [36, 37]. This fluctuation creates the problem that FSH may often be falsely reassuring regarding the status of ovarian reserve [38]. Antimullerian hormone (AMH) is also a glycoprotein secreted by granulosa cells like inhibin B but from early stage follicles and acts to inhibit FSH effects on the follicle [39]. AMH is different from FSH and inhibin b in that levels during the menstrual cycle remain fairly constant when averaged across a population [4044]. However, it should be emphasized that within individuals, there can be significant changes in AMH levels within a cycle [45]. While AMH variability is clinically significant (perhaps also deserving a nickname, “Also Meandering Hormone”) it shows less variability than most other ORTs when remeasured. Lastly, AMH has been shown at a population level to decline gradually in an almost linear fashion [4649], while FSH is known to remain relatively constant or rise slowly until a rapid rise is observed in the perimenopausal stage [19]. An important area of research is to determine within individuals what patterns of AMH decline exist which underlie the gradual age-dependent decline in average AMH values observed at a population level. It also is possible that at some point in the future, genetic markers such as FMR1 will also be tested more routinely to help predict whether a woman is at risk for development of premature depletion of oocyte supply [50, 51].

Ovarian Response

Incorporation of the patient response to the diagnostic process can be assessed with a mixture of medication and multiple biomarker measurements, referred to as dynamic or provocative testing. In addition, the actual outcome of an ART cycle itself has been reported to predict future response in certain patient populations. Commonly cited dynamic tests include the clomiphene citrate challenge test (CCCT) which measures serum FSH just prior and after 5 days of clomiphene treatment beginning on cycle day 5; the exogenous FSH ovarian reserve test (EFORT) which measures serum FSH and/or inhibin B just prior to administration FSH on cycle day 3, then measured again 24 h later [1, 5, 5254]. Attempting to incorporate patient response into the diagnostic assessment is expensive and logistically difficult which likely has decreased the prevalence of the use of this modality. However, ultimately, the number of high quality oocytes retrieved in COS may be considered one of the major clinical outcomes of interest and closest surrogate for quantitative aspects of ovarian reserve. Thus, the patient’s response to COS itself serves as a helpful modality to assess ovarian reserve [55, 56].

Multivariate Approaches

As more ORTs become available and more patient subphenotypes are defined, the clinician is faced with an increasing number of variables. This presents the challenges of answering which tests are most predictive of the outcome of interest, are several tests better than one test, and how should the tests be weighted? The reality is each clinician uses a multivariate approach when making daily decisions, often referred to as the “art of medicine.” The clinician must intuitively weight dozens of variables contained in the past medical history, age, and physical exam with the ORT results but without clear data about how many of these inputs change accuracy. Attempts are now being made to potentially improve the performance of ORTs by combining them mathematically in algorithms to allow optimized weighting and produce clinically usable information [31, 5761]. The same issues that prevent consensus with single ORT use are magnified with use of index scores and multivariate approaches—which makes it even more difficult to compare studies. Currently, the gains shown by published studies are modest at best for use of ORTs and age at predicting COS response and success of ART treatments and have conflicting conclusions. Meta analyses that seek to combine data from multiple centers and laboratories can be problematic given the heterogeneity of the testing methods, patients, and treatment protocols and it is not surprising that they obtain results that show poor associations [6264]. Yet, if multivariate models are used to synthesize consistent ORT methodology, patient populations, and treatments, it is quite possible that the information obtained from combining biomarkers, imaging techniques, and genetic variants, will be more informative and easier to apply clinically.

Current Clinical Applications of Ovarian Reserve Tests

Descriptions regarding the current clinical applications of ORTs (Table 5.2) are provided below but there are certain caveats that apply almost uniformly to these applications:

  • First, the wide variety of definitions used for patient populations, exposures, ORT selection, and methodology, prevents any actual cut points from being generally recommended without first defining the aforementioned variables precisely.

  • Secondly, ORT values exist on a continuum and can fluctuate within individuals due to inherent biological variability, such that single measurements can be misleading with frequencies that depend upon the ORT and patient population. Thus, cut points for ORTs, which are useful to compare assays or establish clinical algorithms, should be used cautiously and the reliance on one ORT modality should be avoided for definitive management decisions.

The consequence of these caveats is that practical approaches may require more effort by the clinician when initially establishing a clinical strategy to navigating the use of ORTs including (a) gaining an understand from where cut points and value ranges were derived for a chosen ORT source and (b) if that relates appropriately to the clinical outcomes and patient population being managed.

ORTs for Predicting Response To Controlled Ovarian Stimulation

Although additional applications of ORTs are developing and in clinical practice, identifying low and high responders to COS may be the most well-established use. The term “low” rather than “poor” and “high” rather than “good” is selected here to emphasize and focus on the quantitative aspect of response to COS separately from oocyte quality and ART cycle success.

Low Responders

The literature can be confusing as most of the ORTs have studies demonstrating cut points which can yield sensitivities and/or specificities above 80–90 %. There now have been a number of studies that have compared the performance characteristics of most ORTs together, including basal FSH, inhibin B, estradiol, AMH, and AFC. AMH and AFC perform fairly consistently with greater overall correlation to low response than age or other single ORTs, which, given the heterogeneity of study designs, attests to their strong correlation to response to COS [1, 3, 6, 2123, 31, 54, 65]. Although some studies have tried to determine which performs better, AFC or AMH, results have shown fairly similar performances when both ORTs are performed well although some may believe AFC to be slightly better than AMH when in the hands of experienced clinicians [66, 67]. It should be noted, however, that none of the ORTs have demonstrated, through multiple ­publications from several groups, sufficient sensitivity or specificity to predict with certainty the outcome in ART, even for oocyte quantity.

Some studies have shown basal serum FSH to have clinically helpful specificity for poor response [36, 68], with one study showing of up to 100 % specificity but only when a high cut point for normal is utilized and with sensitivity too low to be used alone as an ORT [68]. Basal Inhibin B, initially showed promise as an ORT in studies using the inhibin B system from Serotec, LTD [69, 70]. However, subsequent studies [1, 6] failed to reproduce similar accuracy for inhibin B, commensurate, interestingly, with the lack of availability of the Serotec platform. Dynamic or provocative tests such as CCCT and EFORT (using both FSH and inhibin B) [2, 7173] have consistently shown clinically useful sensitivity and/or specificity often superior to other single ORTs. However, the requirement for two measurements and medication has likely led to minimal use, especially when evidence exists that a single measurement of a single ORT may have sufficiently similar accuracy [54].

High Responders

Certain ORTs consistently demonstrate the significant ability to predict, independently of age, which women will likely be high responders to COS which has important benefits to reduce complications of excessive response (e.g., OHSS and cycle cancelation) and also to reduce consumption of gonadoptropins. There are now numerous studies demonstrating clinical utility of ORTs with respect to a wide variety of definitions of excessive response including high estradiol levels, withdrawal of stimulation (“coasting”), cycle cancelation, high number of oocytes retrieved, and more severe conditions associated with OHSS such as accumulation of ascites and hospitalization [8, 54, 62, 7476]. For example, in a study of 110 patients with excessive response defined as greater than 20 oocytes retrieved, investigators could demonstrate that an AFC cut point could select 11 % of patients and identify hyper-response with 50 % and 96 % sensitivity and specificity, respectively [54]. Using moderate and severe OHSS as a clinical outcome, in a study of 262 patients, an AMH cut point which identified 25 % of the patients also performed with 91 % sensitivity and 81 % specificity, respectively for OHSS [8]. However, despite the variation in the definition of excessive response outcome and also variation with cut point selection, AFC and AMH showed across multiple studies clinical helpful performance characteristics and frequently performed better than most other ORTs for both sensitivity and specificity. As both AMH and AFC measurements exist along a continuum, for practical implementation, one must chose the definition of excessive response and identify internal thresholds for management changes.

Oocyte Quality, Live Birth Rate in ART

With respect to ART treatments, the studies performed to date have not demonstrated with sufficient consistency or robust predictive power a clinically helpful relationship between ORT results and oocyte quality or pregnancy success that is widely applicable with specific cut points [19]. That said there have been studies which demonstrate remarkable results in specific circumstances that could dramatically help guide care. For example, in a study of serum basal FSH measurements in over 8,000 cycles from one center with a single FSH measurement source, FSH thresholds could make clinically helpful, age group specific, robust predictions of chances of live delivery per ART cycle start along a continuum of values [68]. Values above certain thresholds demonstrated 100 % specificity for failed cycles although those thresholds only identified about 1 in 30 women tested above 40 years of age and 1 in 324 women tested under age 35. However, other differently structured studies arrive at different conclusions such as FSH being valuable predicting live birth only in certain age groups [77] versus no ability to predict live birth better than age alone as concluded by a recent ­meta-analysis which used 28 databases to ­aggregate data from 5,705 IVF patients and multiple FSH diagnostic platforms [64]. A number of studies indicate that AMH or AFC levels do not predict treatment success [21, 78]. This conflicts with the findings other published findings [60, 76, 79] including a recent study externally validated an AMH-based live birth prediction model to, independent of age, predict live birth in 822 patients with statistical significance, although the confidence intervals were wider than some may view as clinically helpful [59].

The lack of consensus and conflicting medical literature is not surprising given the multifactorial nature of embryo development into a healthy baby. However, the heterogeneity of study designs, an inability to control for confounding variables, and insufficiently robust biological association of ORTs to live birth rate, presents serious hurdles to overcome in the quest for consensus. Thus, applications of ORTs in predicting live birth currently must remain a user-defined, site-specific approach. Future studies that examine the prognostic value of AMH, AFC, or other tests on cumulative live birth rate or total reproductive potential as described above are needed. It is quite plausible that any measure that predicts oocyte number of retrieval may be a better predictor of the success of fresh and frozen embryo transfer combined, than it would be of fresh cycles only because more embryos are likely to be frozen if a greater number of oocytes are retrieved.

Overall Fertility and Recurrent Pregnancy Loss

Clinical justification for ORT use in the general population to assess fertility is beginning to appear. Evidence is mounting that infertility is associated with lower ORT values as demonstrated by lower AFC in 881 infertile women without PCOS compared to 771 women without the diagnosis of infertility [16]. In another prospective study of 100 general population women attempting to conceive, early follicular phase AMH was shown to predict fertility rates [17]. Thus, it appears promising that ORT results will play a future role in fertility assessment of the general population.

Data on miscarriage and ORTs are scant and primarily derive from patients receiving ART treatment. One retrospective study showed no association with highest serum basal FSH and fetal aneuploidy [80] in 177 spontaneous miscarriages associated with 70 euploid and 107 aneuploid offspring. No association with AFC, FSH, and CCCT was demonstrated prospectively comparing values in 77 women with pregnancy loss versus 233 with ongoing pregnancy [52]. However, AFC was shown to be predictive of only first trimester loss in 67 patients with miscarriage compared to 247 controls with ongoing pregnancy, although the overall association was weak with an ROC curve AUC of 0.588 [81]. Recently, in a study of women undergoing aneuploidy screening of embryos followed by IVF of 279 women, those with reassuring FSH and AMH values generated lower rate of all aneuploid blastocysts compared to 93 women with concerning FSH and/or AMH (35 % vs. 14%, P  <  0.001) [82]. It was further noted that when both FSH and AMH were concerning, the highest percentage of aneuploid blastocysts was observed (77 %) compared to only one being concerning (58.5 %, 58.8 %) and both reassuring (51.7 %). Thus, it appears that ORTs may be useful in predicting increased risk of miscarriage.

PCOS, POI, and Menopause

As research has advanced and ORTs such as AFC and AMH have become more widely used, helpful clinical information for patients can be applied to help identify, diagnose, and manage other diseases and processes not strictly related to attempts to have a child.

AMH is now also being proposed by some as an alternative criterion to diagnose women with PCOS or to identify women at high risk for PCOS [83]. One recent study, which included by 66 women without PCOS or polycystic ovaries and 62 confirmed PCOS by hyperandrogenism and oligomenorrhea, identified an optimized AMH cut point demonstrating 92 % sensitivity and 97 % specificity for PCOS [84]. However, the use of AMH in this context remains controversial and has not been adopted in official criteria for PCOS diagnosis.

Perhaps the most exciting developments relate to early detection of POI and long-term prediction of the menopausal transition and menopause. As AMH and AFC levels, at a population level, demonstrate a gradual almost linear decline [4649], these ORTs have applications in both early detection of POI prior to symptoms and long-term prediction of menopause onset. Earlier identification of women at risk of POI may help them avoid the most severe consequences of this disease such as missing the opportunity to have children with their own eggs as well as other complications associated with early menopause such as bone loss and increased cardiovascular events [14].

Although AMH, AFC, FSH, and Inhibin B have all been published as being able to add significantly more predictive power to prediction of menopause than age alone, AMH and AFC appear to show the better performance characteristics [1113, 85]. Furthermore, it may be rates of change are more predictive than single measurements [85, 86]. As increasing amounts of individualized longitudinal data are becoming available, the confidence intervals around age of the predicted last menstrual period are becoming narrower [87]. Subphenotypes may be further defined that can increase predictive information, such as genetic interactions with ovarian reserve. For example, one study of 240 women indicated that FMR1 repeat length was associated with a 54 % difference in AMH level [50]. Another recent study identified several genetic markers in 450 women that were associated with ovarian follicle number and menopause [88]. At this juncture, the published literature on menopause prediction appears sufficiently consistent such that, if a women has an AMH or AFC value very low for her age using a well-calibrated testing source, it would be questionable not to alert her at least about the increased possibility of earlier than average menopause. This knowledge can allow a woman to proactively address her desired plan for future childbearing. In addition, a woman with ORT results substantially low for her age can proactively address the risk of long-term medical issues such as osteoporosis, cardiovascular disease, and certain forms of cancer which are more prevalent in women with early menopause [14].

With the availability of clinically validated egg preservation technologies, there is now the ability to dramatically increase the length of time a woman has to have a child with her own eggs [89]. This significant advancement has clear immediate application to preserve eggs, for example, prior to receiving ovarian toxic treatments such as chemotherapy [90]. However, the combination of egg preservation and the developing predictive power of ORTs, presents society with the double-edged sword of providing a safety net for possible future ovarian reserve-related infertility, but the risk of encouraging women to delay natural attempts at conception.

Exogenous Hormone Use

Influence on AMH levels by exogenous hormones has been clearly demonstrated [91]. While some publications suggest that oral contraceptive pills (OCPs) do not affect AMH or AFC levels [92], it now is becoming clear that OCPs such as monophasic estrogens can lower AMH and AFC levels [93, 94]. In one study of 25 women on OCPs for more than 3 months significant improvement in AMH and AFC parameters were observed after the second menstrual cycle without OCPs [95]. This was confirmed in a complementary study with 44 women off OCPs for at least 3 months who showed an average reduction of approximately 50 % in AMH by week 9 of OCP use [96]. This indicates that if a woman has a concerning AMH or AFC while on an estrogen OCP, it may be helpful to retest after stopping the OCP use for two cycles if the retesting would change management. However, if the AMH level is reassuring while on estrogen OCPs, the above recently published studies indicate it will likely remain reassuring off OCPs. While there may be logical ways to extract clinically helpful information in certain scenarios with patients taking OCPs, careful attention should be paid to the use of exogenous hormones when interpreting ORTs.

Current Challenges

Biology

One of the biggest barriers for current ORTs in achieving the desired narrowness of confidence intervals for predicting clinical outcomes is the inherent biological flux associated with biomarkers of the HPO axis. If ORT results can fluctuate in clinically significant amounts with some frequency, there is an intrinsic limit to the accuracy of the test regardless of study design, sample size, and uniformity of patient population. It has long been recognized that FSH levels fluctuate dramatically between from cycle to cycle [36, 37]. The recently more popular ORTs, AFC and AMH, receive much focus in part because the average value in the population does not show the same dramatic dependence on the stage of the menstrual cycle as FSH, inhibin B, LH, or estradiol [44, 97]. While this has important logistical benefits by not requiring measurement at a particular time of the menstrual cycle, especially in those women who do not regularly menstruate or have had a hysterectomy, this does not address the larger issue of values being clinically significantly different in the same individual when retested even within the same menstrual cycle. For example, Sowers et al. measured AMH every day of the menstrual cycle, demonstrating a consistent AMH average throughout the menstrual cycle in five groups of five women with similar AMH values [44]. However, closer examination of the data points showed two of five women with similar average AMH values having daily values of approximately 0.6 and 0.75 ng/ml for half the cycle and nearly 2 ng/ml for the other half of the menstrual cycle. This finding was recently observed again in a population of 44 women retested within a menstrual cycle [45].

The other major biologic barrier for ORTs to assess accurately the ability of a woman’s oocytes to produce a healthy baby, is that from fertilization onward, numerous other confounding variables are required in the process. A successful pregnancy depends upon many factors such as a sufficiently healthy sperm and a receptive endometrium. This presents a significant challenge both in the current ability to diagnostically assess these variables accurately and separately, and, statistically, in the number of patients needed to appropriately power studies that would seek to perform the extensive subset analysis required.

Standardization

While the biology of the human reproductive system is difficult to control, the fertility field is challenged with lack of consistency in almost every aspect of ORT study design to the point that the latest American Society for Reproductive Medicine practice guideline concluded that there is no consensus as to the definition of ovarian reserve and the evidence for the tests which measure it is at best “fair” [19]. Substantial variation can be seen in study population phenotypes, treatment regimes, clinical outcomes assessed, choice of ORT(s), and method of analysis, and use of cut points.

When it comes to performance of the ORTs, dramatic differences can exist in the reported value and clinical performance for the same sample depending upon the diagnostic platform chosen (Fig. 5.1). The best example perhaps of this is the history of inhibin B which showed clinically useful performance with the Serotec kit [69, 70] and not with the DSL kit that replaced it, leading to the likely unrepairable clinical distrust of this biomarker [1, 6]. One misconception is that automation and FDA clearance resolve issues with consensus. While FDA clearance and automation improved the assay performance and ease of measurement for serum FSH, this has not led to establishment of consensus regarding FSH testing despite over 20 years of publications regarding its use [19]. Differences in diagnostic platforms are not clear on reports provided to clinicians frequently. These differences can be substantial as in the recent College of American Pathology Surveys 2011 Y-B Ligand publication demonstrated that 434 laboratories produced an acceptable mean FSH value of 34 IU/L while, with the same reference sample, 151 other laboratories produced an acceptable mean value of 19 IU/L, the difference being the FSH analyzer platform.

Fig. 5.1
figure 00051

Effect on reported ORT value by three different sources of variability. When retesting the same patient with an ORT, a minority, but significant fraction of the time one value is clinically different from a patient’s “true” or most representative value. At least three factors can affect this. (1) Biological flux of ORTs can be substantial. (2) Exposures to medications such as oral contraceptive are now known to affect results of ORTs such as AMH and AFC. (3) Although testing methods may have minimal variability within a chosen source, the between source assay differences may be substantial. The affect of any single source of variability can be clinically significant (example 1) and even more so if multiple sources of variability are present and combine I the same direction (example 2)

Unfortunately, for the two ORTs currently receiving the most attention, AFC and AMH, reference standards don’t even exist. AFC is a highly user-dependent modality and, despite attempts at international standardization, there remains some inconsistency in the size of follicle to include in the AFC with obesity further complicating interpretation or rending it impossible [30]. The measurement of AMH has undergone three kit changes (Immunotech, DSL “GenI,” Beckman/DSL “GenII”) in the past 3 years, with a new one arriving on the market shortly along with automated platforms and blood spot tests on the way [98100]. Although the clinical correlations observed with different AMH kits are consistent, different AMH kits often have inconsistent conversion equations published between the others. This makes extrapolation of results from one kit to another risky to interpret a clinical report for a patient without conducting careful validation experiments. It is also very important for clinicians to be aware that values in the literature may have been performed using a different assay and thus may not be readily applicable to the results of their patients. Additionally, the use of the same AMH kit can produce dramatically different values depending on a variety of factors including the treatment of sample and the laboratory methodology. Furthermore, as previously discussed, the influence of exogenous medications such as OCPs were once considered of no consequence now are recognized as significantly affecting ORT results. However, most importantly clinical value ranges, which determine the treatment, are frequently set by the laboratory based upon CLIA requirements to establish a general mean and distribution in a general population and not upon the clinical outcomes being managed by the test. For example, a “normal range” for AMH can be 0 ng/ml to 6.9 ng/ml which spans the gamut of ovarian failure (depleted ovarian reserve) to high risk for OHSS or PCOS (high ovarian reserve).

The above challenges can unfortunately be additive and pose a significant risk of clinically miscategorizing a patient if careful steps are not taken to avoid this (Fig 5.1). Fortunately, there are practical ways to minimize the chance of misguiding care with ORT use.

Practically Optimizing the Use Of ORTs

It’s The Approach, Not Just the Test

The pattern that consistently emerges from literature assessing ORTs is that performance and utility depend upon the user’s decisions regarding patient populations, treatments, ORT selection, and methodology. Furthermore, the value of a particular ORT’s PPV and NPV depends upon prevalence of the clinical outcome in the intended use population, which can vary dramatically, for example, with diminished ovarian reserve in an oocyte donor screening program as compared to counseling a woman about IVF using her own oocytes. Thus, minimizing the risk of misinterpretation of ORT values requires a methodical approach, which may involve some initial effort to establish (Table 5.3). One approach is described below:

Table 5.3 Practical steps to optimize the use of ORTs
  • The first recommended practical step is to recognize that consensus does not currently exist regarding ORT interpretation and utility and expend the effort necessary to establish one’s desired approach.

  • Second, the fluctuation of ORT results and possible sources of error makes important utilizing at least two different ORTs when evaluating a patient. Frequently, this is possible as other ORTs, such as FSH and estradiol, have other utilities in the initial assessment of a patient, and therefore to combine this with AFC and/or AMH is logistically reasonable. The use of different modalities such as imaging and serum testing has the added benefit of it being less likely to have an error, such as improper specimen handling, affect both modalities.

  • Third, it would be ideal, but frequently not possible, to establish a consistent source of ORTs and obtain an understanding of how the clinical value ranges are determined. The ideal scenario is that each practitioner ultimately clinically calibrates his/her ORTs against his/her own outcome data, but this is often times not feasible. Practically speaking, as it is not possible to track down the source of every outside laboratory result, judicious use of retesting at a familiar source should be considered if retesting could significantly change clinical management.

  • Although perhaps an unpleasant truth, the materials and methods change, not infrequently, for ORTs and vigilance with respect to the affect this change would have on interpretation is important. If one uses regularly one or two sources for ORT results it would not be unreasonable to perform a brief inquiry of the laboratory director once or twice per year as to if there were any changes with a chosen source of ORT that could affect value ranges.

  • Fifth, as many women use OCPs and it is difficult at times to stop taking them, a practical method for tests such as AMH or AFC is to obtain the values and if reassuring consider it sufficient to use this value as recent data indicates it is likely that the ORT result remain reassuring if not more so off estrogen OCPs. If AMH and AFC are concerning while on an estrogen-based OCP, one can consider then retesting off OCPs if management decisions would change.

  • Sixth, overall, one should be very cautious and avoid, if possible, counseling a patient solely based upon ORT values since the certainty of outcome for these tests is not definitive. Ultimately, it is advisable to use ORTs to influence rather than direct clinical management.

Clinical Example

Given the especially ambiguous nature of ORT results and lack of consensus, a short case scenario is presented with possible responses to better illustrate use of the recommendations. This scenario is not intended to represent consensus views or incontrovertible information.

Case 1

A healthy 28-year-old female with no prior attempts at conception is considering attending medical school and presents to fertility specialist, referred by general practitioner with an AMH value of 2.0 ng/ml by an outside laboratory with normal range reported as 0–6.9 ng/ml.

Patient: “Will I still be able to have children in 8 years after I finish medical school and residency?”

Clinician: “With no family or medical history concerning for early loss of fertility, it would be wise to recheck this lab value before drawing any conclusions. In the meantime, let’s obtain an antral follicle count today by ultrasound”

AFC shows a total of ten follicles between 2 and 10 mm. Rechecking of the AMH at a different laboratory regularly used by the clinician with well-established value ranges returns value of 0.6 ng/ml which fell into a range that was consistent with the patient already being at high risk for poor egg supply. Discussion at next visit:

Clinician: “Rechecking your AMH shows you have a value that is low for your age and that you already are at risk for low egg supply. There are now several studies from several sources that show women with low AMHs are more likely to go into menopause sooner than women with high AMHs of the same age. While we can’t give you any specific prediction about your fertility window, you are at likely at higher risk than the average to have menopause earlier and thereby have a shorter fertility window. If having children right now is not want you want or are able to do, you may want to consider egg cryopreservation. While long term follow up data isn’t yet available we are cautiously optimistic about there not being significant difference between babies born through natural conception versus babies born through IVF using preserved oocytes.”

Patient: “Why was my AMH lab value so different when you retested it?”

Clinician: “It could be natural fluctuation or differences in the methods of measurement. Let’s revaluate in the next several months and also try to figure out what you would like to do about planning your future family building goals.”

Conclusion

Research over the past 10 years has demonstrated a wide variety of clinical utility for ORTs such as improving COS management, risk stratification for ART treatment success, identification of women at risk for infertility, more sensitive detection of diminished ovarian reserve, prediction of time until menopause, and adjunctive use to identify and/or diagnose PCOS. The ORTs, AMH and AFC, have each emerged as the two most predictive individual ORTs for responsiveness to COS for retrieval of oocytes as well as sensitive identifiers of diminished oocyte supply, proximity to menopause, and likelihood of PCOS. Many of these research findings are currently applied with clinical benefit.

While the potential advantages of ORT use in clinical medicine is clear, with the biological fluctuations in ORT results, the complexity of fertility assessment, and lack of standardization, consensus is not possible regarding most of the above utilities, and the risk of misguiding clinical care using an ORT result is high if appropriate steps are not taken by clinicians. This risk can be minimized by (1) recognizing that performance of an ORT is specific to the source of ORT and the clinical environment in which it is applied, (2) identifying at least two different ORTs for use, (3) use a consistent source of ORT results where possible with an understanding of how the values relate to the clinical outcomes being managed, (4) inquiring periodically about assay change at a chosen ORT source which could change interpretation, (5) avoiding use cut points from publications without understanding how they apply to your source of ORT, (6) paying attention to exogenous hormone use, and (7) avoiding the use ORTs alone to make clinical decisions. This approach likely will reduce the risk of misinterpretation of results while simultaneously harnessing the information ORTs can provide to improve clinical care.