Introduction

Prenatal diagnosis enables clinicians to know whether the foetus has an anomaly, in order to offer parents genetic counselling. The most prevalent autosomal aneuploidy is Down’s syndrome (three copies of the chromosome 21); the other two more prevalent trisomies are Edward’s syndrome (chromosome 18 trisomy) and Patau’s syndrome (chromosome 13 trisomy) [1, 2]. The prevalence is 14.2, 3 and 1 per 10,000 live births from 2006 to 2010, respectively [2]. Prenatal diagnosis services vary between countries, but in most developed countries screening is a routine practice before a definitive diagnosis [3]. One of the most common screening programmes is first-trimester screening (FTS) which includes ultrasound measurement of nuchal translucency (NT), maternal serum markers such as PAPP-A or b-hCG, and assessment of other data such as the mother’s age [3]. When the assessment of these criteria results in an index above a certain cut-off (that varies across clinical guidelines), it is considered that the foetus presents a high risk of chromosomic anomalies. Other strategies are second-trimester screening and integrated screening where risk is estimated after all FTS and second trimester screening tests have been completed.

In 2011, a new technology appeared on the markets, the so-called non-invasive prenatal testing (NIPT), that consists of analysis of free DNA from foetal–placental cells circulating in the mother’s blood. This test can be performed from week 10 of pregnancy, when foetal DNA fragments are detectable. It is an extremely safe non-invasive test as sufficient DNA can be extracted from a small sample of maternal blood [4]. This test can form part of screening strategies to detect potential trisomies. NIPT has been proposed as an added test to current FTS to increase the detection rate of the common trisomies, to decrease the number of invasive diagnostic tests, and to reduce the number of procedure-related foetal losses (PRFL). In a universal NIPT strategy, the test is offered to the general obstetric population, that is, to all pregnant women without considering the results from other routine tests. In a contingent NIPT strategy, NIPT is offered as a second line test when a positive result is obtained from the usual screening tests.

Systematic reviews reveal favourable results [5,6,7,8]. The most recent meta-analyses have found that NIPT has very high sensitivity and specificity for Down’s syndrome and not so high for Edwards and Patau’s syndromes [6, 7]. According to the meta-analysis by Taylor-Phillip et al. [6], for instance, the pooled sensitivity was 99.3% (95% CI 98.9–99.6%) for Down’s syndrome, 97.4% (95.8–98.4%) for Edward’s syndrome, and 97.4% (86.1–99.6%) for Patau’s syndrome. The pooled specificity was 99.9% (99.9–100%) for all three trisomies [6]. NIPT is not a diagnostic test, precisely due to the existence of false positive results. Therefore, a positive result must be confirmed by means of invasive tests such as amniocentesis or chorionic villus sampling (CVS) [9]. These two invasive diagnostic tests have the risk of PRFL. Thus, the advantage of NIPT is that a lower number of women would be candidates for an invasive test and hence the number of PRFL or complications would decrease. All these reasons have spread the use of NIPT as part of prenatal screening in recent years [4, 10]. Reimbursement of NIPT by public healthcare systems or insurance companies requires economic evaluations comparing outcomes and costs of strategies that include NIPT. We performed a systematic review of NIPT cost-effectiveness studies to screen trisomy in chromosomes 21, 18 and 13 with the aim of informing decision-making.

Methods

Information sources and search

The search was performed initially in February 2016 and updated in April 2017 in the electronic databases MEDLINE and MEDLINE in process (OvidSP), EMBASE (Elsevier), and Cochrane Library (DARE, HTA, NHS EED) (Wiley Online Library). We used a search strategy previously used in a high-quality systematic review of NIPT [6]. This strategy combined medical subject headings (MeSH) and text terms such as: non-invasive prenatal test, NIPT, cell free DNA, cfDNA, maternal blood, Trisomy, Aneuploidy, Down Syndrome, Edward Syndrome, Patau Syndrome. The search strategy was applied without language limits and with the date limit of January 2006 given that NIPT is a very recent technology. We did not use a filter for economic evaluations as this review was part of a broader project that included the review of the diagnosis yield of the technology. Regular alerts were established on MEDLINE database to capture new studies. The reference lists of the articles included and other relevant studies identified for the systematic review of effectiveness were also verified.

Selection, data extraction and quality assessment

Papers selection and study quality assessment were performed by two independent reviewers (economists) (J.F.R., L.G.P.). Data extraction was performed by one reviewer (economist) (L.G.P.) and then verified by a second reviewer (economist or clinician) (J.F.R., R.L., M.A.R.R.). Disagreements between reviewers were resolved by consensus or by consulting a third reviewer.

Study quality was assessed by means of the Drummond and Jefferson [11] criteria for economic evaluations. Data were collated in spreadsheets designed ad hoc. The extracted data were: identification of the study (authors, country, date, etc.), aim, design, time horizon, perspective, population, level of risk, characteristics of alternatives in comparison (including key parameters such as sensitivity, specificity and false positive rate of NIPT), measures, costs, data sources, analysis, results including costs, outcomes (detected cases, PRFL), incremental cost-effectiveness ratios (ICER), and sensitivity analysis results.

Data were summarised by narrative procedures, and the main characteristics and outcomes of each study were displayed in structured tables. Original costs were converted to a common currency and price year, 2016 international dollars (USA), according to recommended guidelines and formulae [12] that includes purchasing power parity and gross domestic product deflator by means of a converter tool [13, 14].

Eligibility criteria

We selected papers published in peer-reviewed journals that fulfilled the following selection criteria (structured according to the PICOS question):

  • Types of participant: women with single or twin pregnancies in their first or second trimester that take part in a prenatal screening programme for any reason including a potential risk of foetal anomalies.

  • Types of interventions: screening programme with NIPT to identify chromosome 21 trisomy (T21), chromosome 18 trisomy (T18) or chromosome 13 trisomy (T13) in the foetus. Both universal and contingent NIPT strategies were included.

  • Type of comparators: screening programmes that do not include NIPT. Usual screening strategies involve serological and ultrasound markers and finally, diagnostic invasive tests. These could be FTS, second-trimester screening and integrated screening. No screening was also a potential comparator.

  • Types of outcomes: to be included the study had to notify detected cases and costs of every comparator or ICERs.

  • Types of studies: full economic evaluations, that is, cost-benefit analysis, cost-utility analysis, cost-effectiveness analysis, cost-consequences analysis, and cost-minimisation analysis. We excluded partial economic evaluations.

  • Type of report and languages: we excluded protocols of studies without results, conference abstracts, letters, editorials, and discussion papers. We included studies published in English or Spanish.

Results

The study selection procedure (Fig. 1) identified 3540 references after discarding duplicates. Their titles and abstracts were screened. Of these, 70 articles were retrieved for full review, 56 of them were excluded for different reasons (detailed reasons for exclusion are accessible upon request). Review of the list of references and the alert system yielded no further additional references. Subsequently, 14 papers related to 12 studies were included in the systematic review [15,16,17,18,19,20,21,22,23,24,25,26,27,28] (one study was reported in three reports/papers [20, 21, 23]).

Fig. 1
figure 1

Flow diagram of study selection

Characteristics and methodological quality

Table 1 describes the characteristics of the studies. Assessment of methodological quality can be found in the Electronic Supplementary Material. Studies were published between 2012 and 2016. Five studies were performed in the USA, two studies in Australia, one study in Canada and four studies in three European countries (the Netherlands, Belgium and the United Kingdom). Most studies were decision-analytic models. For some the type of model was not specified.

Table 1 Characteristics of economic evaluations

The population studied was pregnant women, in some studies a number similar to the number of pregnancies in a year in the country. No studies specifically included twin pregnancies. Comparators were the usual prenatal screening strategy and some form of screening with NIPT. The most common strategy of usual screening was FTS, but other studies evaluated the integrated screening or a combination of first and second trimester screening. Studies included contingent NIPT [17, 19, 25], universal NIPT [18, 26, 27], or both [15, 16, 22,23,24, 28]. The values used in the models for the main parameters (sensitivity, specificity, false positive rate) are coherent with the published evidence. The perspective was not always stated but the healthcare provider was the most frequent. Consequently, direct medical costs were included in every study. Two studies presented more than one perspective, including educational costs and lost productivity costs. The time horizon was the duration of pregnancy in all studies although three studies included other time horizons as well. In these cases, costs were discounted at 3%. The main outcome in all studies was cases detected or diagnosed. Three studies included detection of the three trisomies (T13, T18 and T21 [18, 19, 28]; the rest included only T21. Ten out of 12 studies also included PRFL as an outcome. In fact, some authors explicitly presented their study as a cost-consequence study. Two studies reported confidence intervals [17, 28].

The methodological quality is acceptable in most studies. However, the lack of transparency and details on sources prevented a more accurate assessment of the bias. For example, some studies did not appropriately report the methods and/or results of the sensitivity analyses (see Electronic Supplementary Material).

The results of the studies are shown in Table 2, including ratios in the original currency and ratios expressed as 2016 international dollars. One remarkable result is that in all 12 studies that included PRFL as an outcome, the number of PRFL is much lower for NIPT strategies than for strategies that do not include NIPT. Results are varied when the outcome is number of cases detected.

Table 2 Results of economic evaluations

Contingent NIPT vs usual prenatal screening

Nine studies compared contingent NIPT with the usual screening strategy in their countries [15,16,17, 19, 22,23,24,25, 28]. The two studies performed in the USA found contradictory results. The first study that evaluated NIPT, funded by the industry, was published in 2012 and found the contingent NIPT a dominant strategy [19]. The other USA study found NIPT strictly dominated from the societal perspective and costlier but also more effective from the payer perspective [28]. Similarly, two Australian studies found contradictory results. One study found NIPT a less costly and less effective strategy than FTS in terms of cases detected [15] while a previous study had found NIPT dominated (when the same test uptake is assumed) or more expensive and more effective if the test uptake is increased [25]. A Canadian study found this latter result in every scenario considered [24].

Four studies were performed in Europe [16, 17, 22, 23]. Beulen et al. estimated an ICER of 94,000 € per case when comparing contingent NIPT with FTS in the Netherlands [16]. Neyt et al. found NIPT slightly less effective in terms of cases detected and less costly for a risk cut-off of 1/300 in Belgium [20, 23]. In the various scenarios, depending on different values of NIPT sensitivity or risk cut-offs, results ranged from NIPT as a dominant strategy to more effective and costly [23]. Finally, two studies by the same team of researchers have evaluated contingent NIPT in the UK [17, 22]. Morris et al. found NIPT less costly and less effective than FTS for a risk cut-off of 1/150 and dominated by FTS when lower cut-offs were considered [22]. In a more recent study results in terms of costs were similar but different in terms of cases detected. Chitty et al. found that the strategy with contingent NIPT identifies more cases than the current screening programme [17]. The authors attribute this difference to the input data. This study used data from a prospective cohort with real data about the uptake of NIPT, screening and invasive tests in the National Health Service (NHS) [17], while Morris et al. used data from literature [22].

Universal NIPT vs usual prenatal screening

Nine studies assessed NIPT as a universal screening strategy [15, 16, 18, 22,23,24, 26,27,28]. In all of them the strategy with NIPT is more effective than the usual screening in terms of cases detected. In seven studies universal NIPT was more effective but also costlier than usual screening, with ICERs above € or $ 200,000 per case detected [15, 16, 22,23,24, 27, 28].

Two of these studies found some remarkable exceptions. Ayres found that a universal NIPT restricted to women older than 40 could be dominant when the estimation uses the highest costs (including Medicare costs and the highest estimates of private healthcare prices); when the estimation uses lower costs (only Medicare costs) they obtained an ICER of $81,199 per case in Australia [15]. Walker et al. concluded that universal NIPT in the USA was dominant over integrated screening from the societal perspective, although with a very wide confidence interval for the ICER that ranges from negative to positive estimations [28]. This latest study found that the ICER was sensitive to the unit cost of the screening and the diagnostic testing, and screening uptake among other parameters.

Finally, the two studies funded by the industry drew different conclusions. Song et al. found that in the USA FTS was dominated by a strategy consisting of (a) universal NIPT for women older than 35 years or with risk due to history and (b) contingent NIPT for women with positive FTS [26]. In a later study, modifying the model by Song et al., Fairbrother et al. concluded that universal NIPT is a cost-saving strategy over FTS when the NIPT unit cost is $453 or less, but no incremental costs are reported [18].

Discussion

In our systematic review of economic evaluations of NIPT we found 12 studies with heterogeneous results, especially for the contingent NIPT. Some studies found contingent NIPT dominant, other studies found it dominated, others found it costlier and more effective, and some studies found that this strategy detected fewer cases at a lower cost than the usual screening. This case, the south–west quadrant in the cost-effectiveness plane, is the most unusual among the published cost-effectiveness studies [29], making decision-making even more difficult. Among European countries results were also inconsistent [16, 17, 22, 23]. In summary, it is difficult to draw a conclusion on contingent NIPT as this strategy can occupy every quadrant of the cost-effectiveness plane depending on the study. Those studies that developed sensitivity analysis or analysis of scenarios found that the drivers of the cost-effectiveness results were the cut-off [17, 20, 22], age of the women [15], perspective [28], or even test uptake rate [25]. Meanwhile, studies that evaluated universal NIPT found consistent results, showing that this strategy is more effective but also more expensive than the usual screening and usually leads to very high ICERs. Consequently, NIPT for the general population regardless of the risk level seems too costly at present. The unit cost of NIPT appears to be the key parameter that can make universal screening with NIPT a cost-effective strategy [16, 18, 28]. Universal screening with NIPT is more effective but also costlier than contingent NIPT from the payers’ perspective in all studies [15, 16, 22,23,24, 28].

Despite these inconsistent results, there are other outcomes that make NIPT a very attractive option at present for women, healthcare providers, and healthcare authorities. A common result in all studies is the reduced number of PRFL with prenatal screening programmes that include NIPT in comparison with usual screening programmes. The reduction of PRFL with contingent NIPT in comparison with usual screening ranges from 43 [16] to 95% [15]. This reduction is due to the prevented invasive procedures such as amniocentesis and CVS that are not performed when the result of the NIPT is negative, and consequently there is a lower number of unwanted foetal losses. The importance of this variable is shown by the fact that most authors decided to conduct a cost-consequence analysis and included this outcome in their studies, which reflects the difficulty of the decision-making. Walker et al. did not estimate PRFL but estimated Down’s syndrome live births as an outcome [28]. We did not extract this measure because the effect of NIPT on number of births is country-specific depending on cultural and legal aspects. They also estimated costs in the long term as they included the societal cost of raising a child with Down’s syndrome. This analysis would be biased were it not completed with the reported results of the analysis in the short term from the payer perspective [28].

Three out of 12 studies were sponsored or funded by the industry [18, 19, 26]. Two of them have serious drawbacks due to lack of transparency [18, 19]. Unfortunately, the two studies that reported results for each trisomy separately are also the studies with less transparency. This prevents drawing robust conclusions on the cost-effectiveness of NIPT for the detection of T18 and T13. Some studies included data from women/strategies where NIPT was used during the second trimester. This could result in biased outcomes as the accuracy of NIPT in the second trimester is higher than in the first trimester. Nonetheless, the methodology is generally appropriate in most of the 12 studies. The lack of direct transferability of cost-effectiveness analysis may be based on differences in the cut-off values, the uptake rate of screening, access to genetic counselling, and population characteristics, apart from perspectives and unit costs, among others.

This systematic review presents some shortcomings such as the possible exclusion of unpublished studies or studies published in languages other than English or Spanish (publication and language bias), and the lack of direct transferability. Nonetheless, we have strived to find all the relevant literature, to assess quality and to interpret results. Some studies were excluded from this review because they did not fulfil our inclusion criteria. For example, one study was excluded because the outcome was not detected cases but number of Down’s syndrome births avoided [30]. Ohno and Caughey evaluated NIPT as a diagnostic tool (without requiring amniocentesis for confirmation) and NIPT as a screening test, and concluded that the latter is cost-effective in comparison with the first option [31]. Since usual screening without NIPT was not a comparator, this study was excluded. These authors estimated ICER in terms of cost per QALY as they used utilities measured by the standard gamble method in a study on women’s preferences [31]. Although QALY is a standard and desirable outcome in economic evaluations of health technologies, in this case it involves judgements of parental preferences related to another person’s life. As this is controversial and country-specific depending on cultural and legal aspects, it is not common to find QALYs in economic evaluations on prenatal diagnosis.

The current evidence shows that the sensitivity of NIPT for T21 is better than T18 and T13 [6, 7]. It is expected that a lower number of women would be candidates for an invasive test after NIPT. Consequently, the number of PRFL or complications would decrease, as T21 is the most prevalent trisomy and the one where NIPT yields better results than T18 and T13, trisomies that would be identified by means of other signs such as ultrasound. Besides, screening with NIPT in twin pregnancies is feasible but not reliable. The foetal fraction is lower, the failure rate is higher, and the detection rate may be lower than in single pregnancies [32,33,34]. Overall, under health conditions such as prenatal screening and diagnosis, where there are several combinations of technologies with not always clear and potentially disastrous consequences, and where considerations such as preferences, beliefs, rights and maternal and foetal health, among others, must be considered, shared decision-making appears to be an appropriate option [35]; in fact, NIPT is being implemented in several European countries and elsewhere [4]. NIPT is recommended by some scientific societies although with some reservations [9, 36]. Dondorp et al. advise health authorities in countries where prenatal screening is offered as a public health programme to “adopt an active role to ensure the responsible innovation of prenatal screening on the basis of ethical principles” [9]. The introduction of NIPT in the public healthcare system can be an important impact on the budget if it is not restricted to those single pregnancies with high risk, although reimbursement conditions correspond to each country and depend on many factors. New companies have entered the market offering NIPT and prices are decreasing. The unit cost in the first and most recent studies identified was $1200 [19] and £250 [17], respectively. Moreover, according to some authors, it is expected that improvements in NIPT will yield to the point of becoming a diagnostic test [37]. Meanwhile, health authorities and future mothers must balance costs and outcomes (correct diagnosis, foetal losses prevented) of old and new technologies to make well informed decisions.