Introduction

Newborn screening (NBS) is a public health measure for the early detection inborn errors of metabolism (IEM), endocrinopathies, and a variety of other disorders, where early presymptomtic detection and treatment can prevent mental retardation, disabilities, or death, or at least can improve the quality of life and extend the life span of affected patients. This chapter focuses on the laboratory tests, which use whole blood, taken by heel prick, dried on a special blood collection device, the so-called dried blood samples (DBS). During the last 20 years, other genetic conditions, like hemoglobinopathies, cystic fibrosis, infectious disease like HIV and CMV, immunodeficiencies like SCID, or muscular dystrophies like Duchenne muscular dystrophy (DMD), or spinal muscular atrophy (SMA), were added to the NBS panel. In addition, there are also conditions that use point-of-care testing, which are not lab-based tests, like newborn hearing screening, using otoacoustic testing, or screening for critical congenital heart defects (CCHD) using pulse oximetry. This chapter also provides an overview of the history of NBS, principles, goals, and some pitfalls.

History of Newborn Screening

Newborn screening as a laboratory test started with the invention of the bacterial inhibition assay for the detection of phenylketonuria (PKU) in 1963 by Robert Guthrie (Guthrie and Susi 1963; Guthrie 1996). However, sometimes forgotten, at least three mothers of mentally retarded children should be mentioned, who pushed scientists on, because they would not just accept the disability of their children as fate, but wanted a diagnosis or treatment. The first is Pearl S. Buck. Although she was not successful, she wrote down the story of her child in a touching book: The Child Who Never Grew. The second are Harry and Borgny Egeland from Oslo who got in touch with Dr. Ivar Asbjørn Følling, who finally could isolate phenylpyruvic acid from the urine of their two disabled children, which also gave the name, phenylketonuria, to the disorder (Fölling 1934). Then in 1951 again there was a mother, Mrs. Jones, who had a diagnosis for her daughter Sheila (PKU), who now insisted that the pediatrician, Dr. Horst Bickel, should look for a possible treatment. Maybe the persistency of Mrs. Jones, lead Horst Bickel to introduce a phenylalanine-free diet (Bickel et al. 1953), which has been proposed a few years before by Woolf et al. (Woolf and Vulliamy 1951; Woolf et al. 1955). The initiation of treatment and the proof of effectiveness have been very well documented also on Super 8 films and can be found at https://www.youtube.com/watch?v=OqZ7QHO5_hs. Sheila Jones’ diagnosis was made at the age 2 years with the ferric chloride test, or the Følling-Test as it is called in some countries. But although the treatment with the phenylalanine-free diet could improve the clinical situation of the patient, it could not reverse mental retardation. However, with the introduction of a treatment option for PKU and a simple urine test, all newborn siblings of PKU patients could be tested and treated early, from birth on. The next step was the introduction of the so-called diaper test by Dr. Centerwall et al. (1960). They adopted the ferric chloride test for newborns by just pouring ferric chloride solution onto the wet diapers of newborns to detect excreted phenylpyruvic acid. The test worked in principle, only the sensitivity was poor. With the diaper test only the very severe cases of PKU could be detected, who already had a very high concentration of phenylpyruvate in urine. And again, it was a father of a child with PKU who approached Robert Guthrie at a meeting of families with disabled children, whether he could not try to develop a more sensitive test, so that all children with PKU could be treated early enough to prevent mental retardation. This was the start of NBS for PKU in the USA in 1963, and many countries followed in the following years. And still today, the NBS test, using whole blood taken by heel prick and dried on a special blood collection device, is often called the “Guthrie Test.”

Then step by step new tests for other disorders were developed and included into NBS in several countries, like galactosemia (Paigen et al. 1982), biotinidase deficiency (Heard et al. 1984), maple syrup urine disease, MSUD (Naylor and Guthrie 1978), homocystinuria (Whiteman et al. 1979), congenital hypothyroidism (Larsen and Broskin 1975; Dussault et al. 1976), and congenital adrenal hyperplasia (Cacciari et al. 1982).

In 1968, the World Health Organization (WHO) had initiated a study to define criteria for the introduction of population screening, which had been accomplished by Wilson and Jungner (Wilson and Jungner 1968; Jungner et al. 2017).

The introduction of tandem mass spectrometry (TMS) has somehow revolutionized NBS. It changed the paradigm from one disorder—one test, to one technology—multiple disorders. This changed the interpretation of criteria no. 9 from the Wilson and Jungner criteria totally. Once TMS was introduced, the cost of adding another disorder, which could be detected in the profile of amino acids or acylcarnitines was more or less zero. Therefore, it was necessary to revise the Wilson and Jungner criteria the new situation (Andermann et al. 2008) (Table 1.1

Table 1.1 Wilson and Jungner classic screening criteria

).

Three points of these criteria should be especially discussed. First, criteria no. 7, “The natural history of the condition, including development from latent to declared disease, should be adequately understood.” This works easily, while screening an adult population for a certain disease. Medical history of the patients (or probands) are normally available, repeat testing can be easily done, and normally there are well-defined criteria, who should be declared as a patient (criteria no. 8). For conditions that are included in NBS, the knowledge of the natural history is not always well understood, due to several reasons. First of all, it has to be beared in mind that scientific and medical knowledge expands over time. For example, before NBS for PKU was started, variant hyperphenylalaninemias were more or less unknown, and also disorders of the cofactor metabolism of the phenylalanine hydroxylase, tetrahydrobiopterine, were unknown or not well understood (see Chap. xx). The introduction of NBS for PKU also developed a new “condition”: Maternal PKU, which could not be anticipated beforehand. Another example is NBS for galactosemia. It started with the measurement of total galactose in DBS (Paigen et al. 1982), which was accomplished by the so-called Beutler test (Beutler et al. 1964), which was a qualitative or semiquantitative test to measure the activity of the galactose-1-phosphate uridyltransferase, the enzyme deficient in classical galactosemia. The introduction of the Beutler test led to the detection of a variant form of galactosemia, the so-called Duarte-2 galactosemia. At the beginning, the patients with the Duarte-2 variant were treated the same way as patients with classical galactosemia, and it was only in the 1990s, when increasing knowledge about the natural history of galactosemia showed that these Duarte-2 variant patients normally do not need any treatment at all. And just recently, a fourth disorder in the galactose metabolism has been described, galactose mutarotase (GALM) deficiency (Iwasawa et al. 2019). Other examples are histidinemia, which was introduced in several countries, probably after a single case report (Garvey and Gordon 1969) and a method paper, which afterwards proved that elevated histidine in blood is a condition without clinical significance (Brosco et al. 2010). There are also other conditions, where the clinical relevance or the clinical penetrance of the disorder is unclear or very low, like SCADD, and 3-MCC deficiency. A last example are pilot urine newborn screening programs for neuroblastoma in Quebec, Austria, Germany, the UK, and Japan. But, although early treatment with a combination of surgery and chemotherapy seemed to work well, the death rate from neuroblastoma tumors did not change. Therefore, it was suspected that NBS for neuroblastoma had detected previously unrecognized mild tumors that would have spontaneously regressed, also without any therapy (Riley et al. 2003; Maris and Woods 2008). Secondly, criteria no. 3, “Facilities for diagnosis and treatment should be available” in connection with criteria no. 8 again. A “facility for the diagnosis,” together with an “agreed policy on whom to treat,” should be interpreted as: After a positive NBS test, there should be a definite diagnostic test available to decide directly after the diagnostic test, whether a child has a condition, and needs immediate treatment, or whether the child is not affected, and can be released as healthy. There are some disorders that do not fulfill this criteria, for example, VLCADD, where acylcarnitine profiles can be totally normal when the patients are in an anabolic status (Spiekerkoetter et al. 2010), or several of the lysosomal storage disorders, where the residual enzyme activity alone, cannot predict 100% whether the disease will progress, and also genetic analysis is not 100% helpful, and often there is no other metabolic marker available to determine the progression, or normalization. And the third point directly emerges from this problem, it is criteria no. 6, “The test should be acceptable to the population.” Different stakeholders of NBS programs can have totally different opinions about it. Pediatricians and patient organizations for a certain disorder can be extremely in favor for NBS, even if there is a long time of uncertainty, whether treatment is necessary or not. At the other end, there may be a big number of parents who rather not want to have this particular disorder included because of this uncertainty. However, informed consent, although it is nowadays included in most countries is not an easy task, and the burden of false-positive NBS results have been described by several groups (Morrison and Clayton 2011; Schmidt et al. 2012; Johnson et al. 2019).

Decision-making for NBS programs is not an easy task. In many countries, it is formalized like in the USA (DHHS 2013), Germany, Switzerland, the UK, for example, but although many countries have celebrated their 50th anniversary of NBS during the last years, there are still a lot of countries around the world that have not started any newborn screening, or just had some pilot programs (Pandey et al. 2019), and sometimes NBS is only available for a small part of the population, who can afford to pay for NBS by themselves.

Newborn Screening: A Public Health Program

Newborn screening is not just a laboratory test; it should be recognized as a whole program. It includes midwives, nurses, gynecologists, neonatologists, the laboratory, special diagnostic centers, and specialized treatment centers. NBS programs should include information material about the extent of the program for parents and midwives, nurses, gynecologists, and neonatologists, for the latter especially also information how a positive NBS result will be communicated. Ideally, there should also be designated specialized centers for the final diagnostic test, and specialized centers for the treatment. And there must be a feedback about the outcome of the diagnostic test, back to the newborn screening laboratory, in order to generate reliable statistical data: Number of newborns screened, and for each disorder, recall rate, positive predictive value (ppv), negative predictive value (npv), and incidence.

The structure of NBS programs is quite diverse worldwide and also the way how new disorders are integrated into existing NBS programs is diverse. A helpful guideline for countries that have no legitimate guideline, the Recommended Uniform Screening Panel (RUSP) of the US Advisory Committee on Heritable Disorders in Newborns and Children could be a helpful guide for decision-making. The latest update can always be found at https://www.hrsa.gov/advisory-committees/heritable-disorders/rusp/index.html.

In addition, the Clinical and Laboratory Standards Institute (CLSI) has published several guidelines (https://clsi.org/standards/products/newborn-screening/) for the implementation of NBS.

One problem that NBS programs are faced with is often the lack of financial support for those parts of the NBS program that are not directly related to the laboratory test, and often there is no connection between the DBS-NBS, and newborn hearing screening, and screening for CCHD. One example is the state of Bavaria in Germany, where a public health screening center coordinates tracking of all NBS tests statewide. This includes checking of completeness, follow-up of positive NBS results, and diagnostic tests, and also whether the patients with a definite diagnosis have been admitted to a specialized center (https://www.lgl.bayern.de/gesundheit/praevention/kindergesundheit/neugeborenenscreening/index.htm). Another issue is often the enormous costs of new therapies, for so far untreatable disorders, like enzyme replacement therapy for LSDs, or the treament for SMA.

Principles and Practice in the NBS Laboratory

It should be kept in mind that every NBS test, whether immunoassay, enzymatic assay, metabolite determination by tandem mass spectrometry, determination of profiles by HPLC, IEF, or determination of copy numbers by rtPCR, is ONLY a SCREENING TEST, and not a diagnostic test. A definition of screening (not only NBS) has been published by Wald (1994): “Screening is the systematic application of a test or enquiry to identify individuals at sufficient risk of a specific disorder to benefit from further investigations or treatment, among persons who have not sought medical attention on account of symptoms of that disorder.” This definition implements three things: (a) Newborn screening is not a diagnostic test, (b) it needs further investigations to confirm a positive screening test, (c) among the screened population there can be individuals that have a low risk of having a certain condition, according to the screening result, but still can have or develop the disease.

Improvements in instrumentation and methodology have continuously improved the detection limits of analytes, and the sensitivity and specificity of laboratory tests. Still every newborn screening laboratory has to define cut-offs for their primary screening test, which will effect sensitivity, specificity, ppv, and npv.

Sensitivity and Specificity

Ideally, the distribution of metabolite concentrations or enzyme activities shows a normal distribution. Ideally, the affected and unaffected individuals are completely seperated from each other (Fig. 1.1a). However, normally there is always on overlap between these two groups (Fig. 1.1b). The cut-off is normally choosen in a way that there are no fn results.

Fig. 1.1
figure 1

Frequency distributions between normal and affected population

However, this would for some disorders (like CF) result in an enormous number of fp results. In these cases, a second-tier test can improve the situation (Fig. 1.2). But sometimes it has to be accepted that a screening test is not able to pick up all cases. However, sometimes the combination of marker metabolites can result in 100% sensitivity and 100% specificity, like in CPT-I deficiency (Fingerhut et al. 2001) (Fig. 1.3).

Fig. 1.2
figure 2

Distribution of IRT values from normal newborns and newborns with CF

Fig. 1.3
figure 3

Distribution of the ratio of C0/(C16 + C18) from normal newborns (red bars), newborns on carnitine supplementation (blue circle) and newborns with CPT-I deficiency (orange crosses)

Sensitivity is the percentage of affected individuals that are detected with the respective test.

Specificity is the percentage of unaffected individuals that are correctly detected as unaffected.

$$ \mathrm{Sensitivity}= rp/ rp+ fn,\mathrm{Specificity}= rn/ fp+ rn $$

(rp = right positive; rn = right negative; fp = false positive; fn = false negative)

What Is a “False-Positive” Result?

Different NBS programs often use different terminology. In this chapter, we will use “abnormal” result and “normal” result, which are ultimately defined by the choosen cut-off for each laboratory test. If the first measurement from a specific NBS card is “abnormal,” the test should always be repeated from the same NBS card in duplicate. This will eliminate a laboratory error. If two results are not plausible, the laboratory should search for an explanation. Since every test has also a certain uncertainty of measurement, this needs to be included into the cut-off consideration. If the repeat testing is again “abnormal,” this will result in a “Positive Screening Result” for a specific disorder. If then, either a second DBS is taken, or a specific diagnostic test is made, and this second test results in a “normal” test result, or the diagnostic test excludes the condition, for which the initial screening test was “positive,” then the initial screening will be called a “False-Positive” result.

False-positive results (fp) are expected in NBS because the major goal is not to miss a patient that has the respective condition. There are several reasons for a false-positive screening result. (a) Screening tests with a high uncertainty of measurement also tend to have a higher fp rate. (b) Higher biological variation of the disease marker will also lead to a higher fp rate. (c) If the marker metabolite is not specific for a certain disorder. (d) The metabolite level is influenced by nutrition and diet. (e) The metabolite levels are influence by the mother, for example, free carnitine levels in CUD, or Vitamin B12 levels in disorders of cobalamin metabolism.

Fp result can be effectively reduced, when it is possible to use not only one primary disease markers, but several markers or additional ratios. Even more effective are second-tier tests which are more specific than the primary test, but too expensive or labor intensive to apply them directly to all DBS. For example, second-tier genetic testing in CF screening, or the determination of allo-isoleucine by HPLC or UPLC in MSUD screening (Fig. 1.4). Major causes of fp results are summarized in Table 1.2 (Table 46.2 from the previous edition of this book).

Fig. 1.4
figure 4

Separation of leucine, isoleucine, and allo-isoleucine by UPLC-MS/MS

Table 1.2 Commonlya used methods in bloodspot NBS (historic or currently used)

When comparing fp rates between different NBS programs and published data, it is important that a clear definition has been given for fp results. For example, a DBS of a newborn with a complete glucose-6-phosphate dehydrogenase deficiency will give an abnormal screening result for classical galactosemia, if only the Beutler test is used. This could be counted as a fp result for galactosemia screening, however from the design of the Beutler test, which uses four different enzymes that are present in the DBS, galactose-1-phosphate uridyltransferase (GALT), phosphoglucomutase (PGM), glucose-6-phosphate dehydrogenase (G6PD), and 6-phosphogluconate dehydrogenase, it is an expected finding, and therefore it could be as well defined as a true positive result.

What Is a “False-Negative” Result?

False-negative results (fn) in screening are unwanted, but it is important to keep in mind that a screening test can never be 100% sensitive. There are several examples, where biological variability will result in fn results. One example is homocystinuria. The primary marker is methionine because the determination of total homocysteine is not feasible as a primary test. However, with methionine as a marker only patients with classical homocystinuria (cystathionine synthase deficiency) can be detected. In addition, earlier sampling due to improved sensitivity, earlier discharge from hospital, and inclusion of more severe disorders with earlier onset, like MSUD, will lead to more fn results because methionine rises rather slowly, even in classical homocystinuria. A second example is tyrosinemia type I. Again using tyrosine as the primary marker will lead to fn results because in case of tyrosinemia type I it is not the enzyme block that will lead to the elevation of tyrosine, it is the liver damage that produces the elevation of tyrosine, together with elevated phenylalanine, methionine, and the branched chain amino acids. The third example is glutaric aciduria type I (GA-I). In GA-I, it is well known that the so-called non-excretors, patients with clinically and genetically proven GA-I that do not excrete 3-hydroxyglutaric or glutaric acid in the urine, are missed by NBS (Gallagher et al. 2005). Also for CF it is well known that the sensitivity is only around 95–96%, meaning that 4–5% of cases are missed by NBS (Heidendael et al. 2014).

Positive and Negative Predictive Values

The positive predictive value (ppv) and the negative predictive value (npv) are necessary measures, when communicating a NBS result.

$$ npv= rn/ rn+ fn, ppv= rp/ rp+ fp $$

The npv and ppv describe, how reliable a test result is, related to the disease state of the respective newborn. If the npv is 100%, it means the risk for a newborn with a normal test result to have this respective disorder is zero. On the other hand, a ppv of 100% means that the chance for newborn with a positive test result, not to have the respective disorder is also zero. In reality, neither npv nor ppv reach 100%. However, the npv is normally >99.9%, but it still means that there is still a chance that a newborn with a normal test result can have the respective disease. The ppv is quite variable, and as already discussed above, dependent on the choosen cut-off. However, very often the ppv can also be dependent on the test value. For example, a TSH value of >100 mU/L has probably a ppv of 100%, while a TSH value of 21 mU/L has probably a ppv of only 1–5%. Or when we look at CF screening with second-tier genetic testing: if second-tier testing finds two disease causing mutations, the ppv is 100%, irrespective of the initial IRT value. However, if no mutation is found, then the ppv is most likely again dependent on the IRT level.

Methodology

Since this book deals with inborn errors of metabolism, the description of methodology focuses on detection of amino acids and acylcarnitines by flow injection tandem mass spectrometry (FI-MS/MS). For the determination of amino acids and acylcarnitines by FI-MS/MS, there are two different methods in use. Extraction into an organic phase either (a) after derivatization to the respective butyl esters, or (b) without derivatization. The method with butylation results in higher signal intensities than the method without derivatization; however, the modern tandem MS instruments tend to be so sensitive that this has no effect on the sensitivity of the test results. It only has to be kept in mind that same isobaric compounds are not isobaric anymore after butylation, e.g., C4DC and C5OH. Dicarboxylic acid will add two butyl ester groups, while the hydroxyacids will only have one butyl group.

Table 1.3 provides an update from Fingerhut (2009

Table 1.3 Target diseases for newborn screening

) of target diseases for NBS, which can be compared with the RUSP.

Table 1.4

Table 1.4 Primary markers, and secondary markers and/or ratios for FI-MS/MS

provides a list of the primary marker metabolites that can be detected by FI-MS/MS, and possible secondary markers.

The Newborn Screening Process

The primary responsibility for the whole NBS process is very often in the hands of the newborn screening laboratory, unless it is embedded in a clearly defined NBS program. The integration of non-laboratory screenings, like newborn hearing screening and screening for CCHD, is even more complex and will not be discussed in detail here.

Blood Sampling

The standard specimen for NBS is capillary whole blood dried on a special blood collection device, the so-called dried blood spots (DBS). The test cards should be distributed by the screening laboratory to their customers, midwives, hospitals, pediatricians, and general physicians, and they should include all necessary information that are needed for the correct interpretation of test results. The blood collection device must have a special quality and should (ideally) by approved by FDA, or a comparable national institution (Hall 2017).

Since the number of people involved in blood sampling is normally quite high, it is necessary to provide regular information and education to the customers (Evans et al. 2019).

Laboratory Test

The number of tests, and the methodology used for NBS varies between different countries (Loeber et al. 2021). A summary is given in Table 1.3.

Confirmatory Testing

Confirmatory testing is often not performed in the screening laboratory, but it is a crucial part of the NBS program. It is already mentioned by Wilson and Jungner (criteria no. 8): “There should be an agreed policy on whom to treat as patients.” That means there must be a well-defined testing for the confirmation of the so far “suspicion” that an abnormal NBS result represents. Without a definite positive confirmatory test, no screening program should count an abnormal NBS result as a detected case. Unfortunately, this is often neglected, which can be seen from a lot of publications on screening for CH during the last years that can be summarized under the title: “Increasing incidence for CH by lowering the cut-off for TSH.”

Treatment and Follow-up

The last part of the NBS process is the referral of newborns with a positive screening test to a specialized center, initiation of treatment, and follow-up. While the quality of the NBS tests can be measured by the number of correctly detected cases (e.g., ppv, fp rate, fn), the quality and success of the NBS program will be measured by the outcome of detected cases. Therefore, long-term outcome studies are extremely important for the evaluation of NBS programs (Badawi et al. 2019). Unfortunately, the costs for this quality assessment are mostly neither covered by the health insurance, within the reimbursement for NBS, nor by the health authorities. This is absolutely incomprehensible in these times of quality control, where nearly everything is certified, or accredited by any “ISO-XXXX.

Perspective

Newborn screening will steadily improve and the number of disorders will increase. This will be driven either by improved methods and technology, which makes screening possible, when marker metabolites get measurable, or by new treatment option, when sofar untreatable disorders get treatable by the invention of new therapeutics, like SMA.

And last but not least, the decrease in cost for next-generation sequencing (NGS), whole exome sequencing, or whole genome sequencing, have started the debate, whether this will be the future of NBS (Yang et al. 2019; Phornphutkul and Padbury 2019).

Conclusion

Newborn screening is surely one of the most effective preventive health care programs in the world. It has a history of more than 50 years (in some regions), not to forget those countries, where they just start to think about introducing NBS. During the last 50 years, NBS has evolved from a laboratory test, to a public health care program, still there is work to do to improve. In addition, new technologies will continuously challenge the newborn screening laboratories.