Introduction

Intrauterine growth restriction (IUGR) is etiologically associated with various maternal, foetal and placental factors [6, 19, 30, 31]. Infants who have experienced IUGR are at higher risk of neonatal mortality and morbidity [10, 16, 32, 34, 41]. Moreover, they may be at increased risk of long-term adverse outcome, such as neurocognitive impairment, poor school performance and short stature [23, 27, 28]. Associations between IUGR and chronic diseases in (early) adulthood, such as coronary heart disease and type 2 diabetes mellitus, have also been described extensively [1]. There exists no universally agreed means of diagnosing IUGR. Small-for-gestational-age (SGA), often defined as birthweight below the gender-specific 10th percentile for gestational age (GA), is commonly used as a surrogate [2].

To distinguish between normal birthweight and birthweight deviation, reference values are required. In the absence of exclusion criteria regarding risk factors for IUGR, a reference is descriptive. A descriptive reference allows application to other neonates to establish whether or not their measurements are typical of the reference group. In the presence of exclusion criteria regarding risk factors for IUGR, a reference is prescriptive. A prescriptive reference constitutes a model to which a neonate should conform and indicates how growth should progress [4, 43].

It is well known that descriptive references possess low sensitivity in detecting IUGR [5]. It is less evident that a prescriptive standard is superior. The one study we identified had limited statistical power [13]. Studies that compared descriptive birthweight references to prescriptive foetal growth references consistently showed the benefits of the latter, i.e. superiority in detecting morbidity and mortality [5, 25, 29, 42]. However, foetal growth references differ from neonatal birthweight references in many methodological aspects and it is not evident that their prescriptive nature alone accounts for their superiority.

Considering the far-reaching consequences of IUGR, it is important to interpret birthweight for a given gestational age. The present study was initiated because of the observation of major discrepancies between the diagnosis of IUGR made in utero and the neonatal classification based upon descriptive birthweight standards. Inspired by previous studies, we hypothesised that prescriptive birthweight standards are superior to descriptive birthweight standards [5, 13, 25, 29, 42]. The aim of this study was to investigate to what extent a prescriptive birthweight standard based on a subpopulation free of risk factors for IUGR could improve the identification of infants born SGA at risk of adverse neonatal outcomes.

Materials and methods

Study design

We performed a two-phase retrospective cohort study. Firstly, we developed descriptive and prescriptive birthweight standards and classified neonates as either SGA or non-SGA accordingly. Secondly, we assessed the associations between the two SGA definitions and adverse neonatal outcomes and evaluated the birthweight standards in terms of diagnostic performance.

Study population

Data were extracted from the perinatal database of The Netherlands Perinatal Registry (PRN), which is a linked database of medical registries from four professional organisations that provide perinatal care. Among the items reported are maternal demographics and medical conditions, pregnancy complications and details concerning labour, birth and neonatal outcome, most of which are registered as diagnostic codes. Although participation is not 100 %, the database can be considered as an unbiased representation of the total population in The Netherlands [38, 40].

We included infants with GA between 24+0 and 42+0 weeks, who were alive at the onset of labour and were born in The Netherlands between January 1, 2000, and December 31, 2007. In over 95 % of the pregnancies, GA was certain, either confirmed by or based on early ultrasound. To construct descriptive birthweight standards, infants with missing GA, birthweight or gender data were excluded. In accordance with the current Dutch birthweight standards, we also excluded multiple pregnancies [40]. To construct prescriptive birthweight standards, we further restricted the inclusion criteria to infants without congenital malformations, born to healthy mothers after pregnancies without suggested risk factors for impaired foetal growth, and, in case of prematurity, after spontaneous onset of labour (Fig. 1).

Fig. 1
figure 1

Flow diagram of study participants. aMore than one exclusion criterion or missing data variable may apply to the same pregnancy. bHealth status was considered unknown when infants were born in hospitals that did not participate in the paediatric registration. cPre-existent hypertension; pre-gestational diabetes mellitus; ‘other’ systemic diseases; cardiovascular disorders with haemodynamic consequences; thromboembolic disorders; respiratory disorders; epilepsy treated with anticonvulsants; malignant diseases; anaemia (haemoglobin level <9.6 g/dL (<6.0 mmol/L)); recurrent urinary tract infections; substance abuse; medication use. dGestational hypertension; proteinuria; pre-eclampsia; eclampsia; placenta praevia; placental abruption; TORCHES; disorders of pregnancy ‘not otherwise specified’. ePrimary caesarean section or induction of labour <37 weeks GA

For the second phase of our study, we selected a sample group, consisting of neonates who were born in hospitals that take part in both the obstetric and paediatric registration. Sampling was necessary to assure that infants without recorded adverse neonatal outcomes could indeed be considered healthy, instead of being falsely labelled as such due to incomplete registration. We extended the sample by adding (low risk) births attended by midwives if the mothers’ postal codes were equal to those of mothers who gave birth in one of the selected hospitals. More details are provided in Fig. 1. Selected neonates were classified as SGA or non-SGA (i.e. birthweight < p10 or birthweight > p10), separately according to both birthweight standards.

Data analysis

Baseline characteristics were analysed using descriptive statistics and presented in an appropriate manner.

We used Tukey’s method to exclude outliers; observations were excluded if birthweight was more than twice the interquartile range below the first quartile or above the third quartile [22]. The LMS method was adopted to calculate birthweight percentiles [8]. The curves were smoothed using penalised B-splines. For more details, we refer to Appendix 1 (online supplementary material). Model fit was evaluated by visual assessment of the smoothed versus empirical percentiles (online supplementary Fig. S1).

In the absence of a gold standard, the standards were evaluated by their ability to ‘predict’ adverse neonatal outcomes associated with IUGR. Early perinatal mortality was defined as death within 7 days postpartum, including intrapartum death. Late perinatal mortality was defined as death between 8 and 28 days postpartum. Low 5-min Apgar score was defined as an Apgar score of ≤3 at 5 min postpartum [9]. The diagnosis of sepsis was established by a positive blood culture and/or based on clinical symptoms. Infant respiratory distress syndrome (IRDS) was diagnosed based on clinical and radiographic findings according to Giedion [15]. Bronchopulmonary dysplasia (BPD) was diagnosed in infants requiring oxygen supplementation beyond 36 weeks postconceptional age. Intraventricular haemorrhage (IVH) and cystic periventricular leukomalacia (cPVL) were diagnosed based on classic sonographic findings and classified according to Papile and De Vries, respectively [11, 35]. Necrotizing enterocolitis (NEC) was diagnosed based on a combination of clinical, laboratory and radiographic findings and staged according to Bell’s classification [3]. Retinopathy of prematurity (ROP) was diagnosed based on fundoscopic examination and classified according to international classification [21]. Hypoxic ischemic encephalopathy (HIE) was diagnosed based on clinical presentation, presence of seizures and duration of symptoms and graded according to Sarnat’s classification [37]. Hypoglycaemia was defined as a blood glucose level <2.5 mmol/L (45 mg/dL). Hyperbilirubinaemia was diagnosed when phototherapy was required. Hypothermia was defined as a body temperature <35.5 °C (95.9 °F). Polycythaemia was defined as a venous haematocrit >0.65.

Because the risk of adverse outcomes decreases substantially with increasing GA, we conducted separate analyses for different GA strata.

We performed multiple logistic regression analyses to relate each outcome variable to growth status (SGA versus reference category ‘non-SGA’). Various potential confounders were identified and included in the model if statistically significant at the 0.05 level (two-tailed Wald test) [24]. We estimated adjusted odds ratios (aORs) and 95 % confidence intervals for each SGA definition. The diagnostic performance of both standards was assessed in terms of sensitivity and specificity and compared using McNemar’s test for matched binary data [17, 18].

Statistical analyses were performed with SAS 9.2 software (SAS Institute, Inc., Cary, NC, USA).

Results

During the study period, 1,474,454 infants were born in The Netherlands. We excluded 113,970 (7.7 %) records because of inappropriate GA (i.e. <24+0 or >42+0 weeks), antepartum stillbirth, multiple pregnancy or missing data. Subsequently 3607 (0.3 %) outliers were excluded. The remaining study population comprised 1,339,360 live births (Fig. 1).

To obtain a ‘healthy’ subpopulation, another 268,407 (20.0 %) infants were excluded. In general, the exclusion rate decreased with increasing GA (online supplementary Tables S1 and S2). The principal cause of exclusion was maternal hypertension, defined as a diastolic blood pressure of >90 mmHg. The final healthy subpopulation comprised 1,070,953 records (Fig. 1).

Baseline characteristics are shown in Table 1. Compared to the healthy subpopulation, the excluded records comprised more nulliparous women (53.9 versus 43.8 %), preterm births (13.9 versus 3.9 %), low birthweights (<2000 g, 5.8 versus 0.6 %) and hospital deliveries (93.0 versus 55.3 %).

Table 1 Baseline characteristics of the two populations and the excluded records

Figure 2a, b shows gender-specific 10th, 50th and 90th percentiles for both birthweight standards. The difference between the standards decreases both with increasing GA and with increasing birthweight. The maximum difference between the two 10th percentiles is 312 and 362 g for boys and girls, respectively (GA 31 weeks). Consequently, the heaviest SGA infants according to the prescriptive birthweight standard may be at most 362 g heavier than the heaviest SGA infants classified according to the descriptive birthweight standard. At 39 weeks GA, the two standards agree almost perfectly, as do the 97.7th percentiles (irrespective of GA; online supplementary Tables S1 and S2). The overall percentage of infants classified as SGA according to the descriptive standard was ≈10 %. The prescriptive birthweight standard classified significantly more (preterm) infants as SGA, up to 38.0 % at 29 weeks GA (p < 0.0001; Fig. 2c).

Fig. 2
figure 2

Descriptive and prescriptive 10th, 50th and 90th percentiles for boys (a) and girls (b) and SGA rates in the sample group according to both standards (c). * Forty-two weeks includes only infants born at GA 42+0 weeks, whereas the other GAs include the entire week (e.g. 24+0 to 24+6 weeks). BW birthweight

Both SGA definitions were significantly associated with multiple adverse neonatal outcomes (Table 2). All preterm infants including those aged 24+0–25+6 weeks (online supplementary Table S3) were at significantly increased risk of low 5-min Apgar scores and early neonatal death, irrespective of which standard was used. Infants aged 26+0–31+6 weeks were at increased risk of late neonatal death, BPD and ROP. The risk of developing NEC or sepsis was increased starting from GA 28+0 weeks. The risk of cPVL was decreased in infants aged 26+0–27+6 weeks, but increased in infants aged 28+0 weeks and older. Infants aged 26+0–31+6 weeks were at decreased risk of IVH, whereas infants aged 32+0–36+6 weeks were at increased risk. The latter were also at increased risk of minor adverse outcomes hypoglycaemia, hyperbilirubinaemia, hypothermia and polycythaemia.

Table 2 Adjusted ORs and confidence intervals for separate GA strata

Term infants (GA 37+0–39+0 weeks) were at increased risk of both early and late neonatal death, low 5-min Apgar score, sepsis, HIE and hypoglycaemia, hypothermia, hyperbilirubinaemia and polycythaemia (online supplementary Table S3). Infants aged >39+0 weeks were eliminated from further analyses because there was no substantial difference between the 10th percentiles.

To assess the additional benefit of using the prescriptive birthweight standard, separate analyses were conducted for those infants classified as ‘extra’ SGA (i.e. according to the prescriptive birthweight standard only). The results are also shown in Table 2. Overall, the additional benefit increased with increasing GA. Interestingly, while the risk of IRDS was significantly decreased in SGA infants aged 32+0–36+6 weeks when classified according to the descriptive birthweight standard (aOR 0.57 [0.49–0.67]), the risk was significantly increased in extra SGA infants classified according to the prescriptive birthweight standard only (aOR 1.30 [1.16–1.46]).

The diagnostic performance of the two standards was assessed in terms of sensitivity and specificity and compared using McNemar’s test (Fig. 3; online supplementary Fig. S2 and Tables S4 and S5). The prescriptive birthweight standard showed significantly improved sensitivity, when compared to the descriptive birthweight standard (generally p < 0.0001). The increased sensitivity was proportional to the relative increase in the percentage of SGA infants, with the highest sensitivity in the GA stratum with the highest SGA percentage according to the prescriptive birthweight standard (GA 28+0–31+6 weeks). As a result of the inversely proportional relationship between sensitivity and specificity, the improved sensitivity was accompanied by a significant decrease in specificity (p < 0.0001) that was also most apparent in infants aged 28+0–31+6 weeks (Fig. 3).

Fig. 3
figure 3

Test characteristics for separate GA strata. BW birthweight

Discussion

Our study shows that prescriptive birthweight standards could improve the identification of SGA infants at risk of adverse neonatal outcomes, when compared to descriptive birthweight standards.

The two birthweight distributions we assessed were substantially different. Because risk factors for IUGR were more prevalent among preterm infants, comparatively many preterm infants did not fulfil the inclusion criteria of the prescriptive birthweight standard. The difference between the prescriptive and descriptive birthweight standard decreased with increasing GA, reflecting the higher incidence of IUGR and hence exclusion rate among preterm infants. Consequently, we found that SGA rates were much higher in preterm neonates when classified by the prescriptive birthweight standard. These results were consistent with a similar study performed by Ferdynus et al. [13]. Previous studies that compared foetal growth standards to (descriptive) neonatal birthweight standards also reported similar results, supporting the hypothesis that prescriptive birthweight standards might approximate foetal growth standards [12, 29, 42]. So-called customised standards adjust birthweight further by considering the influence of supposedly physiological determinants of growth, mostly maternal characteristics [14]. Although customised standards have consistently shown improved prediction of neonatal mortality and morbidity, the actual contribution of maternal characteristics has been questioned [7, 20]. Moreover, several studies have demonstrated that maternal influences on foetal growth may not be purely physiological [20, 44]. Because our database lacked information on most of the commonly used variables, we were not able to investigate whether customisation has an additional advantage over the application of restrictive inclusion criteria.

In our study, SGA was significantly associated with many adverse neonatal outcomes, including several minor adverse outcomes with potentially harmful implications [34, 41]. In general, associations were strongest when SGA was classified according to the descriptive standard. Since these infants were the smallest infants classified by both standards, they may be expected to be at greatest risk. However, extra SGA infants classified according to the prescriptive birthweight standard only were still at (significantly) increased risk of adverse neonatal outcomes. The additional benefit of the prescriptive birthweight standard increased with increasing GA, reflecting the shape of the ‘area between the curves’ and the (increasing) number of subjects within succeeding GA strata. Similar to Ferdynus et al., our results support the hypothesis that prescriptive birthweight standards can improve the identification of SGA infants at risk. Interestingly, we found that the risk of having IVH was significantly decreased in infants aged 28+0–31+6 weeks, whereas Ferdynus found an increased risk [13]. Results from other studies concerning the association between SGA and adverse outcomes are similarly inconsistent. It is likely that these inconsistencies arise at least partially from differences in study methodologies, for example differences in SGA definition, sample size and GA range [39]. Despite the large sample size (n = 127,584), Ferdynus et al. included relatively few premature infants. As a result, the statistical power of this study was limited and other results did not reach statistical significance [13].

Birthweight standards are inexpensive and convenient tools that are used in clinical practise to identify infants who exhibit signs of IUGR. As such, they can be viewed as screening tools, the performance of which is routinely expressed in terms of sensitivity and specificity [18]. The results presented in this study do not indisputably indicate that either one of the two standards performs best. While the prescriptive birthweight standard was significantly more sensitive, the descriptive birthweight standard was significantly more specific. Ideally, a screening instrument has both high sensitivity and specificity. In this particular setting, however, a false-positive result (i.e. low specificity) does not result in unnecessary or potentially harmful treatment, whereas a false-negative result (i.e. low sensitivity) means an increased risk of adverse neonatal outcomes is not acknowledged. All considered, the decrease in specificity may not be negligible but was deemed less clinically relevant.

As stated in the Introduction, the (negative) influence of IUGR is not only visible in early life but extends far into adulthood. The so-called developmental origins hypothesis postulates that undernutrition during foetal life and infancy permanently changes gene expression and thereby establishes body composition, functional capacity, setting of hormones and metabolism and responses to adverse environmental influences later in life. The result is an increased risk of cardiovascular disease, hypertension, stroke and type 2 diabetes mellitus [1]. Birthweight, as the ultimate outcome and quantifiable result of foetal growth, might be able to function as a starting point for risk assessment both early and later in life.

Strengths and limitations

One of the principal strengths of our study was the use of a large population. This allowed us to investigate the association between SGA and many adverse neonatal outcomes, and across separate GA strata. Because we used the same statistical method to calculate birthweight percentiles for both populations, we can attribute differences between the two standards to actual differences in birthweight distribution, thus demonstrating the effect of applying restrictive inclusion criteria.

Previous studies showed that despite the increased number of SGA infants, the association with adverse neonatal outcomes was still significant [13, 42]. By investigating extra SGA neonates as a separate group, we demonstrated the additional benefit of the prescriptive birthweight standard. Furthermore, we acknowledged the trade-off between identifying more SGA infants truly at risk, yet simultaneously increasing the number of false-positive results [18].

The biggest challenge was to define the characteristics of a healthy subpopulation. Our choice of exclusion criteria was based on an extensive literature search. Its application was to some (unknown) extent constrained by lacunas in the database; information on some of the risk factors we identified was either limited (e.g. smoking) or not available at all. If more stringent exclusion criteria could have been applied, this would have led to more removals, but not necessarily greater precision in the calculation of standards [26, 36]. Conditions that might have caused the opposite effect on growth (i.e. macrosomia) were not excluded. Although this may have caused some bias towards higher birthweights, since the upper percentiles were not influenced by the exclusion of IUGR infants, we assumed that excluding macrosomic infants would not effect the lower percentiles. Because the exclusion criteria concerned potential risk factors, it is possible that some healthy infants were excluded as well. However, birthweight distributions are more likely influenced by the inappropriate inclusion of an unhealthy infant than by the inadvertent exclusion of a healthy infant [33, 36].

Our study was limited by the need to select a sample group, to assure that infants without recorded adverse outcomes could indeed be considered healthy, instead of falsely being labelled as such due to incomplete (paediatric) registration. Aside from a distortion of the true ratios of hospital deliveries versus home deliveries and midwife-assisted in-hospital deliveries, baseline characteristics did not differ between women who were included in the sample and those who were not (results not shown). The use of retrospective data has the advantage of enabling the inclusion of a large number of infants. An obvious disadvantage is that the data were not collected with this particular study in mind and in the interpretation of our findings this has to be kept in mind. For example, the database did not contain data about antenatal corticosteroid treatment, which may be a confounder of the association between SGA and adverse neonatal outcomes. Fortunately, these limitations may not be of major consequence, since our analyses were executed on a matched sample and we were interested in the magnitude of the associations relative to each other, aiming to find the ‘relatively’ best standard. Because the current analyses were executed on a sample group of which the majority of infants also served as the reference population for the (two) birthweight standards, our findings should be confirmed in an independent population. It is our intention to repeat the analyses when we have a sufficiently large independent sample.

Conclusion

To our knowledge, this study is the largest study comparing descriptive versus prescriptive birthweight standards. We demonstrated that birthweight standards, defined by a subpopulation free of risk factors for IUGR, could improve identification of infants born SGA and at risk of several adverse neonatal outcomes. Clinical management should not be based on birthweight standards alone, and the results of this study endorse this statement by proving that neither one of the two standards performs great. However, for certain epidemiological purposes, it is also important that a standard correctly identifies infants born SGA, who may be at increased risk of adverse outcomes. Descriptive references mistakenly classify infants as non-SGA, which might, as a result, obscure or enhance the true association between potential risk factors and SGA, or between SGA and (long-term) adverse outcomes. Improved understanding of these associations will aid in clinical decision-making and care and might ultimately improve both short-term and long-term outcome.

Our results provide the rationale for the development of prescriptive birthweight standards for the Dutch population. We believe the results also justify promotion of this concept in other countries. Above all, we feel that clinicians and researchers should be aware of the difference between prescriptive and descriptive birthweight standards, and the implications their use can have for both clinical practise and research outcomes.