Introduction

Neonatal jaundice affects 60–85% of term infants [1,2,3,4]. Since visual assessment of jaundice is not accurate [1, 3, 5,6,7,8,9,10], both the American Academy of Pediatrics and the Spanish Association of Pediatrics recommend that all newborns as of 35 weeks of gestation undergo screening for hyperbilirubinemia, by measuring either total serum bilirubin (SB) or transcutaneous bilirubin (TcB) [4, 5, 7, 8, 11,12,13,14,15,16,17]. Hyperbilirubinemia is the most common cause of readmission to the neonatal unit [3, 11, 18] and is potentially dangerous due to its risk of causing acute bilirubin encephalopathy and kernicterus [19, 20]. Even though the gold standard to measure serum bilirubin is in a blood sample, that technique is painful, stressful for the newborn and the parents, time-consuming, and more expensive. For these reasons, non-invasive methods like the determination of TcB are widely used nowadays [3, 4, 13, 21, 22]. Several authors have found a good correlation between TcB and SB [10, 13, 23,24,25,26,27,28,29,30,31,32]. The majority of published studies validated TcB meters in Caucasian infants, although some authors focused on homogeneous populations of Asian, African, and Hispanic newborns [33,34,35,36,37,38,39,40,41].

Newborns in our reference area belong to different ethnic populations and have different skin tones, which led us to wonder whether correlation between SB/TcB was reliable regardless of skin color as some studies suggest [4, 8, 10, 24, 30, 42]. We had the impression that correlation of TcB/SB varied depending on skin tone, with a greater difference for darker skin. Classifying neonates into ethnic groups is complicated and does not always correlate with a particular skin color [43]. Since there were no validated neonatal phototype scales, we designed one which we previously validated—Neomar’s neonatal skin color scale (Fig. 1) [44]. Our hypothesis was that neonatal TcB correlated well with SB, although the correlation depended on skin tone. To answer that question we determined TcB and SB at 48–72 h of life and compared those results by skin color groups determined by Neomar’s neonatal skin color scale [44]. Assessing if TcB was more or less reliable depending on skin color would enable us to avoid drawing blood and help us decide when to follow jaundiced neonates after hospital discharge.

Fig. 1
figure 1

Neomar’s neonatal skin color scale

Patients and method

This was a prospective observational study. All neonates born at our hospital during the study period (October 2016 to October 2017) were offered to participate and enrolled if their parents agreed to and signed an informed consent. We expanded the study period until March 2019 to recruit additional color 4 newborns. Exclusion criteria were parents’ refusal to participate and the presence of sternal skin lesions that may alter the reliability of TcB measurement. None of the participants had been under phototherapy when measurements were done. We did not exclude patients who required phototherapy after bilirubin testing or had ABO or Rh incompatibility.

We collected data on gestational age (GA), gender, prematurity, mother’s country of origin and ethnicity (which we used as a surrogate for participant’s ethnicity), birth weight (BW) and weight at discharge, feeding choice, mother and newborn’s blood type, and paired TcB/SB measurements. Participants were assigned to a color group at 24 h of life according to Neomar’s skin color scale which has four categories: light (color 1), medium-clear (2), medium-dark (3), and dark (4) (Fig. 1). With the venous or heel puncture routinely performed at 48–72 h of life for neonatal metabolic screening, we collected an extra 0.5 mL of blood to determine SB by means of a colorimetric method by diazotation (COBAS INTEGRA® 400 plus analyzer, Roche Diagnostics). At the same time, we determined TcB with Dräger Jaundice Meter JM-105™ (Minolta, Dräger Medical GmbH, Lübeck, Germany) in the mid-sternal area, which is more light protected than the forehead. The device automatically gave us the mean of three measurements. We also included newborns with a clinical indication to determine SB at any time if parents agreed to and signed a written informed consent.

Sample size calculation was based on a desired minimum correlation coefficient of 0.85, in order to calculate a 95% confidence interval (95% CI). We obtained the required sample sizes for each margin of error using the confidence interval. For a margin error varying between 0.05 and 0.025, required sample sizes were 230 and 125, respectively. Sample sizes available in each color were enough to cover these minimum sample sizes, with the only exception of the color 4 group.

Statistical analyses

We described quantitative variables (GA, BW, TcB, and SB) using the mean and standard deviation (SD), and qualitative variables (gender, prematurity, breastfeeding, mother’s country of origin/ethnicity) with frequencies (n) and percentages. We compared the four color groups to verify that they were homogeneous in terms of GA, BW, sex, and prematurity rate. Differences among color groups were assessed through one-way ANOVA for continuous variables and chi-squared tests for categorical variables. Between-color differences in bias (TcB-SB) were also checked by one-way ANOVA followed by pairwise comparisons (Sidák method). Bland-Altman plots and their elements: mean of differences or bias and limits of agreement were performed to evaluate agreement between methods. These analyses were performed separately for each color group. Loess local regression smoothing was also applied to study possible trends on the bias. Pearson correlation coefficient (r) and its confidence interval between SB and TcB was computed for each color. Also, for each color, simple linear regression relating TcB as a linear function of SB was performed. Following these analyses, multiple linear regression models were conducted through forward stepwise approach. Variables included were, besides TcB: skin color, GA and an interaction term GA by color (GA × skin color). Variance inflation factor (VIF) and adjusted R-squared were added to the table to check multicollinearity as well as best goodness of fit. P values lower than 0.05 were considered statistically significant. SPSS version 23.0 for Windows (IBM, Armonk, NY, USA) as well as STATA version 15 (StataCorp., College Station, TX, USA) were used for statistical analyses.

Our hospital Ethics Committee accepted and approved this study (reference number 2015/6519/I).

Results

We recruited 1405 neonates during the study period. However, in 46 patients, there was no color assignment. Thus, we included 1359 patients: 337 from color 1, 750 from color 2, 249 from color 3, and 23 from color 4, which was representative of the ethnic diversity of our population. Table 1 describes the characteristics of our population and color groups. There were no differences in terms of gender, GA, and BW among the four color groups, but there were in the prematurity rate (more preterm infants in color 1 and fewer in color 3), feeding choice (more breastfed infants in color 3 and fewer in color 4), and mother’s ethnicity and country of origin.

Table 1 Characteristics of our study population

We analyzed 1549 paired SB/TcB measurements (379 for color 1, 828 for color 2, 308 for color 3, and 34 for color 4). Some patients had more than one determination of SB/TcB, and we analyzed all the paired measurements. The mean (SD) SB was 9.68 (3.94) mg/dL (range 0.6–27.8), and the mean (SD) TcB was 10.85 (4.21) mg/dL (range 0.0–20.0) for all skin colors. The mean (SD) SB was 8.65 (4.19) mg/dL for color 1, 9.70 (3.69) mg/dL for color 2, 10.80 (3.79) mg/dL for color 3, and 10.64 (5.48) mg/dL for color 4. The mean (SD) TcB was 9.35 (4.49) mg/dL for color 1, 10.78 (3.86) mg/dL for color 2, 12.69 (3.83) mg/dL for color 3, and 12.50 (5.66) mg/dL for color 4. The global (all skin colors) mean (SD) difference TcB-SB was 1.18 (1.53) mg/dL (95% CI − 1.82; − 4.19). Pearson correlation coefficient for color 1 was 0.935 (95% CI 0.921; 0.947), for color 2 0.924 (95% CI 0.913; 0.933), for color 3 0.908 (95% CI 0.887; 0.926), and for color 4 0.956 (95% CI 0.914; 0.978). Figure 2 shows the linear relationship between TcB and SB by skin color. β coefficients of TcB are all above 0.8. Moreover, all R-squared coefficients are all above 0.8 showing a good linear relationship. Results from Bland-Altman plots showed different mean bias depending on the skin color (Fig. 3). These biases were increasing with the color scale, from − 0.70 (95% CI − 3.82;2.42) for color 1 to − 1.08 (95% CI − 3.98;1.82) for color 2, and until − 1.89 (95% CI − 5.09;1.30) and − 1.86 (95% CI − 5.11;1.38) for colors 3–4, respectively. The difference increased gradually from colors 1 to 4 and was statistically significantly different between colors 1 and 2, between colors 1 and 3, between colors 1 and 4, and between colors 2 and 3, and between colors 2 and 4 (p value < 0.001), but not between colors 3 and 4 (Fig. 4). These differences are only clinically relevant between skin colors 1 and 2, between skin colors 1 and 3–4, and between skin colors 2 and 3.

Fig. 2
figure 2

Linear relationship between TcB and SB by skin color. The regression equation for each color: (color 1): SB = 0.87 × TcB + 0.50 (R2 = 0.87), (color 2): SB = 0.88 × TcB + 0.18 (R2 = 0.85), (color 3): SB = 0.90 × TcB − 0.63 (R2 = 0.82), and (color 4): SB = 0.93 × TcB − 0.94 (R2 = 0.91)

Fig. 3
figure 3

Bland-Altman plots for skin colors 1–4. SB and TcB are expressed in mg/dL

Fig. 4
figure 4

Error bar graph that shows the mean difference TcB-SB depending on skin color, including brackets for those pair of comparisons that were statistically significant on multiple comparisons

Figure 5 shows the Bland-Altman plot for all skin colors depending on TcB (≤ 15 vs > 15 mg/dL). We found that globally TcB underestimated SB for TcB > 15 (for TcB 15: LoA, 95% CI − 1.73 (− 6.02; 2.55)) but overestimated it for TcB ≤ 15 mg/dL (for TcB 15: LoA, 95% CI 1.06 (− 3.87; 1.75)).

Fig. 5
figure 5

Bland-Altman plots for TcB measures (≤ 15 vs > 15 mg/dL)

Results from multiple linear regression of SB (Table 2) showed an association with TcB (β = 0.88, 95% CI 0.87; 0.90, p < 0.001). For the color variable, in respect to color 1, β coefficients were for color 2 β = − 0.22, 95% CI − 0.40; − 0.04, p = 0.019; for color 3 β = − 0.81, 95% CI − 1.04; − 0.58, p < 0.001; and for color 4 β = − 0.80, 95% CI − 1.32; − 0.28, p = 0.003. Table 3 shows several models included in the process of model selection through forward stepwise approach. There are significant differences regarding the accuracy of TcB to predict SB depending on the value of TcB and skin color, but not depending on GA.

Table 2 Results of multiple linear regression as a function of SB
Table 3 Multivariate lineal regression models including the variables: TcB, skin color, gestational age and an interaction term gestational age by skin color (skin color × GA), and prematurity

We also calculated the proportion of potentially spared blood samples if a new guideline based on the obtained results was implemented for each color group depending on GA (Table 4): 18.4% of spared samples in neonates ≥ 38 weeks and 8.3% of neonates < 38 weeks.

Table 4 Calculation of spared samples to measure SB depending on skin color and TcB

Discussion

Jaundice meters are widely used and have improved over the past few years [4, 8, 9, 12, 22, 45]. Transcutaneous bilirubinometry measures the bilirubin in the subcutaneous tissue, and therefore, TcB is not the same value as SB [12], although current jaundice meters have been designed to agree as closely as possible with SB [8]. Most clinical studies in term and late-preterm infants have shown an accuracy of ± 2 mg/dL between TcB/SB [8]. The most currently used jaundice meters are BiliChekTM (Philips Electronics, Amsterdam, The Netherlands) and JM-103/105™ (Minolta, Dräger Medical GmbH, Lübeck, Germany). Even though jaundice meters are a powerful tool to screen neonatal hyperbilirubinemia, some controversies remain as to their reliability depending on GA, ethnic group or skin color, level of TcB/SB, and exposure to phototherapy [12, 30, 46,47,48].

The use in the preterm infant under 35 weeks has not become routine, although numerous studies support the use of TcB in infants of this GA [4, 21, 34, 36, 49, 54,55,56]. While a 2013 systematic review and Afanetti reported a similar reliability among preterm infants [4, 49], subsequent data showed that the correlation worsened with decreasing GA, at the same time that the threshold for phototherapy is lower [50,51,52]. Despite this controversy, correlation between TcB and SB in preterm infants seems strong [4, 21, 34, 36, 49, 53,54,55,56]. Jnah recently conducted a prospective study on a multiracial population of babies born at 30–34 6/7 weeks and found a strong correlation between TcB and SB prephototherapy, with consistently lower TcB readings than SB (generally between 2 and 3 mg/dL), and concluded that BiliChekTM was accurate, reliable, and useful for screening preterm neonates before, during, and after phototherapy, and could potentially avoid 40 to 79% of blood samples [9]. The results of our study do support the reliability of jaundice meters in newborns of GA from 30 to 38 weeks.

Modern jaundice meters use specific algorithms in order to isolate bilirubin from other chromophores such as melanin [8, 27]. Nevertheless, some authors found that their readings can still be affected by skin pigmentation [10, 27,28,29, 43] and noted that SB tended to be overestimated in dark-skinned infants [4, 10, 13, 27, 30, 38, 43] and underestimated in light-skinned infants [30]. Despite this, Chimhini found a good correlation in black Zimbabwean neonates [33]. Samiee-Zafarghandy assessed the accuracy of JM-103™ according to skin color but using cosmetic references, and neonatal skin color differs from cosmetic products [30]. Afanetti compared Caucasian vs non-Caucasian neonates but did not take actual skin color into account [4]. Classifying newborns by ethnic group is complex, imprecise, and unreliable due to the variability of skin color within a given ethnic group. We observed this in our study, where there were Asian, African American, and Hispanic neonates in the four color groups, and Caucasian neonates in three of them (Table 1). Several studies have analyzed the reliability of bilirubinometers depending on skin color obtaining different results [30, 42, 43, 57, 58]. The problem is that, unlike adults, a reliable classification of skin phototype or skin color or tone did not exist for the newborn, and each study used its own classification based on cosmetic color palettes [30, 43]. However, these colors differ greatly from actual neonatal skin tones, and this is why we chose to use the Neomar’s skin color scale to classify neonates. We observed that TcB overestimated SB for all skin colors, although in a higher degree for dark-skinned neonates (colors 3–4), which is in accordance with what previous studies have reported. We used JM-105™ for our study, the same device (or a newer version of it) which was used in previous studies with similar results [13, 25, 26, 29, 31]. Jones included infants from different ethnicities and found that the accuracy differed among races although he did not analyze infants according to skin color: TcB overestimated SB in African Americans and underestimated it in Caucasians, and there were no differences in Hispanics [29]. Samiee-Zafarghandy found a high association between TcB/SB for all skin colors, and an absolute mean (SD) difference TcB-SB of 0.77 (1.54) mg/dL with broad limits of agreement (− 2.30–3.85 mg/dL) [30]. Yang found mean (SD) differences between TcB-BS of 1.4 (1.8) mg/dL in term and 2 (2) mg/dL in late preterm infants [34]. Our study supports the use of TcB as a reliable estimation of SB which is consistent with previous studies which also found a strong correlation between TcB/SB [13, 25, 35, 37, 39, 59]. Correlation of SB/TcB among our patients was excellent regardless of skin color (Fig. 2), although we did observe differences depending on skin color: TcB tended to overestimate SB in all groups, but to a lesser degree for lighter skin, whereas overestimation was higher for darker skin, a difference that increased with skin color. This supported our subjective feeling that TcB often overestimated SB in darker-skinned neonates. We obtained a global mean difference SB-TcB of − 1.18 ± 1.53 mg/dL, higher than reported by Taylor (− 0.84 ± 1.78 mg/dL) and a higher correlation (0.908–0.956 depending on skin color, as Fig. 2 shows), than Taylor’s 0.78 [10]. According to our hospital guidelines, we draw a blood sample to measure SB if TcB is 2 mg/dL under the indication for phototherapy. Indication for phototherapy at 48 h of life is 15.8 mg/dL for neonates of 38 weeks of GA or more, 13.2 mg/dL for neonates of 35–37 weeks of GA, and 11.8 mg/dL for neonates of 31–34 weeks of GA. Considering skin color and the different bias TcB/SB we observed with this study, we would have changed our guidelines as follows: we would measure SB if TcB was 1.8 mg/dL under the indication for phototherapy for color 1, 1.5 for color 2, and 1 mg/dL for colors 3–4. Overall, with the new guidelines, we would have spared 18.4% of blood samples in term neonates ≥ 38 weeks and 8.3% of blood samples in neonates under 38 weeks (Table 4). We consider these proportions clinically relevant and are currently conducting a prospective study to confirm this hypothesis.

In most cases, TcB can replace SB if SB is < 15 mg/dL [7, 25, 37, 60]. Jones found a correlation of r = 0.93 between TcB/SB, although correlation worsened (both overestimated and underestimated SB) for higher values of SB (0.82 for SB > 10, 0.69 for SB > 12, and 0.52 for SB > 15). It has been reported that for high levels of SB (> 14.6–15 mg/dL), TcB underestimates SB and needs confirmation [11, 23, 25]. Our results agree with this (see Fig. 5): correlation of SB/TcB was worse for higher levels of TcB (> 15 mg/dL) [10].

Jaundice meters may also be less reliable in neonates receiving phototherapy or less than 24 h after interrupting it [7, 60], although not all the authors agree with this statement [9, 31, 34, 48, 61]. We are currently collecting data on the correlation SB/TcB in patients under phototherapy.

Our study has limitations. First, we could not recruit many color 4 patients given our population characteristics. The small numbers could make our results not comparable for this color group even though we expanded our study period in order to include more color 4 neonates. Second, we did not assess inter-observer reliability in color assignment between color groups 1–4. However, we validated Neomar’s neonatal skin color scale in a previous study, where inter-observer agreement on color assignment was good (83.2%) [44]. Third, most of our SB and TcB levels were low because this study enrolled healthy newborns. We continued a prospective study to enroll patients with an indication for phototherapy to check whether the correlation between SB/TcB remained for higher bilirubin levels. Fourth, we used JM-105™, for which our results may not be generalizable to other jaundice meters. Last, we collected data at a single center, for which our results may not be generalizable.

Conclusions

Our study supports the reliability of TcB to assess SB regardless of skin color. TcB tends to overestimate SB, in a higher degree in dark-skinned neonates. The use of Neomar’s skin color scale may help reduce the number of blood samples and the amount of painful stimuli to the newborn. A larger color 4 sample would increase reliability of TcB in this color group. TcB is less reliable for TcB values > 15 mg/dL.