1 Introduction

Pulse oximetry is a non-invasive technique that is used for measuring the blood oxygen saturation of arterial blood (SpO\(_{\text{2}}\))Footnote 1 [2]. This vital sign is widely monitored in intensive care units, general wards, during surgery, and in ambulant patients [10, 13].

Most commercial SpO\(_{\text{2}}\) devices require a probe to be attached to the finger. SpO\(_{\text{2}}\) devices operate by shining light into the skin and by collecting the photons that travelled through the tissue. A photodetecting element converts the probed light intensity into an electric signal which includes a photoplethysmographic (PPG) component. Irrespective of data acquisition mode and probing wavelength, PPG signals are modulated by blood volume variations at the microvascular bed of tissue and are recognizable as cardiac-related, temporally-varying and relatively low-amplitude signals [1, 8].

Typically, monitoring SpO\(_{\text{2}}\) involves PPG measurements at red and near-infrared (NIR) wavelengths. This is because PPG-amplitudes acquired using red (e.g., 660–700 nm) are sensitive to the saturation level of the arterial blood whereas the PPG-amplitudes in NIR (e.g. 800–950 nm) enable accounting for individual and temperature variations [3]. The ratio of the normalized PPG-amplitudes in red over NIR (hereafter denoted as RoIR) is commonly known as the “ratio-of-ratios” (RRs) method, with the understanding that the PPG amplitude for each wavelength derived from a ratio between a pulsatile component, AC, and non-pulsatile component, DC [22].

While relatively unobtrusive, contact probes are undesirable in patient groups with sensitive skin such us premature infants and the wounded. This motivated the concept of measuring PPG remotely and development of camera-based SpO\(_{\text{2}}\) (SpO\(_{\text{2,cam}}\)) systems [21]. One fundamental difference between contact and camera-based/imaging pulse oximetry is the shallower light penetration depth involved in the latter. Verkruysse et al. [20] showed that by using 675 nm (red) and 840 nm (NIR), a single universal calibration curve with acceptable spread between individuals could be achieved. Also van Gastel et al. [18] elaborated on the calibratability of camera-based SpO\(_{\text{2}}\) estimates by showing that motion-robustness improves when more than two wavelengths in the red-NIR range are combined to cancel out motion interference in PPG signals.

Despite the recognized value of using red and NIR wavelengths, there are practical reasons for preferring visible light (Vis) wavelengths. For example, green PPG amplitudes are the strongest across the range of 300–900 nm [19]. Also, modern illumination setups are becoming less reliant on incandescent lamps and the availability of NIR light is no longer a given. Lastly, operating in Vis could enable cost-effective implementations since RGB cameras, including those on smartphones, could be used. As demonstrated by Guazzi et al. [7], the red and blue camera channels of an RGB camera could provide inputs for an adapted RRs-based methodology for estimating SpO\(_{\text{2}}\). Trends were shown in the logarithmic ratio of the normalized red over normalized green which were associated with SpO\(_{\text{2}}\) changes in five subjects. However, calibrations were subject dependent and temporal consistency was not assessed.

Another concern about the validity of camera-based SpO\(_{\text{2}}\) estimations in Vis is the fact that blue-green and red wavelengths differ in terms of penetration depth. This challenges the usual pulse oximetry assumption of wavelengths interrogating the same vasculature. This aspect is emphasized in Fig. 1 (adapted from [20]) which shows, qualitatively, that the contribution to the camera-based PPG signals acquired using blue and green wavelengths are mostly modulated by arterioles at the upper dermal layers whereas signals acquired using red wavelengths even reach subdermal tissue. The depth-gap between probing wavelenghts may increase the inaccuracy of camera SpO\(_{\text{2}}\) estimates because of depth-variations in skin properties and its interference on the RRs, e.g. the blood concentration [16].

Fig. 1
figure 1

The penetration depths are wavelength-dependent and PPG signals probed using red-NIR light reach the deep arteriovenous (A-V) plexus and subdermal tissue, whereas signals probed using blue and/or green have depth-origin at arteriolar blood volume variations the upper dermis

The focus of this paper is on the calibration feasibility of SpO\(_{\text{2,cam}}\) using green as a replacement for NIR for normalization, while keeping red as the saturation-sensitive wavelength. We limit our scope to an adapted version of the RRs to Vis; i.e., the ratio of normalized red over the normalized green (RoG) as predictor of SpO\(_{\text{2}}\) levels. While this study is ancillary to Verkruysse et al. [20], new data is reported for camera-based PPG signals probed with a camera dedicated to green light. For consistency with Verkruysse’s work, similar statistical, data processing and visualization options were applied to ease study comparisons.

Note that the nomenclature used in this paper is consistent with the recommendation for naming convention of Leonhardt et al. [14]. PPG imaging is used in reference to frameworks or algorithms that are aimed at vital signs extracted remotely from 1-dimensional PPG signals. PPG imaging and camera-based PPG are used interchangeably. Given the nomenclature redundancy in the literature, we clarify that the insights of this paper are applicable to “remote PPG”, “DistancePPG, “non-contact PPG”, “video-PPG” and “imaging PPG”.

2 Methods

Section 2.1 refers to compliance with ethical standards. Section 2.2 describes the experimental setup and protocols leading to the paired data (camera-based RoG, reference SpO\(_{\text{2}}\)). Section 2.3 is dedicated to signal processing and details the determination of the calibration curve (i.e., linear regression) parameters for the SpO\(_{\text{2}}\) level (%) as function of RoG.

2.1 Ethical statement

The study was not a clinical trial and was conducted on healthy volunteers [20]. The experimental protocols that were applied in this investigation (studies N, H, and C) were approved by the Philips IRB (Internal Committee on Biomedical Experiments)Footnote 2 and a written informed consent was obtained from each subject. All experiments were carried out in accordance with the ethical standards laid down in the 1964 Helsinki declaration and its later amendments.

2.2 Setup and dataset construction

In a basic set-up protocol (study N), 25 healthy subjects were involved. The subjects foreheads were recorded synchronously by three monochrome cameras (Stingray F046-B FireWire; Allied Vision Technologies, Stadtroda, Germany). These were equipped with bandpass filters with center / bandwidth wavelengths of 560/32 (green), 675/67 (red) and 842/56 (NIR) nm, respectively (Semrock FF01-579/34-25, FF02-675/67-24-D and FF01-842/56-24D; IDEX Corp., Lake Forrest, IL) and sampled at 15 frames per second. Subjects were recorded in the sitting position and motion was minimized by an head support (see Fig. 2).

Fig. 2
figure 2

Setup schematics and protocols/studies. Sitting subjects were measured by monochrome cameras coupled with bandpass filters for R, G and NIR, as well as by finger pulse oximetry (SpO\(_{\text{2,ref}}\)). Two light armatures with incandescent bulbs were used. Gradual temperature cooling or hypoxia were induced in some of the recordings

Stable illumination was provided by two armatures (Falcon Eyes, Hong Kong, China), each equipped with 9 incandescent lamps (Philips, 40 W; DC-powered). Both were place symmetrically and at a distance of about 1 m from the subjects face. Assuming an isotropic light distribution and a luminous efficacy of 15 lumens per watt, the total luminous flux and intensity is estimated as 10.8 kilo-lumens and 859.4 candela, respectively.

To minimize the contamination of specular reflections in facial images, the light sources were placed at an angle of \({45}^{\circ }\) with respect to the camera. However, including polarising filters could have been beneficial for normalized PPG-amplitude measurements in green wavelengths, in particular in subjects with oily skin and/or darker skin tones. Further considerations on specular refections are included in the discussion section of this paper.

Figure 3 shows representative frames of forehead recordings performed using the green, red and NIR camera channels. To ease the registration of camera video-recordings, markers were placed on the subjects forehead. The markers (consisting of “L”-shaped pieces of paper) were gently attached skin using double-sided adhesive tape. After registration, rectangular skin regions of interest (sROIs) were manually selected in a subset of uniformly illuminated skin pixels within the forehead (see yellow rectangles in Fig. 3).

Fig. 3
figure 3

Representative forehead frames and skin selections (in yellow) for probing camera-based PPG signals in the green, red and NIR camera channels

For arriving at a SpO\(_{\text{2}}\) reference signal, the sample-wise (1 Hz) median of measurements from four commercially available contact probes was taken. The reason for combining commercial probes was improving the precision of the reference signal.

Two protocol adaptations to study N were implemented; one to broaden the range of SpO\(_{\text{2}}\) variations (study H) and another to explore the possible effect of patient centralization (as mimicked by temperature cooling; study C). Study H involved 21 healthy subjects and consisted of measurements performed in a [normobaric] hypoxic tent. Study C was performed in a climate-controlled room with an initial temperature of \(21.4\pm 2.0\,^{\circ }{\text{C}}\) (average±standard deviations; N = 21) that was gradually reduced in steps of about \(5\,^{\circ }{\text{C}}\) until a final temperature of \(8.4\pm 1.2\,^{\circ }{\text{C}}\) was reached. Video recordings of the subjects foreheads and cheeks were made before, in the middle of the cooling intervention and when about 5–8 \(^{\circ }{\text{C}}\) was reached. Study C involved ten healthy subjects and three recordings per subject.

Conventional (contact-based) pulse oximetry was obtained in all subjects with the aim of becoming a SpO\(_{\text{2}}\) reference. Four commercially available sensors, coupled to a Philips MP2 patient monitor, were used: a Philips finger and ear sensor (M1191B and M1194A, Philips Medizin-Systeme, Böblingen, Germany), a Masimo finger sensor (LNCS DC-I, Masimo Corporation, Irvine) and a Nellcor finger sensor (DS-100A Medtronic, Dublin, Ireland). The pulse oximetry sensors were synchronized between each other and acquired 1 SpO\(_{\text{2,ref}}\) (%) sample per second. Note that the camera sensors were synchronized between each other by using LabVIEW software (National Instruments, Austin, TX) while video recordings were synchronized manually with SpO\(_{\text{2,ref}}\) readings manually. Further details on the dataset preparation are found elsewhere [20].

For the current investigation, 36 green, red and NIR recordings from studies N and C were used to assess the calibratability of SpO\(_{\text{2,cam}}\) from RoG and from RoIR. 40 additional recordings from 5 healthy subjects were used from study C. Study C contains 10 forehead measurements and 10 cheek measurements at room temperature conditions, 10 forehead measurements at medium temperature and 10 forehead measurements at low temperature. Each subject was measured twice for each condition and skin site. The duration of the recordings ranged from 3 to 20 min (median, 5 min). The camera video recordings and the conventional pulse oximetry (reference) signals were acquired by LabVIEW and stored in an uncompressed format for offline processing in Matlab (version 2018a, the Mathworks, NA).

2.3 Data analysis and statistics

2.3.1 Signal processing

Reference \({{SpO}_{2}}\) After manually synchronizing the SpO\(_{\text{2}}\) traces with video recordings, a reference SpO\(_{\text{2}}\) signal was determined for each recording from the median of 4 finger pulse oximetry traces. One reference SpO\(_{\text{2}}\) value, \(SpO_{2,ref}^{(i)}\), was stored per recording session i.

PPG imaging: signal extraction After registration of the red, green and NIR images, rectangular ROIs were manually defined such that they only contained skin. For all subjects, the average ROI area was \(17\pm 4\%\) of the total image area (19k ROI pixels out of 45k total image pixels). For subjects 10 and 11 (exemplified in Fig. 3), the ROI area were 86,868 and 72,927 pixels, respectively.

Raw camera-based PPG signals, G(t), R(t) and IR(t), were obtained as the spatially averaged pixel intensity in a manually selected skin region of interest (sROI) on the subjects forehead. The demarcation of the sROIs was aimed at maximizing skin area while avoiding specular locations and body hair. Because the subjects face was stabilized by a supporting structure, it was sufficient to apply static sROIs over each video recording.

AC/DC-normalization The signals G(t), R(t) and IR(t) were normalized by division by its low-pass filtered component (LPF): \(x_{AC/DC} = (x- x_{LPF})/x_{LPF}\). The procedure is known as AC/DC-normalization and is employed as a means to achieve invariance to external lighting fluctuations or slow body motions [5, 19].

To generate low-pass filter coefficients, the Matlab function butter was used to generate an 11th order Butterworth filter (normalized cutoff frequency, 0.667/ (\(f_s /2\)), where the parameter \(f_s = 15\) Hz is the sampling frequency). Filtering was applied to G(t), R(t) and IR(t) with the function filtfilt. The normalized signals were denoted as G\(_{\text{n}}\)(t), R\(_{\text{n}}\)(t) and IR\(_{\text{n}}\)(t).

Normalized signals can be seen as the additive contributions of a source signal, s(t) or \(s'(t)\), a common motion signal, m(t), and additive white Gaussian noise, n(t), to account for sensor noise:

$$\begin{aligned} G_n(t)= & {} \beta _{G} s'(t) + m(t) + n_G(t) \end{aligned}$$
(1)
$$\begin{aligned} R_n(t)= & {} \beta _{R} s(t) + m(t) + n_R(t) \end{aligned}$$
(2)
$$\begin{aligned} IR_n(t)= & {} \beta _{IR} s(t) + m(t) + n_{IR}(t) \end{aligned}$$
(3)

The \(\beta\) terms translate the wavelength-dependency of the PPG imaging amplitudes and the following holds by definition: \(RoG \triangleq \beta _{R}/\beta _{G}\) and \(RoIR \triangleq \beta _{R}/\beta _{IR}\). Note that the notation used in Eq. 1 for the PPG source of G\(_{\text{n}}\)(t), \(s'(t)\), is distinguished from the notation used for R\(_{\text{n}}\)(t) and IR\(_{\text{n}}\)(t). This translates morphological differences between green and red-NIR signals. The motion m(t) is common for all signals. While there is a line of camera-based PPG research aimed at motion-robustness, e.g. [5, 18], in a datasets comprising recordings with only sporadic and non-severe skin motions, the approaches within the next paragraphs are sufficient for obtaining reliable measurements of RoG and RoIR (see Appendix A).

Denoising In a second step, G\(_{\text{n}}\)(t), R\(_{\text{n}}\)(t) and IR\(_{\text{n}}\)(t) were temporally filtered in the frequency domain to the fundamental of the pulse-rate frequency (p.r.f.). This strategy relies on harmonic truncation and is aimed at achieving higher signal sensitivity [12]. Figure 4 depicts the denoising approach used in this study, featuring window-based processing with harmonic truncation guided by the p.r.f. estimated at the signal G\(_{\text{n}}\)(t).

Fig. 4
figure 4

Approach to denoise camera-based PPG signals

The systolic peaks of G\(_{\text{n}}\)(t) were first detected by peak detectionFootnote 3. Then, the camera-based PPG signals were processed in an overlap-and-add manner with temporal windows of 300 digital samples (i.e., a duration of 20 s each) and 50% overlap (window type: Hanning). In each window, signals were Fourier-transformed (FFT) and truncated to the frequency bin that corresponds to the p.r.f. (tolerance: ± 2 bins). The denoised signals were denoted as G\(_{\text{n,1}}\)(t), R\(_{\text{n,1}}\)(t) and IR\(_{\text{n,1}}\)(t).

Calculating RRs PPG-amplitudes were determined as the difference between peaks and valleys in denoised PPG imaging signals. By defining the operator \(E_{peaks}[.]\) as the median of these cardiac-related systolic peaks and diastolic valleys in signals, the amplitudes of G\(_{\text{n,1}}\)(t), R\(_{\text{n,1}}\)(t) and IR\(_{\text{n,1}}\)(t) were determined as follows:

$$\begin{aligned} |Gn|= & {} E_{peaks}[Gn_1(t)] + E_{peaks}[-Gn_1(t)], \end{aligned}$$
(4)
$$\begin{aligned} |Rn|= & {} E_{peaks}[Rn_1(t)] + E_{peaks}[-Rn_1(t)], \end{aligned}$$
(5)
$$\begin{aligned} |IRn|= & {} E_{peaks}[IRn_1(t)] + E_{peaks}[-IRn_1(t)]. \end{aligned}$$
(6)

The RRs between the normalized red over green amplitudes, RoG, and red over NIR, RoIR, were finally determined for each subject as \(RoG = |Rn|/|Gn|\) and \(RoIR = |Rn|/|IRn|\), respectively. Each recording session provided two data pairs: (\(RoG^{(i)}, SpO_{2,ref}^{(i)}\)) and (\(RoIR^{(i)}, SpO_{2,ref}^{(i)}\)). The group descriptives of SpO\(_{\text{2,ref}}\) and SpO\(_{\text{2,cam}}\) by using |Rn|, |Gn|, RoIRRoG were determined as means ± standard deviations (std) or as 95% confidence intervals (CI).

2.3.2 Calibratability assessment

A metric of signal quality Noise and sporadic motions occur even under controlled measurement conditions. As a metric of signal quality, the signal-to-noise ratio (SNR) of R\(_{\text{n}}\)(t) was proposed [20]. This choice was motivated by the fact that Rn(t) has typically the weakest PPG-amplitudes and lower-bounds SNR in the Vis-NIR range. For each video recording i, we obtained \(SNR_{Rn}^{(i)}\) as the mean of estimates computed at 20-sec long, non-overlapping windows over R\(_{\text{n}}\)(t).

In each window, the transient SNR was computed as the logarithmic ratio of the power of R\(_{\text{n}}\)(t), \(P_{Rn}\), over noise, \(N_{AWGN}\). \(P_{Rn}\) was estimated in a frequency band containing the fundamental and two harmonic(s) of the pulse-rate frequency. Figure 5 illustrates possible in- and out-bands for estimating \(P_{Rn}\) and \(N_{AWGN}\) in the frequency domain (FFT length, 300 samples).

Fig. 5
figure 5

In and out frequency bands for estimating the SNR of R\(_{\text{n}}\)(t). Plotted in black is the power spectral density (PSD) of a sinusoidal signal, s, with a frequency of 1 Hz and 2 harmonics plus sensor noise, n (additive white Gaussian noise). Acronyms: a.u., arbitrary units; p.r.f., pulse-rate frequency

The power of R\(_{\text{n}}\)(t) was divided by the noise floor estimated in non-pulse bins surrounding the harmonics of the p.r.f. (see out-bands in Fig. 5). Diverging from Verkruysse et al., the signal quality metric is not aimed at excluding subjects from the dataset.

Regression analysis Linear coefficients (\(C_1, C_2\)) were fitted to the data pairs \((RoG^{(i)}, SpO_{2,ref}^{(i)})\) as

$$\begin{aligned} Spo_{2, ref}^{(i)} = C_1 RoG^{(i)} + C_2. \end{aligned}$$
(7)

A robust algorithm for determining (\(C_1, C_2\)) was used instead of the regular least squares method (Matlab function fitlm with robust options enabled). Robustness was achieved by using a bisquare function to penalize outliers, thus mitigating the influence of possible outliers or noisy points in the regression coefficients [6, 9]. We computed the significance of the regression, its Pearson’s R-squared (\(R^2\)) and the root-mean-squared error metric \(A^{*}_{rms}\) defined as

$$\begin{aligned} A^{*}_{rms} = \sqrt{\frac{1}{n} \sum _{i=1}^{n} \left| SpO_{2,ref}^{(i)} - SpO_{2,cam}^{(i)}\right| ^2 }. \end{aligned}$$
(8)

The asterisk in \(A^{*}_{rms}\) is used to emphasize that this metric excludes short-time errors (e.g., due to sporadic head movements). For Eq. 7 with coefficients (\(C_1, C_2\)), plots were provided with 99% confidence intervals (CIs; based on the \(\chi ^2\) distribution for the root-mean squared error of the regression) and the ISO 80601-2-61:2011 requirement of \(A^{*}_{rms}<\)4%. The same procedure was followed for calibrating SpO\(_{\text{2,cam}}\) based on RoIR.

2.3.3 Challenges to calibratability

Challenge 1 The data obtained during temperature cooling was explored by testing the significance of the mean difference between RoG measured at room, medium and low temperature conditions. Because RoG data was Gaussian-distributed (as assessed by the Shapiro-Wilk test), one way ANOVA tests were used to perform comparisons. Also, we assessed whether temperature-perturbed data points were within the error margins of the calibration curve obtained for SpO\(_{\text{2,cam}}\).

Challenge 2 Also a switch in the measured skin-site can increase the estimation errors by (7). Skin-site variations were explored by testing the significance of the difference of the means between RoGs obtained at room temperature between paired samples of RoG computed at forehead versus cheeks. The RoG for the two skin sites were also Gaussian distributed and paired-sample t tests were used to test the significance of the mean differences SpO\(_{\text{2,cam}}\) estimates from forehead vs. cheeks. The significance level p was set at 0.05.

3 Results

3.1 Calibration Constants

Figure 6a shows the scattered data from studies N and H for the pairs (\(RoG^{(i)}, SpO_{2,ref}^{(i)}\)). A fit to these points enabled 99% margins of agreement within 4% and an \(A^{*}_{rms}\) of 2.9% (see Fig. 6b; \(N=46\)). Comparatively, the pairs (\(RoIR^{(i)}, SpO_{2,ref}^{(i)}\)) would also enable 99% margins of agreement within 4% and an \(A^{*}_{rms}\) of 1.7%.

Fig. 6
figure 6

a Scatter plot and regression line along with its corresponding 99% confidence interval (\(N=46\)); b Calibration data from studies N and H, shown along with data from study C (cooling intervention). The labels “Low”, “Medium”, and “Room” refer to temperatures of approximately \(8\,^{\circ }{\text{C}}\), \(14\,^{\circ }{\text{C}}\) and \(22\,^{\circ }{\text{C}}\), respectively

Similar to conventional pulse oximetry, RoG and RoIR were negatively associated with SpO\(_{\text{2}}\). The obtained calibration curves are as follows:

$$\begin{aligned} SpO_{2,cam}(RoG)= & {} -98.0 RoG + 105.0, \end{aligned}$$
(9)
$$\begin{aligned} SpO_{2,cam}(RoIR)= & {} -45.2 RoIR + 116.8. \end{aligned}$$
(10)

The fact that the coefficient \(C_1\) is twice as large for SpO\(_{\text{2,cam}}\)(RoG) than for SpO\(_{\text{2,cam}}\)(RoIR) suggests a disadvantage of green as an alternative to NIR wavelengths. Still, Eq. 9 was significant and evidenced a moderate strength of association between SpO\(_{\text{2,ref}}\) and RoIR (\(R^2=0.58, p<0.001\); see Fig. 6a). We remark that this outcome is below than the attainable by using red-NIR (Eq. 10: \(R^2 = 0.86, p<0.001\); \(A^{*}_{rms} = 1.7\%\)).

To explore possible causes for increased individual errors, the PPG data from an outlier in Fig. 6 (Subject 11), was compared with that of an inlier point (Subject 10). Accordingly, the raw and AC/DC-normalized camera-based PPG signals for subjects 10 and 11 are provided in Fig, 7 and in Fig. 8, respectively. The comparison of this data evidences higher noise levels (relative to PPG-amplitudes) in the outlier situation.

Fig. 7
figure 7

Raw and normalized camera-based PPG signals for an inlier of the calibration study (SpO\(_{\text{2,cam}}\)(RoG), 97.6%; SpO\(_{\text{2,ref}}\), 98.9%; Subject 10, study N). The outcomes of low-pass-filtering (LPF) raw camera-based PPG signals are included

Fig. 8
figure 8

Raw and AC/DC-normalized camera-based PPG signals for an outlier of the calibration study (SpO\(_{\text{2,cam}}\)(RoG), 93.7%; SpO\(_{\text{2,ref}}\), 101.5%; Subject 11, study N). Acronym: LPF, low-pass-filtering

Note that, for subject 11, heavy modulations were observed in raw signals in addition to the cardiac-related component. The frequency of these modulations is about 0.14 Hz, which is lower than the normal range of resting respiratory frequencies (\(\sim\) 0.3 Hz). It is unclear from the data whether these modulations are due to Mayer waves (\(\sim\) 0.1 Hz) [11] and/or motions. Still, the pattern was not observed in other subjects.

Realizing that the magnitudes of camera-based PPG signal can differ considerably among populations, in particular among varying skin tones, we use \(SNR_{Rn}\) as a describing metric of our dataset. The estimated SNR for the PPG signals in the data from studies N and H is shown in Fig. 9. The median SNR was 2.8 dB and the minimum − 4.0 dB. Thus, the calibration constants developed in this study for SpO\(_{\text{2,cam}}\)(RoG) are independent of the SNR levels, but an \(A^{*}_{rms}<\) 3% only holds for populations and recording conditions for which the median \(SNR_{Rn}\) exceeds 3 dB.

Fig. 9
figure 9

SNR estimations for the recordings in the calibration dataset (\(N=46\))

3.2 Challenge 1: temperature cooling

Cooling reduced PPG-amplitudes at all wavelengths. Figure 10 depicts separate regressions that were calculated for the logarithm of the camera-based PPG amplitudes (|Gn|, |IRn|, and |Rn|) as function of temperature.

Fig. 10
figure 10

The amplitudes of the camera-based PPG signals are positively associated with room temperature. The temperature-induced changes in red, NIR and green are proportional

The correlations for associations of temperature with PPG-amplitudes were significant (\(\hbox {p}<0.001\)) and resulted in \(\hbox {R}^2\) values of about 0.6. Based on the slopes of the regressions, for a temperature decrease of \(18\,^{\circ }{\text{C}}\), PPG-amplitudes decrease by a factor of seven.

To assess whether cooling also influences SpO\(_{\text{2,cam}}\), it is useful to observe Fig. 6b, which refers to the cooling intervention (study C; the \(A^{*}_{rms} =\) 2.9 is the error margin for estimates based on \(SpO_{2,cam}(RoG) = -\,98.0 RoG + 105.0\); Eq. 9). The scattering of the data for medium and low temperatures indicates that SpO\(_{\text{2}}\) was underestimated. The cooling-related bias was confirmed by t test comparisons. The average underestimation errors for the point subset measured at temperatures \(>15\,^{\circ }{\text{C}}\) was − 1% (CI [− 1.7, − 0.3]%, \(\hbox {p} < 0.05, N=10\); SpO\(_{\text{2,cam}}\), \(95.8\pm 1.0\%\); SpO\(_{\text{2,ref}}\), \(96.8\pm 0.7\%\)). For the point subset measured below \(15\,^{\circ }{\text{C}}\), the mean underestimation error was − 2.4% (CI [− 3.1, − 1.6]%, \(\hbox {p} < 0.05, N=20\); SpO\(_{\text{2,cam}}\), \(96.3\pm 1.1\%\); SpO\(_{\text{2,ref}}\), \(98.7\pm 1.1\%\)).

Therefore, while most bias errors fit within the margin of the calibration curve, errors do become larger with temperature cooling.

3.3 Challenge 2: a change in measurement site

Five subjects had their foreheads and cheeks measured successively under normal room temperature (\(22.1\pm 2.0\,^{\circ }\hbox {C}\)) and normoxic conditions. Table 1 shows the resulting SpO\(_{\text{2,cam}}\) estimates (means±std) from the RoG and RoIR measurements performed at the forehead and cheek (study C). Table 1 indicates that RoG-based SpO\(_{\text{2}}\) estimates are more affected by a skin-site change between cheek and forehead than RoIR-based estimates.

Table 1 Group statistics (means±std) for SpO\(_{\text{2,cam}}\) estimated by RoG or by RoIR for forehead versus cheek, SpO\(_{\text{2,ref}}\) and bias errors

SpO\(_{\text{2,cam}}\) estimates obtained at the forehead and cheek were compared by means of paired sample t tests. As shown in Table 1, the bias errors in SpO\(_{\text{2,cam}}\) obtained by RoG or RoIR at the forehead are insignificant or within the error margin of the calibration curves. Conversely, the bias errors for estimations at the cheek are significant and exceed the \(A^{*}_{rms}\) of its respective calibration curves. Table 1 evidences that a change in skin-site measurement is only significant for RoG-based SpO\(_{\text{2,cam}}\). Changing the measurement site from forehead to cheeks resulted in an average underestimation error for SpO\(_{\text{2,cam}}\) by RoG of − 3.6% CI [− 5.7, − 1.5]% (i.e., above the \(A^{*}_{rms}\) of the calibration curve).

4 Discussion

This study assessed the calibratability of SpO\(_{\text{2}}\) with center wavelengths at 560 nm (green) and at 675 nm (red). The study was performed on the basis of data from 46 healthy adult individuals recruited for [20]. Stationary sitting subjects had their forehead and/or cheeks measured under ambient room temperature during normoxic conditions and in an hypoxic tent. A calibration curve for SpO\(_{\text{2}}\) as function of RoG was obtained at the forehead. The goodness of this fit was assessed in terms of the strength of correlation and by the \(A^{*}_{rms}\) metric (discarding short-term errors).

The presented data evidence that a single calibration curve can be used to estimate SpO\(_{\text{2}}\) with cameras operating in Vis with an \(A^{*}_{rms}\) error \(<3.0\%\). Concluding upon the feasibility of RoG-based SpO\(_{\text{2,cam}}\) is not trivial because of the considerable penetration depth-gap when compared to camera-based pulse oximetry using red and NIR. It is possible that the penetration depth-gap between red and green wavelengths makes the calibrations susceptible to physiological variations of the skin properties [15]. We minimized possible errors by constraining the measurement conditions, namely by defining a setup and protocol with static, straight-up sitting subjects, a defined measurement site (forehead) and by controlling the room temperature.

The \(A^{*}_{rms}\) error obtained for SpO\(_{\text{2,cam}}\) using red and green was higher than the \(A^{*}_{rms}\) error obtained by using red and NIR but acceptable in view of typical \(A^{*}_{rms}\) values of 2% or 3% for commercial pulse oximeters. Still, coping with sensor noise under the requirement to periodically update SpO\(_{\text{2}}\) estimates (e.g., of about just 10 s) is more difficult for SpO\(_{\text{2,cam}}\) than for their contact-based counterparts which typically achieve a higher SNR and sampling rates. These technical issues can add 2% extra error, which means that technical developments are important to leverage the potential advantage of camera-based pulse oximetry.

The assumption of universal calibratability was challenged by a cooling intervention which mimics patient centralization. From the cold-induced vasoconstriction, we found that the SpO\(_{\text{2,cam}}\) method by RoG tends to underestimate SpO\(_{\text{2}}\). The \(A^{*}_{rms}\) for estimations performed under medium and low ambient temperatures was \(< 2\%\). The seemingly parallel lines suggest that calibrations could hold in the investigated temperature range. Although this result is encouraging, we remark that the temperatures achieved during the cooling intervention comprised 8–15 \(^{\circ }{\text{C}}\). Given the extent of this range, future work is necessary to challenge SpO\(_{\text{2,cam}}\) under more extreme cooling conditions and to assess actual centralized patients.

In the second challenge of this study, it was seen that a change in skin-site measurement leads to considerable measurements errors. This is reasonable given the depth-gap of the PPG-signals acquired using green versus red-NIR wavelengths [16]. As an implication, RoG-based SpO\(_{\text{2,cam}}\) estimated obtained at cheeks require dedicated calibration coefficients to avoid bias errors of 2–6% (see Table 1). The effect of a change in skin site was also observed for RoIR-based SpO\(_{\text{2,cam}}\), although in a lesser extent. Thus, one may assume that the calibration curves for SpO\(_{\text{2,cam}}\) in Vis are specific to the skin-site at which they were developed.

When it comes to continuous monitoring, we see an opportunity to achieve SpO\(_{\text{2,cam}}\) by using green wavelengths, or possibly blue, as an alternative for NIR. However, the relatively lower performance of Vis measurements with respect to red-NIR is not yet fully understood. We believe that contributions which enable a knowledge transfer from contact-PPG are an efficient way to increase our understanding of PPG imaging. Multi-channel photoplethysmographic research systems are enablers of this promising research direction [4, 17]. In this regard, replacing commercial SpO\(_{\text{2}}\) probes in our own setup by raw separate PPG probes for green and red light, synchronized with the camera-recordings, could have provided invaluable insights into the PPG signals under investigation.

We hope that future technical improvements in camera sensors, data acquisition practices and methods will lead to technical solutions for coping with the penetration depth gap(s) between probing wavelengths, motion interference, low SNR conditions and specular refections. Short-term errors may be mitigated in future work by adapting motion-robust algorithms, e.g., [18]. These factors, combined with the careful standardization of measurement conditions will be key to improve SpO\(_{\text{2,cam}}\) further.

Because of its relevance to the SpO\(_{\text{2,cam}}\) methodology, we proceed this discussion with considerations on specular reflections and motion interference.

Specular reflections Polarizing filters are useful for optical skin measurements and are also seen in camera-based PPG works aimed at accurate amplitude measurements, e.g. [16]. When placed in front of the cameras and light sources in a cross-polarizing scheme, specular reflections are cancelled out, which ensures that the normalized PPG-amplitudes are not underestimated.

In study for which the video recordings were originally intended [20], polarizing filters were not essential because red and NIR light penetrate effectively into the skin, which makes the fraction of specularly reflected light small in comparison with the total scattered light.

However, the absence of polarizing filters in this particular study implies that the relatively lower performance of the SpO\(_{\text{2,cam}}\) method by RoG (in comparison with its RoIR-based counterpart) can be partly attributed to contamination by specular reflections. Indeed, the inclusion of polarizers could have improved measurements in green wavelengths, in particular in subjects with oily skin and/or darker skin tones. Future work is valuable to quantify the performance gains enabled by polarizing filters in Vis-based calibrations.

Motion interference The importance of motion robustness is well emphasized in the camera-based PPG literature, e.g., [5, 16, 18]. Therefore, the cautious reader may wish reassurance that motion artifacts did not affect the calibration results provided in this paper.

Profiting from the IR(t) signal, which can be regarded as a third reference signal for algorithmic purposes, we have developed a means to estimate and cancel out a motion interference component in normalized camera-based PPG signals (see Appendix A for details). This enabled the calibratability analysis to be repeated based on “uncontaminated” signals. Interestingly, for SpO\(_{\text{2,cam}}\) by RoG, the calibration coefficients were almost unperturbed (\(C_1\,=\,-\,96.4\), \(C_2\,=\,104.8\)), with an \(A^{*}_{rms}\) of 3.0% and \(R^2\,=\,0.57\) (\(p<0.001\)). Similar observations held for SpO\(_{\text{2,cam}}\) by RoIR (\(C_1\,=\,-\,43.2\), \(C_2\,=\,115.9\); \(A^{*}_{rms}\) = 1.7%; \(R^2\,=\,0.86\); \(p<0.001\)). Thus, as far as it pertains to coping with motion interference, the RRs-methodology can be used in SpO\(_{\text{2}}\) estimations based on recordings with no more than sporadic motions. We hope that our study will be followed by algorithmic developments aimed at coping with higher extents of motion interference.

5 Conclusion

The calibratability of camera-based pulse oximetry using red and green wavelengths was demonstrated by a robust linear regression on data from 46 healthy individuals measured under normoxic and hypoxic conditions (SpO\(_{\text{2}}\) range, 85% – 100%). This represents an extension of the earlier work in which the calibratability of SpO\(_{\text{2}}\) measurements using NIR and red was demonstrated. The results obtained in this paper indicate an \(A^{*}_{rms}\) error for SpO\(_{\text{2}}\) based RoG measurements of 3.0%, which is compatible with ISO 80601-2-61:2011 but larger than its counterpart measurements performed in the red–NIR window (\(A^{*}_{rms}\), 1.7%).

Errors increased when we measured at lower temperatures or at a slightly different skin site (cheek instead of forehead). These sensitivities result from probing the vasculature at different depths for red and green. Our results support the statement that a single calibration curve for estimating SpO\(_{\text{2,cam}}\) using red and green is feasible, though estimation errors are higher than by using red and NIR.