Introduction

Respiratory rate is an important indicator for early detection and diagnosis of potentially dangerous conditions such as sleep apnea,24 sudden infant death syndrome,18 cardiac arrest3 and chronic obstructive pulmonary disease.5 In addition, for some patients who undergo surgery, relative changes in respiratory rates are much greater than changes in heart rate or systolic blood pressure, thus, respiratory rates can be an important vital sign indicator.21 Respiratory rate is most accurately measured using transthoracic impedance plethysmography,1 nasal thermocouples20 or capnography.16 However, these methods all require expensive external sensors which may require donning a mask, nasal cannula or chest band sensors. More importantly, since these devices may disturb natural breathing and sleep positions, they are mostly applicable in constrained environments such as operating rooms and intensive care units.

Recently, photoplethysmography (PPG) has been widely considered for respiratory rate extraction due to its simplicity and non-invasive measurement capability.1113 The PPG signal contains components that are synchronous with respiratory and cardiac rhythms. Indeed, the respiratory rhythm is modulated by frequency and/or amplitude of the cardiac rhythm. The occurrence of temporal variations of frequency and amplitude is characteristic of the respiratory sinus arrhythmia.6 Thus, the respiratory rate can be obtained by detecting the presence of either amplitude modulation (AM) or frequency modulation (FM) in the PPG signal.2

Numerous advanced signal processing algorithms (both parametric and nonparametric approaches) have been applied to extract respiratory rates by looking for AM or FM signatures from a PPG signal.2,19 For a parametric approach, the autoregressive (AR) model approach has been shown to provide relatively good respiratory rate estimation.710 For nonparametric approaches, time–frequency spectrum (TFS) methods such as continuous wavelet transform (CWT) and variable frequency complex demodulation method (VFCDM) have also been shown to provide accurate respiratory rate estimation.2,1113

To our knowledge, respiratory rate estimation using the camera of either a smartphone or a tablet has never been demonstrated nor discussed in the literature. We have recently demonstrated that a pulsatile signal (PS) that has similar dynamics to that of a PPG signal can be obtained from a smartphone’s camera when a fingertip is pressed onto it.4,19 Utilizing these PS derived from an iPhone, we have also shown that accurate detection of atrial fibrillation can be made.17 Given these advances, the aims of this work were: (1) a systematic examination of the PS quality derived from a video camera from several measurement modalities including iPhone 4S, iPad 2, iPod 5, and Galaxy S3; and (2) to determine if accurate respiratory rates can be estimated directly from the PS of the different measurement modalities. The challenge here is that PPG signals are often sampled at greater than 100 Hz whereas most smartphones’ video sampling rates are no more than 30 Hz. Since previous studies have shown good estimation of respiratory rates using the AR model, CWT, and VFCDM from a PPG signal, we also use these methods to compare the accuracy of breathing rates from PS obtained from various models of a smartphone, MP3 player (iPod 5) and a tablet.

Methods

Data Collection

Data were collected on 10 healthy subjects on 2 separate occasions using 4 different devices: iPhone 4S, iPad 2, iPod 5, and Galaxy S3. Only two devices were used simultaneously for data collection in a given experimental setting. Worcester Polytechnic Institute’s Institutional Review Board approved the data collection technique. For the PS acquisition, we used the Objective-C programming language and the Xcode platform for iPhone 4S, iPad 2, and iPod 5; Java was used for the Galaxy S3 on the mobile platform Android 4.1 (Jelly Bean). Specifically, we used Eclipse IDE Indigo R2 for the development environment and Samsung Galaxy S3 for the development and debugging purposes. For the video recordings of iPhone, iPad, and iPod, we examined four different sizes of pixel regions: 50 × 50, 320 × 240 (QVGA), 640 × 240 (vertical HVGA), and 640 × 480 (VGA) for determining the optimal signal quality. For all five different pixel sizes, the PS was obtained by averaging the entire pixel size for each of the three color bands (red, green and blue) for every frame. All four devices provided sampling rate close to 30 frames per second. However, when the video sampling rate was lower than 30 Hz, a cubic spline algorithm was used to interpolate the signal to 30 Hz.

No subject had cardiorespiratory pathologies. All four devices were tested using the same subject, at the same location, and under the same test conditions. Data were collected in the sitting upright position, and the sensor was placed in proximity to the subject’s left index or middle finger as shown in Fig. 1. All subjects were instructed to breathe at a metronome rate according to a timed beeping sound, i.e., to start inspiring when a beep sound starts and to expire before the next beep sound occurs. The data were collected for breathing frequencies ranging from 0.2 to 0.9 Hz at an increment of 0.1 Hz. Prior to data collection, all subjects were acclimated to the breathing frequency rate being measured. Three minutes of data were collected for each frequency for each subject. Electrocardiogram (ECG) recordings were collected with an HP 78354A acquisition system using a standard 5-lead electrode configuration. A respiration belt was placed around a subject’s chest and abdomen to monitor the true breathing rate (Respitrace Systems, Ambulatory Monitoring Inc.). Respiratory and ECG recordings were obtained using the LabChart software (ADInstruments) at a sampling rate of 400 Hz. Figure 1 shows data collection on the four devices by placing a fingertip on the video camera.

Figure 1
figure 1

General scheme to acquire video from the four devices

Extraction of Respiratory Rates

VFCDM

Detection of AM and FM from a PS using the power spectral density (PSD) is difficult since the dynamics are time-varying, hence, require high resolution time–frequency spectral (TFS) methods to resolve them. We have recently shown that because the VFCDM method provides one of the highest TFS resolutions, it can identify AM and FM dynamics. Consequently, Fourier transform of either the AM or FM time series extracted from the heart rate frequency band can lead to accurate estimation of respiratory rates when the acquired signal is PPG data.23

Details concerning the VFCDM algorithm are described in Wang et al. 23 Hence, we will only briefly describe the main essence of the algorithm. The VFCDM starts with an assumption that a signal x(t) is considered to be a narrow band sinusoidal oscillation with a center frequency f 0, instantaneous amplitude A(t), phase \( \phi (t) \), and the direct current component dc(t), as follows:

$$ x(t) = {{dc}}(t) + A(t)\cos \,(2\pi f_{0} t + \phi (t)) $$
(1)

For a given center frequency, instantaneous amplitude information A(t) and phase information \( \phi (t) \) can be extracted by multiplying Eq. (1) by \( e^{{ - j2\pi f_{0} t}} \), resulting in the following:

$$z(t) = x(t)e^{{ - j2\pi f_{0} t}} = dc(t)e^{{ - j2\pi f_{0} t}} + \left( {\frac{A(t)}{2}} \right)e^{j\phi (t)} + \left( {\frac{A(t)}{2}} \right)e^{{ - j(4\pi f_{0t} + \phi (t))}}.$$
(2)

A leftward shift by \( e^{{ - j2\pi f_{0} t}} \) results in moving the center frequency, f 0, to zero frequency in the spectrum of z(t). If z(t) in Eq. (2) is subjected to an ideal low pass filter (LPF) with a cutoff frequency f c < f 0, then the filtered signal z lp(t) will contain only the component of interest and the following Eqs. (3a)–(3c) are obtained:

$$ z_{\text{lp}} (t) = \left( {\frac{A(t)}{2}} \right)e^{j\phi (t)} $$
(3a)
$$ A(t) = 2|z_{\text{lp}} (t)| $$
(3b)
$$ \phi (t) = \arctan \left( {\frac{{{\text{image}}(z_{\text{lp}} (t))}}{{{\text{real}}(z_{\text{lp}} (t))}}} \right). $$
(3c)

When a modulating frequency is not fixed, as described above, but varies as a function of time, the signal x(t) can be written in the following form:

$$ x(t) = {{dc}}(t) + A(t)\cos \, \left( {\int\limits_{0}^{t} {2\pi f(\tau )d\tau + \phi (t)} } \right) , $$
(4)

Similar to the operations in Eqs. (1) and (2), multiplying Eq. (4) by \( e^{{ - j\int_{0}^{t} {2\pi f(\tau )d\tau } }} \) yields both instantaneous amplitude A(t) and instantaneous phase \( \phi (t), \) as described in the following equation:

$$z(t) = x(t)e^{{ - j\int_{0}^{t} {2\pi f(\tau )d\tau } }} = {{dc}}(t)e^{{ - j\int_{0}^{t} {2\pi f(\tau )d\tau } }} + \left( {\frac{A(t)}{2}} \right)e^{j\phi (t)} + \left( {\frac{A(t)}{2}} \right)e^{{ - j\left( {\int_{0}^{t} {4\pi f(\tau )d\tau + \phi (t)} } \right)}}$$
(5)

From Eq. (5), if z(t)is filtered with an ideal LPF with a cutoff frequency f c < f 0, then the filtered signal z lp(t) will be obtained with the same instantaneous amplitude A(t) and phase \( \phi (t) \) as provided in Eqs. (3b) and (3c). The instantaneous frequency is given by:

$$ f(t) = f_{0} + \frac{1}{2\pi }\frac{d\phi (t)}{dt}. $$
(6)

The VFCDM method thus involves a two-step procedure. The first step is to use complex demodulation (CDM) or what we termed the fixed frequency CDM (FFCDM) to obtain an estimate of the TFS, and the second step is to select only the dominant frequencies of interest for further refinement of the time–frequency resolution using the VFCDM approach. In the first step of the VFCDM method, a bank of LPFs is used to decompose the signal into a series of band-limited signals. The analytic signals that are obtained from these, through use of the Hilbert transform, then provide estimates of the instantaneous amplitude, frequency, and phase within each frequency band.

CWT

As described in Introduction section, numerous studies1113 showed relatively good results using the CWT for extraction of respiratory rates directly from a pulse oximeter. The Morlet wavelet was also applied to a half-length of five samples at the coarsest scale for estimating the scalogram of the PS.22 The procedures of the CWT for extracting respiratory rates is nearly identical to the VFCDM in that identified AM and FM series are Fourier transformed to estimate respiratory rates.

AR Modeling

This approach involves estimation of AR model parameters using the optimal parameter search (OPS) criteria.15 The AR parameters are formulated as the transfer function followed by factorization into pole terms. The real and complex conjugate poles define the power spectral peaks with the larger magnitude poles corresponding to higher magnitude peaks. The resonant frequency of each spectral peak is given by the phase angle of the corresponding pole. Among the poles, we set the region of interest for respiratory rates between 0.15 and 1 Hz. The details of the respiratory algorithm using the AR model are described in Lee and Chon.7

Data Analysis

Using PPG signals with sampling rates of at least 250 Hz to derive heart rates has previously been shown to be a good alternative to ECG monitoring.14 However, sampling rates for most smart phone and tablet video cameras range from 25 to 30 Hz. Given these low sampling rates, it is necessary to determine the accuracy of the smart phone and tablet devices in estimating heart rates and respiratory rates. Comparisons of derived heart rates were made between the standard ECG, smartphones and tablets. We used our own peak detection algorithm to determine R-wave peaks from the ECG signals and cardiac pulse peaks from the phone camera PPG signal. Due to the frame rate variability, we interpolated the PS to 30 Hz using a cubic spline algorithm followed by the peak detection. The peak detection algorithm incorporated a filter bank with variable cutoff frequencies, spectral estimates of the heart rate, rank-order nonlinear filters and decision logic.

Three minutes of data sampled at 30 Hz were low-pass-filtered to 1.78 Hz, and then downsampled to 15 Hz. We performed the extraction of the respiratory rate on every 1-min segment of PS, and then the data were shifted by every 10 s for the entire 3 min of recordings, i.e., each 1-min dataset had a 50 s overlap. Thus, for each 3-min segment, we had thirteen 1-min segments to analyze for all methods to be compared. Thus, 3 min of data were sufficiently long to test the efficacy of each method but not too long in duration to fatigue the subjects as their task was to breathe on cue with a metronome-timed beep sound. For the VFCDM and CWT methods, for every 1-min segment, the initial and final 5 s of the TFS were not considered because the TFS has an inherent end effect which leads to inaccurate time–frequency estimates. For the CWT method, the lower and upper frequency bounds of the analyzed signal were set to 0.01 and 0.5, respectively. The filter parameters of the VFCDM were set to the first cutoff frequency F w = 0.03 Hz, second cutoff frequency F v = 0.015 Hz, and filter length N w = 64. We have previously shown that the parameter F v = F w/2, and that N w is chosen to be approximately half the data length. For each of these categories, detection errors were found for each frequency for all subjects using the four different methods. The error ε is calculated as follows:

$$ \varepsilon = \frac{{\sum\nolimits_{i = 1}^{n} {|R_{\text{D}}^{i} - R_{\text{T}}^{i} |} }}{n}, $$
(7)

where n is the number of 1-min segments, \(R_{\text{D}}^{i} \) and \(R_{\text{T}}^{i} \) denote the detected breathing rate and the true breathing rate of i-th 1-min dataset, respectively.

Results

Selection of the Best Color Band and the Optimal Video Pixel Size for Estimation of Heart Rates

Figure 2a shows the orientation of the Field of View (FOV) of each camera relative to the location of the camera flash. All references to “left” and “right” in this paper are relative to the camera FOV, regardless of whether the camera itself was on the front or rear of the device. Note that when a device’s front video camera is on, what is displayed in the LCD display of the device is a mirror image of the FOV of the front camera. The stored video will revert to the FOV view, but until the videotaping is complete, the display in the LCD of the device will be the mirror image of the actual front camera FOV. This is to match people’s expectations as they look in the display while photographing themselves. However, reversal in the display was not taken into account to avoid confusion, and because we used the video feed directly before it was processed for display on the device’s LCD.

Figure 2
figure 2

Example of different regions of iPhone 4S, iPad 2, iPod 5, and Galaxy S3. The top panel (Fig. 2a) represents the camera’s FOV and relative position of flash LED’s. The middle panel (Fig. 2b) shows the locations of the 9 50 × 50 pixel regions in the camera’s FOV. The bottom panel (Fig. 2c) shows the division of the FOV into left and right vertical halves, each of HVGA resolution

Figures 2b and 2c provide details of the video pixel regions examined on all four devices and they consist of the following 12 video regions: left top (LT), left middle (LM), left bottom (LB), right top (RT), right middle (RM), right bottom (RB), middle top (MT), center (C), middle bottom (MB), vertical left half-VGA (vertical left HVGA), vertical right half-VGA (vertical right HVGA) and VGA.

All results shown are based on average values from 10 subjects. When the flashlight was on (back camera displays for iPhone 4S, iPod 5 and Galaxy S3), the green color consistently provided significantly higher mean amplitude values than either the blue or red color.19 Table 1 shows experimental results of R–R intervals (RRIs) extracted from ECG and three-color band PS from an iPhone 4S. As shown in Table 1, the PS values from the smart phone are an excellent surrogate to RRIs derived from ECG for all colors. There was no statistical difference between RRIs derived from ECG and each of the three color PS; the median errors calculated using Eq. (1) are also very small for all three color band signals. Figure 3 shows the Bland–Altman plot for the mean HR data from the iPhone 4S (green color) and the ECG. The Bland–Altman plot shows a mean difference of 0.074 and that most of the data are within the 95% confidence intervals.

Table 1 Experimental results of heart rate extracted from ECG and three-color band signals obtained from iPhone 4S (Resolution: HVGA)
Figure 3
figure 3

Example Bland–Altman plot with a mean difference of 0.074 that shows the limit of agreement of 95% (dashed line is the mean difference ± the limit of agreement) between the continuous HR of a smart phone and the patient’s corresponding ECG signal

Having established that the green color signal provides the best signal amplitude values for an iPhone 4S, we now systematically investigate which pixel regions of the green color give the most optimal signal quality as determined by the largest amplitude values for all four devices. Specifically, nine different regions of 50 × 50 pixels, the left and right pixel regions of HVGA, and the entire VGA pixel region were investigated for the best signal quality. Table 2 shows the mean amplitude values of the green color pulse signal for different pixel regions of the four devices. For iPhone 4S, the left region of HVGA had the largest amplitude value among the twelve regions, as expected, since the LED flash is placed on the left side of the camera’s FOV (see Fig. 2a). For the iPad 2, the device was held vertically on a desk, in landscape mode, so we chose also to consider the FOV in landscape mode. In this case, the right side of the portrait mode FOV was turned to be on top, and the left side was on the bottom. The RT and RM regions of 50 × 50 pixels and the right region of HVGA have among the largest amplitude values since the light source was from the ceiling of the room, i.e. closest to the top in landscape mode. For the iPod 5, the LT and LM regions of 50 × 50 pixels and the VGA have the largest amplitude values. All left values exceed right values because the flash is on the left side of the camera’s FOV (see Fig. 2). For the Galaxy S3, the RT, RM and RB regions have the largest amplitude values among the twelve regions as expected since the LED flash is placed to the right of the camera’s FOV (see Fig. 2). Hence, depending on the location of the LED flash, the left or right HVGA or 50 × 50 regions of the green color PS have the highest intensity value among all regions tested.

Table 2 The mean amplitude values of the green color pulse signals with flash on except for iPad 2

Heart Rate, Frequency Spectrum and Power Spectrum

Figures 4a–4c show an example of a representative 1-min segment of iPhone 4S PS data, its TFS of the green band signal via the VFCDM, and the PSD of the AM and FM signals derived from the HR frequency band (e.g., ~1 to 1.5 Hz), respectively, while a subject was breathing at a metronome rate of 18 breaths/min. Note the similarity of the PS in Fig. 4a to those of commercially-available PPG signals. As shown in Fig. 4c, the PSD of the extracted AM and FM time series show the largest peaks at 0.3 Hz; these peaks correspond accurately to the true respiratory rate of 18 breaths/min.

Figure 4
figure 4

PPG signal, estimated instantaneous frequencies, and PSD. (a) Pulsatile signal,(b) Estimated instantaneous frequencies using VFCDM with prominent frequency oscillations seen near heart rate (1.3 Hz),and (c) PSD of PS signal

Respiratory Rate

The true respiratory rates were derived by taking the PSD of the respiratory impedance trace signals during metronome breathing experiments. True respiratory rates from the respiration trace and the estimated breathing rates from the green signal using both the FM and AM sequences from the VFCDM and CWT were compared using metronome rates ranging from 0.2 to 0.9 Hz. In order to evaluate the four computational methods, we provide figures and tables that show the accuracy and repeatability of each method as a function of the true breathing rate. For tabulating results, we grouped the results for 0.2–0.3 Hz together and designated them as the low frequency (LF) breathing rates. Likewise, the results for 0.4–0.6 Hz breathing rates were lumped together and designated as the high frequency (HF) breathing rates. Since the percentage errors were found to be not-normally distributed, we report the median and inter-quarter range (IQR) values.

Figure 5 shows the subjects’ variations of percentage detection error in the form of box plots for the left region of the HVGA pixel resolution with flash on since this region was found to have the best signal quality as shown in Table 2. The top and bottom panels of Fig. 5 represent results for the LF and HF breathing rates, respectively. The lower boundary of the box closest to zero indicates the 25th percentile, a line within the box marks the median, and the upper boundary of the box farthest from zero indicates the 75th percentile. Whiskers (error bars) above and below the box indicate the 90th and 10th percentiles. Therefore, the area of the blue box is an indication of the spread, i.e., the variation in median error (or IQR), across the population. These figures indicate how well the algorithms perform across the entire population. Red crosses represent the 5th and 95th percentiles.

Figure 5
figure 5

Median and IQR errors measured from iPhone 4S, iPad 2, Galaxy S3, iPod 5 when the flashlight was turned on and off. (a)–(e) represent LH (12 and 18 breaths/min) and HF (24, 30 and 36 breaths/min) breathing rates, respectively

As shown in Fig. 5, the AR model approach is the least accurate followed by CWT-AM, CWT-FM, and VFCDM (both AM & FM approaches) when we consider all breathing frequencies. Note that the variances of the median values as determined by ε [the average respiratory estimation error as defined in Eq. (7)] are significantly lower for both VFCDM and CWT than for AR model approach. Although there was no significant difference in the median error between CWT and VFCDM methods at 0.3 Hz, ε is found to be the lowest for VFCDM- FM at 0.2 Hz. In general, ε is larger for HF than LF breathing rates for all computational methods. For HF breathing rates, ε is lowest for CWT-FM, followed by VFCDM, CWT-AM, and AR model. While there is no significant difference in the variance between VFCDM-FM and CWT-FM, both methods have significantly less variance than either CWT-AM or VFCDM-AM or AR model. Thus, gauging the accuracy as defined by the median errors and their variances, as shown in Fig. 5, we observed that for HF breathing rates, CWT-FM consistently provides significantly lowest median errors and variance values.

As shown in Fig. 5, the subjects’ variation of percentage detection errors has been shown in the form of box plots, which were extracted from front cameras of an iPhone 4S and an iPad 2 (no flash), respectively, for the left HVGA region. While not shown, the left HVGA region also had the best signal quality with the flashlight off for an iPhone 4S. The AR model approach is the least accurate followed by CWT and VFCDM methods when we consider all breathing frequencies. For LF breathing rates, there was no significant difference in the median error between VFCDM methods. However, the variances of the median values as determined by ε are significantly lower for both VFCDM and CWT than for AR model approaches. In general, ε is larger in HF than LF breathing rates. For HF breathing rates, ε is lowest for CWT-FM, followed by VFCDM, CWT-AM, and AR model. While there is no significant difference in the variance between VFCDM-FM and VFCDM-AM in LF breathing rate, median errors of VFCDM-FM are significantly lower than that of VFCDM-AM. Thus, gauging the accuracy as defined by the median errors and their variances, as shown in Fig. 5, we observed that for both LF and HF breathing rates, CWT-FM consistently provides the lowest median errors and variance values.

Figure 5 also shows the subjects’ variation of percentage detection error in the form of box plots, which were extracted from front cameras of a Galaxy S3 and an iPod 5, respectively, both from the 50 × 50 pixel resolutions in the LT for the former and LM regions for the latter. The AR model approach is the least accurate followed by CWT and VFCDM methods when we consider all breathing frequencies. For LF breathing rates, there was no significant difference in the median error between VFCDM methods. However, the variances of the median values as determined by ε are significantly lower for both VFCDM and CWT than for AR model approaches. ε is larger in HF than LF breathing rates. For HF breathing rates, ε is lowest for CWT-FM. While there is no significant difference in the variance between VFCDM-FM and VFCDM-AM in LF breathing rate, median errors of VFCDM-FM are significantly lower than that of VFCDM-AM. Thus, gauging the accuracy as defined by the median errors and their variances, as shown in Fig. 5, we observed that for both LF and HF breathing rates, VFCDM-FM most often provides the lowest median errors and variance values.

Table 3 shows the numerical statistics (IQR) for the “repeatability” across the population of test subjects. The results for 0.2–0.4 Hz (LF breathing range) breathing rates are much better than for 0.5–0.6 Hz (HF breathing range), and in addition, the tracking ability of the breathing rate detection method is much better when CWT and VFCDM methods are used for the LF. Even though the AR method shows significantly lower values of IQR errors than all the other methods studied here, the AR method provided relatively high median errors. For each of the four different devices, the VFCDM-FM method has significantly lower IQR errors (ε < 7) and median errors (ε < 6) than those of any other devices in the 0.2–0.4 Hz breathing rate range.

Table 3 Population statistics for IQR detection errors for each method

ANOVA and the Bonferroni t test were used for analysis of differences between the medians for the seven different methods. Statistical significance was identified as p < 0.05. Tables 4 and 5 provide a summary of the statistical analysis comparing the performance of the five methods (AR, CWT-AM, CWT-FM, VFCDM-AM and VFCDM-FM) to each other. For Tables 4 and 5, we list only those comparison that show significant difference among the five computation methods for each device for both LF and HF breathing ranges. Regarding accuracy, for both LF and HF breathing ranges, the tables show that the AR is significantly less accurate than either the AM or the FM version of the CWT and VFCDM methods for all four devices. Further, FM of CWT and VFCDM are significantly more accurate than their AM counterparts for all four devices but only for the HF breathing ranges. The repeatability values shown in Tables 5 are similar to the accuracy results. For example, for both LF and HF breathing ranges, the AR is significantly less repeatable than either AM or FM of CWT and VFCDM methods for all four devices. For the HF breathing range, FM of CWT and VFCDM are significantly more repeatable than their AM counterparts for all four devices.

Table 4 Statistical significance (accuracy) among the five methods for four devices
Table 5 Statistical significance (repeatability across time) among the five methods for four devices

Table 6 summarizes these measures of median and IQR errors for 0.7, 0.8, and 0.9 Hz breathing rates—rates above what we termed HF rates. As presented numerically in the table, we observe that WT-FM provides the lowest median error at the 0.7 Hz breathing rate, and might be acceptable. However, no method provided reasonably good estimates of breathing rates above the 0.7 Hz breathing rate.

Table 6 Accuracy as determined by median errors at 42, 48, 54 breaths/min (iPhone 4S, flashlight: On). The error values listed for each method represent breaths/min

Figure 6 shows the subjects’ variation of percentage detection error in the form of box plots extracted from a rear camera (with flashlight on) of an iPhone 4S during spontaneous breathing. True respiration rate was found by computing the PSD of the impedance respiration trace signal and finding the frequency at the maximum amplitude using a respiration belt. The variances of the median values as determined by ε are significantly lower for both VFCDM and CWT than for the AR model approach. In the normal range (11–27 breaths/min), VFCDM-FM consistently provides the lowest median errors and variance values. As shown in Table 7, there was no significant difference in the median error among WT-AM, WT-FM, VFCDM-FM, and VFCDM-AM during spontaneous breathing, the accuracy of AR is lower than other approaches.

Figure 6
figure 6

Spontaneous respiratory rate

Table 7 Statistical significance (accuracy and repeatability across time) among the five methods for spontaneous respiratory rate

In general, the ability of the methods to provide consistent results is especially excellent (highest) for both the CWT-FM and VFCDM methods, for both LF and HF breathing rates. As with the accuracy results, the repeatability is also better for the LF than for the HF breathing rates for all four methods. Both CWT-FM and VFCDM provide significantly more repeatable results than either CWT-AM or AR model.

Computation Time

Table 8 shows the computational time for heart rate extraction based on the choice of pixel resolution and the number of color bands used. As shown in the table, pixel resolutions of QVGA and HVGA result in a frame rate of 25 frames/s when only one color is selected. The frame rates extracted from two and three colors are 23 and 20 frames/s, respectively, in the case of HVGA resolution.

Table 8 Computation time of heart rate extracted from color band signal of iPhone 4S depending on different resolutions

The clock speed of the CPU used in the iPhone 4S and iPod 5 is 800 MHz. The latest iPhone 5 is clocked at 1.02 GHz. The recently released Samsung Galaxy S4 is equipped with a 1.9 GHz Quad-core processor. Thus, for most new smartphone and tablet cameras, higher than 30 frames/s can be achieved, suggesting that a choice of higher pixel resolution will not be a significant problem for accurate and real-time detection of heart rates and respiratory rates.

Discussion

In this work, we tested several smartphones and tablets for their feasibility in estimating respiratory rates using the PS derived from a resident video camera and flashlight, when available. The motivation for this work is based on several recent works which showed that accurate respiratory rates, especially at normal breathing rates, can be obtained from pulse oximeters.1113 The characteristics of the PS derived from cameras in smartphones and tablets are similar to PPG signals, hence, similarly-accurate respiratory rates can be obtained, theoretically. Our results do indicate that certainly for normal breathing ranges (0.2–0.3 Hz), this is feasible from PS derived from smartphone and tablet video cameras.

We have optimized the accuracy of the respiratory rates by first systematically analyzing the optimal pixel resolution of the video signal for the attainment of the strongest PS strength. It is logical to assume that the greater the amplitude of the PS, the higher the signal’s strength with the proviso that care is taken to minimize motion artifacts during measurements. Our results showed that a choice of larger pixel resolutions does not necessary result in higher PS amplitude. For example, for the Galaxy S3, iPod 5 and iPad 2, 50 × 50 resolution provided either the highest pulsatile amplitude or was statistically equivalent to HVGA resolution. In fact, HVGA resolution was the best choice only for the iPhone 4S. The important implication of having a smaller pixel region providing just as good or better signal quality than a larger pixel region is the significant reduction in the computational time so that real-time calculation of respiratory rates can be attained.

Commercial pulse oximeters in either transmittance or reflectance mode normally employ a single photodetector (PD) element, typically with an active area of about 6–10 mm2. The image sensor size of the iPhone 4S is 4.54 × 3.42 = 15.5268 mm2. Consequently, when signals are extracted from HVGA (320 × 480 pixels) video mode, the active area is 2.27 × 3.42 = 7.7634 mm2. Hence, we initially thought that motion artifact and noise can be significantly reduced by increasing the active area in the sensor. However, our investigation revealed that larger pixel resolutions do not necessary result in a higher signal-to-noise ratio.

We compared AR-based approaches, CWT, and VFCDM for respiratory rate estimation from smartphones and a tablet because these techniques have been shown to provide good results from PPG signals. Similar to PPG signal results, the VFCDM-FM provided the most accurate respiratory rate estimation with the fastest computational time than any of the methods compared in this study for the LF breathing rate. For HF breathing rates, both CWT and VFCDM methods provided comparable results. The CWT approach using either the FM or AM signals fared better than the AR method but at the expense of higher computational time.

Due to the inherent non-stationarity in the respiratory rate, a time–frequency method is needed and appears to be the most appropriate approach. Another advantage of the TFS approach to estimating respiratory rates is that unlike most filtering approaches, tuning of a number of parameters specific to each subject is not required. Note that in our work, we have used the same parameters (as described in Methods section) for both CWT and VFCDM for all subjects and for all breathing rates.

As was the case with respiratory rate estimation using the PPG signal, the computational speed of the VFCDM method is faster than that of the wavelet method for smartphone and tablet data. The average time to calculate the respiration frequency using the VFCDM method was found to be around 1.4 s, while using the wavelet method took 37.8 s on average (programs running on MATLAB R2012a). The AR spectral method was the fastest as it took only 0.2 s on average using MATLAB, and this computation time includes the time needed to calculate the model order based on an initial model order selection of 50. However, the AR method is the least accurate in respiratory rate estimation.

All three methods showed increased estimation errors as the breathing rates increased, for all devices tested. This observation was also noted for the PPG signal.2 We have also examined breathing rates of 0.7 Hz, 0.8 Hz and 0.9 Hz, and the results showed significant deviation from the true breathing rates for all 3 methods. Both CWT and VFCDM methods provided comparable results with significantly worse estimates for the AR method which was also the case with both LF and HF breathing rates. Hence, our results show that it is feasible to obtain good results for the normal breathing rates but not higher breathing rates (i.e., >0.5 Hz). We can speculate that there are two reasons for inaccurate results for high breathing rates. First, detection of both AM and FM phenomenon requires persistent oscillations for several cycles, but with faster respiratory rates, our decision to limit the data segment to 1 min may not be sufficient. More importantly, with faster breathing rates, the AM or FM phenomenon becomes less apparent, and thus, it becomes more difficult to detect them no matter how sophisticated the detection may be.

We have not considered the device-to-device variations in obtaining respiratory rates. However, we do not believe this is a concern because the specifications of the camera from one device to another device is tightly controlled by the phone manufacturers and hence should not vary at all, and if so, it should only be a minimal amount. Moreover, the pixel resolutions of the examined cameras have more than sufficient resolution to resolve pulse changes, hence, small variations in the pixel resolution, if they occur, should not really affect the respiratory rate estimates. Thus, we believe the device-to-device variation is minimal or not at all, thus, it should not affect the respiratory rate estimation.

In summary, our work was undertaken to determine the optimal pixel resolution and location as well as the color band for obtaining the best quality signal so that we maximize the accuracy of respiratory rate estimation from a video signal from either smartphones or tablets. It was found that a larger pixel resolution does not necessarily result in better signal quality. In fact in most scenarios, a 50 × 50 pixel resolution was just as good as or better than HVGA resolution. In addition, we found that the region closest to the flash in most cases resulted in a higher signal quality which is logical and expected. Finally, using the optimum pixel size, location and color band of the PS, we found accurate respiratory estimates especially in the normal breathing ranges. We found increased breathing rate estimation errors as the respiratory rates increased higher than 0.5 Hz with unreliable results at 0.6 Hz or higher. When both computational time and estimation accuracy are taken into account, the VFCDM-FM provided the best results among all approaches examined in this work. This work allows attainment of at least two vital sign measurements all directly from a finger pressed onto a video camera of either a smartphone or tablet: the heart rate and respiratory rate. It is expected that future work by either our laboratory or others will result in additional other vital sign capabilities directly from a video signal acquired from either a smartphone or tablet.