Keywords

1 Introduction

In physiology, respiration can be defined as the two directional exchange of gases. Oxygen is delivered from the outside air to the cells in tissues and carbon dioxide is transported from cells to the outside air. The exchange of gases is caused by differences in the pressure between lungs and surrounding atmosphere. During the inspiration (inhalation), air enters the lungs, because the air pressure in lungs (within the alveolar spaces) is lower than the atmospheric pressure. When the air pressure becomes higher than the atmospheric pressure the expiration (exhalation) is observed. Therefore, the breathing process can be monitored by the observation of two fundamental activities: mechanical changes of chest/abdomen volumes and airflow changes in nose/mouth regions . Such observations can be described using quantitative parameters representing properties of the breathing process: respiration rate , respiration regularity, presence and length of apnea events, etc. Respiration rate (RR) can be defined as the number of breaths for one minute or “breaths per minute” (bpm). Respiratory rate can characterize the breathing process indicating if the respirations are normal, too fast (tachypnea), too slow (bradypnea), or nonexistent (apnea). However, threshold values that could be used in differentiation of different categories of abnormal respiratory rates are sometimes defined for different categories of subjects. For example, normal respiration rate is changing with age. Therefore, some organizations propose tachypnea threshold values as [1, 2]:

  • Newborn to 2 months: 60 bpm

  • Infant 2 months–1 year: 50 bpm

  • Preschool Child 1–5 years: 40 bpm

  • School age Child: 20–30 bpm

  • Adults: 20 bpm.

Respiratory rate is one of three fundamental vital signs (body temperature , heart rate and RR) and it is a very important parameter indicating potential health problems. For example, the value of the RR above 27 bpm could be a predictor of cardiac arrest [3]. The increased RR is used in the prediction of pneumonia [4] or for the prediction of lower respiratory tract infection [5]. In basic epidemiology, WHO’s guidelines recommend that pneumonia case detection can be based on clinical signs alone, mainly respiratory rate [6]. It has been also shown that the respiratory rate is more discriminatory between stable and unstable patients than pulse rate [3]. Cretikos et al. [7] specified many recommendations about the measurement of respiratory rates for patients staying in hospitals. For example, they claimed “the respiratory rate should be measured and documented accurately in all hospital patients at least once a day, and should always be documented when other vital signs are measured”.

Apnea is defined by the cessation of respiratory airflow and it is especially dangerous during sleep. The length of time required to classify the cessation of respiratory airflow as a true apneic episode is measured in seconds [8], e.g., >10 s for Central Sleep Apnea [9].

Respiration regularity is characterized by periodical appearance of inspiration/expiration events and similar amplitudes (depth) of those events. Abnormalities in respirations may occur in rate, rhythm, and in the effort of breathing. Different respiration patterns have been observed for some illnesses or injuries, including [10, 11]: Cheyne-Stokes respirations, Biot’s breathing, Kussmaul’s respirations, Apneustic respirations, and Ataxia respirations. Cheyne-Stokes respirations are characterized by periods of respirations, during which breathing gets progressively deeper and then gets progressively shallower (crescendo–decrescendo pattern). Similar series of variable in depth breaths are separated by periods of significant apnea (Fig. 1a). This respiration pattern can be a result of strokes, brain tumors or injuries, carbon monoxide poisoning, high altitude sickness and can be observed as a side-effect of morphine administration. Biot’s breathing (or cluster respiration) pattern has clusters of similar rapid respirations separated by apnea periods (Fig. 1b). It could be also a result of stroke or trauma. Kussmaul’s respirations are characterized by deep and fast breathing (hyperventilation) (Fig. 1c). It is typically observed in the late stages of a severe metabolic acidosis, for example in diabetic ketoacidosis. Prolonged inspiration and expiration phases are observed in apneustic respirations (Fig. 1d). The prolonged expiration phase and following pause phase are interpreted as apneic phases due to long cessation of air inflow. It is commonly caused by some damages in central nervous system (CNS). Finally, ataxia (chaotic) respirations constitute a very irregular respiration pattern with irregular pauses and increasing episodes of apnea (Fig. 1e). It could be caused by damages in CNS, typically to the medulla oblongata.

Fig. 1
figure 1

Respiration patterns: a T1—Cheyne-Stokes respirations, b T2—Biot’s breathing, c T3—Kussmaul’s respirations, d T4—Apneustic respirations, e T5—Ataxia respirations

Parametric description of respiration patterns should represent changes in rate and the depth of breathings and should describe the presence and timing of apnea events. Different methods have been proposed in literature for the monitoring and description of respiration-related parameters. In this chapter, we are focusing on the application of thermal imaging for remote monitoring of respiration rhythm. As it was presented earlier, respiration activities can be analyzed observing mechanical changes of chest/abdomen volumes or airflow changes in nose/mouth regions. Both categories of changes can be usually recorded using thermal cameras and analyzed to present respiration waveforms (patterns) and related parameters. This will be described in the following sections of this chapter.

2 Review of the Current State of the Art

2.1 Respiratory Rate Estimation and Respiration Patterns Analysis

Thermal imaging has been often used to analyze different dynamical changes that could be observed in medical diagnostics or treatment . Some examples include: wound healing [12, 13], support in cardiosurgery interventions [14, 15], detection of tumors [1618], and many other [19, 20]. Thermal imaging was also applied to the monitoring of respiration activities.

In [21] authors used a narrow band-pass filter to analyze thermal recordings. The side-view technique was used observing breathing-jet dynamics in the volume of interest (or region of interest , ROI, in a frame) close to the nostrils or mouth. For the ROI of each frame the average value was calculated and normalized in reference to the mean and standard deviation. The autocorrelation sequence was calculated for the extracted and filtered thermal waveform. Finally, the Fourier Transform was applied and power density spectrum (PDS) was calculated. The frequency for the dominated peak in PDS was used as a breathing frequency. The method was experimentally verified with the participation of 9 subjects (19 thermal clips). Results showed good correlation between breathing rate measured with the reference system (respiratory belt with piezo-strap transducer). In both methods the medium wave infrared (MWIR) camera was used (Focal Plane Array, FPA, resolution 640 × 512, 120 fps, 55 fps used in experiments, sensitivity 0.025 C).

Similar measurement technique was presented in [22]. Statistical methodology was used to label thermal video frames as expiratory or nonexpiratory. In the training phase the variant of K-means clustering algorithm was used to cluster “hot” pixels (expiratory) and “cold ” pixels (nonexpiratory) in first, M frames of the thermal video. Hot and cold pixel values were modeled using normal distributions with separate parameters (e.g. a mean) for expiratory (hot) and nonexpiratory (cold) pixels. In each iteration the statistical distance was calculated to expiratory (D e) and nonexpiratory (D n) distributions from the previous step. The Jeffreys divergence measure was used with the smallest distance as a criterion. The parameters of the winning distribution were updated using averaging operation. In the testing phase each analyzed pixel in the ROI was modeled as a mixture of two distributions: D e + D n. Initially, both distributions were equiprobable. In next iterations (t > 0), the current distribution is compared to previous, existing expiration and nonexpiration distributions using the Jeffreys divergence measure and minimal distance criterion. Finally, pixels in the ROI are labeled as expiratory or nonexpiratory and frames are also labeled accordingly. Breathing rate is calculated by counting the labeled framed for each breathing cycle. The method was verified during experiments with 3 subjects (8 thermal clips) for 3 different sizes of ROI. The results showed that the medium size ROI (21 × 9 pixels) outperformed other ROI sizes. The achieved accuracy for the small number of subjects was 96.43%.

Later, the same group [23, 24] used the thermal sequences recorded collinear to the subject’s face. They proposed the use of wavelet transformation on the resampled and normalized thermal signal to analyze it at different scales. It was assumed that the breathing component exists at a scale S max, which is identified for the local maximum of the wavelet energy coefficients. The frequency, f c , that maximizes the transform for mother wavelet is used to calculate the estimated respiration rate:

$$eRR*S_{ \hbox{max} } = f_{c} *\delta ,$$
(1)

where: eRR—estimated respiration rate , δ—downsampling factor (= 10 fps).

Experiments with the participation of 20 subjects [24] were performed using the same MWIR thermal camera as described for previous works. The mean of the absolute normalized difference between values obtained using the thermal imaging method and using the thermistor was 1.73% (accuracy 100 − 1.73% = 98.27%). The method was also used in experiments with pathological subjects [23].

Many papers of the same group (e.g., [2426]) present the problem and possible solutions for the automatic tracking of the nostrils or mouth ROI. Some methods will be described in the next section.

Abbas et al. [27] proposed to use the long wave infrared (LWIR) camera underling that in this range (7–14 μm) the emitted energy dominates the total signal and it is better to measure absolute or relative object irradiance or radiance. The proposed data acquisition and processing method was similar to previously proposed methods by Pavlidis et al. It extracts the respiratory waveform for the ROI of the nostrils, performs filtration and the wavelet transform. This is probably the first time that the method was used for the remote monitoring of neonates. The method was applied for 5 subjects extracting RR from the ECG signal as a reference. The mean absolute error was 1.32 bpm.

The automatic detection of respiration-related ROIs in thermal sequences was proposed by Pereira et al. [28]. First, the face image was segmented using three stages: multi-level Otsu thresholding. Next, background noise was removed and the largest area in the binary image was assumed to be a face region, The final stage was focused on finding the chin contour using method described in [29] and selecting the ROI after detection of nose edges with the use of Canny edge detector. The values in each ROI of the thermal video were averaged producing the digital respiration waveform. After band-pass filtration the adaptive short analysis window w, was applied to the signal to estimate the local breath-to-breath interval. Three estimators were used to calculate these local intervals: adaptive window autocorrelation, adaptive window average magnitude difference function, and maximum amplitude pairs. The adaptive window autocorrelation method calculates the correlation between m interval samples to the right of the analysis window w[v] and to the left w[v − m] of the center of the window w[0]. The second estimator locates the absolute difference between samples. The last estimator is a version of a peak detector. It calculates the maximum amplitude of any two samples. It reaches its maximum if two peaks (in a distance of m) are included in the analysis window w. In experimental verification the LWIR camera was used (resolution 1024 × 768, sensitivity 0.05 K, 30 fps). Eleven volunteers participated in the study. The reference measurement was preformed using piezo plethysmograph. The average breathing rate error for the experiment without user movements was 0.33 bpm with the mean error spread 0.71 bpm.

The general method for the monitoring of respiration with a thermal imaging system was also described in the US Patent Application Publication (Xu, US2012/0289850 A1) [30]. In the method temperatures of extremities of the head and face (nose, mouth ) are used to locate features , which are associated with respiration. RGB values of pixels related to those features are tracked over time to generate pattern of respiration. The respiration rate could be determined from the pattern of respiration counting peaks over pre-defined period of time. Two methods are mentioned: Fourier analysis and peak and valleys detectors. However, only general methods are mentioned. In the document authors underline the use of R, G, B channels suggesting that R and G channels are more important since these channels are “associated with warmer temperatures” and “exhaled air is warmer”. It assumes that the use of colorful (pre-processed) images with the mapping of temperature values to colors. This is not a case in most other methods operating on single matrixes with measured temperature values or intensities of radiation. Color is not a carrier of information in this case, but it is used for the visualization. Additionally, the dynamics of respiration waveform depends on the temperature gradient, so inhalation and exhalation phases are both important and can be monitored as a change of temperature with the sign depending on the ambient temperature.

Lewis et al. [31] used similar methodology to estimate respiratory rate detecting the frequency with the greatest spectral density after Fourier transformation of the average respiration signal obtained for the ROI of the nostrils . However, authors additionally proposed the “integration of the thermal time series generated a transformed time-series, which contained a component assumed to be linearly related to tidal volume”. The cubic-polynomial filter was used to remove sources of variance in thermal time series (existing due to thermal noise). The estimates were compared to results of the reference method, which was the LifeShirt inductance plethysmograph. Two LWIR cameras were used: TVS-700 with the resolution of 320 × 240 (sensitivity 0.08 C) and SC-6000 with the resolution 640 × 480 (sensitivity 0.02 C). Sampling rate was about 30 fps for both cameras. Thermal sequences were measured for 12 subjects with the TVS-700 camera and for 6 subjects with the SC-6000 camera. Similar mean, within-subject correlations were obtained (≥0.90) between results generated for thermal-based data (eRR an relative tidal volume) and for the reference system.

In a series of papers, AL-Khalidi et al. [3235] presented the similar methods of respiration rate estimation by monitoring of skin surface temperature variations in the area located around (centered) the tip of the nose. This round area (circle, ellipse) was divided into eight segments. Pixel values in each segment were averaged for each frame. As a result, 8 signals were obtained and filtered using low-pass filter (5th order Butterworth filter) with the cutoff frequency of 1 Hz. The respiration rate was estimated calculating the average of distances between peaks. The validation of the method was performed experimental with the participation of 20 children. High correlation was obtained (R 2 = 0.994) between the thermal aiming method and the standard respiratory monitoring method.

Hanawa et al. [36] proposed similar breath detection system using the FWIR camera (NEC/Avio, TH7102MX, resolution 320 × 240, sensitivity 0.06 C, 30 fps). The camera detects the temperature change at the nasal hole caused by respiratory activities. Authors focused on the practical use of the system, analyzing different factors that can contribute the results: head rotation, the distance between camera and human, and camera angle. Their used templates ROIs representing nasal cavity area that were extracted from the first frame. Templates differed in size of the rectangle. Template matching was then performed on each frame of the video recording. In all detected regions the average temperature was calculated. Thresholding operation was applied to calculated average temperature values to detect frames that indicate breaths. The method was verified with the participation of 5 subjects. Participants counted their breath during the experiment. The mean absolute error was about 0.12 bpm. In their later papers [37, 38] they focused on nasal cavity detection methods that are described in the next section.

2.2 Facial Tracking Methods for the Estimation of Respiratory Rate

Recently, research of face recognition has rapidly expanded because of a wide range of possible applications . Face and facial features detection is a first step in many automatic face-processing systems [39], not only in computer vision communication or access control systems, but also in medicine. However, face detection is a quite challenging task because it may suffer from lots of variations of image appearance. These variations include environment influence, for example illumination conditions and object influence like facial expressions or pose variation. Regardless of a kind of image (formed for different ranges of electromagnetic spectrum) many novel solutions were proposed in literature to resolve object influence variations, like the template-matching methods [40], the feature invariant approaches [41] or the appearance-based methods [42]. Some solutions for eliminating typical problems (the effect of the background and of disturbances caused by the haircut) were also described by Marzec et al. [43]. Nevertheless, coping with environment influence variations in visible light images is not straightforward and majority of the existing solutions are not robust enough to be used for face detection in visible light in uncontrolled environments [44, 45]. Whereas thermal infrared (IR) record the temperature distribution making them insensitive to variance in illumination conditions [44]. It makes thermal images processing solutions really attractive for various applications.

Considering non-contact estimation of respiration rate , there is a need to detect and track facial features automatically [32]. Some approaches have been already proposed and described for detecting and tracking human face and its characteristic points in thermal images. Many of them are threshold-based methods or at least use binarization in preprocessing stage [32, 4346], utilizing the fact that face has intensities higher than other regions [43]. Different ideas for proper temperature value determination were performed and described in [43]. Some of them were not satisfactory, but in some cases setting the threshold to 28.3 °C allowed eliminating most of the problems related to the background and haircut. Al-Khalidi et al. [32] proposed to use image processing techniques that include segmentation and median filter to enhance the recorded thermal images and remove unwanted noise. The segmentation stage consisted of thresholding and edge detection . Then, the nostril ROI was identified by extracting two warmest regions (points where eye corners meet the nose) within selected boundary (region between the bridge pointed by the nose and inner of each eye). The results indicate that the ROI had not been successfully located in a very small percentage of images (in almost all cases failure was less than 1%). Another method for detecting face in thermal images was described by Bhattacharjee et al. [46]. The preprocessing phase involved binarization of acquired image, marking face area and its centroid. After this phase specific facial features were extracted and classified using two techniques: Haar Wavelet Transform and Local Binary Pattern.

Some other approaches are based on Haar-like features [39, 44], which are descriptors of the local appearance. These features are the main concept of the Viola–Jones algorithm that is often used because of its high efficiency and precision. In [44] authors proposed an automatic eye localization method from long wave infrared images . Described method included eyeglass detection based on a Support Vector Machine classifiers trained from eyeglass features vector. Before eye localization, the face region was firstly detected according to intensities difference between this region and the background. Intensity variations of specific facial regions were described by Haar features. Proposed algorithm allowed achieving accurate rate of eye localization around 85%. Similar methodology was used by Mostafa et al. [39]. In the presented approach Haar features and AdaBoost algorithm were used to model a local texture around a given facial feature and create texture based model. The classifier was learned from labeled examples and used to detect a face. The face recognition process was performed by using nearest neighbor classifier in feature space defined by three signature extraction approaches: LBP, SIFT and Binary Robust Independent Elementary Features (BRIEF). Presented results indicate that thermal images have better performance under different illumination conditions but worse under expression variation. It is better to solve object (expressions) variations in visible images as geometric and appearance features in thermography are more blurred [44]. Different approach takes advantage of temperature distribution together with some considerations about face symmetry [43]. This analysis allows determining characteristic facial points on thermograms and applying specially prepared pattern to it. As a result, head orientation may be determined with satisfactory accuracy .

In order to estimate the respiration rate in mobile conditions , face detection and tracking algorithm should be able to run in real time. Although some methods have been already proposed for detecting face and its features in thermography , the time of processing one frame has not been specified in most of them. However, this parameter is required in order to determine whether the computational performance of the methods allows to run robustly in real time while achieving reliable feature detection.

3 Analysis of Respiration Waveforms

3.1 Heat Flow Near Nasal Cavities

Temperature differences observed at the nostrils or mouth level are a result of heat flow caused by respiration activities and several components describing the local environment. Abbas [27] described the total heat flow rate that is related to one respiration cycle inside the nasal cavity at the nostrils level, as:

$$Q_{\text{RR}} (t) = Q_{\text{conv}} (t) + Q_{\text{rad}} (t) + Q_{\text{perf}} (t) + Q_{\text{evap}} (t) + Q_{\text{other}} (t),$$
(2)

where:

Q conv(t):

convective heat flow related to airflow in nasal cavities, proportional to the temperature difference between the body (nasal cavity tissue) and the environment.

Q rad(t):

the radiation heat flow.

Q perf(t):

heat flow as a result of blood perfusion /flow.

Q evap(t):

heat flow caused by evaporation at the nasal surface.

Q other(t):

other, secondary heat flow/loss sources

The convective heat transfer is a result of temperature differences between the body (nasal cavity tissue) and the environment:

$$Q_{\text{conv}} (t) = k \cdot \left( {T_{\text{e}} (t) - T_{\text{nc}} (t)} \right) \, \cdot A_{\text{nc}} = - k \cdot \Delta T(t) \cdot A_{\text{nc}} ,$$
(3)

where: k is the heat transfer coefficient, T e is the local environment temperature, T nc is the temperature of nasal cavity tissue, A nc is the internal surface area of the nasal cavity.

The net radiation loss rate at the nostrils region can be described by

$$Q_{\text{rad}} (t) = \varepsilon \cdot \sigma \cdot (T_{\text{nc}}^{4} {-}T_{\text{e}}^{4} ) \cdot A_{\text{nc}} ,$$
(4)

where: ε is the emissivity of the nasal tissue , σ = 5.6703 10−8 × (W/m2 K4) is the Stefan-Boltzmann Constant.

The heat flow related to blood perfusion can be usually treated as a distributed heat source:

$$Q_{\text{perf}} (t) = \omega \cdot \rho_{\text{b}} \cdot c_{\text{b}} \cdot ( 1- k)\cdot\left( {T_{\text{a}} (t) - T_{\text{nc}} (t)} \right),$$
(5)

where: ω is the perfusion rate (volumetric flow rate of blood per volume of tissue), ρ b is the blood density factor, c b is the specific heat capacity of the blood, k < 1 is the factor representing the incomplete thermal equilibrium between blood and tissue; T a is the arterial blood temperature .

The overall heat flow is therefore mainly related to changing air temperature and blood perfusion. The initial state or hypothetical steady state can be defined here as a lack of airflow due to the apnea (cessation of respiratory actions). Inspiration or expiration actions result in airflow that depending on values of parameters in Eqs. 35 and can produce observable changes of intensity of radiation or temperature in nasal/nostrils region of interest (ROI) . Due to dynamic character of this process the observable intensity (I(x, y, t)) or temperature (T(x, y, t)) change is a function of time and location. In Fig. 2 examples of thermal images are presented, taken during inspiration (a) and during expiration (b). There is an observable difference of temperature distributions between both thermograms visible in the highlighted region of the nostrils .

Fig. 2
figure 2

Examples of thermal images taken during inspiration (a) and during expiration (b). Color legend is presented in (c). The ambient temperature was 25 °C

The heat flow dynamics at nostrils or mouth levels can be observed in measured sequences of thermal images. These sequences are further processed to extract respiration-related waveforms (signals).

3.2 Data Acquisition and Preprocessing

Sequences of thermal images are recorded using thermal camera , usually using LWIR detectors. The goal of the presented work was to evaluate accuracy of respiration rate analysis using small and portable thermal cameras that can be embedded in smart glasses. Under the eGlasses platform we are developing the experimental smart glasses platform that is dedicated to research activities. It can be easily modified, for example, different electronic modules can be changed; it is possible to print another cover using 3D printer, add sensors or electrodes, change the display, etc. The current prototype of eGlasses uses OMAP 4460 processor with 1024 × 768 transparent display (Elvision Company), 1 GB RAM, 5 MPx camera, WiFi and Bluetooth 4 wireless interfaces, additional sensors (accelerometer, gyroscope, magnetometer, etc.), eye-tracker and extension slots. The Android 4.1 OS and Linux Ubuntu OS have been already tested. For the goals of this work two thermal cameras were used: the TAMARISK 320 LWIR camera and FLIR Lepton LWIR camera module. The first camera, TAMARISK 320, has a spatial resolution 320 × 240, sensitivity <50 mK and was connected using the frame grabber. The second camera, FLIR Lepton, has smaller spatial resolution 80 × 60, sensitivity <50 mK, has a 14-bit dynamics and was connected using SPI (Serial Peripheral Interface) interface with the use of specially designed electronic circuit. Figure 3 presents both cameras located is frames of two prototypes of the eGlasses platform.

Fig. 3
figure 3

Smart glasses with thermal cameras : a TAMARISK 320, b FLIR Lepton module

In the experimental studies it was assumed that thermal cameras are observing subjects from short distances (<1.1 m) with at least partially visible nostrils . Thermal sequences were recorded for several groups of healthy volunteers. Measurements took place in laboratory rooms at ambient temperature between 23–27 °C. All subjects were asked to rest and not move during the experiment. Thermal images were recorded during 60 s with the sampling frequency (f s or frames per seconds, fps) set to about 25 Hz (frame grabber, TAMARISK 320) and 13 Hz (Lepton). In parallel, during all experiments, respiration activities were additionally monitored using the respiration, pressure belt (Vernier RMB).

The first step of data preprocessing was the extraction of intensity of radiation changes that could represent respiration changes. Since in this experiment motion compensation was not used therefore changes were observed inside region of interests (ROI) manually selected at the level of the nostrils or mouth . It was assumed that due to respiration activities intensities of radiation are changing in the region of the nose or/and mouth. For each video frame the region of interest is extracted and corresponding values are averaged (one value for a frame):

$$s(t_{i} ) = \frac{1}{{N_{\text{ROI}} }}\sum\limits_{{x = C_{\text{s}} }}^{{C_{\text{e}} }} {\sum\limits_{{y = r_{\text{s}} }}^{{r_{\text{e}} }} {I(x,y),} }$$
(6)

where: N ROI—number of pixels in the nose ROI, r s, c s—first (start) row and column of the ROI rectangle, r e, c e—last (end) row and column of the ROI rectangle, I(x, y)—pixel value of the data matrix of the ROI, i—the frame number (i = 0 … K − 1, K—number of frames).

Finally the set of digital values (respiration-related waveform) are calculated and normalized to the mean value:

$$s_{n} (t_{i} ) = s(t_{i} ) - \mu (s(t)).$$
(7)

The ROI selection plays very important role in the extraction of respiration-related waveforms. In Fig.  4 examples of 3 different ROI locations or sizes are presented together with the derived s(t i ) signals.

Fig. 4
figure 4

The location of ROIs and extracted (using the average operator) signals for: a the single pixel ROI in the middle of the nose (no respiration waveform expected), b the ROI covering nostrils , c the ROI below nostrils. Data acquired using the TAMARISK 320 camera. Decrease in the periodical signals (b, c) is caused by inspirations (cooling—lower intensity values)

In the state of the art the averaging operation is commonly used to calculate the final aggregate of intensities of pixels inside a ROI. This is justified as it is very fast operation to implement (near real time estimation of respiration rate ) and can spatially filter (low pass) data. To obtain high signal (respiration-related waveform) to noise (thermal interferences) the ROI should contain many pixels that represent skin surface where the respiration-related heat flow changes the local temperature . Therefore, the size of the ROI should be big enough to compensate small movements of the subject (and other related small temperature interferences) and small enough to contain majority of pixels representing respiration-related change of intensity. However, other aggregation operators could be used. For example, we have experimentally verified that higher 1st order moments can be successfully used to extract respiration waveforms, assuming that the ROI covers relatively large area, where there are not other that respiration-related changes. The best results where obtained for the adjusted Fisher-Pearson coefficient of skewness, calculated as:

$$s_{\text{s}} (t_{i} ) = \frac{{N_{\text{ROI}} }}{{(N_{\text{ROI}} - 1) \cdot (N_{\text{ROI}} - 2)}} \cdot \sum\limits_{{x = C_{\text{s}} }}^{{C_{\text{e}} }} {\sum\limits_{{y = r_{\text{s}} }}^{{r_{\text{e}} }} {\left( {\frac{{I(x,y) - \mu_{\text{ROI}} }}{\sigma }} \right)^{3} } } ,$$
(8)

where: μ ROI is an average pixel value in the ROI, σ ROI is a standard deviation of pixel values in the ROI.

Skewness is a measure of symmetry or the lack of symmetry. The skewness for a normal distribution is zero, and any symmetric around mean data should have a skewness coefficient value near zero. Inspiration causes the local changes of data distribution in the analyzed ROI and data are skewed more left or right in reference to “expiration” or “pause” frames. Subtracting mean value of the skewness lead to the representation of skewness changes that represent temperature changes in the ROI. In Fig. 5 some examples of respiration-related signals extracted using the skewness operator are presented.

Fig. 5
figure 5

The location, size and extracted (using skewness operator) signals for: a the biggest ROI covering nostrils , mouth and cheeks, b the middle size ROI covering the nostrils but not mouth, c the small ROI covering nostrils. The waveform on the bottom was extracted for the biggest ROI using the average operator. Data acquired using the FLIR Lepton camera

As it can be observed in Fig.  5 there are only small differences in extracted signals for different sizes of ROIs. For comparison , the last waveform shown in Fig.  5 presents the extracted waveform using the averaging operator for the biggest ROI in the Fig. 5a. It is practically useless for the analysis of respiration changes since the averaging operation performed on many pixels smoothed the changes generated by respiration activities. In the contrary, for the skewness operator extracted waveforms are practically not very sensitive on the size of the ROI, assuming that (1) it covers nostrils and (2) it is not too small. It was experimentally verified, that the width of the ROI should be at least equal to the width of nose and the height of the ROI could be set equal to width (what simplifies calculations). It is important to underline that using the skewness operator it is not necessary to precisely locate the ROI or classify the pixels as respiration-related or not. The ROI could be automatically detected using some predefined proportions in reference to the detected face area. Methods of face detection for thermal images are described later in this chapter.

Extracted, respiration-related waveforms are usually corrupted by higher frequency noise and by baseline drift. Therefore, the additional signal filtration is typically used. Baseline removal was performed using 4th-order high pass Butterworth filter with cutoff frequency set to 0.1 Hz. The low-pass filtration was implemented using repeated moving average operation with the window size of f s/2. The preprocessed signals are further analyze to estimate the respiration rate and other parameters describing the respiration waveform.

3.3 Respiration Rate Estimators

Different methods have been previously proposed for the determination of the main frequency (period) of the periodical signal. In the presented studies short time windows were analyzed in the context of the respiration rate estimation using a thermal camera embedded in smart glasses. The typically used frequency estimator is based on the detection of the frequency value (f RR) for the dominating peak (maximum value) in the frequency domain. It assumes that the respiration signal is dominating in the analyzed signal spectrum. The method has some disadvantages. For short time signals it has low frequency resolution. For example, assuming that the acquisition time T a is equal to 15 s, sampling frequency f s = 15 Hz, and number of samples N = 225 then the frequency resolution in frequency domain is equal to:

$$\begin{aligned} \Delta f & = \frac{1}{{T_{\text{a}} }} = \frac{{f_{\text{s}} }}{N} = \frac{1}{15} = 0.066(6)\;{\text{Hz}}\,{\text{or}} \\ \Delta f & = 0.066(6)*60\;{\text{s}} = 4\;{\text{bpm}}. \\ \end{aligned}$$
(9)

Therefore, to increase the estimation accuracy of respiration rate longer acquisition times are required. For example, assuming T a = 30 s the resolution would be Δf = 1/30 * 60 s = 2 bpm. Such resolution is practically related to the accuracy of ±1 bpm (the actual value is moved to the nearest left or right discrete frequency). In most medical applications , especially used for screening purposes, such accuracy is acceptable and could be much better than clinical observations. For example, in [47] 54 doctors from London were asked to evaluate 3 video recordings of different respiration activities of mock patients . The observed mean difference between values measured by doctors and known respiration rate values where up to 5.43 bpm (i.e., 0.02 for video no. 1, 2.46 for video no. 2, and 5.43 for video no. 3).

The RR estimation method based on the dominated peak in the frequency spectrum (it will be later labeled as eRR_sp) has also other disadvantage. It practically always returns the result even for a signal that doe not represent respiration activities (e.g. noise). Therefore, additional measures are required to evaluate the reliability that the analyzed signal represents respiration activities and the estimated RR value is probable. It will be analyzed later in this chapter.

Respiratory rate is clinically determined by counting the number of times the chest rises or falls per minute. Therefore, other respiration rate estimators could rely in counting events that are related to inspiration and/or expiration. Some examples were described in the state of the art section. Here, we analyze three additional estimators that are used in the analysis of signals in time domain: eRR_zc—estimator based on the number of zero-crossings, eRR_pk—estimator based on the number of detected peaks, and eRR_ap—estimator based on periodicity of peaks locations for the autocorrelation function in the time domain.

The respiratory rate estimator based on the total number of zero-crossings (nZC) in the filtered signal computes the frequency as:

$$f_{ZC} = 0.5 \cdot \left( {nZC\left( {\bar{s}_{fn} (t)} \right) - 1} \right) \cdot f_{\text{s}} /N$$
(10)
$$eRR\_zc = f_{ZC} \cdot 60$$
(11)

The reliable use of this estimator assumes that the analyzed signal is smooth (without high frequency noise/interferences) without baseline drift.

Another respiratory rate estimator that is based on signal morphology uses detection of signal peaks. Typically, it calculates the number of inspiration/expiration peaks in the filtered signal. Assuming that inspiration activities are more easily detected in thermal-bases respiration waveforms (ambient temperature is lower that body temperature ) then respiration frequency can be estimated as:

$$f_{PK} = \left( {nPK_{d} \left( {\bar{s}_{fn} (t)} \right) - 1} \right) \cdot f_{\text{s}} /N_{\text{d}}$$
(12)
$$eRR\_pk = f_{PK} \cdot 60$$
(13)

where: nPK d —number of inspiration peaks, N d—the total number of samples between the first detected inspiration start and the last one.

The method requires the use of a peak detector, so in practice many algorithms could be proposed. In presented studies, the multistep detector was used. First, it is looking for the local minimum and the following local maximum of the analyzed signal, for which their difference is greater than the threshold value T:

$$d_{j} = \bar{s}_{fn} (t_{J + 1} ) - \bar{s}_{fn} (t_{J} ),\quad d_{j} > T$$
(14)

where: \(\bar{s}_{fn} (t_{J} )\)—filtered signal value of the local minimum at j, \(\bar{s}_{fn} (t_{J + 1} )\)—filtered signal value of the local maximum at j + 1.

Peak and valleys points are labeled in two phases. In the first phase the threshold value

T = T 1 is calculated as:

$$T_{1} = T_{K1} \cdot \left( {\hbox{max} \left( {\bar{s}_{fn} (t)} \right) - \hbox{min} \left( {\bar{s}_{fn} (t)} \right)} \right)$$
(15)

where T K1 was the scaling value set to 0.33.

The calculated threshold value is used to detect valleys and corresponding peaks in the analyzed signal. In the second phase, gradients between the corresponding valleys and peaks are calculated and the median value is computed. Next, the calculated median value is used to find the value of the T 2 threshold as:

$$T_{2} = T_{K2} \cdot {\text{median}}(\{ \Delta i\} )$$
(16)

where \(\{ \Delta i\}\) is a set of gradient values between the corresponding peaks and valleys.

The new threshold value T 2 is next used in the detector based on the first derivative estimator (4). In the reported study the scaling factor T K2 = 0.25 was used. The detected peaks are used to calculate the number of peaks () and points in time of the first inspiration event in the analyzed signal window and of the last inspiration event. It enables to calculate the total number of samples between the first detected inspiration start and the last one (N i ).

The next respiration rate estimator used in this study was based on the autocorrelation function. It is known that the autocorrelation sequence of a periodic signal has the same cyclic characteristics as the signal itself. Therefore, the autocorrelation for different time lags is calculated. The period can be further calculated using Fourier Transform and similar analyzes as for the eRR_sp estimator. This estimator will be designed further as eRR_af. This method has the practically the same disadvantages as the eRR_sp. However, the period can be also determined computing the average time period between detected peaks in time domain. Therefore, the next estimator is further used (eRR_ap) that detects peaks of the autocorrelation function using the peak detector method presented above.

The estimated frequencies were multiplied by 60(s) to obtain results in beats per minute (bpm). All estimators were calculated for thermal-based signals and signals measured using the reference pressure belt. Reference signals were visually inspected to manually calculate the respiration rate as a number of respiration events in time. The first and the inspiration events were visually detected in the analyzed signal window. It was assumed that one respiration event longs between two successive starts of inspiration events. The number of respiration events in the analyzed time segment was counted (N RE) and the total time of all respiration events was calculated (T RE). The reference respiration rate was calculated as

$${\text{RR}} = (N_{\text{RE}} *60)/T_{\text{RE}}$$
(17)

The Mean Absolute Error (MAE) was used in the evaluation of different estimators. It is defined as:

$${\text{MAE}} = \frac{1}{L}\sum\limits_{l = 1}^{L} {\left| {eRR\_xx_{l} - {\text{RR}}_{l} } \right|}$$
(18)

where: L—number of data recordings, eRR_xx—the evaluated estimator (e.g. eRR_sp), RR—manually calculated respiration rate using belt data (the reference).

Similarly the standard deviation of absolute errors was calculated.

In [48] we have demonstrated the results of the study focused on the analysis of different respiration rate estimators. Sequences of thermal images were recorded for 16 healthy volunteers (avg. age = 34.75 years ± 13.16) using the TAMARISK 320 LWIR camera. All subjects were asked to breathe naturally and not to move during the acquisition time (1 min). In parallel reference data were collected using the chest pressure belt (Vernier RMB). Next, data were processed using methods described above in this section (the average operator was used as an aggregation operator in ROIs). The mean absolute error was calculated as a difference between manually calculated respiration rates and values computed using different respiration rate estimators. The best results were achieved for the eRR_ap estimator MAE = 0.415 bpm (std. dev. 0.398). The worst results were obtained for the estimator that was based on counting zero-crossings, eRR_zc. The achieved MAE was 1.291 bpm (std. dev. 0.93). The same estimators applied to data collected using the reference belt gave very similar results as for thermal-based data. For example the MSE for the eRR_ap estimator was 0.295 bpm (std. dev. 0.368), but for the worst eRR_zc the MSE was 1.584 bpm (std. dev. 0.816). The error lower that 2 bpm is fully acceptable for medical screening what is the main application of the proposed methodology. It is worth to underline that the implemented estimators worked properly giving similar results for thermal-based data and for belt-data. Small differences in results between the best estimator and manually calculated values were also caused by different number of samples that were analyzed by those methods. The estimators automatically analyzed the whole 30 s long data window. For manually calculated respiration rates only full respiration periods were manually selected from 30 s long data windows.

Similar experiments were described in [49]. Sequences of thermal images were recorded for 11 healthy volunteers (mean age: 39.73 years ± 11.98) using the FLIR Lepton LWIR camera. All subjects were asked to breathe naturally and not to move during the acquisition time (1 min). Also in this experiment reference data were collected using chest pressure belt (Vernier RMB). All data were processed using methods described above in this section (the average operator was used as an aggregation operator in ROIs). However only two estimators were evaluated: eRR_sp and eRR_ap. The mean absolute error was calculated as a difference between manually calculated respiration rates (using Eq. 10) and values computed using two respiration rate estimators. Similar, good results were obtained for both estimators. For thermal-based data the MAE for the eRR_ac estimator was 0.501 bpm (std. dev. 0.504) and for the eRR_sp estimator it was 0.525 bpm (std. dev. 0.454). For belt data results were a little bit better: MAE for the eRR_ac estimator was 0.194 bpm (std. dev. 0.143) and for the eRR_sp estimator it was 0.418 bpm (std. dev. 0.368).

The above shown results were achieved assuming that subjects do not move and do not speak. In [50] we wanted to investigate if it is possible to estimate respiration rate when subjects are talking. We asked 12 healthy volunteers (avg. age = 36.25 years ± 12.08) to continuously speak (small head movements were allowed). This condition was similar to such situation, when a patient, during the interview, describes his/her problem. Analyzing the breathing patterns during natural speech could be interesting for medical purposes but also for proper speech training. In this study, the general methodology used in data processing was the same as previously described. However, the average operator was used for ROIs covering mouth areas. Three respiration rate estimators were evaluated: eRR_zc, eRR_sp and eRR_ap. The interesting finding of this study was that results automatically obtained for thermal-based data were generally better than for belt-data. The MAE for the best eRR_ac estimator was 0.728 bpm (std. dev. 0.597), for the eRR_sp estimator it was 2.089 bpm (std. dev. 2.346) and for the eRR_zc estimator it was 3.575 bpm (std. dev. 2.864). Results obtained for belt data were: for the eRR_ac estimator MAE = 2.553 bpm (std. dev. 2.373), for the eRR_sp estimator it was 2.496 bpm (std. dev. 2.153) and for the eRR_zc estimator it was 1.423 bpm (std. dev. 1.377). The overall results are worse than those when subjects were not speaking. This is because the extracted respiration related signals were much more noisy and sometimes respirations were irregular. In such conditions signals were not stationary. Some examples are presented in Fig. 6.

Fig. 6
figure 6

Results of extracted respiration waveforms for speaking subjects. S02, S08

The described results were obtained using an average as the aggregation operator applied for ROIs of the nostrils or mouth. However, we have also compared previously described estimators for signals extracted using the skewness operator. In this study the FLIR Lepton camera was used. Data were recorded for 10 healthy volunteers (age: 38 years ± 9.3; recording time 1 min, sampling frequency f s = 13 Hz). The respiration, pressure belt (Vernier RMB) was used for reference measurements. The best results were obtained for three estimators: eRR_sp, eRR_af, and eRR_ap. In most cases the results of the eRR_sp and eRR_af were similar due to the used similar method of the frequency estimation. Theoretically, for periodical signals without noise, values calculated by these estimators should be the same because the autocorrelation sequence of a periodic signal has the same cyclic characteristics as the signal itself. So the dominated peaks should be observed for the same frequency in the frequency spectrum. In Fig. 7 examples of filtered signals, their frequency spectrum and autocorrelation signal as a function of time lags are presented.

Fig. 7
figure 7

The frequency spectrum, the filtered signal and (bottom) the autocorrelation signal as a function of time lags for the subject S09 for: a belt data, b thermal data—processed using the average operator, c thermal data—processed using the skewness operator

In Table 1 values of the mean absolute error, the standard deviation of absolute error and the coefficient of determination, denoted R 2, (representing the correlation between estimated and reference data) are presented for best estimators in the study.

Table 1 Results of the study with the analysis of signals extracted using the skewness operator

The best results were achieved using the signals extracted from thermal recordings as a sequence of normalized skewness values of ROI data. It should be also underlined that these good results were observed for all estimators. The results obtained using the skewness operator for thermal data were almost identical to those obtained for reference belt data. The eRR_ap estimator gave best results for belt data and for thermal data processed using the skewness operator. In Fig. 8 the values of MAE are illustrated and graphically compared.

Fig. 8
figure 8

The comparison of values of mean absolute error for particular methods

Other aggregation operators were proposed in [51], however the better results were obtained for the skewness operator.

The obtained results indicate that respiration rate can be reliable estimated using the analysis of thermal recordings. Different estimators were evaluated for thermal sequences recorded using two small, portable cameras: TAMARISK 320 and FLIR Lepton. In all cases the most accurate estimates of respiration rates were achieved for the eRR_ap estimator. This estimator is based on the calculation of autocorrelation for different time lags. Peaks of the derived signal are detected in time domain so there are not such limitations as for methods based on the analysis in frequency domain (e.g. limited frequency resolution). The periodicity of the derived signal is analyzed so additional measures could be proposed to evaluate if the analyzed signal is periodically enough to reliable estimate respiration rate. It is also important to underline that if respirations are irregular then better results should be obtained using estimators based on the detection (and counting) of peaks in time domain. In such cases respiration signals are not stationary and results based on the dominating frequency analysis could lead to higher errors of respiration rate estimates.

Additionally, the very interesting finding is the possibility of estimation of respiration rate when the observed subject is speaking. In practice , smart glasses with the embedded thermal camera and required software could be a very useful tool for a healthcare professional. It can estimate respiration rate more naturally, during typical interview, without “artificially” connected devices to a patient.

Another interesting observation was related to data aggregation in ROIs. For thermal sequences recorded using the FLIR Lepton camera module better results were obtained calculating the skewness value instead of the average value for the ROI of each frame. It was also important, from the practical point of view, that the size of the ROI was not so important for obtaining the signal that contained respiration related changes. It could partially reduce the computational complexity related to the determination of the best ROI size and location in data frames. However, this still does not solve a problem of patient movements. In such situations face/nostril detection and tracking algorithms are required.

3.4 Respiration Pattern Analysis

Respiration rate is the most important parameter that can be computed from the respiration signal. However, other parameters could be valuable for medical diagnostics . Some examples include: the number and length of apnea events, the depth of breathing or amplitudes of inspiration/expiration event, the length of inspiration phase, the length of expiration phase, the regularity of respiration events, etc. These parameters can also describe and can allow discriminating between different respiration patterns presented in Sect. 1. Most of the parameters are mainly based on the detection of three events: start of the inspiration event, start of the expiration event and end of the expiration event. For example, the apnea period can be defined here as a time period between the start of the inspiration event and the end of previous expiration event.

In the study presented in [49] we investigated whether it is possible to reliable detect apnea events from respiration waveforms extracted from sequences of thermal images. During the experiments 12 healthy participants (avg. age = 37.15 years ± 9.16) were asked to follow the T1–T5 respiration patterns. Thermal sequences were recorded using the TAMARISK 320 camera using the procedure described earlier for the analysis of respiration rate. To analyze the possibility of apnea events detection volunteers were asked to hold breath to simulate apnea periods in T1, T2, T4, and T5 patterns. They could decide when hold the breath and how long the apnea event should long. In apnea periods of the extracted signals from thermal recordings the temperature variations were observed. It is caused many internal (e.g. blood flow) and external (heat flow due to ambient temperature changes) thermal conditions. In reference to the baseline such changes can be positive (trend with the positive slope), negative (trend with the negative slope) or neutral (without slope, normalized mean about 0). The observed rate of such changes is typically smaller that for respiration rate and the observed temperature gradient is also significantly smaller than for respiration activity. We proposed the apnea events detection algorithm (Algorithm 1).

It was based on the first derivative of the filtered signal. The absolute values of the first derivative signal were normalized in reference to maximum signal value. Then the algorithm is counting all successive samples, for which values are smaller than the threshold value. The threshold value, T, is calculated as the weighted (K) value of the interquartile range (IQR) for the processed signal. The apnea event is detected if the number of samples (or time period) is higher than the assumed parameter value, Tapnea (e.g. >10 s).

Some results for the apnea detection algorithm are presented in Fig. 9a, b.

Fig. 9
figure 9

Original respiration waveforms (top) and detected apnea periods using the Algorithm 1 for: a the signal recorded using the pressure belt, b the signal derived from the thermal recording. Presented signals were recorded for a subject S01 using the T1 respiration pattern

The very interesting result can be observed from the analysis of signals presented in Fig.  9. For example, the first train of respiration events for thermal recording has more events (8 inspiration events) than the signal measured using the respiration belt (7 events). Similar situation can be observed for the last, 3rd, train. For the pattern T1, subjects were asked to first increase the respiration effort and then decrease. Very shallow respirations in the end of 1st and 3rd trains were not observed using the respiration belt. The pressure difference was too small to be observable. Probably, if the initial air pressure in the belt were higher then the pressure difference would be visible. However, that would be very uncomfortable for the participant of the experiment. It can be concluded that respiration monitoring using the thermal imaging is sensitive to inspirations even when the respiration effort is small.

In Fig. 10 examples of the results of apnea detection algorithm are presented for different respiration patterns extracted from recorded sequences of thermal images.

Fig. 10
figure 10

Original respiration waveforms (top) and detected apnea periods using the Algorithm 1 for signals derived from the thermal recording for subject S09 and for: a T1 pattern, b T2 pattern, c T4 pattern, and d T5 pattern. The results for the T3 pattern is not presented since it does not contain simulated apnea periods

The mean absolute error calculated for differences between automatically detected lengths of apnea periods in thermal-based signals and manually calculated lengths of apnea events for belt-based signals was 0.44 (assuming Tapnea = 4 s and K = 0.6 in the Algorithm 1). The standard deviation was 0.39. Obtained results were very similar to the results of automatic processing of belt-based data. Therefore, it can be concluded that apnea periods can be reliable detected from different respiration patterns that can be extracted from thermal recordings. In the description of respiration patterns we proved that respiration rates (in given time windows) and length of apnea events could be accurately detected from signals extracted for thermal recordings. Additionally, relative amplitude values of respiration waveforms obtained using the reference belt and using the thermal camera were analyzed. The amplitudes were compared manually by the comparison of signal plots in time domain (for the pattern RP1). As it can be observed from Fig. 9 signals derived from thermal recordings do not follow the crescendo–decrescendo pattern than is easily observable in the signal recorded by the pressure belt. Results obtained for all volunteers confirmed that it is not possible to reliable correlate amplitude variations between signals measured with the pressure belt and signals extracted from thermal recordings. As it was described earlier, the thermal recording is sensitive to small temperature changes (respiration with very small effort) but it is not proportional to different effort levels. It can be explained by the heat flow mechanism assuming the cooling process during the inspiration (in room temperatures lower that body temperature ). In the first phase of inhalation there is a high gradient of temperatures (air to nasal cavity tissue) that decreases with time of inhalation. Since the ambient temperature is not changing and, in parallel, nasal cavity tissue is heated by blood perfusion (and also by other mechanisms) therefore the observable temperature change is becoming saturated. Therefore, it is practically impossible to quantitatively compare breathings with different efforts (depths).

In [51] we additionally compared methods of detection of inspiration periods and expiration periods comparing the results obtained for signals recorded with the respiration belt and extracted from thermal recordings (using the TAMARISK 320 camera). The peak-and-valleys detector was used twice: analyzing the signal from the start to the end and from the end to the start. In the first phase starts of inspiration events and ends of inspiration events were detected. In the second pass (from the last sample towards the first sample) the end of expiration and the start of expiration events were detected. The inspiration period I t was calculated as the difference between the time of inspiration end and inspiration start. The same method was used to calculate the expiration period E t . Values of I t and E t were calculated for signals recorded with the respiration belt and extracted from thermal recordings. The mean absolute difference was calculated as a normalized mean absolute difference between values calculated for the belt-based signal and for the thermal-based signal. The normalization was performed by dividing the absolute difference value by the inspiration or expiration period value obtained for signals recorded using the reference pressure belt, i.e.:

$$\Delta I_{t} = \left| {I_{{t{\text{B}}}} - I_{{t{\text{T}}}} } \right|/I_{{t{\text{B}}}} ;\quad \Delta E_{t} = \left| {E_{{t{\text{B}}}} - E_{{t{\text{T}}}} } \right|/E_{{t{\text{B}}}} ;$$
(19)

where:

I tB, E tB :

inspiration/expiration period calculated for the belt signal,

I tT, E tT :

inspiration/expiration period calculated for the signal extracted from the thermal recording.

In Fig. 11 examples of respiration waveforms with automatically detected inspiration/expiration beginnings and ends are presented. It can be observed that for the original belt signal the respiration pause can be observed due to small pressure changes measured in the end of expiration and at the beginning of inspiration.

Fig. 11
figure 11

Examples of original (left) and filtered (right) respiration waveforms obtained for the respiration belt (top) and extracted from the thermal recording (bottom). Automatically detected inspiration/expiration beginnings and ends are indicated. After low pass filtration inspiration ends are practically the same as expiration beginnings and expiration ends are the inspiration beginnings

The results obtained for data recorded for 12 healthy volunteers (avg. age = 36.25 years ± 12.08) shown that inspiration and expiration beginnings and ends events can be detected with certain accuracy . Some differences were obtained between inspiration and expiration periods calculated for signals recorded using the respiration belt and signals extracted from thermal recordings. The normalized, mean absolute difference was about 19% for inspiration periods and about 15% for expiration periods. This relatively big difference is caused by several issues. First, there is a difference between both measurement methods. The start of inspiration can be earlier detected by the thermal imaging since even very small inspiration effort is clearly visible as cooling (if the ambient temperature is lower that the body temperature ). For the respiration belt the small pressure change is visible only if the belt firmly adheres to the body and the initial air pressure in the belt is high enough to observe the difference related to very small respiration movements. Another reason for the observed values of mean absolute differences was the role of signal filtration. The proper use of the peak detector assumes that the signal is smoothed. The filtered signal has smoothed edges so the accuracy of the detection of exact points in time when the change starts is lower that for the ideal step signal. Additionally, the analyzed signals have limited resolution. For example, if the inspiration period longs 1 s (13 samples) then shift by 1 sample in inspiration start and stop events leads to 2/13 = 14.4% of total difference.

Additionally, inspiration-related slopes (S) were calculated and compared for signals recorded with the thermal camera and using the reference, pressure belt. The values of slopes were calculated as the relation of the signal gradient for the corresponding inspiration start/end events to the time difference between points of time when those events occurred. The mean difference between inspiration-related slopes calculated between values obtained for signals recorded using the respiration belt and signals extracted from thermal recordings was 5.04° (±5.24). The smallest difference was 0.97, but the highest value was 18.32. Again, reasons of such differences are similar as described previously, since the calculation of slopes is based on accurate detection of inspiration beginning and end.

3.5 Automatic Detection of the Nostril Region

To estimate the respiration rate of subjects, the thermal facial image sequences have to be preprocessed for detecting and tracking face and nostril region. Some approach for real time nostril area tracking has already been discussed and described in details [52, 53]. The flow of described solution is presented in Fig. 12. In face-processing systems face detection is usually a first step [39]. After extracting facial area from the background it can be processed further in order to analyze its features .

Fig. 12
figure 12

The flow of proposed solution for tracking nostril region

In thermal imagery, the face is usually distinct from other part of image and can be easily marked. Most of existing methods extract face by segmentation, which can be easily achieved with thresholding [32]. On the other hand, Haar-like features are often used as a descriptor of the local appearance, because of high precision and computation speed of Viola-Jones algorithm [39, 44, 54]. This algorithm, which can run practically in real time, consists of two steps: training and detection. In a first phase a special classifier with cascade structure is trained from labeled images. Features from each image are extracted by encoding the presence of oriented contrasts between two regions with Haar-features. The resulted classifier is then used to detect objects in test data set.

In the study presented in [53], thermal video sequences were recorded by using TAMARISK 320 long wave thermal camera (resolution 320 × 240, sensitivity <50 mK, 25 fps) on a group of 19 volunteers (age: 23.7 ± 5.2). During the experiment each volunteer was asked to stand still, turn head slightly left and turn head slightly right. Then, 12,000 thermal images that portray male and female faces (positive cases) and 3000 images of other objects (negative cases) were extracted from recorded sequences and used to train the classifier. Examples of acquired images are presented in Fig. 13. The result of the training step was the Haar-feature classifier capable for face detection in the test data set that consisted of 480 images (20 for each volunteer).

Fig. 13
figure 13

Examples of acquired images: from the left—2 positive and 2 negative cases

The presented research [53] aimed at validating possibilities of tracking nostril region with acceptable accuracy in real time. In order to measure the precision of the algorithm , the mean value of pixel intensities in detected and tracked area was compared with the mean value of pixel intensities in nostril area marked manually in fixed position (not tracked), see Fig. 14.

Fig. 14
figure 14

The nostril area marked manually

Moreover, mean squared error (MSE) and root mean squared error (RMSE) of mean values were calculated separately for series of images that portray each volunteer during performing each movement (quiescence, turn left or turn right).

In thermography most of facial features are usually blurred and undistinguishable [39]. Moreover, usable data may be represented by close contrast values. To make the detection algorithm more robust authors of [53] increased image contrast by making following improvements: the image conversion to a gray scale and a histogram equalization. After extracting facial region and enhancing valuable data, interest points detectors (ORB, SIFT, SURF and Harris Corner Detector) were applied to this area to find specific facial features . Each detector was tested for processing time and accuracy , that was measured as a displacement of detected area from its expected location (specified by an expert) divided by the image height. For each interest point detector, the subtraction between image with found features and original image was calculated. The resulted image is presented in Fig. 15.

Fig. 15
figure 15

The subtraction of original image and image with detected interest’s points (Harris corner detection)

Then, the image was divided into blocks (30 × 30) and the mean value of pixel intensities in each block was calculated. In next step, authors applied thresholding as a segmentation technique, which aimed at partitioning an image into different components. The image was divided into parts with values higher and lower than selected threshold. This operation allowed marking blocks that contain interest points. After that, blocks that were close to each other were classified to the same group. The most numerous groups were formed from facial contour and they did not contain information about facial features, so they were removed. From the remaining groups, interest areas templates were extracted. The whole procedure was repeated for N initial frames and the average location of each template was calculated. The resulted locations and sizes were used to extract final templates, which were matched in next frames using pattern matching technique. The best match was defined as global minimum of all comparisons between templates and image patches slid across tested image. At each location, the template was compared against overlapped patch by calculating its metric using ‘CV_TM_SQDIFF’ method from OpenCV library [55].

The number of matched regions was limited to two, by preserving only these, which distance to nostril area (marked manually) was smallest. Next, the detected nose area was located in the middle of the horizontal distance between two matched areas (in all cases they represented eyes) and directly underneath them. Then, this nostril area was tracked by applying the same pattern matching method.

Undoubtedly, the biggest advantage of described system for automatic tracking of nostril area is really short processing time [27.7 ms (Harris), 23.9 ms (ORB), 19.7 ms (SIFT), 27.6 ms (SURF)] while preserving satisfactory accuracy of region detection (displacement for Harris 7.2 ± 4.3%, ORB 9.9 ± 2.2%, SIFT 7.0 ± 1.9%, SURF 8.9 ± 2.7%).

Each interest point detector was used to detect nostril area separately for each movement. Taking it into account, mean value of pixel intensities in tracked and nostril area marked in fixed position vary depending on used detector and pose of volunteer. Example values of pixel intensities for all detectors while turning head slightly right are presented in Fig. 16.

Fig. 16
figure 16

Mean values of pixel intensities for one volunteer while turning head slightly right (all methods for tracked and fixed localization of nostril region)

Values of pixel intensities for chosen detector depending on performed movement are presented in Fig. 17.

Fig. 17
figure 17

Mean values of pixel intensities for one volunteer for chosen detector (all movements for tracked and fixed localization of nostril region)

As can be seen, changes in the mean values were much more considerable for nostril regions marked in fixed positions. This result was also confirmed by calculated RMSE values (see Table 2). Almost in all cases (instead of 2 pairs of values marked in table with red color) errors were higher for not tracked areas. Similar situation may be observed for different pose variations. For tracked regions fluctuations of mean values were smaller. However, the movements analyzed in [53] were quite inconsiderable and achieved results may be different for other disturbances (for example background or haircut influence, as described in [43]). Currently, algorithm similar to solution presented by authors of [53] is tested for more noticeable movements (also in other planes) by us.

Table 2 RMSE for each movement: rows represent each volunteer (Female F; Male M followed by age) T—tracked region, NT—not tracked region

4 Conclusions

In clinical observations, the respiratory rate is often estimated by counting the number of times the chest rises or falls per minute [56]. Other, quantitative methods use different algorithms and techniques, including inductive plethysmographs or thoracic impedance systems [57], oxygen masks [58], bioacoustic sensors [59], accelerometers or gyroscope sensors [60], etc. Respiration activities are often recorded together with other biomedical signals. For example in the sleep studies a set of signals could be recorded, including electroencephalogram, electro-oculogram, electromyogram, nasal airflow, abdominal and/or thoracic movements, body position, snore acoustic signal, electrocardiogram, and blood oxygen saturation [61, 62]. Respiration rate can be estimated not only from one of those methods (e.g. nasal airflow) but also from other recorded signals (body movements, electrocardiogram, etc.). Additionally, in [63] 3D breathing waveforms can be also recovered out of thermal sequences allowing visualization of subtle pathological patterns.

The remote measurement of respiration rate is another very important and useful possibility. It can be especially valuable for medical screening purposes (e.g. severe acute respiratory syndrome (SARS) , pandemic influenza, etc.). In this chapter we presented many different studies focused on the measurements and estimation of respiration rate using thermal imaging methods. All of the methods demonstrated the very good results of the estimation of respiratory rates. In our works we focused on the evaluation of different respiration rate estimators for the needs of data processing of image sequences recorded by small, mobile thermal cameras . The miniaturization of thermal camera allowed embedding such cameras in smart glasses. In several studies we demonstrated that using image sequences recorded by thermal cameras of smart glasses not only respiration rate can be reliable estimated by also some other parameters that describe respiration patterns.

Respiration rate estimation should be performed as fast as possible. However, this requires data acquisition during a specific period of time. This is especially important for typically used data analysis in frequency domain (due to limited spatial resolution). The presented results of our experiments shown that the best accuracies had been obtained using the analysis of autocorrelation as a function of time lags (eRR_ap estimator). Ideally, the correlation of the signal with its shifted version should produce a value of 1 if the shift is equal to the signal period. In practice , except for the time offset equal to 0 (the same signal), the values are lower than 1. However, the absolute correlation values obtained for time offsets equal to the next multiplicities of signal period can be further used to evaluate if the original signal is really periodical or not. This could be used to evaluate if it is possible that the signal contains respiration-related information. Other similar measures or parameters can be used to evaluate if the signal is less or more periodical. Some examples include Hijorth parameters [64] or spectral “purity” indexes [65]. We have used them with success in the evaluation of signals for pulse rate estimation [66, 67].

In the analysis of respiration rate it is also important to locate pixels, which values change due to respiration activities. Sometimes data classification procedures are used but it requires relatively more computational resources. More often pixels are aggregated in the manually or automatically specified region of interests . In all papers presented in the state of the art the average operator was used. This requires that the ROI should contain majority of pixels that values are changing due to respiration activities. In such cases the specification of ROI location and size could be critical. In our works we asked if other aggregation operators assuming that differences in the parameters describing the distribution of values in the ROI could be useful. We successfully evaluated the skewness parameter calculated for ROI data of thermal sequences recorded using FLIR Lepton camera. It was interesting that the extracted signals using the skewness operator were not so highly dependent on the size of ROI, as it was observed for the average operator. Other parameters could be analyzed in the future.

Automatic detection and tracking of respiration-related sources of thermal changes (nostrils , mouth ) are also very important for the context of mobile applications , especially when a patient is not cooperating. Different methods were presented in the state of the art. In our works we focused on methods that could be fast and have been previously successful for image sequences captured in visible-light.

Analysis performed for detecting and tracking nostril region showed that it is possible to process one frame in less than 30 ms for all detectors [27.7 ms (Harris), 23.9 ms (ORB), 19.7 ms (SIFT), 27.6 ms (SURF)]. This high computational performance is indicator for the assumption that analyzed methods could be used for tracking nostril region in applications running in real-time. Moreover, accuracy of tracking algorithm was also measured by calculating root mean squared error of pixel intensities in tracked and fixed localization of nostril area. Almost in all cases errors for tracked regions were smaller than corresponding not tracked area (for chosen movement, volunteer and method), what allowed for reliable feature tracking. However, analyzed movements were rather small and achieved results are only preliminary. Considering future work in this area, in order to ensure correctness of results, algorithm should also be tested for more noticeable movements and other disturbances. Furthermore, for reliable and efficient medical applications we would like to track facial features without manual alignment, calibration or initialization. Therefore, a fully automatic system for detection, tracing and calculating respiration rate parameters should be designed and implemented in future. Recently, very small thermal cameras have been developed, so a system of this kind could use them after embedding them into wearable devices, like smart glasses.

In this chapter we analyzed respiration rate estimators that can be used to processed sequences of thermal images captured from small thermal camera modules embedded or connected to smart glasses. After calibration of thermal camera modules and using the algorithms to estimate pulse rate from video (recorded in visible light) [68] additional vital signs can be estimated. This could allow obtaining three the most important vital signs: body temperature , pulse rate and respiration rate. Using the intelligent patient identification [68, 69] such data can be automatically stored in the Hospital Information System [70] or other system for the management of Electronic Health Records or Personal Health Records.