Keywords

6.1 Introduction and Background

Hemodialysis is a renal replacement therapy which replaces the lost function of the kidneys for individuals with acute or chronic kidney disease. For those with end-stage renal disease (ESRD), hemodialysis is essential for survival unless a kidney transplant is available. Despite the mortality risk of ESRD, successful hemodialysis can greatly prolong patient lifespans and increase the chance of receiving a donor transplant (Leypoldt 2005). During hemodialysis, arterial blood is filtered through a dialyzer to remove waste products and excess fluid before being returned to the venous system. For individuals with ESRD, hemodialysis is required typically three times per week, which requires a high-flow vascular access so core blood can be filtered efficiently. To improve hemodialysis, permanent vascular access is usually obtained using arteriovenous fistulas or grafts or central venous catheters (Fig. 6.1).

Fig. 6.1
figure 1

The hemodialysis circuit removes arterial blood, filters it externally, and returns it to the body through the vascular access

Patency of a hemodialysis vascular access is the “Achilles Heel” of hemodialysis treatment (Pisoni et al. 2015). Access dysfunction accounts for two hospital visits/year (Cayco et al. 1998; Sehgal et al. 2001) for dialysis patients, and the loss of access patency greatly increases mortality risk (Lacson et al. 2010). Maintenance of vascular access is therefore a key objective in clinical guidelines for dialysis care and is often handled by dedicated vascular clinics to deal with the high volumes of individuals needing emergency interventions (Feldman et al. 1996). The predominant causes of access dysfunction are stenosis (vascular narrowing) and thrombosis (vascular occlusion), which occur in 66–73% of arteriovenous fistulas (AVFs) and 85% of arteriovenous grafts (AVGs) (Al-Jaishi et al. 2017; Huijbregts et al. 2007; Bosman et al. 1998). Venous stenosis near the artery-vein anastomosis occurs in 50–71% of grafts and fistulas, but stenoses can occur anywhere along the vascular access or central veins (Duque et al. 2017; Roy-Chaudhury et al. 2006). Clinical monitoring is essential to identify at-risk accesses for diagnostic imaging and treatment planning and to avoid emergencies, missed treatments, or loss of the access (H. Inc for OSORA CMS n.d.; Hemodialysis | NIDDK n.d.). Doppler ultrasonic imaging, for example, is a noninvasive method for characterizing vascular access function but requires a visit to a healthcare center and evaluation by specifically trained personnel (Sequeira et al. 2017). The promise of efficient, point-of-care monitoring is to proactively identify which patients might need this specialized examination to minimize vascular access dysfunction or loss.

Monitoring for vascular access dysfunction relies on data efficiently gathered in the dialysis center, most regularly through physical exam. When blood flows through a constricted vessel, the resulting high-speed flow jet induces turbulence and pressure fluctuations in the vessel wall (Seo n.d.). This produces distinct bruits which can be heard with a stethoscope during physical examination. Access surveillance, occasionally performed monthly using flow-measuring equipment, cannot detect fast-growing lesions or restenosis after angioplasty and is often a late indicator of access risk (Krivitski 2014) which reduces utility (Krivitski 2014; White et al. 2006; Moist and Lok 2019). Higher-frequency monitoring for access dysfunction would be ideal for early detection of stenosis but must be balanced against the labor and time required. Existing monitoring techniques have variable sensitivities (35–80%), in part due to the expertise dependence of bruit interpretation and physical exam techniques (Tessitore et al. 2014a). Since listening to bruits is an important aspect of physical exams, clinicians have sought to identify auditory features of bruits for quantitative analysis since the 1970s (Duncan et al. 1975).

Recording and mathematical analysis of bruits—sometimes referred to as phonoangiograms (PAGs)—is called phonoangiography because it has the same objectives of characterizing vascular stenosis as angiographic images (Seo n.d.; Kan et al. 2015; Majerus et al. 2018; Doyle et al. n.d.). The primary motivation behind phonoangiography is efficiency and objectivity, because sounds can be recorded easily from the skin surface without particular need for expertise. Signal analysis of PAGs can then be used to objectively describe the underlying turbulent flow and degree of stenosis. Recent advances in spectral and multiresolution analysis, autoregressive models, and machine learning make real-time PAG analysis feasible at the point of care for rapid patient screening. PAG monitoring has the potential to provide widespread, objective screening of hemodialysis vascular access function for early detection of accesses at-risk for thrombosis. This chapter covers relevant signal processing in the analog and digital domains and strategies for extracting classification features from an array of microphone recording sites (Figs. 6.2 and 6.3).

Fig. 6.2
figure 2

Chronic hemodialysis is best achieved using an arteriovenous fistula vascular access, or an arteriovenous graft for individuals with compromised vascular structure. The vascular access is surgically created and monitored clinically to detect the symptoms of dysfunction such as stenosis. Note: for simplicity this image shows the venous and arterial needles at differing angles and positions; in practice, hemodialysis needles are generally placed in the venous segment of the access with the arterial needle antegrade to flow

Fig. 6.3
figure 3

Vascular access stenosis may be detected and quantified using flexible microphone arrays capable of detecting regions of turbulent blood flow produced in the region distal to stenosis

The chapter is organized beginning with a brief summary of prior work using PAGs to locate and classify vascular stenotic lesions. Next, an analysis of recorded bruits is presented to determine the minimum signal bandwidth and dynamic range for analog signal processing prior to digitization. Digital signal processing methods for feature extraction is reviewed, demonstrating feature extraction in spectral, temporospectral, and spatial domains based on recording site location. Finally, three digital analysis strategies are presented to locate, classify, and estimate the actual degree of stenosis using machine-learning methods. While estimation of degree of stenosis provides clinically actionable data, classification enables simpler user notification, for example, with at-home monitoring. Therefore, we highlight these differing approaches to using machine learning for stenosis characterization from acoustic analysis.

6.2 Prior Work in Phonoangiograhic Detection of Stenosis

PAGs have been analyzed for decades, but there is still wide variance in the descriptions of relevant spectral properties in functional and dysfunctional vascular accesses. However, there is a clear relationship between changing acoustic spectra relative to the dimensions of the stenosis. Further, there is relative agreement that PAGs recorded close to the location of stenosis have the distinctive shift in acoustic timber introduced by turbulent flow. Previous studies have analyzed PAGs from humans, from vascular bench phantoms, and from computer simulations of blood flow. Here, we describe two topics which have been studied previously: the spectral properties of PAGs in normal and stenosed cases and the impact of recording location on PAG spectra.

6.2.1 Classification of Degree of Stenosis from Phonoangiograms

Because the degree of stenosis (DOS) in a blood vessel influences the level of turbulent flow, PAG properties are related to DOS. DOS is defined as the ratio of the stenosed cross-sectional area of the blood vessel to the proximal (non-stenosed) luminal area but is also clinically calculated as the ratio in minimum diameter of the stenosed vessel section to the non-stenosed lumen diameter. When angiography is used to determine DOS, linear vessel and stenosis diameter measurements are generally used to estimate DOS within 10% (Allon and Robbin 2009). In our work, because we used computerized tomography (CT) scans of vascular stenosis phantoms (described below), we calculated DOS as the ratio in luminal area in the stenosed and non-stenosed vessel segments, because this accounted for stenosis phantoms that were not circular.

Much early work in PAG analysis represented the combined frequencies generated during systolic and diastolic phases of turbulent blood flow. Because clinical interpretation of pathologic bruits relies on detecting a high-pitched whistling character, it was hypothesized that stenosis would shift spectral power within a certain frequency band (Sung et al. 2015). Although all studies agree that the frequency range of interest is in the 20–1000 Hz band, and that DOS enhanced high-frequency spectral power, identification of specific frequency bands varied widely (Sung et al. 2015; Du et al. 2015; Du et al. 2014; Wu et al. 2015; Mansy et al. 2005; Shinzato et al. 1993; Hsien-Yi Wang et al. 2014; Chen et al. 2013; Akay et al. 1993; Obando and Mandersson 2012; Wang et al. 2011; Clausen et al. 2011; Sato et al. 2006; Gram et al. 2011; Milsom et al. 2014; Rousselot 2014; Gaupp et al. 1999; Gårdhagen n.d.).

Despite the disagreement in the precise effect of stenosis on bruit spectra, these prior studies confirmed that stenosis definitively changes PAG amplitude and pitch. The change, however, could be an enhancement or a suppression of certain frequencies depending on the impact of stenosis on blood flow. Other patient-dependent variables such as PAG amplitude and the recording location relative to stenosis must also be accounted for and are described below.

6.2.2 Localization of Vascular Stenosis from Phonoangiograms

Bruits are only detectable close to where they are created due to the low acoustic amplitude and acoustic attenuation of human tissue. Fluid dynamic simulations have established to a high degree of precision that stenosis induces turbulent flow at physiologic blood pressures, flow rates, and nominal lumen diameters of vascular accesses (Gaupp et al. 1999; Gårdhagen n.d.). These simulations have been confirmed by Doppler ultrasound measurements, which agree that turbulent flow occurs within 2–5 times the diameter of the unoccluded vessel distal to stenosis. Therefore, the presence of a bruit indicates stenosis or some other vascular malformation is nearby.

An important effect is that turbulence and decreased pressure occurs on the downstream side of the stenosis—for an arteriovenous vascular access, this is closer to the venous outflow tract. Therefore, bruits recorded proximally and distally to stenosis have different frequency spectra due to stenosis turbulence (Du et al. 2015). However, the acoustical influence of biomechanical properties and thickness of tissue over the vascular access varies between patients. Because tissue acts as a low-pass filter at auditory frequencies, it is presumed that the most accurate bruit recordings would be obtained in the 1–3 cm region distal to stenosis (assuming unoccluded tube diameter to be 6 mm), where turbulent flow is maximal (Gaupp et al. 1999; Gårdhagen n.d.).

6.3 In Vitro Reproduction of Vascular Bruits

The spectral content of bruits produced by human blood flow is affected by a wide range of uncontrollable factors such as vascular anatomy, blood pressure, blood concentration (hematocrit), and flow rate. We developed an in vitro vascular phantom to reproduce bruits so that relevant acoustic features and classifiers could be matched with known degree of stenosis. Acoustic recordings from the phantom system were used to validate the stenosis classification strategies described below. The reproduction performance of the phantom was validated against 3283 unique 10-s recordings obtained from 24 hemodialysis patients over 18 months (Majerus et al. 2000). Human and phantom bruits were recorded using the same digital stethoscope (Littman 3200) and compared based on aggregate power spectral density. Peak arterial pressure in the phantom was controlled using an adjustable pressure dampening system. Cardiac stroke volume was varied by changing the duty factor of a pulsatile pump. The acoustic power spectra of phantom bruits were validated against reference recordings taken from humans, as previously described (Chin et al. 2019).

Specific construction details of the vascular phantom were previously described (Chin et al. 2019; Panda et al. 2020) and are briefly introduced here (Fig. 6.4). The phantom consisted of a 6 mm silicone tube banded by a silk suture at one location to simulate an abrupt vascular narrowing. Phantoms were produced with DOS from 10% to 85%. The banded tube was then encased below 6 mm of tissue-mimicking silicone rubber (Ecoflex 00-10). The tissue-mimicking portion also extended at least 10 cm in all directions from the stenosis. The final DOS for each phantom was then calculated from images slices taken by CT scan.

Fig. 6.4
figure 4

(a) Vascular stenosis phantom flow diagram. Two pumping systems produced pulsatile flows in a vascular access phantom within physiological ranges of flow and pressure. The recording sites are shown in the stenosis phantom diagram (b). 10–85% stenosis is simulated in the center of phantom by tying a band around 6-mm silicone tubing (c)

Each phantom was connected to a pulsatile flow pumping system (Cole Parmer MasterFlex L/S, Shurflo 4008). Pulsatile pressures and aggregate flow rate were measured with a pressure sensor (PendoTech N-038 PressureMAT) and flow sensor (Omega FMG91-PVDF), respectively. Pulsatile waveforms were delivered to one of the pumps at a rate of 60 beats per minute using a solid-state relay to produce flows from 600 to 1200 mL/min at peripheral peak blood pressures of 110–200 mmHg.

6.4 Signal Processing: Considerations in the Transduction of Bruits

While the main focus of this chapter is signal processing of bruits to produce phonoangiograms for classification, system-level consideration of the signal processing requirements can help optimize performance and avoid over-design. Therefore, this section will review the design considerations for a transducer and front-end interface amplifier to best capture the relevant acoustic signals to the accuracy needed for classification (Fig. 6.5).

Fig. 6.5
figure 5

Signal processing is required first in the analog domain to maximize signal-to-noise ratio and prevent aliasing in analog-to-digital conversion. After digitization, digital signal processing is used to extract features for classification

6.4.1 Skin-Coupled Recording Microphone Array

Fabrication details for recording arrays were detailed previously (Panda et al. 2019a), so this section will introduce new data on the bandwidth considerations for these sensors. The true spectral bandwidth and dynamic range of vascular sounds may still be unknown since only stethoscopes have been used to record these signals previously. Published analyses of PAGs report higher-pitched sounds associated with vascular stenosis (Sung et al. 2015; Du et al. 2015; Du et al. 2014; Wu et al. 2015; Mansy et al. 2005; Shinzato et al. 1993; Hsien-Yi Wang et al. 2014; Chen et al. 2013; Akay et al. 1993; Obando and Mandersson 2012; Wang et al. 2011; Clausen et al. 2011; Sato et al. 2006; Gram et al. 2011; Milsom et al. 2014; Rousselot 2014; Gaupp et al. 1999; Gårdhagen n.d.), which suggests that the reduced frequency range of stethoscopes might be insufficient for blood sounds. Therefore, acoustic recordings from the in vitro phantom were made with a reference transducer (Tyco Electronics CM-01B) with a flat frequency response to at least 2 kHz. For each recording, the 95% power bandwidth was calculated by integrating the power spectral density. To compute the power bandwidth, the power spectral density was computed using fast Fourier transform and then cumulatively integrated by frequency bin until the integration met 95% of the total power in all bins. Because electronic circuits suffer from increased flicker noise at low frequencies, and because all prior reports of PAGs indicate increased power above 100 Hz associated with vascular stenosis (Sung et al. 2015; Du et al. 2015; Du et al. 2014; Wu et al. 2015; Mansy et al. 2005; Shinzato et al. 1993; Hsien-Yi Wang et al. 2014; Chen et al. 2013; Akay et al. 1993; Obando and Mandersson 2012; Wang et al. 2011; Clausen et al. 2011; Sato et al. 2006; Gram et al. 2011; Milsom et al. 2014; Rousselot 2014; Gaupp et al. 1999; Gårdhagen n.d.), we adopted a lower integration bound of 25 Hz. This had a further benefit of enabling shorter-duration recordings (e.g., 10 s), which otherwise do not accurately capture extremely low-frequency signal components. For this analysis 10-s recordings were taken 1 cm before the simulated stenosis, at the stenosis, and 1 and 2 cm after the stenosis relative to the direction of blood flow.

Signal bandwidth was related to the degree of stenosis, as expected, but also to recording location (Figs. 6.6 and 6.7). Both effects were expected based on prior measurements and simulations indicating turbulent flow existing up to 1–2 cm from a typical stenotic lesion (Gaupp et al. 1999; Gårdhagen n.d.). These results suggest the need to record from multiple locations to accurately detect the presence and severity of a stenotic lesion. In an analysis of 156 recordings, the maximum interquartile range for 95% bandwidth was 25 Hz–1.2 kHz; the lower-frequency bound correlated with phantoms with low DOS producing little turbulent flow (Fig. 6.8). These data suggest that a signal bandwidth of at least 1.5 kHz is appropriate for measuring vascular bruits. With a safety factor, we adopted a bandwidth of 25–2.25 kHz.

Fig. 6.6
figure 6

Power spectral density recorded at different locations relative to a 75% stenosis show a site-specific signal bandwidth. In general, sites after stenosis have wider signal bandwidths because of the local presence of turbulent blood flow

Fig. 6.7
figure 7

The 95% power bandwidth for 156 PAG recordings for DOS 10-90% were aggregated based on recording site. Recordings at sites 2 and 3 indicate wider bandwidth independent of degree of stenosis or flow rate. This forms the basis of the classifier methodology, as there is a distinct correlation between elevated power and frequency content in the presence of stenosis

Fig. 6.8
figure 8

Analysis of interquartile range for PAGs recorded with DOS 10–90% showed a required bandwidth of at least 1600 Hz to accurately capture signal dynamics in the analog signal processing section. Including a safety factor, the interface amplifier was designed for 2.25 kHz bandwidth to limit noise

The required bandwidth was achieved with a signal-to-noise ratio of 24 dB using a polyvinylidene fluoride (PVDF) film as a 2-mm diameter circular transducer. This transducer was developed to be coupled directly to the skin to measure blood sounds through direct piezoelectric transduction (Panda et al. 2019b). The small size of the transducer allowed it to be fabricated in recording arrays (M and Panda 2019). In this work we describe testing from arrays arranged as 1×5 channels spaced by 1 cm laterally (Fig. 6.4).

6.4.2 Transducer Front-End Interface Amplifier Design

Each PVDF microphone in the recording array must be coupled to an interface amplifier to amplify the signal amplitude before digital conversion. The analog performance of the interface amplifier is driven by three constraints: the electrical impedance of the PVDF transducer, the required signal bandwidth, and the required dynamic range. In this case, the dynamic range constraint is driven by the minimum signal accuracy needed for the digital signal processing and classification strategy. In a retrospective analysis of blood sounds measured from hemodialysis patients and an in vitro phantom, we determined that a minimum dynamic range of 60.2 dB was needed for accurate classification of stenosis severity (Panda et al. 2019a), which is roughly equivalent to 10-bit accuracy after digital conversion. As described in the previous section, a bandwidth of 2.25 kHz is needed to capture most of the energy in the PAG signals.

The amplifier input impedance constraint is based on the electrical model for each 2-mm transducer which was extracted using an impedance analyzer (Hioki IM3570). The PVDF transducer was modeled electrically as a resistor and capacitor in parallel (Fig. 6.9). Measured values of the sensor resistance, capacitance, and the equivalent sensor output current when recording PAGs are shown in Table 6.1.

Fig. 6.9
figure 9

The PVDF transducer is modeled simply as a resistor (RS) and capacitor (Cs) in parallel with output current Isignal based on measured impedance at 100 Hz

Table 6.1 Measured transducer parameters for an electrical model for the PVDF sensor

Because the PVDF transducer has a large impedance with a small signal current, a transimpedance amplifier (TIA) was designed to convert the piezoelectric sensor current to a voltage that can be digitized. Each microphone within the array feeds a dedicated TIA. The TIA converts the current produced by the transducer to an output voltage while minimizing the input referred noise power. The TIA is an ideal interface to high impedance, current output devices, but certain critical design considerations must be made to optimize the total signal-to-noise ratio of the output signal. The most important design consideration, which has a direct impact on the sensitivity, is the input-referred noise of the TIA. In feedback TIAs built using general voltage amplifiers such as an op-amp with a shunt-shunt feedback, the input referred noise is a function of the input-referred voltage and current noise of the op-Amp (Binkley 2008). Therefore, op-amps with high input-referred voltage (\( nV/\sqrt{Hz} \)) and/or current noise (\( nA/\sqrt{Hz} \)) should be avoided.

The design specifications for the TIA were chosen assuming it would be followed by a 2nd-stage programmable gain amplifier and a 10-bit analog-to-digital converter. Therefore, a small-signal output level was chosen to limit harmonic distortion which can occur with large signal swing. The performance of the TIA dominates the analog noise floor and linearity, so these later stages are not described here. Design requirements for the TIA are summarized in Table 6.2 based on measured properties from PAGs in humans and the vascular phantom (Panda et al. 2019a).

Table 6.2 Design specifications for transimpedance interface amplifier

In addition to the inherent noise of the op-amp, the feedback resistor plays a key role in the overall input-referred noise power of the TIA. Increasing the feedback resistance not only reduces the noise current associated with the resistance but also results in higher TIA gain which helps lower the overall input-referred noise of the TIA. Nevertheless, the requirement imposed on the frequency response of the TIA when interfacing with the transducer limits the amount of resistance that can be used in the feedback path. Still, optimizing the feedback resistance will lead to lower input-referred noise within the required gain bandwidth (GBW) of the TIA (Fig. 6.10).

Fig. 6.10
figure 10

Example of op-amp open loop transfer function and noise transfer functions versus frequency. Ideally, the noise transfer function will be flatter until the op-amp gain begins to roll off (e.g., “B”)

The critical performance metrics are important in completing the design process. Major small-signal TIA performance metrics are the transimpedance gain, the 3-dB bandwidth, and input-referred noise power. Considering the transimpedance gain and the bandwidth, the feedback network is the first physical parameter that must be determined. The feedback network generally consists of a resistor and capacitor that are connected in parallel. The resistive part helps set the transimpedance gain of the TIA, while the capacitive component helps with setting the frequency response, particularly the bandwidth and the stability. The frequency response affects the TIA noise transfer function, and consequently, the input referred noise of the TIA, too. Eqs. 5, 6 demonstrate how to optimize feedback capacitor ranges, e.g.,

$$ \mathrm{If}\ \left(\frac{R_f}{R_s}+1\right)\ge 2\sqrt{R_f\ast {C}_s\ast GBW}\kern2em {C}_f=\frac{C_s}{2\left(\frac{R_f}{R_{in}}+1\right)} $$
$$ \mathrm{If}\ \left(\frac{R_f}{R_s}+1\right)\le 2\sqrt{R_f\ast {C}_s\ast GBW}\kern2.5em {C}_f=\sqrt{\frac{C_s}{R_f\ast GBW}} $$

Another critical consideration for feedback capacitor value is the desired cutoff frequency. This cutoff frequency determines the TIA’s -3db bandwidth, f −3dB, expressed as

$$ {f}_{-3 dB}=\frac{1}{2\pi \ast {R}_f\ast {C}_f} $$

Input referred power is defined by the ratio of the output noise power, divided by the TIA transfer function. This can be calculated using the SNR of the circuit (Fig. 6.11):

$$ SNR=\frac{Power_{signal}}{Power_{noise}}=\frac{{I_s}^2\ast {R}_f}{\overline{i_n^2}\ast {R}_f}=\frac{{I_s}^2}{\overline{i_n^2}\ } $$
Fig. 6.11
figure 11

Equivalent transducer and transimpedance amplifier circuit model for input-referred noise calculations with parallel current source, \( \overline{i_n} \), functioning as input noise source

The TIA design process is to maximize SNR given constraints on required bandwidth, available supply voltage/current, and necessary dynamic range. The transfer function of output voltage level (V out) and input current (I signal) is dependent on the feedback resistance:

$$ {V}_{out}=-\left[{I}_{signal}\ast {R}_f\right]+{V}_{ref.} $$

In this example, V ref is generated by a voltage divider of R 1 and R 2. Both were selected to be 10kΩ to set the reference at half of the supply voltage, i.e.,

$$ {V}_{ref}={V}_{supply}\ast \frac{R_2}{R_1+{R}_2}\ \left(\mathrm{where}\ {R}_2={R}_1\right) $$

The DC value of the output for this stage of amplification was selected to be 2.1 V. From this parameter, the feedback resistance was calculated as:

$$ {R}_f=\frac{V_{out}-{V}_{ref}}{I_{signal}}=\frac{2.1V-1.65V}{0.63\mu A}=715k\Omega $$

The value of the feedback capacitance was determined from the required signal bandwidth. Rearranging Eq. 7 for C f, we arrive at:

$$ {C}_f=\frac{1}{2\pi \ast {R}_f\ast {f}_{-3 dB}}=\frac{1}{2\pi \ast 715k\Omega \ast 2.25 kHz}=100\rho F $$

The minimum op-amp bandwidth for this circuit was calculated using the feedback resistance and capacitance, R f and C f, as well as the capacitance of the input pin of the selected op-amp (Texas Instruments OPA2378). The IN-pin capacitance is the sum of the sensor capacitance (C s), common-mode input capacitance (C CM), and differential mode capacitance (C Diff) as:

$$ {C}_{IN}={C}_s+{C}_{CM}+{C}_{Diff}=1000\rho F+5\rho F+4 pF\cong 1000\rho F $$
$$ {f}_{GBW}\ge \frac{C_f+{C}_{in}}{2\pi \ast {R}_f\ast {C}_f^2}\ge 24.7\ kHz $$

Therefore, the op-amp must have a minimum bandwidth of roughly 25 kHz. The OPA2378’s 900 kHz bandwidth satisfies this requirement and is a viable component for this application. The OPA2378 has an input voltage noise density of \( \frac{20 nV}{Hz^{1/2}} \). The input referred voltage noise was calculated as \( \frac{183 nV}{Hz^{1/2}} \) which meets the 60 dB dynamic range requirement over the signal bandwidth of 2.25 kHz.

6.5 Signal Processing and Feature Classification Strategies for Acoustic Detection of Vascular Stenosis

The preceding sections described how phonoangiograms can be efficiently transduced through arrays of flexible microphones and the bandwidth and dynamic range needed for interface and data conversion electronics. After a bruit is recorded, a wide range of digital signal processing strategies can be used to extract meaningful features. Prior examples have reported that autoregressive spectral envelope estimation, wavelet sub-band power ratios, and wavelet-derived acoustic features correlate to degree of stenosis (Sung et al. 2015; Du et al. 2015; Du et al. 2014; Wu et al. 2015; Mansy et al. 2005; Shinzato et al. 1993; Hsien-Yi Wang et al. 2014; Chen et al. 2013; Akay et al. 1993; Obando and Mandersson 2012; Wang et al. 2011; Clausen et al. 2011; Sato et al. 2006; Gram et al. 2011; Milsom et al. 2014; Rousselot 2014; Gaupp et al. 1999; Gårdhagen n.d.). Features can be extracted from multiple signal processing branches and compared using machine-learning techniques, e.g., radial basis functions or random forests. However, feature extraction and model training must be constrained to prevent over-fitting on limited datasets and to improve generalized use. In this section we provide an overview of how two derived time domain signals—acoustic spectral centroid (ASC) and acoustic spectral flux (ASF)—have unique properties for bruit classification. Importantly, ASC and ASF are derived directly from the discrete wavelet transform coefficients, which reduce feature dimensionality and aid scalar feature extraction.

A specific physical system implementation provides constraints on computational complexity, accuracy, and ease of implementation which can guide the selection of features. In this section, we review the fundamental approach for extracting spectral features from a single acoustic recording site. We will then expand this signal processing into other domains, specifically into time and space, by leveraging time-synchronized recordings from an array of microphones.

6.5.1 Multi-domain Phonoangiogram Feature Calculation

Because PAGs are time domain waveforms, they can be analyzed in both the temporal or spectral domains, i.e., as one-dimensional signals in either domain. Spectral transforms such as discrete cosine transform and continuous wavelet transform combine these domains to form a two-dimensional waveform along time and frequency (or scale) axes. However, when PAGs are acquired at multiple sites along a vascular access, the spatial distribution of PAG properties provides an additional analysis domain. If PAGs are also sampled simultaneously, time domain differences between signals are correlated and can be analyzed. When features are extracted from different domains, they can be compared to each other using clustering and classifier techniques as long as they are reduced to scalar form.

In this section we review how features can be extracted from each domain with dimensional reduction to scalar values. The spectral domain provides scalar features such as average pitch. The temporospectral (combined time-spectral) domain allows segmentation of blood sounds in cardiac cycles to provide sample indices for systole onset. After temporospectral segmentation, spectral features can be separately calculated in systolic and diastolic phases. Finally, the spatial domain provides features describing the time delay between PAGs at different recording sites. Spatial analysis also enables detection of spectral changes between sites to predict where turbulent blood flow is occurring.

6.5.1.1 Spectral Domain Feature Extraction

Spectral domain feature extraction is likely the most common approach in PAG signal processing. This is intuitive because humans perceive frequency content with great sensitivity, and PAG processing seeks to replicate traditional auscultation by ear. In this section we review spectral domain feature extraction using continuous wavelet transform (CWT) to describe the spectral variance over time.

CWT over k scales W[k, n] is computed as:

$$ \boldsymbol{W}\left[k,n\right]={x}_{PAG}\left[n\right]\ast \psi \left[n/k\right], $$

where ψ[t/k] is the analyzing wavelet at scale k. We used the complex Morlet wavelet because it has good mapping from scale to frequency, defined as:

$$ \psi \left[n\right]={e}^{-{\left(n/2{f}_c\right)}^2}{e}^{j2\pi {f}_cn}, $$

where f c is the wavelet center frequency. In the limit f c → ∞, the CWT with Morlet wavelet becomes a Fourier transform. Because of the construction of the Morlet wavelet as the wavelet ψ[n] is scaled to ψ[n/k], and k is a factor of 2, the wavelet center frequency will be shifted by one octave. Therefore, CWT analysis with the Morlet wavelet can be described by the number of octaves (N O) being analyzed (frequency span) and the number of voices per octave N V (divisions within each octave, i.e., frequency scales). Mathematically the set of scale factors k can be expressed as:

$$ k\left[{i}_O,{i}_V\right]={2}^{\left({i}_O+{i}_V/{N}_V\right)}. $$
$$ {i}_O={k}_0,{k}_0+1,{k}_0+2\dots {N}_O $$
$$ {i}_V={k}_0,{k}_0+1,{k}_0+2\dots {N}_V $$

Where k 0 is the starting scale and defines the smallest scale value and the total number of scales K= N O N V. For PAG analysis, we compute CWT with N O = 6 octaves and N V = 12 voices/octave, starting at k 0 = 3. After computing the CWT, pseudofrequencies F[k] across all K scales are calculated as:

$$ F\left[k\right]={f}_c/k. $$

Because the CWT involves time domain convolution, each discrete sample n has a paired sequence of k CWT coefficients, i.e., it is a 2-dimensional sequence. In the context of phonoangiogram classification, features must be extracted from W[k, n] that are of singular dimension. Dimension reduction of W[k, n] can operate over all or part of the k scales at each discrete sample n, over a single k scale for all n samples, over all points of W[k, n], or through a more complex combination of summation over k and n.

The systolic and diastolic portions of pulsatile blood flow contain differing spectral information on turbulent flow, so we have chosen to first reduce the CWT dimensionality to n to produce time domain waveforms. This preserves the spectral differences between different times in the cardiac flow cycle. Two n-point waveforms are calculated from W[k, n]: auditory spectral flux (ASF) and auditory spectral centroid (ASC). From these waveforms, we can compute time-independent features such as RMS spectral centroid, or we can extract time domain spectral features as explained in the next section.

ASF describes the rate at which the magnitude of the auditory spectrum changes and approximates a spectral first-order derivative. It is calculated as the spectral variation between two adjacent samples, i.e.,

$$ ASF\left[n\right]=\frac{1}{K}\sqrt{\sum_{k=1}^K{\left(\left|\boldsymbol{W}\left[k,n\right]\right|-\left|\boldsymbol{W}\left[k,n-1\right]\right|\right)}^2} $$

where W[k, n] is the continuous wavelet transform obtained over k total scales.

To intuitively demonstrate how ASF describes a signal, Fig. 6.12 shows ASF calculated from a stepped single tone test waveform. The tone changes over [100, 200, 400, 800, 1000] stepping every 2 s. At every tonal change, the spike in the ASF waveform corresponds to the time of the spectral shift and the magnitude. The ASF waveform, therefore, describes how when, and how quickly, spectral power is shifting between bands. This is useful in mapping large variations in a PAG signal, such as the systole and diastole phases. Segmentation of these phases, therefore, uses the ASF waveform (described below).

Fig. 6.12
figure 12

Spectrogram of artificially generated test waveform with 6 single-tone frequencies from 100-1500 Hz. The ASF curve (lower) shows a spike at every change of frequency, approximating the spectral first derivative

ASC describes the spectral “center of mass” at each n sample in time. For Gaussian-distributed white noise, ASC will be constant at pseudofrequency F[K/2]. ASC is commonly used to estimate the average pitch of audio recordings, where a higher value corresponds to “brighter” acoustics with more high frequency content (Tzanetakis and Cook 2002). ASC is calculated as:

$$ ASC\left[n\right]=\frac{\sum_{k=1}^K\left(\left|\boldsymbol{W}\left[k,n\right]\right|. fc\left[k\right]\right)}{\sum_{k=1}^K\left|\boldsymbol{W}\left[k,n\right]\right|\ } $$

where W[k, n] is the continuous wavelet transform obtained over K total scales of the PAG and f C[k] is the center frequency.

ASC for the same test waveform is plotted to intuitively describe how this waveform describes the time domain spectral energy of a signal (Fig. 6.13). Because only a single tone is used at each time point, ASC consistently describes the frequency of the sine wave until it changes. Because F[k] represents pseudofrequencies, there is not a perfect mapping between ASC pseudofrequency and real auditory frequency. The use of the Morlet waveform in the CWT improves the pseudofrequency accuracy, but for PAG classification, absolute frequency accuracy is not needed (discussed below).

Fig. 6.13
figure 13

Spectrogram of artificially generated test waveform with 6 single-tone frequencies from 100-1500 Hz. The ASC curve describes the frequency of the sine wave at each time point

Example computations of ASC and ASF waveforms, compared to the time domain and spectral domain PAG recording, demonstrate feature calculation (Fig. 6.14). After the three-dimensional W[k, n] is computed, time domain ASC and ASF waveforms are calculated. From these waveforms simple, time-invariant scalar values such as RMS or peak amplitude are calculated and used for stenosis classification.

Fig. 6.14
figure 14

Time-domain bruit (a) and continuous wavelet transform spectral domain (b). The descriptive signals auditory spectral centroid and flux were extracted from CWT coefficients (c,d). The RMS value of the descriptive signals is one example of a scalar feature derived from the time-domain waveform

6.5.1.2 Temporospectral Domain Feature Extraction

For PAG analysis we are primarily interested in identifying the time onset of systolic and diastolic phases. This allows separate spectral feature extraction in each phase, ratioed features by comparing spectral changes between phases, and time domain comparisons such as lengths of cardiac phases, or time shifts between recording sites. This analysis is useful because blood flow acceleration occurs in the high-pressure systolic pulse, which gives rise to turbulence producing high spectral power. As a spectral derivative, the ASF waveform is well suited to describe the onset of systolic turbulence and is used for temporospectral segmentation.

Segmentation simply used a thresholding procedure; systolic ASF onset is defined as the time when the ASF waveform exceeds a threshold in each pulse cycle (Fig. 6.15). A suitable threshold of 25% of the ASFRMS value was determined empirically using data recorded from human patients and the vascular phantom (Panda et al. 2019b). Pulse width is also used to reduce false threshold crossings. The times between threshold crossings are calculated, and any crossings which produce pulse widths less than 40% of the mean are discarded (Lázaro et al. 2013).

Fig. 6.15
figure 15

Auditory spectral centroid (ASC) varies with degree of stenosis but also between systolic and diastolic phases (a). The auditory spectral flux (ASF) waveform enables segmentation between pulsatile phases so that the RMS value of ASC (ASCRMS) can be separately calculated (b)

Temporospectral segmentation produces a set of i indices (n ASF,i) describing systolic and diastolic pulse widths, which themselves can be used as features. However, the indices can also be used to segment spectral waveforms such as ASF and ASC to split them into systolic ASFS and ASCS, and diastolic ASF D and ASC D. Features for each phase can be calculated by combining all segments or by averaging the feature for each segment. As an example, consider an ASC waveform segmented into P systolic segments each with length n. The RMS value of ASC in the systolic phase only is then:

$$ {ASC}_{S, RMS}=\frac{1}{P}\sum_{i=1}^P\sqrt{\frac{1}{n}\sum {ASC}_{S,P,n}^2} $$

In practice, because systolic segments do not all have the same length n, any derived features are calculated for each segment independently and averaged over P segments.

Ratiometric features can also be calculated as ratios or differences between successive systolic/diastolic pairs. This reduces the effect of interference caused by recording which is correlated between adjacent segments or can be a less individual-specific feature because the diameter of the blood vessel and absolute flow rate contribute to ASC and differ between people. For example, ASC and ASF waveforms show significant differences in systolic and diastolic phases (Fig. 6.15), especially as DOS increases.

6.5.1.3 Spatial Domain Feature Extraction

The final domain analyzed in this model of PAG signal processing is the spatial domain. Features are not extracted directly from the spatial domain; rather, new features are derived as the difference in features between sites (Fig. 6.16). This is a powerful technique because not only does it accentuate regions of turbulent flow, but also the proportional feature changes between recording locations are themselves related to degree of stenosis. Therefore, spatial domain features are useful for both physical localization of stenosis and classification of degree of stenosis. Furthermore, ratiometric site-to-site feature comparisons remove some of the individual variation in features attributed to differences in anatomy. For example, the dimensionless change in systolic ASC (ASC S) between sites 1 and 2 can be calculated as \( \raisebox{1ex}{${ASC}_{2,S}$}\!\left/ \!\raisebox{-1ex}{${ASC}_{1,S}$}\right. \). To obtain a similar comparison in approximate units of Hertz, a difference is used, i.e., ASC 2, S − ASC 1, S.

Fig. 6.16
figure 16

Features can be derived for each recording site, or based on differences between sites. Since all features are scalar, they can be combined into the same featureset and used for classification

This spatial domain technique can be generalized to produce composite features for any multi-site measurement with little complications as long as the compared features are independent scalars. However, any site-to-site calculations relying on time require synchronization in sample rates between sites, or alignment of waveforms based on a reference symbol so that relative time differences can be calculated. For example, composite temporospectral features require time invariance in the calculation. Once this condition is met, composite spatial domain features based on time shifts are simple to calculate. For example, the time delay in ASF systolic onset (n ASF) between sites 1 and 2 can be calculated as:

$$ {t}_{d,1-2}=\raisebox{1ex}{$1$}\!\left/ \!\raisebox{-1ex}{${F}_s$}\right.\left({\boldsymbol{n}}_{ASF,1}-{\boldsymbol{n}}_{ASF,2}\right). $$

This calculation is easily performed from feature calculations for each recording site (Fig. 6.17) and is transformed to a continuous time difference in units of seconds by dividing by the sample rate F S. Scalar features from multiple domains can be combined to form a single feature set (Fig. 6.16), especially if a machine-learning classifier will be used because the scalar features can be analyzed as if they are unitless.

Fig. 6.17
figure 17

ASF calculated at proximal and distal locations showed an inversion in T d. At moderate and severe DOS, T d became negative, suggesting flow velocity increase

6.6 Classification of Vascular Access Stenosis Location and Severity In Vitro

The clinical goal for multi-site recordings of PAGs is to both locate and describe the severity of stenosis. In our previous work, we showed that binary or ternary classification using single features was sufficient to classify DOS as mild, moderate, or severe. Analysis of this method using receiver operating characteristic (ROC) revealed detection sensitivities as high as 88–92% and specificities as high as 96–100% (Panda et al. 2020), but classification was only accurate at certain recording locations. Therefore, feature selection for an array of recording sites is important to detect differences between recording sites. This section demonstrates comparing features between sites using hyperdimensional classifiers to greatly improve the stenosis classification accuracy from PAG recordings.

6.6.1 Multi-domain Feature Selection

The previous sections described how phonoangiograms are transduced and processed as analog signals, prior to being digitized for digital signal processing. Features are then extracted from multiple dimensions to yield a final set of M features F[S.M], which are site-specific to each of S recording sites (Fig. 6.18). In previous work we and others have described more than 15 features that are correlated with degree of stenosis in humans and in bench phantoms of vascular stenosis (Sung et al. 2015; Du et al. 2015; Du et al. 2014; Wu et al. 2015; Mansy et al. 2005; Shinzato et al. 1993; Hsien-Yi Wang et al. 2014; Chen et al. 2013; Akay et al. 1993; Obando and Mandersson 2012; Wang et al. 2011; Clausen et al. 2011; Sato et al. 2006; Gram et al. 2011; Milsom et al. 2014; Rousselot 2014; Gaupp et al. 1999; Gårdhagen n.d.; Chin et al. 2019; Panda et al. 2020; Panda et al. 2019a).

Fig. 6.18
figure 18

As features are extracted, the dimensionality of the dataset is reduced to yield a final set of features. Since each site has features extracted from site-specific features and intra-site feature differences, a total featureset of F[S,M] is produced with S features over M sites

Machine-learning classifiers require optimized feature selection through numerous methods. Feature selection improves the performance of classifier algorithms and reduces the likelihood of over-fitting to a data set of limited size. Numerical methods such as principal component analysis are powerful tools, as is supervised feature selection which relies on trained experts to select the features describing most of the variance in the observed effect. In this work we used both automated and supervised feature selection to select the most appropriate features. In the following classification examples, we explain the rationale behind feature selection for the given classification task.

6.6.2 Stenosis Spatial Localization Using Acoustic Features

Because the presence of stenosis produces turbulent flow in blood, a characteristic high-frequency sound is produced locally within 1–2 cm of the lesion (Gaupp et al. 1999; Gårdhagen n.d.). Spatial domain feature analysis is ideal to detect differences between recording sites caused by dramatic changes in blood flow patterns. To demonstrate the feasibility of detecting the location of stenosis using acoustic features alone, we tested eight stenosis phantoms on the vascular phantom previously described over variable blood flow rates of 700–1200 mL/min. This range of flows was tested at each degree of stenosis to simulate the nominal levels of human blood flow rates in arteriovenous vascular accesses. DOS for the phantoms ranged from 10% to 85%.

A vascular access is typically a uniform segment of blood vessel with few collateral veins, so we simply tested a one-dimensional recording array with five locations along the path of blood flow (Fig. 6.4). Recording sites were spaced by 1 cm and used skin-coupled microphones as previously described. While we analyzed over 15 features for stenosis localization, we found many features were correlated (Chin et al. 2019) and therefore adopted the site-to-site change in mean systolic ASC (\( \overline{\Delta {ASC}_S} \)) as the sole feature for localization (Fig. 6.19). This feature was intuitively selected because it is well documented that the presence of stenosis causes high-pitched blood sounds. Therefore, we expect that an abrupt stenosis in an otherwise smooth vessel will produce higher pitch at sites within several centimeters. Five site-to-site features for each flow rate and DOS were calculated, and including replications this yielded 370 total samples for statistical analysis.

Fig. 6.19
figure 19

Stenosis localization uses spatial features derived from feature differences between adjacent sites. In this example, the shift in \( \overline{ASCS} \) between sites is used to detect the presence of stenosis beneath a specific recording site

In this experiment, the actual stenosis was located directly under location 2; location 1 was recorded 1 cm proximal, and locations 3, 4, and 5 were 1, 2, and 3 cm distal to stenosis. The interval plot (Fig. 6.20) indicated a positive shift between \( \overline{\Delta {ASC}_{S,}} \)differences from proximal to distal locations (p < 0.001 for 30% < DOS < 90%) (Panda et al. 2019a). Confidence intervals and differences in group means were calculated using ANOVA followed by Tukey’s test with 95% confidence intervals (α = 0.05). Because sample data followed a normal distribution, Tukey’s test was used to adjust confidence intervals based on the number of comparisons tested. Statistical analysis was performed in Minitab software (Minitab, LLC, State College, PA, USA). In general, differences between locations 3 and 4 and 4 and 5 were positive by 50–70 Hz, while the other site differences were negative. This suggested that a simple threshold difference of 70 Hz in \( \overline{\Delta {ASC}_{S,}} \)between adjacent array recording locations could identify stenosis proximally to the recording sites within 1–2 cm.

Fig. 6.20
figure 20

Difference in ASC S between adjacent locations showed no significant variation for 0% DOS (p>0.05) (a). A large spectral shift at locations distal to stenosis (stenosis center at location 2) (b). Data plotted for phantoms with 30%<DOS<90%, p<0.001 for all locations. Analysis of variance and Tukey’s test were identified statistically significant differences in ASC means at significance level α=0.05

6.6.3 Stenosis Severity Classification from Acoustic Features

While the location of stenosis can be estimated by comparing feature shifts between sites to a threshold, classification of the degree of stenosis is more challenging from a single feature. This is in part because the degree of stenosis and the nonlinear properties of blood interact such that DOS nonlinearly impacts overall flow rate and turbulence pattern (Gaupp et al. 1999; Gårdhagen n.d.), introducing time-dependent changes to both acoustic spectra and intensity. Many classification strategies have been proposed and studied for a single recording site (Sung et al. 2015; Du et al. 2015; Du et al. 2014; Wu et al. 2015; Mansy et al. 2005; Shinzato et al. 1993; Hsien-Yi Wang et al. 2014; Chen et al. 2013; Akay et al. 1993; Obando and Mandersson 2012; Wang et al. 2011; Clausen et al. 2011; Sato et al. 2006; Gram et al. 2011; Milsom et al. 2014; Rousselot 2014; Gaupp et al. 1999; Gårdhagen n.d.), e.g., showing classification accuracy of about 84% using binomial Gaussian modeling (Sung et al. 2015). Here we extend classification to leverage temporospatial domain features drawn from multiple recording sites.

We chose to classify PAG data using a quadratic support vector machine (SVM) (Joachims 1998). The quadratic SVM is widely used in natural language processing tasks and is suitable for PAGs which have similar autoregressive properties as speech (Majerus et al. 2018). As a machine-learning algorithm, the SVM defines a hyperplane which is used to separate clusters of data points in a high-dimensional space. The hyperplane is used as a decision surface and is optimized to maximize the separation distance between the classes of data.

Because the data are not linearly separable, the SVM transforms the input data points into a higher dimension using a kernel function. For the quadratic SVM, the kernel K is a polynomial of order 2, i.e.,

$$ K\left({x}_1,{x}_2\right)={\left({x_1}^{\mathrm{T}}{x}_2+1\right)}^2. $$

Expanding this kernel reveals how data are expanded into higher dimension through interaction terms:

$$ \begin{aligned} K\left({x}_1,{x}_2\right)&={\left(\sum_{i=1}^n{x_1}^{\mathrm{T}}{x}_2+1\right)}^2\\ &=\sum_{i=1}^n{x_{1,i}}^2{x_{2,i}}^2+\sum_{i=2}^n\sum_{j=1}^{i-1}\left(\sqrt{2}{x}_{1,i}{x}_{1,j}\right)\left(\sqrt{2}{x}_{2,i}{x}_{2,j}\right)\\ &\quad +\sum_{i=1}^n\left(\sqrt{2}{x}_{1,i}\right)\left(\sqrt{2}{x}_{2,i}\right)+1. \end{aligned} $$

This dimensional expansion changes the distances between data points in the higher-dimensional space and allows a decision surface to be constructed. The decision surface is a hyperplane optimized to the distance between the hyperplane and the nearest data points in each class. Because this quadratic optimization problem involves significant computation, SVMs are developed using machine-learning strategies and generally tuned iteratively.

For the case of DOS classification, we trained the SVM in MATLAB using the same dataset of 370 recordings described above. For each of S recording sites, a set of M features was calculated giving a total feature array F[S,M]. However, after detecting the location of stenosis, only recordings from the nearest site need to be classified, i.e., the SVM was only trained on a single feature vector F[M]. In our example with 5 recording sites, this reduced the total number of observations (recordings) to 50.

Training of the SVM was performed in MATLAB in three phases. First, PAG features were transformed to a high-dimensional space using the polynomial kernel. Then feature selection was performed to reduce the total number of features (and hence the dimensionality) of the SVM. This reduced the overall model complexity, reduced the numerical instability risk inherent to SVMs, and reduced the risk of over-fitting. Principal component analysis was used to define the three features which described variance between the data classes: \( \overline{ASC\cdot ASF} \) (mean ASC multiplied by mean ASF), \( \overline{ASC_S} \) (mean value of ASC in systole), and t d (time shift in ASF onset compared to first recording site). The computation of these features is illustrated in Fig. 6.21. Then, quadratic optimization was performed to fit an optimal hyperplane between the classes of data. Model validation was performed using fivefold cross-validation such that the model was trained on ten observations and tested by classifying the remaining 40.

Fig. 6.21
figure 21

Scalar features are derived from the time-domain ASC and ASF descriptive waveforms, including interaction features such as \( \overline{ASC\cdot ASF} \). Temporo-spectral features such as systolic width can be derived, or compared to adjacent sites to compute spatial features such as t d which describes the time shift at ASF onset in systole between time-synchronous recordings

The quadratic SVM was designed to classify PAGs into three output classes for DOS: mild, moderate, and severe. Because these classes were ordinal (monotonic) and known a priori, quadratic SVM was selected (versus, e.g., clustering methods). Further, while DOS is a continuous variable, we chose to bin it into classification ranges because clinical monitoring does not require precise quantification of DOS; imaging is then used after a lesion is identified to more precisely determine treatment options (Sequeira et al. 2017). However, acoustic features can also be used to continuously estimate the DOS using regression, as described in the following section. Thresholding after regression can be used to similarly classify estimated DOS into ranges for clinical action.

Class definitions were chosen to be consistent with our prior work (Panda et al. 2020; Panda et al. 2019a): DOS < 30% (mild), 30% ≤ DOS ≤ 70% (moderate), and DOS > 70% (severe). Validation accuracy of the quadrative SVM on this data was 100% even though the features were not linearly separable (Fig. 6.22). Importantly, most of the classification accuracy came from the ASC and ASF features; however, adding the temporospatial measure t d helped prevent misclassifications at high DOS which occur when the stenosis greatly reduces vascular flow rate (Table 6.3).

Fig. 6.22
figure 22

The quadratic SVM classified DOS as mild (< 30%), moderate (30%<DOS<60%), and severe (DOS>60%) with 100% accuracy. This demonstrates the advantage of SVM as the included features are not fully separable linearly in the feature space (a, b)

Table 6.3 Performance of Quadratic SVM versus included features

However, while t d boosts classification accuracy only slightly, multiple recording locations for stenosis localization are still essential to accurate classification. For example, Table 6.4 indicates how classification accuracy drops significantly when applied to PAGs recorded more than 2 cm from the actual site of stenosis and dropping the spatial feature t d. This suggests that accurate PAG classification requires either a priori knowledge of stenosis location or multi-site recordings to detect locations for analysis.

Table 6.4 Classifier accuracy of quadratic SVM versus single recording sites

While this analysis suggested that machine-learning can be used for accurate classification of PAGs, it must be noted that cross-validation alone is only sufficient to optimize the hyperplane on the training data. The model was trained using data from a set of vascular phantoms with variable rates of blood flow, but this does not account for the wide anatomical variance seen in humans. Therefore, it is still unclear how accurately this model will function on unseen data. This remains an opportunity for future work.

6.6.4 Degree of Stenosis Estimation from Acoustic Features

The previous section discussed using acoustic features from PAGs to classify stenosis into clinically actionable ranges, but features can also be used to predict the actual degree of stenosis. Previous work in this area demonstrated that DOS could be estimated within 6% given a priori knowledge of the stenosis location (Du et al. 2015). Here, we demonstrate how features from multiple domains can be used to further improve DOS estimation using Gaussian process regression (GPR).

GPR is a regression modeling method, but unlike linear or nonlinear regression—which seeks to fit a least-squares model to minimize prediction error to a dataset f(x)—GPR is a Bayesian process which models f(x)as a Gaussian process (Rasmussen and Williams 2006). Thus, the value f(x) at each point x is represented as a random variable with a Gaussian distribution (Applebaum et al. 2002). The actual values used to train the model are therefore considered simply as independent observations drawn from the underlying normal probability distribution at each point. For example, observation-response pairs (x 1, y 1) and (x 2, y 2) are represented by normal distributions P(y 1| x 1) and P(y 2| x 2). Regression of a new response y 3 based on a new observation x 3 is then calculated as the conditional probability P(y 3| (y 1, y 2), (x 1, x 2, x 3))

Assuming the mean of the joint distribution of all input features F[M] is zero (accomplished through normalization without losing information between each recording), training the GPR involves solving for the unknown covariance matrix using a radial basis function kernel K(x m, x n), i.e.,

$$ Cov\left(f\left({x}_m\right),f\left({x}_n\right)\right)=K\left({x}_m,{x}_n\right)={\alpha}^2{e}^{-\frac{1}{2{l}^2}{\left({x}_m-{x}_n\right)}^2}. $$

In this example the parameter α2 is the output variance of the data, while l 2 represents the length scale of the data variance. Generally, α2 indicates the average distance of the function from its mean, while l determines the memory length of the modeled GPR. For a GPR trained on time-invariant features, e.g., PAG features, l = 1. Similarly to the quadratic SVM, training data are transformed by the basis function to a higher-dimensional space. Optimization of the basis function is then performed iteratively to minimize the RMS predicted error to the input data. Model training was performed in MATLAB on the same 50 recordings used to train the quadratic SVM classifier. The RMS error of the optimized GPR was calculated using fivefold cross-validation.

While the SVM classifier was demonstrated in the previous section, SVM regression was not used for stenosis estimation. GPR was selected after feature distribution analysis, which indicated that due to the chaotic nature of turbulent fluid flow, and the dependency on variable blood flow rate, features measured at each degree of stenosis spanned a range of observations around a defined central value. Generally, for DOS > 50% extracted features followed a normal distribution when pooled across all recording sites and all flow rates. Although GPR would suffer from finite bounds on confidence intervals because DOS is bounded on the range of 0–100%, because the model was only validated on the range of DOS from 10% to 90%, GPR out-performed other regressions, perhaps due to estimation of the underlying variance for each feature. For example, using the same features as in Table 6.5, quadratic SVM regression only achieved a best-case 8.3% RMS error.

Table 6.5 RMS error from exponential GPR model versus features and recording sites included

As in the quadratic SVM classifier, the addition of more features reduced the RMS error of the regression. However, unlike the SVM, the regression required data from sites around the stenosis to improve accuracy. In this example, the actual stenosis lesion was located under Site 2 with turbulent flow occurring beneath Site 3 and Site 4 based on established models (Gaupp et al. 1999; Gårdhagen n.d.). Including features from recordings proximal and distal to the lesion greatly improved the estimation accuracy. For all tested DOS, error was in the range [−11% 14%], and for DOS>50% error was [−11% 3%] (Fig. 6.23). From this outcome we conclude two things. First, this in vitro experiment clearly demonstrates the need for multiple recording sites for accurate phonoangiographic estimation of degree of stenosis. In humans with more variable vascular anatomy, the need for the multiple recording sites is likely greater because the location or presence of stenosis is not known a priori. Second, the achieved accuracy is sufficient for clinical monitoring, which generally only needs to detect when stenosis exceeds 50% or is rapidly progressing (Sequeira et al. 2017; Valliant and McComb 2015; Tessitore et al. 2014b). Clinical imaging would still be used, so the objective for phonoangiographic monitoring is simply to identify which patients to select for imaging.

Fig. 6.23
figure 23

Exponential Gaussian process regression estimated degree of stenosis for each in vitro vascular stenosis phantom (a). The trained model estimated degree of stenosis with RMS error of 4.3% (b) and error range of [−11% 14%] and [−11% 3%] for all tested stenoses and for stenoses > 50%, respectively (c)

6.7 Conclusion

This chapter discussed a new technique for point-of-care clinical monitoring of a vascular access using an array of microphones. Turbulent blood flow produces bruits that are recorded by each microphone and analyzed as phonoangiograms to detect the location and severity of stenosis. Signal processing spans several domains, beginning with the analog signal processing needed to amplify and filter the PVDF microphone signals before digital conversion. In the digital domain, continuous wavelet transform was used to produce acoustic spectral centroid and acoustic spectral flux analytic signals, from which acoustic features were derived. Systolic-diastolic segmentation provided additional features or the calculation of ratiometric features. Techniques to calculate features from multiple domains—spectral, temporospectral, and spatial—were feasible because of time-synchronous recordings from the microphone array.

A 1×5 microphone array was used to record bruits from a vascular phantom using stenosis models of 10–90% and blood-mimicking fluid at physiologic flow rates and pressures. This produced a dataset of recordings from which features were calculated. Stenosis localization was demonstrated using a simple binary classifier against a pitch-shift threshold to detect which recording site was nearest the stenotic lesion. A quadratic support vector machine classifier was trained using multi-domain features from a single recording site and achieved 100% accuracy when classifying the degree of stenosis as mild, moderate, or severe. Finally, estimation of the actual degree of stenosis was demonstrated using an exponential Gaussian process regression. The regression model combined features recorded from four sites to estimate degree of stenosis with 4.3% RMS error. Because the clinical threshold for elective surgery for vascular stenosis is 50% (Sequeira et al. 2017; Valliant and McComb 2015; Tessitore et al. 2014b) (and clinical monitoring for stenosis does not need to be as accurate as angiographic imaging), this suggests that phonoangiographic analysis is feasible for point-of-care monitoring.