1 Introduction

Optical coherence tomography (OCT) is a non-invasive and high-resolution optical imaging method which can form two- or three-dimensional images from optically scattering media such as biological tissues. OCT works based on interferometric measurement of the pathlength differences and the intensities of back-scattered light from tissue microstructures [1]. The fine imaging resolution (1–10 µm) and the high speed of OCT allow for rapid and three-dimensional (3D) visualization of tissue internal structures. In medicine, OCT systems are widely used for research and clinical purposes, specifically in the fields of ophthalmology, cardiology, dermatology, oncology, urology, dentistry, and gastroenterology [1]. In the field of ophthalmology, OCT has achieved greatest success, making OCT a gold standard method for diagnosis of retinal diseases [2, 3]. In cardiology, intravascular OCT (IV-OCT) has become one of the three most important intravascular imaging methods, alongside intravascular ultrasound (IVUS) and coronary angiography methods [4]. IV-OCT is a non-contact catheter-based imaging method that has 10 times greater resolution in comparison with IVUS [5]. In IV-OCT, a small OCT probe located inside a catheter is administered through the peripheral artery and navigated to reach desired regions of interest (e.g., coronary arteries, or aorta) to scan the artery walls and acquire the in vivo morphology at an unprecedented level of detail.

While OCT offers key advantages over competing medical imaging technologies, it forms images based on the scattering of light from structural in homogeneities rather than tissue composition. As a result, OCT is quite sensitive to structural alterations caused by early stages of disease such as atherosclerosis and early dental caries, but it frequently lacks the desired levels of diagnostic specificity (i.e., yields too many false positives). This shortcoming results in poor diagnostic performance at early stages of diseases and restricts the applications of OCT. For example, Shokouhi et al. in a study compared the diagnostic performance of OCT with that of a molecular-contrast imaging method based on absorption of light (named thermo-photonic lock-in imaging or TPLI) in detection of early dental caries [6]. The results showed better detection specificity and significantly less false positives in detection of early dental caries with TPLI than with OCT due to the more specific nature of light absorption sensing mechanism (i.e., TPLI mechanism) over the light scattering (i.e., OCT mechanism). As another example, several works have demonstrated that relatively benign and vulnerable stages of atherosclerosis plaques (i.e., intimal thickening and fibro-atheroma) yield similar OCT structural images, while the chemical compositions of these two stages are different [7, 8].

Photo-thermal optical coherence tomography (PT-OCT) is a functional extension of OCT with the promise to overcome the nonspecific nature of conventional OCT by forming 3D images based on both scattering and absorption of light [9]. Compared to other molecular-specific extensions of OCT, PT-OCT is intrinsically able to visualize depth-resolved maps of targeted molecules of interest (MOI). In PT-OCT, an intensity-modulated photo-thermal (PT) laser with a wavelength at the absorption band of a molecule of interest is added to the conventional OCT system. In such a configuration, absorption of the PT laser by the molecule of interest induces a localized modulated temperature field (aka thermal wave field). This thermal wave field, in return, yields a modulated variation in the local refractive index and thermo-elastic expansion that causes a physical sample displacement. Combined, these phenomena result in the modulation of the optical path length (OPL) at the modulation frequency of the PT laser with a modulation amplitude on the order of tens of nanometers [9]. This modulation in OPL is then sensed and quantified by the OCT phase signal.

To detect a molecule of interest and form depth-resolved maps with molecular specificity, PT-OCT signals can be acquired from either a sample labeled with exogenous PT agents, or label-free, using the intrinsic light absorption bands of the tissue. To date, nanoparticles, as exogenous agents, have been employed in PT-OCT to boost the photo-thermal signal from human [10] and rabbit [11] tissues ex vivo, to visualize blood capillaries of mouse ear in vivo [12], and to detect cancer cells in vitro [13]. Detection and depth-resolved visualization of both endogenous (melanin) and exogenous (gold nanorods) absorbers in mouse retina have also been reported in vivo [14]. More importantly, the feasibility of semi-quantitative label-free PT-OCT imaging of blood oxygen saturation in vessel phantoms [15] and melanin concentration in zebrafish eye have been reported [16]. While these studies signal the ability of PT-OCT in depth-resolved functional imaging of MOI in samples, the state-of-the-art in PT-OCT still suffers from several important limitations. First, as the physics underlying PT-OCT is multifactorial, the behavior of PT-OCT signals in a complex medium such as biological tissues is not well understood. Without a deep knowledge about this multifactorial physics, extracting accurate information from the received PT-OCT signals is not feasible. Although some theoretical models have been suggested for enhancing our understanding of the functionality of PT-OCT signals with MOI concentration, the key effects of sample and system influence parameters other than MOI concentration cannot be explained by the previous models. The other challenge of PT-OCT is in its true specificity to a targeted MOI. That is, conventional PT-OCT is successful in depth-resolved sensing of absorption of PT light. Such absorptions, however, are not unique to a single MOI as other not-targeted tissue constituent (aka. pseudo-MOI) may also absorb the PT light. As there is no metric in conventional PT-OCT to distinguish between signals originating from MOIs and pseudo-MOIs, conventional PT-OCT images are not entirely specific to certain tissue constituent. Obtaining quantitative insight into chemical composition of tissue is another key challenge of PT-OCT. Given the multi-factorial nature of PT-OCT signals, aside from tissue absorption spectrum, several system parameters and sample properties also influence the acquired PT-OCT signals. Quantitative interpretation of tissue composition from PT-OCT signals, therefore, requires decoupling of influences not related to chemistry and absorption spectrum from received signals. For example, for quantitative imaging of atherosclerotic plaques with PT-OCT, both system parameters (e.g., lipid pool distance to system focal plane) and sample parameters (e.g., the thickness of the fibrous cap) significantly affect the strength of the acquired PT-OCT signal, in addition to lipid concentration. Developing strategies for signal processing in light of a comprehensive opto-thermo-mechanical theory is expected to enable more accurate, and perhaps quantitative, PT-OCT imaging of biological tissues. Lastly, due to the necessity of extended temporal sampling of responses, PT-OCT inherently suffers from low-imaging speed. This limitation hinders translation of PT-OCT into the clinics. Attempts made to date for enhancing the imaging speed of PT-OCT suffer from key limitations such as increased complexity and cost of the system, or compromising signal to noise ratio (SNR) and imaging depth for enhanced imaging speed. A clinically viable PT-OCT technology must be fast enough to be able to capture images from moving samples (for example motion of cardiac tissues due to the heartbeat).

The abovementioned major challenges of PT-OCT have long held back the translation of the technique to the clinics. In this paper, we report and review the results of our recent studies aimed at addressing PT-OCT key limitations. First, we will review our theoretical models that relate system parameters, sample geometry, and opto-thermo-mechanical sample properties to PT-OCT signals. Such comprehensive models enable gaining deep understanding of the variables affecting PT-OCT signals and their relation to chemical composition and structure of biological tissues. Next, we discuss spectroscopic approaches to PT-OCT imaging as a mean for overcoming the detection specificity limitation. The idea of spectroscopic PT-OCT was first introduced by the group of Prof. Milner for the purpose of measuring blood oxygen saturation both in phantoms and in vivo [15, 17, 18]. The spectroscopic results offered and discussed in this manuscript, however, focus on specific detection of tissue constituents relevant to cardiology applications. The manuscript, subsequently, discusses how the findings of the discussed studies can be utilized in machine learning (ML)-based strategies for decoupling the effect of influence parameters form PT-OCT images, and thus, gaining refined insight into tissue chemical composition. In this section, we demonstrate ability of ML platforms to learn characteristic signal trends and features that are most correlated with tissue composition. To date, only few efforts have been made to quantify MOI concentrations from PT-OCT signals, notably determining the concentration of dissolved oxygen in blood [18] or of ICG in water [19]. The design of experiments in these works, however, were disconnected from clinical scenarios as all the sample and system influence parameters (other than MOI concentration) were kept constant. This type of single-parameter quantification is not normally valid in the clinic because two or more influence parameters (e.g., surrounding medium, distance to the focal plane, depth of MOI, etc.) are expected to vary from one sample to another. To the best of our knowledge, feasibility of classifying MOI concentrations with machine learning models have not yet been examined in the field of PT-OCT. Lastly, we will review a strategy that we recently proposed for enhancing the imaging speed in PT-OCT by orders of magnitude which enables PT-OCT imaging of moving samples. We hope that the recent knowledge and technologies reviewed in this manuscript open the door for more accurate assessment of atherosclerotic plaques based on co-registered tomography of tissue structure and chemical composition.

2 Methodology

2.1 Principles of OCT

From the mathematical point of view, OCT relies on calculation of cross-correlation of back-reflected light with respect to a reference light. Such mathematical operation is normally carried out in OCT systems in Michelson interferometer configuration. In a spectral domain (SD)-OCT systems (Fig. 1), the light source is normally a broadband coherent light source. The output light is divided into two arms by a beam splitter and are called reference and sample arms, respectively. In the sample arm, light is focused on the sample surface by an objective lens. In the reference arm, the light illuminates a fixed mirror as the reference. After back-reflection of light from sample/tissue, and the reference mirror, these two beams are combined again in the beam splitter and redirected to a spectrometer for recording of reflectivity interference pattern of the two beams as a function of wavelength. The intensity of the reflectivity pattern in the spectrometer, I(k), can theoretically be explained as[20]:

Fig. 1
figure 1

A schematic of SD-OCT system. The output light from the laser source illuminates the sample and the reference mirror after passing through the beam splitter. The back-reflected light from the sample and the mirror is merged and delivered to the spectrometer. In the spectrometer, the frequency of modulated wave correlates with the depth of layers in the sample. After applying Fourier transformation on received signals, an A-line is obtained. The location of each pixel is formed within the coherence gate length (lc). For details, see the mathematical expressions in the main text

$$I\left( k \right)=S\left( \kappa \right)\left( {{I}_{R}}+\underset{i=0}{\overset{m}{\mathop \sum }}\,\left[ {{I}_{Si}}~+~2\sqrt{{{I}_{Si}}{{I}_{R}}}\cos \left( 2k\Delta {{L}_{i}} \right) \right] \right)$$
(1)

where, S(κ) is the spectral power of the light source, IR is intensity of reference arm, ISi is intensity of ith surface of a sample with m reflective surfaces in depth, k is wave number, and ΔLi is the optical path length difference (OPL) between the reflected beams from the ith surface and the reference (\(\Delta {L}_{i}={n}_{2}{L}_{2}-{n}_{1}{L}_{1}\), where n is the refractive index and L is the physical length). As such, the term 2kΔLi in Eq. 1 represents the relative phase of the light between reference and sample beams. To form the OCT image based on the intensity of reflections from individual reflectors, the signals in K-space (wave number/Fourier space) need to be converted to Z-space (optical path length space). Therefore, to obtain the depth profile of reflections in sample (aka A-line), inverse Fourier transformation is applied to the K-space data, which gives:

$$\left|i(z)\right|=\left|\gamma \left(z\right){I}_{R}+\sum_{i=0}^{m}\gamma (z)\otimes[{\text{I}}_{\text{Si}} + 2\sqrt{{\text{I}}_{Si}{\text{I}}_{\text{R}}}\updelta (\text{z}\pm 2({\text{z}}_{R}-{\text{z}}_{Sn}))]\right|$$
(2)

Here δ is delta Dirac function and γ is the inverse Fourier transform of the spectral power S(κ). Equation 2 shows that an OCT image A-line, i(z), is indeed the cross correlation of light source point spread function with the spatial distribution of reflectors/structures along sample depth. Figure 1, schematically, demonstrates the OCT signal processing procedure for a sample with three subsurface reflectors. As required by Eq. 1, as the depth of reflective surface increases, the modulation frequency of the acquired interference spectrum, 2kΔLi, increases, resulting in spatial separation of signals along depth after inverse Fourier transformation. Eventually, by raster scanning the beam in one direction, a cross sectional image (aka B-Scan or tomogram) is formed. A volumetric 3-D image (C-scan) of the sample can be made by attaching several parallel 2-D tomograms/B-Scans.

In OCT, phase images can also be formed from phase of OCT signals; note the complex-valued nature of \(i(z)\) in Eq. 2. Phase images are intrinsically more sensitive to changes in OPL than the amplitude images. Phase of signals (2kΔLi in Eq. 1) can detect relative displacements on the order of tens of picometers to few nanometers, while the detection limit of changes in OPL via the amplitude of signals is directly linked to the system axial resolution (i.e., on the order of several micrometers). As such, in PT-OCT the structural information of the sample is captured via the signals’ amplitude, while insight into light absorption behaviors of the sample are deciphered from the phase signals.

2.2 Principles of PT-OCT

Figure 2, schematically, illustrates the sequence of physical phenomena that take place leading to PT-OCT signals. In PT-OCT, the sample is simultaneously illuminated with OCT and intensity-modulated PT lasers. In this configuration, at positions where the modulated PT laser is absorbed, a modulated temperature field (aka thermal-wave field or TW field) is established. This TW field is modulated over time at the modulation frequency of PT laser source and is governed by the bio-conduction differential equation 12:

Fig. 2
figure 2

Schematic representation of the physical phenomena taking place in PT-OCT; in this system, the wavelength of PT laser is selected at the absorption band of molecule of interest in the sample. As a result of PT light absorption in the sample by the molecule of interest, a modulated thermal field is generated. The produced heat, then, causes a change in the OPL locally near the molecule of interest that can be tracked by OCT phase, for more details about principle of PT-OCT, see text

$$\frac{\partial T}{\partial t}=\frac{{I}_{PT}{\mu }_{a}}{\rho c}+\alpha {\nabla }^{2}T$$
(3)

Here T is the temperature, t is time, µa is the absorption coefficient at PT laser wavelength, IPT is the PT power fluence rate, ρ is the medium density, c is the specific heat of the medium, and α is the thermal diffusivity of the medium. The consequence of presence of a TW field at and around an absorber is a change of the local refractive index and elastic deformation due to thermal expansion, and consequently, a net change of OPL [21]:

$$\Delta OPL\left(z\right)={OPL}_{{T}_{0}+\Delta T}-{OPL}_{{T}_{0}}={\int }_{0}^{z}\left(\left[n\left({T}_{0}\right)+\frac{dn}{dT}\Delta T\right]\bullet \left[1+\beta \Delta T\right]-n({T}_{0})\right)dZ$$
(4)

Here T0 is the initial temperature, n represents the refractive index, dn/dT stands for thermo-optic coefficient, and β is the thermal expansion coefficient. The presence of the integral in Eq. 4 declares that the absolute phase at a specific depth is the sum of phase changes from sample surface to the interrogated depth. Numerically, the range of \(\Delta OPL\) is normally between few nanometers to few hundred nanometers. Therefore, such small \(\Delta OPL\) variations, compared to the ~ 10 µm axial resolution of OCT, cannot shift the location of peaks in the A-line amplitude channel. The A-line phase signal, however, offers sufficient sensitivity to reveal the small variation in \(\Delta OPL\). Thus, upon absorption of modulated PT excitation by the MOI, the phase of the OCT signal will change as a consequence of variation in the OPL according to the following Equation 22:

$$\Delta \phi =\frac{4{\pi \Delta OPL}}{{\uplambda }_{0}}$$
(5)

where λ0 is the center wavelength of the OCT laser. Equation 5 suggests that if the amplitude of the PT laser is modulated in sinusoidal form at a specific frequency, the ensuing temperature field due to absorption of PT light by MOIs leads to modulation of OPL and \(\Delta \phi\) at the same sinusoidal frequency (Fig. 2). Therefore, by monitoring the phase over time, PT-OCT can obtain insight into the extent of absorption of PT light along the depth of samples.

2.3 PT-OCT Instrumentation

Given the importance of phase stability in PT-OCT, systems are frequently based on SD-OCT configuration. The schematic of our PT-OCT setup is shown in Fig. 3. In this setup, the OCT light is a broadband superluminescent diode centered at 1310 nm (± 75 nm at 10 dB; Exalos, Switzerland). To capture the signals, a 2048-pixel line scan camera with a maximum acquisition rate of ~ 147 kHz (Wasatch Photonics; USA) was used. In the setup, 3 PT intensity-modulated single-mode photo-thermal lasers were installed illuminating at 806 (30 mW- Thorlabs; USA), 1040 (500 mW-Innolume; Germany), and 1210 nm (500 mW-Innolume; Germany). An optical circulator (OC) directs the OCT light to the fiber coupler. The PT light is also connected to the other input of the fiber coupler. By passing through the coupler, the OCT and PT light are merged and split into the reference and the sample beams. In the sample arm, the output light illuminates the sample surface after passing through a reflective beam collimator (Thorlabs; USA), a 2-DOF galvo mirror, and an objective lens (LSM02, Thorlabs; USA). To raster scan the sample surface, the 2-DOF Galvo mirror is used. In the reference arm, a polarization controller, a dispersion compensation (DC) block, and a gold-coated reference mirror were installed. The back-reflected light of these two arms is merged again after passing through the beam splitter and redirected to the spectrometer by the optical circulator. The formed interference pattern in the spectrometer is captured by a line scan camera (LSC). The captured signal is digitized and sent to the computer for processing. The axial and lateral resolutions of developed OCT system in air were measured as 10 µm and 11.5 µm, respectively.

Fig. 3
figure 3

Schematic of the designed and developed PT-OCT setup including: OCT laser, optical circulator (OC), spectrometer (spec) and 2048-pixel line scan camera (LSC), photo-thermal laser (PT), 50:50 fiber coupler, polarization controller, dispersion compensation (DC) block, reference mirror, reflective beam collimator (RBC), 2 degree of freedom galvo mirrors, objective lens (OL), and computer

2.4 Signal Processing and Imaging Protocol

The flowchart of conventional PT-OCT signal processing is depicted in Fig. 4. To form the tomogram of the sample, in light of Eqs. 1 and 2, spectrometer signals need to be mapped to k-space (wavenumber space, \(2\times \pi /\lambda\)) before transformation to the z-space (physical length). To do so, after loading the received signals on the PC, the captured background is subtracted from it. The background subtraction removes the intensity of the reference beam (term IR in Eq. 1). At the spectrometer, sampling along wavelength is uniform; however, when the information is mapped to wavenumber the sampling interval is no longer constant. Therefore, the k-space signal is resampled linearly before application of Fourier Transformation. By applying inverse fast Fourier transformation (iFFT) on the signal in k-space, the signal converts to z-space (Eq. 2). The amplitude of complex numbers produced by iFFT forms the OCT amplitude image. The phase of complex numbers for each pixel over time (M-scan) is the basis for forming PT-OCT images. Due to bulk heating, there is drift in the variation of phase over time. To remove the signal drift, the phase difference in the temporal direction is computed. Removing the signal drift allows more sensitive measurement of MOI-induced phase modulations, which are preserved by the difference operation. By applying FFT on the phase signal and evaluating the response at the modulation frequency of PT laser, and using Eq. 5 [23], the pixel amplitude value of PT-OCT image is calculated as:

$$Amp\left(Z\right)=\frac{\left|p(z,{f}_{0})\right|{\lambda }_{0}}{4{\pi }^{2}{f}_{0}\Delta t}$$
(6)
Fig. 4
figure 4

The flowchart of data processing to form OCT and PT-OCT images from the raw data, scare bar = 60 μm

Here \(\left|p\right|\) is the normalized FFT amplitude of the difference of the phase signal at the PT laser modulation frequency of \({f}_{0}\), \({\lambda }_{0}\) is the center wavelength of the OCT laser, and \(\Delta t\) is the acquisition time for one A-line. For the PT-OCT experiments discussed in this manuscript, experiments were carried at an A-line rate of 21.6 kHz. Each M-scan consisted of 5000 data points acquired in ~ 230 ms. The PT laser was modulated sinusoidally at a desired frequency (e.g., 500 Hz).

2.5 Samples

Phantom experiments presented and reviewed in this manuscript were acquired from 2 categories of controlled phantoms: PDMS-based phantoms, and biological phantoms. Additional results from biological tissues are also offered and discussed.

To evaluate the efficacy of developed theoretical models, PDMS-based phantoms were made using 806 nm dye absorber (Sigma Aldrich, USA) and titanium oxide scattering powder to achieve absorber concentration of 2.2 mg/ml. The details of preparation process and sample properties can be found in Ref [24].

Biological phantoms consisted of dominant constituents of cardiac tissues such as water, lipid, collagen, and elastin. These phantoms were used for evaluating the enhancement in detection specificity of PT-OCT with spectroscopic methods. Mayonnaise (Kraft) was chosen as the rich source for lipid (> 80%) because of the similarity of primary lipid compositions to that of the coronary plaques [25,26,27]. To prepare various dilutions of the lipid-water compound, droplets of mayo and ultrasound gel with appropriate weight ratios were dispensed in a petri dish and stirred well to make homogenous samples at various weight concentrations of mayo. To make water-based tissue like phantoms, agar powder (Sigma Aldrich, USA) was dissolved in water to make a solution with 98% of water weight ratio. The solution was then heated up and stirred well for 5 min until the water was boiling. Afterward, the solution was poured into a petri dish and was put in a refrigerator (4 °C) to solidify. To make two-layer samples, a wedge of the agar-water sample was placed on a mayonnaise substrate. The collagen sample was obtained from chicken cartilage as a rich source of collagen [28]. A fresh chicken leg was cut from the knee joint, and tendons above the cartilage were removed, then the sample was washed with Phosphate-buffered saline (PBS). The tissue was placed on a glass slide at room temperature (24 oC) to image with the system. The elastin foam sample was made with elastin from bovine neck ligament (Sigma Aldrich, USA). After dissolving elastin in 0.05% acetic acid solution, and freezing it at − 20 °C for 24 h, the sample was sublimated to make a foam scaffold of elastin.

A piece of fresh human aorta tissue was procured from the National Disease Research Interchange (NDRI; PA, USA), belonging to a woman (88 year) who died of cardiac complications. The fresh sample was shipped to our lab at the York University within 24 h after tissue collection. The acquisition of human sample and the preparation steps were carried out at Hybrid Biomedical Optics (HBO) lab under approved ethics protocols by York University (e2020-234 and e2020-250). After unpacking and washing the aorta sample with PBS serum, a longitudinal cut was made with dissection scissors to make a rectangular flat tissue from the pipe-shaped aorta. This sample was then placed on a glass slide and imaged by PT-OCT system at room temperature. During imaging, 10 cc of PBS was poured on the sample every 15 min to prevent tissue dehydration.

2.6 The Machine Learning Model

Given the significance of lipid concentration to the vulnerability of atherosclerotic plaques [29], a ML model was designed for labeling/classifying lipid composition of pixels in a PT-OCT dataset. To do so, the influence parameters having significant contributions to the PT-OCT signals were first identified (aka, feature selection step). Based on the findings of theoretical modeling and the associated experimental validation experiments, the following parameters of an acquired PT-OCT dataset were deemed to be of significant relevance to the ML model:

  • the distance from the selected pixel to the focal plane (Df in Fig. 5(a)).

  • the distance of the selected pixel to the top surface of the sample (Dtop in Fig. 5(a)).

  • the gray level of the selected pixel in the OCT image (Grpix in Fig. 5(a)).

  • the average of gray level of the pixels above the selected pixel (Grline in Fig. 5(a)).

  • the amplitude(s) of pixel’s PT-OCT signal (PTAmp).

Fig. 5
figure 5

(a) An illustration of spatial parameters of a single pixel in an OCT image that are deemed of significance to proper predictions by the ML model. These parameters provide the ML model with information about strength of pixel’s OCT signal, the depth of selected pixel in the sample, the distance of the selected pixel related to the focal plane, and the scattering of the medium above the selected pixels. (b) An illustration of the designed classifier model. The five significant inputs are extracted from the raw OCT and PT-OCT images and are sent to the SVM classifier model. The model is trained for labeling pixels in PT-OCT images into three bins, labeled with “low,” “mid,” and “high” class

The information provided to the model by above parameters were expected to offer the model an estimation of the medium’s light attenuation properties, the roll-off of OCT, PT-OCT signal sensitivity with depth, and the strength of PT light absorption by MOI. Using this information, the model was designed to classify a given pixel into one of the following groups/classes of lipid concentration: low (~ [0–30% lipid]), medium (~ [30–70% lipid]), and high (~ [70–100% lipid]).

The values of these parameters for each pixel were extracted from OCT and PT-OCT B-mode images via the developed code in MATLAB: First, the level of focal plane in the OCT images were entered manually in the code. Then, the top edges of the sample in OCT image were detected automatically by thresholding method. Finally, the values of the 5 features for each pixel were extracted and listed in the data base amounting to ~ 27,000 pixels. To train the model, 90% of these datapoints (≃ 24,000) were selected randomly. To train the SVM model, 95% of the training dataset for each class was used for training and the remaining 5% was considered as outlier. The strategy for removing the outliers enables the model to find better support vectors in data, leading to a higher precision in sketching of boundary of classes. Polynomial kernel was used for the SVM model to allow learning of non-linear behaviors that are intrinsic to PT-OCT responses [30]. To analyze generalization on the training dataset and to optimize the trained model, the cross-validation with the k-fold method (here, k = 10) was performed on the trained model. For k times, the cross-validation process is repeated, so that each of the k subsets exactly used one time as the validation data. These k training results can then be averaged to produce a single and optimized model. The advantage of this method over random subsampling is that we can make sure all data points in a dataset are used for both training and validation. This enables us to train a model with a better performance in generalization.

3 Results and Discussion

3.1 Theoretical Models

To address some of the limitations of current theoretical PT-OCT models, two new models were proposed. The first model took a first step in understanding the relation between composition and PT-OCT signals by considering the interplay between opto-thermo-physical properties of tissue as a function of its composition [30, 31]. This work offered a theoretical formulation for estimating the PT-OCT response in a two-component tissue-like sample. Validation of proposed model was carried out using experimental PT-OCT results of mayonnaise (mayo)-ultrasound gel mixtures at various component ratios. Mayo was chosen to mimic the lipid-rich necrotic-core material present in atheromatous coronary atherosclerotic lesions [25, 32]. Mayo, indeed, is primarily composed of lipids, which provide an absorption signature that can be targeted with PT-OCT, and its lipid composition is similar to that of the atherosclerotic plaques [25, 32]. In this model, a solution for the bio-heat equation (Eq. 3) was presented as a function of concentration-dependent material properties as follow [30]:

$$\Delta T\left(\Psi \right)=\frac{2P{t}_{L}}{\pi {W}_{f}^{2}}\times \frac{{\mu }_{mean}}{{\rho }_{mean}{c}_{mean}}\times \left(1+\left[{\mu }_{C}-{\rho }_{C}-{c}_{C}\right]\Psi +{M\left({\mu }_{C}, {\rho }_{C}, {c}_{C}\right)\Psi }^{2}+\dots \right)$$
(7)

where Ψ is the relative concentration with respect to a 1:1 mixture as follow:

$$\zeta =\left({\zeta }_{2}-{\zeta }_{1}\right)\Psi +\frac{\left({\zeta }_{2}+{\zeta }_{1}\right)}{2}={\zeta }_{range}\Psi +{\zeta }_{mean}, -0.5\le\Psi \le 0.5 and {\zeta }_{C}=\frac{{\zeta }_{range}}{{\zeta }_{mean}}$$
(8)

Here \(\zeta\) is the concentration-dependent material parameter (e.g., \(\rho\)) and \({\zeta }_{1}\), \({\zeta }_{2}\) are the values of the material parameters of pure individual components, respectively. \({\zeta }_{C}\) is the contrast of the material property in the two-component mixture which determines the sensitivity of the parameter to variation of concentration. Eventually, the OPL can be expressed as a function of relative concentration by:

$$\Delta OPL(\Psi )=\underset{{L}_{0}}{\overset{{L}_{1}}{\int }}n\left(T,\Psi \right)dl=\left(n\left(\Psi \right)+\frac{dn}{dT}\left(\Psi \right)\Delta T\left(\Psi \right)\right)\left({L}_{0}+{L}_{0}\beta \left(\Psi \right)\Delta T\left(\Psi \right)\right)-n(\Psi ){L}_{0}$$
(9)

Here, T0 is the initial temperature of the sample, dn/dT is the thermo-optic coefficient, and β is its linear thermal expansion coefficient. Figure 6 depicts theoretical/model and experimental results for a lipid-water mixture at various mixture ratios. As seen, the results obtained from the model has a great consistency with the experimental signals, demonstrating a non-linear response; the previous models, however, predicted a linear behavior for the PT-OCT signal with change in MOI concentration as a consequence of neglecting that changes in material physical properties (e.g., density and heat capacity) that occurs when MOI concentration changes. The more accurate prediction of the proposed model becomes specifically important when the overarching goal is quantification of MOI concentration from the acquired PT-OCT signals. For instance, the composition and lipid content of atherosclerotic plaques play a critical role in determining the propensity of a plaque to rupture [33] and the developed estimate model offered the prospect to leverage the same contrast mechanism as those used by near infrared spectroscopy [34] and photoacoustic imaging [29, 35, 36] (i.e., absorption of lipid) for plaque composition imaging, albeit with much finer resolution. Another major outcome of developed model was shedding light on the reason behind the relatively small PT-OCT signals from water which moderately absorbs the commonly used 1210 nm PT laser. The new theory showed that the large heat capacity of water (i.e., the main constituent of biological tissues) acts against the rise of the temperature, limiting the induced \(\Delta OPL\) and consequently the PT-OCT signals. The significance of these findings is that it opened the door for correlating MOI concentration with PT-OCT signals. The details of this study can be found in our published paper [30].

Fig. 6
figure 6

The experimental (*) and simulated PT-OCT signals (previous and proposed model) as a function of concentration of MOI (here, lipid)

Expanding on the findings of the first theoretical model, a comprehensive model was then proposed for prediction of PT-OCT signal in heterogenous multi-layer samples considering the 3D distribution of opto-thermo-mechanical properties [24]. This model had a serial hierarchy with 3 blocks for predicting the PT laser light field, determining the induced thermal-wave field upon absorption of PT light by MOIs, and evaluating the subsequent thermo-mechanical expansion field due to the induced temperature change in the sample. The model was validated under various conditions of light attenuation, the size of the induced thermal field in the sample, and the different mechanical and thermal boundary conditions. The key differentiator of this model from previously reported models was inclusion of mechanical properties of material (e.g., Poisson’s ratio) and thermal heat flux between sample layers [9, 23, 37], both of which significantly affect the PT-OCT signals. For instance, Fig. 7(a) depicts the experimental signals and model predictions at different depths of the PDMS-based sample with different top layers. When the top layer is air, the PDMS sample can freely expand upward, resulting in larger measured and predicted PT-OCT signal values at all depths. When the top layer is glass, however, both mechanical and thermal conditions at the boundary changes, resulting in smaller measured and predicted PT-OCT values. As such, the intercept value in Fig. 7(a) can be used as a measure of rigidity of overlayer on a target MOI which is of significant value to vulnerability assessment of the fibrous cap overlayer in atherosclerosis. Figure 7(b) depicts the significance of consideration of mechanical properties of sample (e.g., Poisson’s ratio) in predicting PT-OCT responses. Details of the proposed comprehensive model can be found in Ref [24]. The comprehensive 3D model opened the door for in-depth understanding of the sample and system parameters that influence PT-OCT signals. The knowledge we obtained from the modeling endeavors were subsequently used for developing experimentation and machine learning strategies for specific detection of molecules of interest and their classification based on concentration. Given the significance of atherosclerosis in cardiology, our developed strategies were mostly geared toward specific detection and quantification of atherosclerotic plaque constituents such as lipid.

Fig. 7
figure 7

(a) The simulated and experimental signals versus depth of the PDMS sample, obtained at different mechanical and thermal boundary conditions; (b) the effect of consideration of Poisson’s ratio on simulated PT-OCT signals. Without involving the Poisson’s ratio in the calculations, the simulated signals significantly underestimate

3.2 Spectroscopic PT-OCT for Detecting Tissue Compositions

To enhance specificity in detection of MOIs, spectroscopic PT-OCT was proposed. Here, in light of the light absorption spectra of MOIs and pseudo-MOIs, two (or more) PT wavelengths are integrated into the system. PT-OCT experiments are carried out sequentially at the selected PT wavelengths and results are interpreted in light of the known characteristic absorption spectral signatures of MOIs to produce depth-resolved maps of distribution of MOIs within the OCT structural images. For example, based on the absorption spectra of the principal chemicals of cardiac tissues, PT lasers at 1040 nm and 1210 nm can be used for enhancing the detection performance [29]. Exciting lipid with 1210 nm PT laser is expected to produce a strong PT-OCT signal while PT-OCT signals at such excitation from water, elastin and collagen are expected to be moderate. PT excitation at 1040 nm, on the other hand, is expected to yield moderate PT-OCT signal for elastin and collagen while producing minimal PT-OCT signal for lipid and water. As such, interrogation of PT-OCT signals at these PT laser wavelengths enables depth-resolved insight into the chemical composition of tissues with higher specificity.

The OCT B-mode and PT-OCT B-mode images of the standard phantoms are depicted in Fig. 8. The PT-OCT B-mode images were captured in 3 conditions: when both PT lasers were off (base line), only the 1040 nm PT laser was on, and only the 1210 nm PT laser was on. Comparing the dual-wavelength PT-OCT responses of phantoms with respect to the baseline allows us to study characteristic absorption signatures of the samples. For example, both water-based agarose (Fig. 8(a.1) to (a.4)) and lipid (Fig. 8(b.1) to (b.4)) samples absorbed PT laser at 1210 nm but did not absorb 1040 nm. These dual-wavelength PT-OCT images of water and lipid have great consistency with the known absorption spectra of lipid and water at these two wavelengths [29]. Next, to compare the amplitude of signals acquired from water and lipid, a two-layer sample consisting of a substrate of lipid under a wedge of water-agarose sample was imaged (see Fig. 8(c.1) to (c.4)). Although both water and lipid generate PT-OCT signals at 1210 nm, the amplitude of the lipid signal is much greater than that of water. The reason behind this can be found in their material properties. Indeed, both optical (e.g., absorption coefficient) and thermal (e.g., specific heat and diffusivity) properties of lipid contribute to generation of stronger thermal fields in the sample than that of the water. Similarly, the results of collagen (chicken cartilage) in Fig. 8(d.1)–(d.4) and elastin foam Fig. 8(e.1)–(e.4) show a moderate absorption at both PT wavelengths which are consistent with expectations from the reference absorption spectra [29]. Results depicted in Fig. 8 can be seen as the first PT-OCT signature table of MOI in atherosclerosis.

Fig. 8
figure 8

The results of spectroscopic PT-OCT imaging of the controlled phantoms. The OCT and the PT-OCT images from (a) water, (b) lipid, (c) two-layer water–lipid sample, (d) cartilage/collagen, and (e) elastin samples. By comparing the received signals at 1040 and 1210 nm with the PT:off mode, it is seen that there is great consistency between the absorption spectra of these materials [29] and the received PT-OCT signals, scale bar = 80 μm

Figure 9 shows images corresponding to a healthy human aorta sample. The histology image and the corresponding OCT image demonstrate high degree of similarity (i.e., a single layer structure). The histology image (Fig. 9(a)), however, shows an accumulation of collagen/elastin in this sample (yellow and green color for collagen and purple for elastin with Movat stain). The obtained PT-OCT responses are consistent with presence of collagen/elastin in the fresh tissue sample; comparison of the acquired signals at 1040 nm and 1210 nm with the baseline (Fig. 9(c)) suggest moderate light absorption at 1040 nm (Fig. 9(d)) due to light absorption by collagen and elastin and stronger light absorption at 1210 nm (Fig. 9(e)) due to light absorption by collagen, elastin, and water in this fresh tissue. The obtained dual-wavelength PT-OCT results demonstrate the potential for enhancement of detection specificity using spectroscopic approaches. Nevertheless, accurate interpretation of spectroscopic datasets can be challenging because several system and sample parameters, in addition to absorption spectral responses of tissue constituents, can also contribute to the magnitudes of acquired PT-OCT responses.

Fig. 9
figure 9

The results of spectroscopic PT-OCT imaging of a normal/healthy human aorta sample (a) histology, (b) OCT tomogram, (c) PT-OCT base line, (d) PT-OCT at 1040 nm, and (e) PT-OCT at 1210 nm of human aorta sample. Comparison of the acquired signals at 1040 nm and 1210 nm with the baseline (c) suggest moderate light absorption at 1040 nm (d) due to light absorption by collagen and elastin and stronger light absorption at 1210 nm (e) due to light absorption by collagen, elastin, and water in this fresh tissue. These preliminary and promising results demonstrate the potential of spectroscopic PT-OCT method as a new method for digitized histology; scale bar = 200 μm

3.3 Machine learning model for classifying concentration of MOI

The above theoretical studies and experimentations speak to the highly multifactorial nature of PT-OCT signals. As discussed in Sect. 3.1, the PT-OCT signal depends on various system and sample parameters that make the functionality of the signal highly non-linear to the input parameters; these non-linearities affect each other and often increase the order of non-linearity. Analytic inversion of such multifactorial problems is inherently challenging and noise sensitive. Machine learning (or simply ML) has demonstrated, time and again, that it offers a compelling solution to this type of multifactorial and non-linear problems. As such, our next work was focused on developing a machine learning model that can learn the multifactorial aspects of PT-OCT signals to enable decoupling of the effects of influence parameters not linked to MOI concentration (e.g., MOI depth or distance to focal plane) and ultimately predict/classify concentration of desired MOIs. A key thinking behind this work was to design the model and the training database based on the outcomes/results of theoretical models in order to enhance the prediction performance of the machine learning model (see Sect. 2.6). After model development and training, the ability of the classification model in generalization was evaluated with the unseen test dataset both qualitatively and quantitatively. The concentrations of lipid in the unseen datasets were 10, 50, and 95%. To visualize the performance of the model in prediction, the labels of ground truth images were assigned on each pixel based on their known lipid concentrations. Also, to analyze the performance of the model, the confusion matrix for each lipid concentration class was calculated. The calculated confusion matrix allowed us to compare the accuracy of classification in each class.

Figure 10(a)–(c) depict spatial distribution of the ground truth information (i.e., known unseen phantom concentrations) using three different colors. The 10, 50, and 95% concentrations of the unseen test phantoms fall into the defined three classes of trained model (low: 0–30%, mid: 30–70%, and high: 70–100% concentration), respectively. The classification results of the model on the test dataset are depicted in Fig. 10(d)–(f). The color coding used to label the pixels are the same as the ground truth labels. The qualitative assessment of the model classification performance can be done by comparing the predictions against the ground truth. This comparison indicates that the model correctly labeled most of the pixels. However, as seen in Fig. 10(d)–(f), at deeper regions the accuracy of classification is smaller. The reason behind such discrepancy is significant attenuation of OCT signals with depth in highly scattering media which results in dramatic decrease in SNR, and in return a dramatic increase in noise of the phase signals. Misclassifications also take place at pixels close to surface because the PT-OCT signals have not accumulated enough at such shallow depths to produce measurable PT-OCT signals.

Fig. 10
figure 10

The ground truth of the sample with lipid concentrations of (a) 10%, (b) 50%, and (c) 95%, and the prediction results of the samples with lipid concentrations of (d) 10%, (e) 50%, and (f) 95%. The green windows are showing a region with good accuracy in the prediction. In the deeper surfaces, due to significant attenuation of signals and poor SNR, the accuracy of predictions is diminished. scale bar = 180 μm

To justify using the ML approach, we compared the classification results of ML approach against those of conventional thresholding method. In the thresholding method, the classification is done based on pixels’ amplitude in the PT-OCT images. The class boundaries of thresholding method were defined based on the distributions of pixel intensities for all samples, Fig. 11(a). The histogram plots show that while the high class pixel values can clearly be distinguished from other concentrations, distinguishing low and mid concentration classes is challenging. To create a fair comparison between ML and the thresholding, first, the threshold of the most trivial class (the high class) is selected in a way that results in the same value of accuracy as that of the ML method. The values of classification accuracy are obtained by the confusion matrix. Based on this approach, PT-OCT signal values of greater than threshold value of 9 nm were classified to the high concentration class. Next to define the thresholds for the low and mid classes, the histograms of pixel amplitudes were plotted in the range of 0 to 9 nm (see the inset of Fig. 11(a)). As seen, the distributions of pixels for the low and mid classes overlap significantly; as such, threshold values of remaining classes were defined as: 0 \(\le\) low < 4.5 nm, 4.5 \(\le\) mid < 9 nm. The classification results with the thresholding method for the test samples (10, 50, and 95%) are depicted in Fig. 11(b). Qualitative analysis of results suggests that the performance of classifying for the high class is good (as expected). However, for the 50% and 10% phantoms, the classification appears to be random and inaccurate. The results of Fig. 11 highlights the limitation of classic thresholding for classification of non-linear and multifactorial signal. Qualitative comparison of ML and thresholding classifications suggest that the ML model performs more accurate classifications when in low and mid concentration classes because the trained ML model has learned how to compensate pixel signal values in light of the 5 influence parameters shown in Fig. 5(b).

Fig. 11
figure 11

The results of the conventional 1-parameter thresholding method. (a) Histogram of pixel amplitude in PT-OCT images for samples with various concentrations of lipid; inset shows the histogram of the training dataset with finer intervals, (b) the results of classification with the thresholding approach for the samples with 95, 50, and 10% lipid. This classification approach offers poor accuracy, especially at mid and low classes; scale bar = 150 μm

To analyze the performance of these the two classifications approaches quantitatively, the confusion matrix of predictions and ground truth on the test data (the unseen data) were calculated, Fig. 12. The two confusion matrices are obtained from the data selected by the green window in Fig. 10(d) to 0(f) where PT-OCT signals are measurable. The accuracy in correct classification of the “high” class is approximately equal in both approaches (~ 91%). However, the accuracies for the “low” and the “mid” classes with the ML approach are significantly greater than those of the thresholding method (Low: 64.4% vs. 43.8%; Mid: 68.2% vs. 36.6%). That is, classification of low and mid concentrations via ML approach improved the accuracy by 47% and 86.3%, respectively. It is worth pointing out that the accuracy of the thresholding for the mid class is only slightly better than that of an arbitrary/random classification as, from the statistical point of view, the probability of correctly classifying a pixel into 3 bins is 1/3 or 33.3%.

Fig. 12
figure 12

Comparison of the SVM model performance and the conventional thresholding. Confusion matrix results between the true and predicted classes with the SVM model (left) and the thresholding method (right). Although the performance of both method is comparable for the high-class predictions, the performance of the ML method in the other two classes is significantly better

3.4 The Strategy for Increasing Imaging Rate in PT-OCT

Limited imaging speed has long been a key inhibitor for translation of PT-OCT to the clinics. The last section of this review paper focuses on our recently introduced strategy for enhancing the effective imaging speed of PT-OCT by orders of magnitude; the new PT-OCT variant is called transient-mode PT-OCT (TM-PT-OCT) and does not require any instrumentation modification to the conventional PT-OCT system [38, 39]. In TM-PT-OCT, images are formed based on the transient thermal responses of absorbers to pulsed laser excitations (~ 100 µs long) from conventional low-cost laser diodes. TM-PT-OCT image pixel values are defined based on the maximum deviation of the recorded transient response from the initial noise floor in the acquired phase signals. Details of the method can be found in Refs. [38, 39]. Due to the stronger nature of transient photothermal responses, signals of sufficient SNR can be acquired in TM-PT-OCT at much shorter acquisition times compared to conventional PT-OCT. Consequently, the effective PT-OCT imaging line rate can be drastically improved in TM-PT-OCT to the kHz range which is over one and two orders of magnitude faster than the current state-of-the-art in photo-thermal optical lock-in OCT [40] and conventional PT-OCT methods, respectively. Such enhancement in effective imaging speed enables co-registered structural and molecular-specific imaging of moving samples at video-rate. As a demonstration of this method, results of captured images with the conventional method and TM-PT-OCT method from a moving sample are plotted in Fig. 13. The sample consisted of two regions with two different percentages of lipid, Fig. 13(a). As seen, while the OCT tomogram (Fig. 13(b)) and the conventional PT-OCT (Fig. 13(d)) cannot discern between the two concentrations, these regions can clearly be distinguished in the TM-PT-OCT image (Fig. 13(c)). In TM-PT-OCT, the sample is excited with a low-energy squared pulse that add no cost or complexity to the setup. In the previous efforts for enhancing PT-OCT imaging speed, complex and costly instrumentation had to be added to PT-OCT setups to only marginally increase the imaging speed [41, 42]. Given the significant enhancement of imaging speed with TM-PT-OCT, we also demonstrated possibility of performing spectroscopic PT-OCT imaging at video-rate which is of significant relevance to downstream clinical applications of PT-OCT [39].

Fig. 13
figure 13

(a) schematic of the moving sample. A frame of recorded video from the moving sample with (b) OCT, (c) TM-PT-OCT and (d) the conventional PT-OCT methods. These frames of the recorded video clearly demonstrate that while OCT and conventional PT-OCT methods cannot provide enough contrast, the two different concentration regions in the sample can clearly be distinguished by TM-PT-OCT method; scale bars = 100 μm

4 Summary, Limitations, and Future Directions

PT-OCT is a functional extension of OCT with the promise to provide micron resolution structural images of tissue co-registered with molecular contrast maps. As such, PT-OCT has great potential to overcome the non-specific limitation of OCT which is of great value to a broad spectrum of clinical applications. To date, applications of this useful imaging technique have been demonstrated in several studies by assessing the presence of a MOI in a sample qualitatively. Analyzing PT-OCT signals quantitatively, however, has the potential to offer deeper insight into chemical properties of samples in a depth-resolved manner. Quantitative interpretation of PT-OCT signals has been carried out in limited cases and conditions such as concentration measurements in liquid samples [15, 19]; however, quantitative PT-OCT imaging in biological tissues is complex and involved because, in addition to the concentration of absorber(s), several system and sample parameters also influence the acquired PT-OCT signals. Therefore, the first step toward quantitative PT-OCT imaging is the development of comprehensive theoretical models that reveal the nature and characteristics of PT-OCT system and sample influence parameters. Another key limitation of PT-OCT is its intrinsic slow effective imaging rate (i.e., few Hz) which impedes translation of this molecular extension toward in vivo and clinical imaging.

In this review paper, we presented and discussed the outcomes of our recent works aimed at addressing some of the key limitations of PT-OCT. To gain deeper understanding of the multifactorial physics of PT-OCT, two theories were developed and discussed. In the first theory, the relation between PT-OCT signal amplitude and sample properties was modeled in a general sample state including liquid and tissues-like samples. To generate this model, all significant opto-thermo-physical properties of samples were considered. Using this model, PT-OCT signals were modeled as a function of sample properties (e.g., absorption coefficient or thermal diffusivity) to study how certain sample opto-thermo-physical properties influence PT-OCT signals. In addition, a second comprehensive theory for PT-OCT was introduced for estimating/predicting PT-OCT signals of multi-layer samples in 3D. In this case, by considering the sequence of physical phenomena taking place in PT-OCT, the light, the thermal, and the mechanical fields were modeled to finally predict the ensuing PT-OCT responses. This comprehensive model enabled understanding of the effects of parameters such as the mechanical stiffness, the heat flux at layer boundaries, and the Poisson ratio on signals which could not be understood from previous models. The validation results for both theories indicated satisfactory correlation between experimental signals and model predictions, suggesting that the models can be used as simulators for designing and optimizing PT-OCT systems and experiments.

Next, two strategies were presented for obtaining refined insight into tissue chemical composition. In the first strategy, we introduced a spectroscopic approach to PT-OCT imaging to increase specificity of detecting and differentiating between MOIs (e.g., lipid and collagen/elastin). In brief, two PT laser wavelengths were integrated into the setup to sequentially excite the sample at two different wavelengths. Interpretation of dual wavelength responses in light of the known light absorption spectrum of targeted MOI enabled more specific detection. This work demonstrated the ability of the system to achieve depth-resolved and specific detection of lipid as the most important constituent of the arterial plaques. It was also noted that to design a spectroscopic PT-OCT system, a compromise should be considered in light of sample properties, sample geometry, and targeted tissue constituents. In general, to decrease the detection uncertainty, at least two PT lasers are needed in which the target MOI absorbs moderately or strongly but with different ratio. That is, poor absorption of the PT laser by the target MOI may not be very helpful in the spectroscopic PT-OCT approach. Probing at more than two PT wavelengths, generally, is expected to enhance detection performance at the cost of added complexity and cost of the setup and increased processing time of imaging. A possible way to decrease the imaging time is shining all PT lasers simultaneously but with slightly different modulation frequencies so that individual responses can later be decoupled from the acquired data in the frequency-domain.

In the next section of this review paper, quantification of received signals form lipid was carried out using a machine learning powered model. The theories generated by our team showed that quantification of PT-OCT signal is a multi-factorial and non-linear problem. Therefore, quantification of PT-OCT signal in a general condition was not fruitful with classic signal processing approaches. We employed a SVM classifier model to classify and to label pixels in PT-OCT images based on their lipid concentrations. The results showed that our trained SVM model, qualitatively, had a good performance in classifying pixels from single and multi-layer samples. Further analysis on the results showed that the SVM model improved the precision of classification by approximately 47% and 86% for two classes over conventional classification with thresholding. Strategies presented in this study can help us to gain more reliable molecular information from tissues, that is essential in early diagnosis of various diseases. For instance, in cardiology, estimating the concentration of lipid pool or measuring thickness of fibrous cap are significant factors in risk assessment of an arterial plaque [36]. As AI is a fast-growing topic in the world, countless efficient methods powered by AI have been presented. For our purposes, we used SVM, however, there is an opportunity to apply or combine other methods in these problems for getting better performance. A more comprehensive ablation study on the input of the SVM model can be done to analyze the impact of each input parameter. Also involving the texture features of the OCT images in the model may prove to be helpful as a new channel of information. Imaging more complex phantoms should also be done to evaluate performance of generalization in the model.

In the last section, a new imaging variant, named transient-mode (TM) PT-OCT, was discussed for addressing the low-imaging rate of PT-OCT. In this method, the thermo-elastic responses of sample to a low-power squared pulse excitation is used for forming PT-OCT images. Discussed results show that while TM-PT-OCT enhances the effective imaging speed by orders of magnitude, it does not require any modification to the instrumentation of conventional PT-OCT systems. Such significant enhancement in imaging speed became possible because the transient response of the sample occurs with significantly higher amplitude and much shorter time than the steady-state response used in conventional PT-OCT. Previously, the fastest reported PT-OCT acquisition rates were on the order of 10–100 Hz [41, 42]. With TM-PT-OCT, we showed that PT-OCT imaging rate can be increased to 3–7 kHz (i.e., 1–2 order of magnitude improvement). Our recent publications in this field focused on introduction of the concept and as such further enhancement of imaging speed may still be possible. For example, the scanning path of the galvo mirrors can be further optimized. In our works, galvo mirrors were moved in a raster scan manner which necessitates inclusion of delays for allow for dissipation of induced transient thermal fields before the next measurement could take place. Such delays made the effective A-line rate at last 3 times slower. As a future direction, the motion path of galvo mirrors can be engineered in a zigzag (or other patterns) manner to avoid the required delays. Another important achievement discussed in this review paper is first-time demonstration of video-rate PT-OCT imaging of a moving sample with TM-PT-OCT. This video showed the structure of a moving sample co-registered with depth-resolved molecular information. As a complementary study, spectroscopic TM-PT-OCT was also demonstrated in our published paper which enabled video-rate imaging of tissue structure and insight into chemical composition with high specificity [39].

We anticipate the knowledge and strategies presented in this review paper to open the door for downstream translation of the PT-OCT technology into the clinics. The promise of PT-OCT in generating depth-resolved structural images co-registered with molecular maps is specifically attractive for detection of early stages of diseases when structural abnormalities are minimal.