Keywords

1 Introduction

In order to decrease the total amount of radiation exposure, caused by intra-operative fluoroscopy, and provide real-time three-dimensional (3D) guidance, ultrasound (US) has been incorporated as an alternative imaging modality into various computer assisted orthopedic surgery (CAOS) procedures [1]. Nevertheless, due to the continuing challenges faced during the extraction of relevant anatomical information from US data, most of the proposed US-based CAOS guidance systems have not succeeded in clinical settings. Ultrasound images typically contain significant speckle and imaging artifacts, which do not correspond to any specific anatomy, complicating image interpretation and automatic processing. Furthermore, orientation of the US transducer with respect to the imaged anatomy and the elevational beam width strongly influence bone surface response profile and corresponding bone boundaries appear several millimeters in thickness. In order to overcome some of these challenges bone segmentation or enhancement methods have been proposed by various groups.

Fig. 1.
figure 1

Bone surface response appearance in ultrasound. (a), (b) High intensity soft tissue interfaces above the bone surface with similar intensity profile as the bone surfaces and reverberation artifacts inside the shadow region. (c) Separate low intensity spine bone surfaces. (d) Low intensity bone surface obtained due to non-optimal orientation of the US transducer.

The previously proposed image-based segmentation or enhancement methods can be classified into three groups: (i) methods using image intensity/gradient information [2, 3], (ii) methods based on local phase image features [4, 5], and (iii) hybrid approaches which combine the strengths of intensity and phase-based methods [6,7,8]. Intensity-based approaches are not robust to low contrast bone responses and high intensity soft tissue interfaces (Fig. 1). One of the distinct features in bone US data is the shadow region. A large transition in acoustic impedance between the tissue and the bone causes most of the acoustic signal to be reflected back creating a low intensity region extending from the bone boundary to the bottom of the image. Incorporating this information into their framework improved the accuracy and robustness of the proposed intensity- and phase-based methods [3, 6,7,8]. In [6], the percentage of overlapping surfaces between the manual segmented and automatic method was 62.5%. The hybrid approach proposed in [7] integrated machine learning into their framework. The method was validated on 35 US scans obtained from a single subject achieving an accuracy score of \(86\%\) for 1 mm tolerance with 0.59 mm localization error for this tolerance. The computation time for the proposed method was 2 min. In [8], computed tomography (CT) derived bone surfaces were registered to US derived bone surfaces. The reported average surface fit error for the in vivo pelvis data was close to 0.5 mm.

Although previously reported results provided promising outcomes, acquisition of high quality US data in clinical settings continues to be and ongoing challenge in US-based CAOS procedures effecting the accuracy and robustness of the segmentation methods. In this work, we propose a bone localization method which is accurate and robust to different US imaging artifacts. Local phase-based image features are utilized to enhance the bone surface response profile and suppress the soft tissue interfaces and imaging artifacts. The enhanced images are used as an input to an \(L_{1}\) norm-based contextual regularization method which emphasizes uncertainty in the shadow regions. The enhanced bone response and shadow region images are incorporated into a dynamic programming solution for localizing the bone surfaces. Qualitative and quantitative validation results on scans collected from seven volunteers are presented. The proposed method is also compared against previously developed intensity-based [3] and phase-based [9] methods.

2 Methods

The flowchart of the proposed method is provided in Fig. 2 and is based on our previous experience where local phase image features are used for bone enhancement and/or segmentation.

Fig. 2.
figure 2

Flowchart of the proposed bone localization method.

2.1 Enhancement of Bone Surface Response

Bone surface response profile in US is highly affected by the orientation of the beam with respect to the imaged bone boundary and the 3D anatomy of the imaged surface. If the US beam is perfectly aligned and the attenuation from soft tissue interface is low the bone response profile appears as a dominant ridge edge along the scanline direction. However, while imaging complex shape bone surfaces, such as spine, or if the attenuation from soft tissue interface is large the bone response profile can be dominated by different edge profiles. The first step in our framework involves the enhancement of the low intensity bone surfaces by constructing a local phase enhancement metric, similar to [9], as:

$$\begin{aligned} USE(x,y)=\frac{\sum _{r}\sum _{s}\left\lfloor [e_{rs}(x,y)-o_{rs}(x,y)] -T_{r} \right\rfloor }{\sum _{r}\sum _{s}\sqrt{e^{2}_{rs}(x,y)-o^{2}_{rs}(x,y)}+\epsilon }. \end{aligned}$$
(1)

Here o(xy) and e(xy) represent the even and odd symmetric filter response and are obtained by filtering the B-mode US image, US(xy), in the frequency domain using Log-Gabor filter [10]. Since the first step in the proposed framework is to provide an initial general ultrasound enhancement, in this new metric we are not using the absolute response values of the even and odd filter responses which was done previously for enhancement of bone interfaces [9]. Filter orientations and scale are represented with r and s respectively. \(\epsilon \) is a small constant included to avoid division by zero. \(T_{r}\) is a noise dependent threshold calculated as a specified number of standard deviations above the mean of the local energy distribution because of noise [11]. The standard deviation and mean of the local energy is calculated for each orientation separately using the response of the smallest scale filter [10].

Fig. 3.
figure 3

Local phase image bone features. Top row shows the extracted local phase image features where the enhanced USE(xy) image was used to extract LPT(xy), LPE(xy), LwPA(xy), and LP(xy) image features. Bottom row shows the extracted local phase image features where the B-mode US(xy) image was used to extract LPT(xy), LPE(xy), LwPA(xy), and LP(xy) image features. Red arrows point to bone surfaces and soft tissue interfaces where the improvement was achieved. Distance map is shown on the far right. (Color figure online)

Figure 3 shows that USE(xy) results in the enhancement of low intensity bone surfaces and soft tissue interfaces. Hacihaliloglu et al. [12] recently proposed a tensor-based feature descriptor, called local phase tensor (LPT(xy)), for the enhancement of bone features while suppressing high intensity soft tissue interfaces. The second step in the bone enhancement framework is to calculate the LPT(xy) image. LPT(xy) is obtained using even and odd filter responses which are defined as:

$$\begin{aligned} \begin{aligned} T_{even}&= \left[ {{\varvec{H}}}(US_{DB}(x,y))\right] \left[ {{\varvec{H}}}(US_{DB}(x,y))\right] ^{T},\\ T_{odd}&= -0.5\times (\left[ \nabla US_{DB}(x,y)\right] \left[ \nabla \nabla ^2 US_{DB}(x,y)\right] ^{T}\\&\quad +\left[ \nabla \nabla ^2 US_{DB}(x,y)\right] \left[ \nabla US_{DB}(x,y)\right] ^{T}). \end{aligned} \end{aligned}$$
(2)

Here \(T_{even}\) represents symmetric features and \(T_{odd}\) represents the asymmetric features. H, \(\nabla \) and \(\nabla ^2\) denote the Hessian, Gradient and Laplacian operations, respectively. \(US_{DB}(x,y)\) is obtained by masking the band-pass filtered USE(xy) image with a distance map which improves the enhancement of bone surfaces located deeper in the image while masking out of soft tissue interfaces close to the transducer. Band-pass filtering was performed using a Log-Gabor filter [12]. The final LPT(xy) image is obtained using \(LPT(x,y)=\sqrt{T_{even}^2+T_{odd}^2}\times cos(\varphi )\). The instantaneous phase obtained from the symmetric (\(T_{even}\)) and asymmetric (\(T_{odd}\) ) features responses is represented with \(\varphi \) [12]. Investigating the obtained LPT(xy) image (Fig. 1) we can see that the descriptor enhances soft tissue interfaces close the to bone surface as well. In order to provide an enhancement with less soft tissue interfaces and more compact bone representation, local phase energy (LPE(xy)) and local weighted mean phase angle (LwPA(xy)) image features are extracted using monogenic signal theory where the monogenic signal image (\(US_{M}\)(x,y)) is formed by combining the bandpass filtered LPT(xy) image (\(LPT_{B}(x,y)\)) with the Riesz filtered components as:

$$\begin{aligned} \begin{aligned} US_{M}(x,y)\,&=\,\big [US_{M1}(x,y),\,US_{M2}(x,y),\,US_{M3}(x,y), \big ] \\&=\,\big [LPT_{B}(x,y),\,LPT_{B}(x,y)xh_{1}(x,y),\,LPT_{B}(x,y)xh_{2}(x,y),\big ]. \end{aligned} \end{aligned}$$
(3)

Here \(h_{1}\) and \(h_{2}\) represent the vector valued odd filter (Riesz filter) [13]. For band-pass filtering \(\alpha \)-scale space derivative quadrature filters (ASSD) are used which are shown to produce produce improved edge detection results on simulated US images [14]. The LPE(xy) image is obtained by averaging the phase sum of the response vectors over many scales using:

$$\begin{aligned} LPE(x,y)=\sum _{sc}\big |US_{M1}(x,y)\big |-\sqrt{US_{M2}^{2}(x,y)+US_{M3}^{2}(x,y)}. \end{aligned}$$
(4)

In the above equation sc represents the number of scales. LPE(xy) encodes the underlying shape of the bone boundary by accumulating the local energy of the image along several filter responses. LwPA(xy) is calculated using:

$$\begin{aligned} LwPA(x,y)=\arctan \left( \frac{\sum _{sc}US_{M1}(x,y)}{\sqrt{\sum _{sc}US_{M1}^{2}(x,y)+\sum _{sc}US_{M2}^{2}(x,y)}}\right) \end{aligned}$$
(5)

during the calculation of the LwPA(xy) feature map noise compensation is not performed and the LwPA(xy) image preserves all the structural details of the US image such as the soft tissue interfaces and bone surface. The final improved local phase bone image (LP(xy)) is obtained using: \(LP(x,y)=LPT(x,y) \times LPE(x,y) \times LwPA(x,y)\). Figure 3 shows the obtained local phase feature images (LPT(xy), LPE(xy), LwPA(xy)). One common property of the extracted local phase image feature images is that the enhanced bone surfaces are well localized in all of the three images while soft tissue interfaces are not. Therefore, the combination of these three phase feature images results in the suppression of soft tissue interfaces while keeping the bone surfaces more compact and localized. In Fig. 3 (bottom row) we also show the bone enhancement results obtained if we used US(xy) image as an input to the tensor-based phase descriptor. Red arrows point to the enhanced soft tissue artifacts and missing bone boundaries since. These are the locations in the B-mode US image (US(xy)) where the bone response is weaker compared to the soft tissue interfaces above the bone surface. The obtained LP(xy) image is used in the next section for the enhancement of bone shadow region.

Fig. 4.
figure 4

(a) Enhanced bone shadow image BSE(xy). (b) Bone probability image obtained by masking LP(xy) with BSE(xy). (c) Bone localization presented as curve BL(s). The curve BL(s) is overlaid on the actual bone surface for better representation. (d) Localized bone surface is overlaid on the B-mode ultrasound image of in vivo knee. (Color figure online)

2.2 Enhancement of Shadow Region

Automatic identification of shadow regions is important since it can be used as an additional feature to improve the robustness and accuracy of the segmentation or registration methods. The bone shadow enhancement is based on the modification of previously proposed US confidence map (CM) approach [15]. However, instead of using the US image intensity information we use LP(xy) image features. We achieve this by modeling the interaction of the US signal within the tissue using scattering and attenuation information. The model, denoted as US signal transmission map (\(US_{A}(x,y)\)), maximizes the visibility of high intensity features inside a local region and satisfies the constraint that the mean intensity of the local region is less than the echogenicity of the tissue confining the bone. The scattering and attenuation effects in the tissue are combined as: \(CM_{LP}(x,y)=US_{A}(x,y)BSE(x,y)+(1-US_{A}(x,y))\rho \). Here \(CM_{LP}(x,y)\) represents CM image obtained from LP(xy) using [15], \(\rho \) is a constant value representative of echogenicity in the tissue surrounding the bone, and BSE(xy) is the enhanced bone shadow image which we are trying to calculate. In order to calculate BSE(xy), \(US_{A}(x,y)\) is estimated first by minimizing the following objective function [16]:

$$\begin{aligned} \frac{\lambda }{2}\big \Vert US_{A}(x,y)-CM_{LP}(x,y)\big \Vert ^2_{2} +\sum _{j\in \chi }\big \Vert W_{j}\circ (D_{j} * US_{A}(x,y)) \big \Vert _{1}.\ \end{aligned}$$
(6)

Here \(\circ \) represents element-wise multiplication, \(\chi \) is an index set, and \(*\) is convolution operator. \(D_{j}\) is calculated using a bank of high order differential filters [17]. The filter bank results in the enhancement of bone features in the local region while attenuating the image noise. \(W_{j}\) is a weighting matrix calculated using: \(W_{j}(x,y)=exp(-{\mid }D_{j}(x,y) * CM_{LP}(x,y){\mid }^2)\). In (6), the first part measures the dependence of \(US_{A}(x,y)\) on \(CM_{LP}(x,y)\) and the second part models the contextual constraints of \(US_{A}(x,y)\). These two terms are balanced using a regularization parameter \(\lambda \) [16]. After estimating \(US_{A}(x,y)\), BSE(xy) image is obtained using: \(BSE(x,y)=[(CM_{LP}(x,y)-\rho )/[max(US_{A}(x,y),\epsilon )]^\delta ]+\rho \). \(\delta \) is related to tissue attenuation coefficient \((\eta \)), \(\rho \) is a constant value representative of echogenicity in the tissue surrounding the bone, and \(\epsilon \) is a small constant used to avoid division by zero [16]. Figure 4 shows the enhanced bone shadow image BSE(xy) where the soft tissue interface above the bone surface is represented with uniform intensity and the shadow region is represented with low intensity values corresponding to a low probability value that the signal reaching back to the transducer imaging array (high intensity denoted with dark red and low intensity with blue color coding). Investigating the BSE(xy) image we can see that the transition from soft tissue interface to bone shadow region is represented with a sharp intensity change clearly differentiating the two interfaces. The enhanced bone shadow region image (BSE(xy)) and local phase bone image (LP(xy)) are used during the bone surface localization which is explained in the next section.

2.3 Bone Surface Localization

The localization of the bone feature within a column s, denoted as BL(s), is achieved by minimizing a cost function composed of two energy functions denoted as internal energy (\(E_{int}(x,y)\)) and external energy (\(E_{ext}(x,y)\)). \(E_{int}(x,y)\) is obtained by masking the LP(xy) image with the BSE(xy) image which provides a bone probability map (Fig. 4(b)). The external energy (\(E_{ext}(x,y)\)) is constructed by dividing the US image into three regions denotes as bone region, boneless region and the jump region (the region between the first two regions) (Fig. 4(c)). \(E_{ext}(x,y)\) is constructed using these three regions as [3]:

$$\begin{aligned} E_{ext}(i,j)= \left\{ \begin{array}{lcl} \nu ||\frac{dBL}{ds}||^{2}+\xi ||\frac{d^{2}BL}{ds^2}||^2 +\varsigma ; &{} ~~~ &{} \text {Bone region}, \\ JumpCost; &{} &{} \text {Jump region},\\ \nu D_{1}^2+\xi D_{2}^2; &{} &{} \text {Boneless region}. \end{array}\right. \end{aligned}$$
(7)

Here \(\nu \) and \(\xi \) are the weights of the smoothness (the first derivative of BL(s)) and the curvature (the second derivative of BL(s)), and \(\varsigma \) is small negative scalar ensuring larger connected bone regions to stay connected. Bone connectivity is further maintained with the JumpCost constant which penalizes frequent jumps between bone and boneless regions. As there is no bone information present in the boneless region, first and second order derivatives are assigned constant values \(D_{1}\) and \(D_{2}\). Dynamic programming optimization is used to solve:

$$\begin{aligned} BLmin(i,j) = E_{int}(i,j) + \min _{k}\big [BLmin(k,j-1)+E_{ext}(k,j)\big ]. \end{aligned}$$
(8)

BLmin(ij) represents the minimum cost of moving from first column to the pixel in ith row and jth column. Row index is represented with k. The index of the pixel k, j with its minima is stored in \(Indexmin(i,j) = \mathop {\mathrm{argmin}}\nolimits _{k}[BLmin(k,j-1)+E_{ext}(k,j)]\). Dynamic programming provides a fast optimization of the cost function. The final optimized bone localization if obtained by tracing back from the last column of the US image using:

$$\begin{aligned} BL_{opt}(s)=\left\{ \begin{array}{lcl} NR+1 &{} ~~~ &{} s=NC; \\ Indexmin[s+1,BL_{opt}(s+1)]; &{} &{} s=1,\ldots ,(NC-1). \end{array}\right. \end{aligned}$$
(9)

\(BL_{opt}\) is the optimized segmentation path where the energy cost function is minimized. The number of rows and columns are indicated with NR and NC of the B-mode US image. NR and NC also indicate the last row and last column in the US image. The final localized bone surfaces is shown in Fig. 4.

2.4 Data Acquisition and Experiments

After obtaining the institutional review board (IRB) approval a total of 150 different US images, from seven healthy subjects, were collected using SonixTouch US machine (Analogic Corporation, Peabody, MA, USA). Depending on the anatomical region of interest two different transducers were used (C5-2 curvilinear, L14-5 linear transducer). Depth settings and image resolutions varied between 3–8 cm and 0.12–0.19 mm respectively. All the proposed image enhancement and localization methods were implemented using MATLAB 2014a software package and run on a 2.3 GHz Intel(R) \(\mathrm {Core^{TM}}\) i5 CPU, 16 GB RAM windows PC. The localized bone surfaces were compared to manual localization results obtained from an expert user. The quality of the localization was evaluated by computing average Euclidean distance (AED) between the two surfaces. We also compare the localization results against the methods proposed in [3, 9]. For bone shadow enhancement, \(\lambda \,{=}\,2\) and \(\rho \), the constant related to tissue echogenicity, was chosen as 90% of the maximum intensity value of \(CM_{LP}(x,y)\). LPT(xy) images were calculated using the filter parameter values defined in [12]. The CM(xy) and \(CM_{LP}(x,y)\) images were obtained using the constant values as: \(\eta \,{=}\,2\), \(\beta \,{=}\,90\), \(\gamma \,{=}\,0.03\). For bone surface localization the constant values were chosen as: \(\nu \,{=}\,50\), \(\xi \,{=}\,100\), \(JumpCost\,{=}\,0.8\), \(\varsigma \,{=}\,0.15\), \(D_{1}\,{=}\,D_{2}\,{=}\,1\). These values were determined empirically and kept constant during qualitative and quantitative analysis.

Fig. 5.
figure 5

Qualitative results. First, third and fifth rows represent the B-mode ultrasound image of in vivo radius, spine and femur respectively. Second, fourth and sixth rows present the localization result. Green represents manual expert segmentation and red is obtained using the proposed algorithm. (Color figure online)

3 Results

Investigating the qualitative results we can see that the surfaces localized with the proposed method have a good alignment with the expert manual localization (Fig. 5). The combination of enhanced local phase bone features and shadow region information provides a robust estimate even if (i) the shadow region had intensity variations (Fig. 5; femur and spine), (ii) disconnected bone surfaces (Fig. 5; spine), (iii) low intensity bone boundary (Fig. 5; radius, spine and femur), and (iv) high intensity soft tissue interfaces (Fig. 5; femur, spine and radius). The overall AED error for the proposed method was 0.26 mm (SD: 0.22). The overall AED error for [9] and [3] were 0.78 mm (SD: 0.68) and 4.5 mm (SD: 4.39) respectively. The maximum AED was 1.36 mm for the proposed method, and 19.08 mm for [3], and 4.2 mm for [9] (Table 1). Table 1 also shows the 95% confidence level calculated for the localization results obtained for all the three methods compared. We can see that the the proposed method outperforms [3, 9]. The average computation time was 9.4 s.

4 Discussion and Conclusion

We have presented a method for accurate, robust and fully automatic localization of bone surfaces in two-dimensional US data based on enhanced local phase bone and shadow region information. The method was validated on 150 in vivo US data, obtained from seven volunteers, and achieved an overall AED error of 0.26 mm. We achieved a 67% improvement in terms of surface localization over state of the art methods and 94% improvement compared to intensity-based localization methods. Although we have not directly compared our method to machine learning-based approaches [7] our reported localization results have 54% improved accuracy. However, the proposed shadow enhancement method and local phase features extracted in the proposed work can also be incorporated into existing machine learning approaches as additional features which could results in the improvement of the localization results reported for these methods. The specific contributions include: (1) the use of \(\alpha \)-scale filters for extraction of bone phase features, (2) calculation of a new bone probability map for improved bone surface localization, and (3) combination of enhanced bone shadow features with three different image phase features for bone localization. Previously, it was shown that by optimizing the filter parameter selection, using information derived from the collected data, improvements can be achieved in terms of surface localization and robustness to artifacts [9]. Therefore, the filter parameter selection process should be automated. Another limitation of the proposed method is the achieved mean computation time which was around 9.4 s. This is a large computational cost considering that any intra-operative procedure performed requires real time feedback. Future work will involve (i) improvement of the computation speed, (ii) validation on more in vivo scans, and (iii) optimization of the filter parameters. Finally, we would like to mention that although there were no failed cases for the proposed method a more extensive validation is required in order to fully address clinical challenges that can be faced during the application of the method. Specifically, volunteers with high body mass index will require a special investigation which we will be performing as part of our future work.

Table 1. Comparative results of the proposed approach.