1 Introduction

Today, a steadily increasing number of US device vendors dedicate their efforts on Point-of-Care Ultrasound (POCUS), including PhilipsFootnote 1, ButterflyFootnote 2, ClariusFootnote 3, UltraSeeFootnote 4, and others. In general, these systems’ development is hardware-driven and aims at introducing conventional scanning modes (B-mode, color Doppler) in previously inaccessible surroundings in the first place [1].

At the same time, significant work on improving non-point-of-care US has been presented in recent years [2]. Amongst them, three-dimensional (3D) US relying on external hardware tracking is already translating into clinical routine, enabling advanced live reconstruction of arbitrary anatomy [3]. Naturally, the trend to employ deep learning tools has not stopped short of US, exhibiting remarkable progress to segment challenging anatomies or classify suspicious lesions, as shown in the review by Litjens et al. [4] and references therein.

In liaison, these breakthroughs in terms of hardware and image processing allow us to look beyond conventional usage of US data. In this work, we summarize recent advances in POCUS and interventional US using innovative image analysis and machine learning technologies, which were implemented within our medical imaging framework ImFusion Suite.

For instance, very long 3D US scans facilitate automatic vessel mapping, cross-section and volume measurements as well as interventional treatment planning (available on an actual medical device now, see PIUR tUSFootnote 5). Brain shift compensation based on multi-modal 3D US registration to pre-operative MR images enables accurate neuro-navigation, which has successfully been proven on real patients during surgery [5].

In the remainder of the paper, we start with a brief overview of the important features of our ImFusion software development kit (SDK) allowing for such developments and then highlight the following applications in greater detail: (i) Employing deep learning and optionally inertial measurement units (IMU), we have been able to show that 3D reconstruction is even possible without external tracking systems. (ii) For orthopedic surgery, precise bone surface segmentation facilitates intra-operative registration with sub-millimeter accuracy, in turn allowing for reliable surgical navigation. (iii) Last but not least, ultrasound uniquely allows to close the loop on the acquisition pipeline by actively influencing how the tissue is insonified and the image formed. We perform a tissue-specific speed-of-sound calibration, apply learning-based filtering to enhance image quality and optimally tune the acquisition parameters in real-time.

2 ImFusion SDK as Research Platform

A variety of open source C++ platforms and frameworks for medical imaging and navigation with US have evolved in the past, including 3D Slicer [6] with the SlicerIGT extension [7], the PLUS toolkit [8], CustusX [9], and more recently SUPRA [10]. All of these have a research focus, and have successfully helped to prototype novel algorithms and clinical workflows in the past, some with a very active development community striving for continuous improvement. Nevertheless, turning an algorithm from a research project into a user-friendly, certified medical product may be a long path.

Complementary to the above, we are presenting the ImFusion Suite & SDK, a platform for versatile medical image analysis research and product-grade software development. The platform is based on a set of proprietary core components, whereupon openly accessible plugins contributed by the research community can be developed. In this work, we emphasize the platform’s capabilities to support academic researchers in rapid prototyping and translating scientific ideas to clinical studies and potential subsequent commercialization in the form of university spin-offs. The SDK has been employed by various groups around the world already [5, 11,12,13,14].

It offers radiology workstation look and feel, ultra-fast DICOM loading, seamless CPU/OpenGL/OpenCL synchronization, advanced visualization, and various technology modules for specialized applications. In order to deal with real-time inputs such as ultrasound imaging or tracking sensors and other sensory information, the streaming sub-system is robust, thread-safe on both CPU and GPU, and easily extensible. Research users may script their algorithms using XML-based workspace configurations or a Python wrapper. Own plugins can be added using the C++ interface. In the context of dealing with 3D ultrasound, further key features that go beyond what is otherwise available include robust image-based calibration tools similar to [15], and various 3D compounding methods that allow for on-the-fly reconstruction of MPR cross-sections [16]. Last but not least, handling of tracking sensors include various synchronization, filtering and interpolation methods on the stream of homogeneous transformation matrices. Having all of the above readily available allows researchers to focus on advancing the state of the art with their key contribution, as demonstrated in the following examples.

3 3D POCUS Without External Tracking

Most POCUS systems are currently based on 2D ultrasound imaging, which greatly restricts the variety of clinical applications. While there exist systems enabling the acquisition of three-dimensional ultrasound data, they always come with drawbacks. 3D matrix-array ultrasound probes are very expensive and produce images with limited field-of-view and quality. On the other hand, optical or electro-magnetic tracking systems are expensive, not easily portable, or hinder usability by requiring a permanent line-of-sight. Finally, leveraging the inertial measurement units (IMU) that are embedded in most current US probes provides a good estimate of the probe orientation, but acceleration data is not accurate enough to compute its spatial position.

Fig. 1.
figure 1

From [17], modified.

(a) Overview of our method for a frame-to-frame trajectory estimation of the probe. (b) Architecture of the neural network at the core of the method. (c) Results of the reconstructed trajectories (without any external tracking) on several sweeps acquired with a complex motion.

Therefore, in the past decades, there has been a significant effort in the research community to design a system that would not require additional and cumbersome hardware [18, 19], yet allowing for 3D reconstruction with a freehand swept 2D probe. The standard approach for a purely image-based motion estimation was named speckle decorrelation since it exploits the frame-to-frame correlation of the speckle pattern present in US images. However, due to the challenging nature of the problem, even recent implementations of this approach have not reached an accuracy compatible with clinical requirements.

Once again, deep learning enabled a breakthrough by boosting the performance of image-based motion estimation. As we have shown in [17], it is possible to train a network to learn the 3D motion of the probe between two successive frames in an end-to-end fashion: the network takes the two frames as input and directly outputs the parameters of the translation and rotation of the probe (see Fig. 1a and b). By applying such a network sequentially to a whole freehand sweep, we can reconstruct the complete trajectory of the probe and therefore compound the 2D frames into a high-resolution 3D volume. We also show that the IMU information can be embedded into the network to further improve the accuracy of the reconstruction. On a dataset of more than 700 sweeps, our approach yields trajectories with a median normalized drift of merely 5.2%, yielding unprecedentedly accurate length measurements with a median error of 3.4%. Example comparisons to ground truth trajectories are shown in Fig. 1c.

4 Ultrasound Image Analysis

A core feature of the ImFusion SDK consists of its capabilities for real-time image analysis. Provided that the employed US system allows for raw data access, the processing pipeline from live in-phase and quadrature (IQ) data regularly starts with demodulation, log-compression, scan-line conversion, and denoising.

Image Filtering. Instead of relying on conventional non-linear image filters, it is possible to use convolutional neural networks (CNNs) for denoising. Simple networks with U-net architecture [20] can be trained with l2-loss to perform a powerful, anatomy-independent noise reduction. Figure 2a depicts an exemplary B-mode image of a forearm in raw and filtered form. More complex, application-specific models could be used to emphasize a desired appearance, or to highlight suspicious lesions automatically.

Fig. 2.
figure 2

From [21], modified.

(a) Raw B-mode image of volunteer forearm cross-section (left), and the result of the CNN-based denoising filter (right). (b)(c) Examples of automatic bone segmentations in various US images (different bones and acquisition settings), along with the neural network detection map.

Bone Surface Segmentation and Registration. As presented in [21], we have shown that the automatic segmentation of bone surfaces in US images is highly beneficial in Computer Assisted Orthopedic Surgeries (CAOS) and could replace X-ray fluoroscopy in various intra-operative scenarios. Specifically, a fully CNN was trained a set of labeled images, where the bone area has been roughly drawn by several users. Because the network turned out to be very reliable, simple thresholding and center pixel extraction between the maximum gradient and the maximum intensity proved sufficient to determine the bone surface line, see example results in Fig. 2b, c. Once a 3D point cloud of the bone surface was assembled using an external optical tracking system, pre-operative datasets such as CT or MRI can be registered by minimizing the point-to-surface error. An evaluation on 1382 US images from different volunteers, different bones (femur, tibia, patella, pelvis) and various acquisition settings yielded a median precision of 0.91 and recall of 0.94. On a human cadaver with fiducial markers for ground truth registration, the method achieved sub-millimetric surface registration errors and mean fiducial errors of 2.5 mm.

5 Speed-of-Sound Calibration

In conventional delay-sum US beamforming, speed-of-sound inconsistencies across tissues can distort the image along the scan-lines direction. The reason is that US machines assume a constant speed-of-sound for human tissue; however, the speed-of-sound varies in the human soft tissue with an approximate range of 150 m/s (Fig. 3a). To improve the spatial information quality, we have developed a fast speed-of-sound calibration method based on the bone surface detection algorithm outlined in the previous section.

As presented in [21], two US steered frames with a positive and a negative angle are acquired in addition to the main image. Then, the bone surface is detected in the steered images and they are interpolated into one single frame. Wrong speed-of-sound causes both vertical and horizontal misplacements for the bone surface in the steered images. The correct speed-of-sound is estimated by maximizing the image similarity in the detected bone region captured from the different angles (Fig. 3b). This method is fast enough to facilitate real-time speed-of-sound compensation and hence to improve the spatial information extracted from US images during the POCUS procedures.

Fig. 3.
figure 3

(a) The difference in fat-to-muscle ratio between two patients; red and green lines show the length of fat and muscle tissues. Considering the average speed-of-sound in human fat and muscle (1470 m/s and 1620 m/s), one can compute the average speed-of-sound for both images, resulting in 1590 m/s and 1530 m/s, respectively. At a depth of 6 cm, this difference can produce around 1 mm vertical shift in the structures. (b) Superimposed steered US images before (left) and after (right) the speed-of-sound calibration; red and green intensities are depicting the individual steered frames with angles of \({\pm }15^\circ \). Note the higher consistency of the bone in the right image. (Color figure online)

6 Acquisition Parameter Tuning

One last obstacle of a wider adoption of ultrasound is the inter-operator variability of the acquisition process itself. The appearance of the formed image indeed depends on a number of parameters (frequency, focus, dynamic range, brightness, etc.) whose tuning requires significant knowledge and experience. While we have already shown above that – thanks to deep learning – US image analysis algorithms can be made very robust to a sub-optimal tuning of such parameters, we can even go one step further and close the loop of the acquisition pipeline.

Fig. 4.
figure 4

Automatic tuning of the US acquisition parameters based on the real-time bone detection presented in Sect. 4, sub-optimal settings marked with red lines. (Color figure online)

Just like standard cameras use face detection algorithm to adjust the focus plane and the exposure of a picture, we can leverage a real-time detection of the object of interest in the ultrasound frame to adjust the acquisition parameters automatically as shown in Fig. 4. Using machine learning to assess the image quality of an ultrasound image has already been proposed (e.g. [22]), but using a real-time detection allows to tailor our tuning of the parameters in an explicit and straightforward way.

More specifically, knowing the position of the object in the image allows us to directly set the focus plane of the ultrasound beams to the correct depth. It also enables us to adjust the frequency empirically: the shallower the object, the higher we can define the frequency (and vice versa). Finally, we can also choose an adequate brightness and dynamic range based on statistics within a region of interest that includes the target structure.

We believe such an algorithm could allow less experienced users to acquire ultrasound images with satisfactory quality, and therefore make the modality more popular for a larger number of clinical applications.

7 Conclusion

We have presented a number of advanced POCUS & interventional US applications through the ImFusion Suite. While many aspects of 3D ultrasound with and without external tracking have been thoroughly investigated by the community in the past, dealing with such data is by no means trivial, hence dedicated software was in our experience crucial to achieve such results.