Keywords

1 Introduction

Cancer is one of the main causes of death worldwide. In the case of head and neck squamous cell carcinoma (HNSCC) the correct staging includes the careful examination of the nearby lymphatic system. As the current clinical and radiological assessment procedures are limited in their accuracy [1], many specialized centers still focus solely on the complete removal of all lymph nodes in the head- and neck area [2]. This procedure is called elective neck dissection (END) and is often not necessary and harbors the risk to damage the complex intricate nervous and lymphatic systems near the intervention site [3]. An alternative is sentinel lymph node biopsy (SLNB), often used in breast cancer surgery and melanoma, which is a minimal invasive standard staging procedure based on radioactive tracers (i.e. 99m technetium; 99Tc). It has also been successfully applied for selected head and neck tumors and is the most accurate histologic staging procedure with the highest success rate [46]. The goal of SLNB is to examine the first draining lymph node after a tumor site. This sentinel lymph node (SLN) is most likely the first to receive metastases [7]. If the SLN is tumor free, no END needs to be carried out and the specialist speaks of a “clinically negative neck” (cN0-neck). It is therefore of utmost importance that SLNB be as accurate as possible in order to correctly stage the malignancy in the subsequent histopathologic work-up of the excised lymph nodes.

Current preoperative HNSCC assessment is performed using either lymphoscintigraphy (LS) or more recently single photon emission tomography (SPECT) combined with computed tomography (CT). Thanks to the integration of SPECT/CT into the procedure a higher spatial resolution and therefore a better anatomical orientation is achieved compared to the classical LS alone. SPECT/CT is nowadays considered the standard method for preoperative imaging before any SLNB [2]. During an SLNB procedure the specialist is guided by a handheld 1-dimensional audio-based gamma detection probe (HGDP) indicating activity of the radioactive tracer in the lymphatic system. For the intervention to be effective, the surgeon relies on the SPECT/CT images from the preoperative assessment. As not only the target node (5 mm–10 mm in diameter) is infiltrated and made visible by the tracer, the identification proves to be especially challenging. This is accentuated where there are no prominent visible anatomic landmarks. The general consensus is that the lymph node with the highest noise-to-background ratio in gamma radiation is the true SLN. Therefore, the need for more sophisticated imaging devices and better data visualization arises given these intricacies of SLNB.

The company SurgicEyeFootnote 1 sells a commercially available system based on freehand SPECT (fhSPECT) that solves some of the aforementioned problems of SLNB. Freehand SPECT depends on a preparation step needed to register the gamma probe with the patient followed by gamma activity acquisition in order to build a correct synthetic 3D model of the tumor and its location. This model is then used to produce the augmentation of the surgical scene, displayed on an external screen. After each tissue removal, the model needs to be updated by a consecutive gamma activity acquisition in order to correct the augmentation [8].

The research group of Prof. Dr. Nassir Navab from the TU in Munich has a strong background in augmented reality (AR) for radioguided surgery in general and sentinel lymph node biopsy in particular.

The approach of fhSPECT was developed by these researchers [9]. One of their recently published review articles compares clinical applications of 3D scintigraphic imaging (fhSPECT) and navigation in radioguided surgery [10].

An overview of the current technologies, different modalities, devices, tools and staging procedures to treat each type of malignancy is also provided by the IAEA’s Guided Intraoperative Scintigraphic Tumour Targeting (GOSTT)Footnote 2 report.

Fig. 1.
figure 1

Artistic rendering of an intraoperative setup of the mobile hand-held scintigraphy unit (e.g. tablet computer). The collimator and its endoscopic cameras on the back of the unit are pointing towards the patient. A live view of the patient’s tracer enriched SLN, overlaid on top of the incision, is presented to the surgeon on the display.

The aim of this research project is to develop an advanced mobile hand-held scintigraphy unit with AR capabilities (Fig. 1). This unit fuses or augments directly gamma activity images with optical images of the incision for better localization and identification of tracer enriched lymph nodes during SLNB. These augmented images are then displayed on a tablet computer or similar device attached to the unit. To the best of our knowledge, such an intraoperative AR representation of the surgical scene of an SLNB intervention is unique. The presented preliminary results show the potential of our concept.

2 Materials and Methods

In this study, we used endoscopic cameras NanEyeFootnote 3 that measure only 1 mm \(\times \) 1 mm \(\times \) 1.7 mm in width, depth and height, respectively. Their pixel resolution is 250 \(\times \) 250 pixels with a pixel size of 3 \(\upmu \mathrm{m}\) \(\times \) 3 \(\upmu \mathrm{m}\) and thus an aspect ratio of 1:1. The effective focal length is 660 \(\upmu \mathrm{m}\). The built-in optics are fisheye-based lenses with an f-number of 2.7, an aperture of 244 \(\upmu \mathrm{m}\) and a depth of focus between 8 mm–75 mm.

Our industrial collaborator DECTRISFootnote 4 provided us with a gamma detector prototype with a native resolution of \(487 \times 195\) pixels and a pixel size of 172 \(\upmu \mathrm{m}\) \(\times \) 172 \(\upmu \mathrm{m}\). DECTRIS’ detector technology is based on Hybrid Photon Counting (HPC) and cadmium telluride (CdTe) sensor material. Furthermore, a Linux PC based data processing unit as well as the necessary software packages to operate the detector were part of their contribution.

To effectively detect high energy photon (\(\gamma \)-ray) sources and to be able to correctly map the tracer enriched target tissue, a \(\gamma \)-detector also depends on a particular collimator design. In a preliminary study [11], the used parallel-hole collimator showed fundamental limitations with regard to the correct mapping between the optical cameras of the system and the detector producing the activity- or \(\gamma \)-image. The opto-geometric properties of a parallel-hole collimator (PHC) are such that \(\gamma \)-rays hitting the detector yield an orthographic or distance invariant view of the object. The augmentation using standard optical cameras is thus more complex as it requires an accurate mapping between the orthographic projection of the collimator and the pinhole or projective geometric projection of the optical camera. For this, an accurate 3D depth map of the surgical scene is needed which is virtually impossible to get in the OR, without using a tracking infrastructure. Our hypothesis states that the above mentioned steps could be substantially simplified if either the collimator or the optical camera had the same and aligned projection geometry. We thus propose on using a pinhole collimator with geometrically aligned optical cameras. A pinhole collimator acts as an idealized “camera obscura” which produces a perspective projection of the \(\gamma \)-image. As such the pinhole approach has comparable opto-geometric properties to an optical camera simplifying image augmentation.

A particular disadvantage of the standard pinhole collimator is its reduced photon yield. Our custom built collimator for this research project uses simultaneously multiple pinholes which results in an aggregated photon sensitivity and thus increases its overall performance. In order for such a multi-pinhole collimator (MPC) design to be still optically valid, multiple small (endoscopic) optical cameras are inserted into the pinholes, see Fig. 2. As \(\gamma \)-rays easily penetrate the relatively small cameras, no significant attenuation is to be expected. The overlap of the field of view of the endoscopic cameras will in the future be exploited by stitching algorithms to present a unified image. The optical parameters of the endoscopic cameras are such that their diagonal field of view (FOV) is \({90}^{\circ }\), which corresponds to a vertical and horizontal FOV of approx. \({71}^{\circ }\), given an aspect ration of 1:1. The half-angle \(\alpha \) of the FOV of an idealized pinhole camera thus given by

$$\begin{aligned} \tan (\alpha ) = \frac{w}{2} \times \frac{1}{h} \end{aligned}$$

where w denotes the width and h the height of the pinhole compartment. This model is adequate for the envisioned solution to combine the two modalities (i.e. \(\gamma \)- and optical images), see Fig. 2a.

The septae (vertical compartment separations along h) and the front- and side plates need to be thick enough in order to absorb \(\gamma \)-rays which do not pass through the pinholes. This avoids the generation of an unwanted background signal on the actual detector. Our calculations suggest a thickness of 1 mm for the front- and side plates and a thickness of 0.25 mm for the septae. As such only 5 % of the emitted photons are able to penetrate the shielding. Lead and Tungsten are standard materials in nuclear medicine to build collimators [12]. Tungsten has very good mechanical properties and a high specific density of 19.25 g/cm\(^3\) and is thus used for our collimator. The assembled multi-pinhole collimator and the cameras are shown in Fig. 2b.

Fig. 2.
figure 2

(a) Pinhole compartment of a multi-pinhole collimator with its field of view (FOV, 2 \(\times \alpha \)), given the specific geometric properties. The FOV of a miniature camera is drawn in comparison. (b) Rendering of the used camera placement layout with respect to the pinholes of the collimator. The endoscopic optical cameras are 1 mm \(\times \) 1 mm in length, width and 1.7 mm in height. The collimator has dimensions 85.9 mm \(\times \) 36 mm \(\times \) 36 mm, pointing towards the surgical scene (arrow).

The basic workflow and the specific challenges to combine a PHC produced \(\gamma \)-image with an optical camera image shall be presented qualitatively in Fig. 3. The two gamma sources shown in the image (originating from i.e. two neighboring lymph nodes) are mapped on-top of each other in the orthographic projection of the PHC. As they do not share the same depth in the tissue, the detected gamma activity should be split in two in the optical cameras looking from one side. This, however, can only be resolved with accurate depth information of the tracer activity which is not available (Fig. 3a).

Fig. 3.
figure 3

(a) Optical camera with its view cone, perspective projection. Parallel-hole collimator, orthographic (parallel) projection. The two \(\gamma \)-sources are at an unknown depth which cannot be measured. (b) Our approach: both view cones of the camera and the pinhole overlap symmetrically. An incident \(\gamma \)-ray is seen by the detector under the same viewing angle as the corresponding object point seen by the camera. No depth information is needed for the augmentation.

Our approach is based on the hypothesis that similar opto-geometric properties of pinhole modeled cameras yield similar or identical projections. Any detection of an incident \(\gamma \)-ray on the MPC detector belongs to the same homogeneous trace as the corresponding pixel on the optical image (Fig. 3b). As long as both viewing axes are properly aligned, and hence there is a symmetric overlap of the FOV of the pinhole and the optical camera, the activity and the pixels are correlated and provide a correct registration. A priori depth information is therefore not needed.

To test and verify our assumption on fusing the two modalities, we set up an experiment consisting of a vial filled with 0.1 ml of metastable technetium (99Tc). The liquid in the tip of the inner tube served as an idealized point source with a radioactivity of 15 MBq. This technetium isotope is a standard tracer for SPECT/CT and LS and is therefore routinely used in the domain of nuclear medicine. The vial was then placed at specific distances (i.e. 3 cm, 5 cm, 7 cm) from the combined \(\gamma \)-detector/camera unit and observed using different exposure times (Fig. 4). The distances where chosen such that they correspond to the expected distance from the device to the surgical site (neck of patient).

In this experiment one endoscopic camera/pinhole pair was used to present the working principle. The camera was calibrated in a separate step using Zhang’s algorithms [13] and a chessboard pattern to obtain its intrinsic and extrinsic parameters. The intrinsic parameters of an optical camera need to be known in order to compensate for lens distortions and to obtain the principal point. The principal point is at the center of the projection of the image and often offset with respect to the optical axis due to imperfections in the alignment between the camera chip and the optics.

Fig. 4.
figure 4

Setup of the experiment on the workbench, as seen from above. The vial is placed at a distance of 5 cm and roughly aligned with an endoscopic camera, which is slightly offset to the axis of the pinhole (smaller arrow), due to manual placement. Other measured distances are also indicated (dashed circles). The photon acquisition time for each position is 30 s and 60 s, respectively.

The pinhole image patch, as it is acquired through the ideal camera model of the pinhole, does not suffer from distortion or shifts and can serve as ground truth orientation for the scintigraphy device. Therefore, the center of the pinhole image patch can be further used to fine tune the alignment of both images. As the point sources (vials, Fig. 4) and the optical camera were manually placed and inserted, offsets between both images are inevitable and need to be corrected by the algorithm. Currently, each off-line augmentation or fusion uses a specific offset correction factor.

Implementation. A basic off-line image overlay algorithm is used to translate an image patch \(\mathbf {P_n}\) of \(pinhole_n\) from the \(\gamma \)-detector to the principal point of the optical image \(\mathbf {I_n}\) of the corresponding \(camera_n\). \(\mathbf {P_n}\) has size \(60 \times 64\) pixels, \(\mathbf {I_n}\) has size \(250 \times 250\) pixels. As the pinhole characteristics are such that the pinhole image \(\mathbf {P_n}\) is mirrored, a rotation of \({180}^{\circ }\) around its center needs to be applied. Let \(\mathbf {t}\) be the translation vector \(:=\) \(principalPoint(\mathbf {I_n}) - \mathbf {P_n}/2\) and \(\mathbf {R}\) the rotation matrix, we obtain for every pixel \(\mathbf {p}\) of \(\mathbf {P_n}\)

\(\mathbf {p'} = \mathbf {t} + \mathbf {R} * \mathbf {p}\)

where \(\mathbf {p'}\) indicates an activity in the coordinate system of \(\mathbf {I_n}\). Correction factors \(s * e_x, s * e_y\) can be added to \(\mathbf {t}\) to visually align the activity image in case of slight offsets. For every \(\mathbf {p'}\) its activity value is evaluated by a hard-coded color look-up table. The augmentation is thus \(\mathbf {I_n}(\mathbf {p'}) = color(\mathbf {p'})\).

Fig. 5.
figure 5

Vial at 3 cm, min. photon count: 0 (black), max. photon count: 21 (white), pinhole offset: 30 pixels to the right, exposure time of the detector: 60 s. The background signal is high as the photon penetration of 5 % is accentuated at this distance. The pinhole cannot fully capture the source as it is partially out of its view cone.

Fig. 6.
figure 6

Vial at 5 cm, min. photon count: 0 (black), max. photon count: 19 (white), pinhole offset: 30 pixels to the right, exposure time of the detector: 60 s. With increasing distance, less unwanted photons hit the shielding and thus their absorption probability is higher. The source is now completely in the view cone.

Fig. 7.
figure 7

Vial at 5 cm, min. photon count: 0 (black), max. photon count: 16 (white), pinhole offset: -10 pixels to the left, exposure time of the detector: 30 s. The pinhole cannot fully capture the source as it is partially out of its view cone.

3 Results

The following image series (Figs. 5, 6, 7, 8) show the augmentation or fusion of an optical image with the corresponding \(\gamma \)-image. The distances from the detector to the vial are given, following the experimental setup according to Fig. 4. The same middle pinhole/camera pair was used throughout the experiments.

Fig. 8.
figure 8

Vial at 7 cm, min. photon count: 0 (black), max. photon count: 14 (white), pinhole offset: 30 pixels to the right, exposure time of the detector: 60 s. At larger distances the other pinholes also begin to capture the source, given the opening angles of their view cone.

Each experiment was done with the same radioactive source of approx. 15 MBq, measured at the start of the first experiment. The fusion algorithm presented in the Implementation Section was applied. The images to the right (c) show the principal point (white circle) as well as the pinhole center (white x). The difference is the offset applied to match the activity pixels \(\mathbf {p'}\) with the vial inner tube where the tracer (99m technetium; 99Tc) liquid is stored. No scaling of either image is applied. The \(\gamma \)-detector has some defective cells (middle gray column, a). Specific observations are given for each augmentation run.

4 Discussion and Conclusion

First tests with our device show promising results. Our hypothesis of using a multi-pinhole based collimator with axis-aligned endoscopic cameras can explain the good spatial matching of the \(\gamma \)-activity images with the corresponding optical images. This can be seen in the Results section. However, the background signal at short distances is considerably high and challenges image segmentation to properly display SLNs. Further, the augmentation depends on a proper axis alignment between pinhole and camera. Compared to the state-of-the-art, the image quality of the endoscopic cameras for the envisioned operating distances is at least as good as with normal cameras for capturing the patient’s neck, given sufficient illumination is provided. The sensitivity of the gamma detector is superior in terms of photon conversion rate as what is currently available on the market. This particular combination makes it possible to provide a system that meets the criteria to advance SLNB in a truly AR fashion.

Current limiting factors of sentinel lymph node biopsy are its strong dependence on accurate preoperative imaging modalities and the crude orientation provided by 1-dimensional audio-based gamma detectors for the localization of sentinel lymph nodes. The augmentation of the gamma activity of the tracer enriched lymph nodes with the optical images of the surgical scene, as presented in this feasibility study, helps the surgeon to better orient himself and focus the biopsy towards the relevant regions of the intervention site. A more targeted biopsy enables a more reliable post-operative histopathologic staging and therefore a more effective analysis of potential cancerous tissue. Breast cancer and melanoma staging based on SLNB face similar challenges. Our approach is therefore also applicable in these domains and could provide a step forward for sentinel lymph node biopsy in general.

Next development steps of the project focus on building an improved collimator shielding to better constrain unwanted photons to reduce the background signal. Furthermore, clustering and stitching algorithms need to be developed in order to take advantage of the multiple pinhole configuration and the endoscopic camera array to improve the capturing process of the gamma activity as well as the incision. To correct for any axis misalignments between the harboring pinholes and their cameras, we envision a calibration scheme to further improve the augmentation process. A chessboard patterned object combined with a weak radioactive source and radioactive transparent corner points could be used to establish a correct point-to-point correspondence between the optical cameras and the \(\gamma \)-detector. The vision to provide a truly mobile hand-held device, as shown in Fig. 1, lastly needs integration of the crucial hardware parts into a relatively small form factor.