Keywords

1 Introduction

Histology images are golden standards for medical diagnosis and analysis, as they contain key information such as the cause and severity of the diseases. With the advancement of deep learning technology, computers are now capable of being applied in the analysis of medical images and in extracting key information. However, traditional 2D images can lose a lot of important information, such as the vascular structure in 3D space. Moreover, due to different task requirements and variations such as machine specifications among hospitals and institutions, there is a need to develop a general 3D reconstruction system. Current 3D reconstruction tasks, especially those involving high-resolution images, require extensive computational resources and are extremely time-consuming, with the registration and semantic segmentation tasks as the bottleneck of the real-time visualization for gigabyte WSIs [1]. In this work, we propose a computational-efficient method to reconstruct the pathology images for WSI 3D reconstruction using point clouds, a discrete set of data points in the 3D space. The process comprises semantic segmentation, point cloud sampling, point cloud registration, and 3D rendering. This process outperforms the existing reconstruction process as it combines the sampling and modeling processes by constructing point clouds. Subsequently, registration is performed, greatly reducing the computational and time costs required for the process.

2 Related Works

Many approaches have recently been proposed recently for 3D reconstruction. For example, [2] developed techniques to inspect the surface of organs by reconstruction from endoscope videos. A pipeline named CODA [1] perceives the spatial distribution of tumors such as the pancreas and liver. ITA3D reconstructs tissues through non-destructive 3D pathology images [3]. Comparative studies have published to reconstruct 3D organs in the disciplines of ultrasound [4, 5], radiology [6,7,8] and orthodontic [9, 10]. Notably, due to factors such as the quality of the loaded glass slides and manual operation during the preparation of pathological sections, the three-dimensional reconstruction must perform image registration, which makes the three-dimensional reconstruction method based on CT images, as in [11], unsuitable for direct application to WSI. Despite many AI-powered applications, accuracy and performance are still the dominant challenges for real-time diagnoses. In the setting of gigabyte pathology images, cellular-level segmentation and image registration are required to be produced in a short time to keep up with the high-throughput scanners and minimize the waiting time for the final confirmation by pathologists.

Fig. 1.
figure 1

The 3D visualization pipeline of tumor tissue. For the visualization of tumor volume, the raw WSIs are firstly decoupled to image patches and processed by the binary segmentation network, with the gated axial transformer as the encoder and CNN as the decoder. Then the patches are rejoined to form segmentation maps representing tumor (positive) area. The color convention is applied to WSI binary image to visualize the entire tissue volume. (a) and (b) stand for one layer of point clouds generated from the binary images to represent the density of the tumor (red) and tissue (blue). (c) and (d) are the 3D visualization tissue volume generated from the representative point clouds. (Color figure online)

3 Method

WSI-Level Tissue Segmentation. The medical transformer, namely gated axial-attention transformer [12, 13] employs a position-sensitive axial-attention mechanism, with a shallow global branch and a deep local branch incorporated.

Inspired by this design, we trained a network with two branches of gated-axial transformer and a CNN-transformer hybrid architecture as the backbone to extract global and local information. The segmentation ground truths are derived from 2D WSI segmentation maps labeled manually by QuPath [14]. Then the 2D WSIs are cropped to image patches and curated to feed the segmentation network, as patch-based deep learning networks are currently the mainstream structures in the discipline of histology image analysis. The raw images and paired segmentation masks are cropped to \(128 \times 128\) pixel image patches for input. The network consists of two branches. The gated-axial transformer aims to learn global information by capturing feature correlations. The other branch of CNN-transformer hybrid architecture employs the transformer structure as the encoder and the CNN as the decoder, where the latter is deepened with multiple layers to allow a clear separation of tumor tissue (positive) and dense tissue (negative), as shown in Fig. 1. After the binary segmentation, the output patches are rejoined to form WSI for the later tumor visualization.

Point Clouds. Point clouds are applied for 3D modeling objects such as buildings [15] and human bodies [11, 16, 17]. This research generates the layered point clouds with down-sampled semantic segmentation results. The pixels of tumor (positive) masks are appended to the layered point cloud. The x and y coordinates of the points are generated from the segmented images, and the z coordinate is the interpolation of the stacked WSI. The computed point clouds are then reconstructed at the three dimensions for WSI registration. Compared with another commonly used 3D reconstruction tool, voxel-based 3D pixel representation using a 3D 0/1 matrix, the point cloud is more suitable for modeling high-resolution images with enormous data volumes thanks to its sparser data. In the current task, point cloud reconstruction also serves the function of extracting feature points. If registering the WSI, even when selecting only a few feature points and calculating simple translation and rotational coordinates, the entire WSI needs to be transformed accordingly, and the model needs to be re-sampled. By building the model first and then applying the transformation to it, only the coordinates of the points in the three-dimensional space need to be transformed, and a model that can be used for subsequent processing can be obtained directly (Fig. 2).

Fig. 2.
figure 2

An example of rendering a point cloud into a model, where darker colors indicate higher point density, which corresponds to potentially tumor tissue in the WSI for this task. Figure a and b correspond to model images generated from two serialized WSI sequences respectively.

Axial Registration. Current registration methods employ Radon transform and cross-correlation, where WSIs are cropped and applied with rigid and elastic registration [1]. This computation workload is often massive and also redundant for unimportant regions. Moreover, elastic segmentation may cause image distortion and inaccuracy in segmentation. By contrast, we optimize the overall framework by bringing forward the segmentation and the point cloud generation before the registration.

Specifically, we incorporate the ICP (Iterative Closest Point) strategy to register for the layered point clouds generated from the segmentation output. As each point cloud for registration uses exclusively one layer, we apply point-to-point strategy [18] without employing the normal vectors. A brief review of the point-to-point strategy is formulated as follows:

$$\begin{aligned} P_{fix} = RP_{mov} + T \end{aligned}$$
(1)

\(P_{fix}\) and \(P_{mov}\) are the fixed and moving point clouds. R and T are the rotation matrix and translation vector.

$$\begin{aligned} V_{i,fix} = P_{i,fix} - C_{fix} \end{aligned}$$
(2)
$$\begin{aligned} V_{i,mov} = P_{i,mov} - C_{mov} \end{aligned}$$
(3)

\(P_{i,fix}\) and \(P_{i,mov}\) \((1 \le i \le N)\) are the paired-points in the point cloud; \(C_{fix}\) and \(C_{mov}\) are the center of the two point clouds; and \(V_{i,fix}\) and \(V_{i,mov}\) are the vectors from point to the center.

$$\begin{aligned} \begin{aligned} \mathscr {L}(R,T) &= \frac{1}{N} \sum _{i = 1}^{N} || P_{i,fix} - R P_{i,mov} - T || ^ 2 \end{aligned} \end{aligned}$$
(4)

   N is the number of points in \(P_{mov}\), and \(\mathscr {L}\) is the loss of the registration. Expand the equation and eliminate terms with zero means, \(V_{i,fix}\) and \(V_{i,mov}\) particularly and we obtain the following formula to calculate the final values of R and T in order to minimize the loss value.

$$\begin{aligned} R^* = \arg \min (\frac{1}{N-1} \sum _{i = 1}^{N-1}|| V_{i,fix} - R V_{i,mov}|| ^ 2) \end{aligned}$$
(5)
$$\begin{aligned} T^* = C_{fix} - R^* C_{mov} \end{aligned}$$
(6)

where \(R^*\) and \(T^*\) are the computed rotation matrix and translation vector with minimized loss. The minimum value is achieved through SVD or nonlinear optimization.

Fig. 3.
figure 3

An example of registration processing. The fixed WSI (pink point clouds) and the corresponding moving WSI (green) are computed in the current iteration, and the registration is iteratively performed from bottom (grey) to top (silver). Our selective algorithm pinpoints the essential points (blue) for matrix computation for the ICP translation speedup. (Color figure online)

Innovatively, to speed up the processing, we select the representative layered point cloud, determined by the spatial density and 2D coordinate, to apply the transformation to the entire layer. In each iteration from bottom to top, we select horizontal and vertical band-shaped areas in the moving point cloud, as shown in Fig. 3. For a consistent spatial presentation of the tumor tissue, interpolation is required upon the different resolutions of x, y, z. In this case study, the z value of the points are multiplied by a factor of 4 to map with the xy resolution. The point cloud is interpolated based on the nearest layered point cloud. The layered point clouds are then re-registered iteratively in the same manner.

Algorithm 1
figure a

The axial registration

4 Implementation

We employ Open3D library [19] to generate point clouds to visualize spatial tissue distribution. The model presents point arrays with xyz coordinates, and the functions models produce color point clouds and 3D meshes. The 3D visualization allows the demonstration of comprehensive information interpreted by deep learning structures, including the spatial distribution of tumors and tissues.

5 Quantitative Results

Segmentation. The loss and training time of the segmentation network are demonstrated in Fig. 4. WSIs are cropped to \(128 \times 128\) image patches to feed the network, then rejoined to generate the layered point clouds, as shown in the segmentation image in Fig. 3.

Fig. 4.
figure 4

The loss and time are reported every fifty epochs to train the segmentation network. Observably, our model converges steadily in the loss function. A couple of seconds are required to obtain a reliable segmentation network.

Registration Speedup. Two metrics of speedup and accuracy evaluate the registration performance, and the latter is measured by Root Mean Square Error (RMSE) of the point pairs. For the axial registration example demonstrated in Fig. 5, the representative points are sampled in x value from 2,250 to 2,750, or y value from 6,750 to 7,250 at the bottom layer, about 1/3 of the total points employed for registration. Overall, the axial registration is with smaller RMSE on average, as shown in Fig. 5.

This pipeline attempts a significant decrease in registration computation, with 1.54 s per layer required, which is about 10.94% the time required for the regular ICP registration [18], and is a tremendous advantage compared with WSI-level registration [1] taking about 40 min per image. Overall, processing the WSI stack registration workflow takes only several minutes on average, whereas the state-of-the-art approach requires a couple of hours [1], as shown in Fig. 5. Consequently, the registration processing will not be the bottleneck of the 3D tissue reconstruction.

Fig. 5.
figure 5

The experimental results of the two datasets. The left column shows the RMSE-frequency histogram and the right column shows the time-frequency histogram. It is obvious that our method outperforms in both speed and accuracy, with lower average value for each standard.

6 Conclusion and Future Work

In this task, we have optimized and integrated existing 3D reconstruction pipelines for WSI (Whole Slide Imaging) and CT (Computed Tomography), resulting in a more efficient pipeline for 3D reconstruction of high-resolution images. By utilizing point cloud merging and assisted registration processes, this pipeline significantly reduces redundant computations, decreases data volume in comparison to voxel methods, and minimizes time consumption during the registration process. While this pipeline is specifically designed for the unique requirements of Whole Slide Imaging (WSI), it also has the potential to adapt to CT and MRI images through semantic segmentation and point cloud sampling, 3D rendering, and omitting the registration. The 3D reconstruction section in [11, 20, 21] also utilized a similar method of acquiring layered images, stacking and aligning them to generate a 3D model. Although there were some differences in the specific implementation, it also demonstrated that our method theoretically could be applied to the 3D reconstruction of other medical images, such as immunohistochemistry images. Therefore, as long as there are appropriate training models and data available, this pipeline can be adaptable to 3D reconstruction tasks for different types of images and tissues.