Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

Automatic lung cancer cell detection is the basis of many computer-assisted methods for cell-based experiments and diagnosis. However, at present, very few work has been focused on lung cancer cell detection. The difficulty in lung cancer cell detection problem is basically three-fold. First, the density of lung tumor cells is generally very high in the histopathological images. Second, the cell size might vary and cell clumping is usual. Third, the time cost of cell detection method, especially in high-resolution histopathological images, is very high in cell-based diagnosis. With these challenges mentioned above, it is still in great demand for researchers to develop efficient and robust lung cancer cell detection methods. To alleviate these problems, we propose an efficient and robust lung cancer cell detection method based on the Deep Convolution Neural Network(DCNN) [1]. Other than computationally-intensive frameworks [2, 3], or ROI(region of interest)-based detection method [4, 5], it exploits the deep architecture to learn the hierarchical discriminative features, which has recently achieved significant success in biomedical image analysis [6, 7].

In the proposed method, the training process is only performed on the local patches centered at the weakly annotated dot in each cell area with the non-cell area patches of the same amount as the cell areas. This means only weak annotation of cell area (a single dot near the center of cell area) are required during labeling process, significantly relieving the manual annotation burden. Another benefit for this technique is to reduce the over-fitting effect and make the proposed method general enough to detect the rough cell shape information in the training image, providing the benefit for further applications, e.g. cell counting, segmentation and tracking.

During testing stage, the conventional sliding window manner for all local pixel patches is inefficient due to the considerable redundant convolution computation. To accelerate the testing process for each testing image, we present a fast forwarding technique in DCNN framework. Instead of preforming DCNN forwarding in each pixel patch, the proposed method performs convolution computation in the entire testing image, with a modified sparse convolution kernel. This technique almost eliminates all redundant convolution computation compared to the conventional pixel-wise classification, which significantly accelerates the DCNN forwarding procedure. Experimental result reports the proposed method only requires around 0.1 s to detect lung cancer cells in a \(512\times 512\) image, while the state-of-the-art DCNN requires around 40 s.

To sum up, we propose a novel DCNN based model for lung cancer cell detection in this paper. Our contributions are summarized as three parts: (1) We built up a deep learning-based framework in lung cancer cell detection with modified sliding window manner in both training and testing stage. (2) We modify the training strategy by only acquiring weak annotations in the samples, which decreases both labeling and training cost. (3) We present a novel accelerated DCNN forwarding technology by reducing the redundant convolution computation, accelerating the testing process several hundred times than the traditional DCNN-based sliding window method. To the best of our knowledge, this is the first study to report the application of accelerated DCNN framework for lung cancer cell detection.

2 Methodology

Given an input lung cancer histopathological image I, the problem is to find a set \(D=\{d_1, d_2, \ldots , d_N\}\) of detections, each reporting the centroid coordinates for a single cell area. The problem is solved by training a detector on training images with given weakly annotated ground truth information \(G=\{g_1, g_2, \ldots , g_M\}\), each representing the manually annotated coordinate near the center of each cell area. In the testing stage, each pixel is assigned one of two possible classes, cell or non-cell, the former to pixels in cell areas, the latter to all other pixels. Our detector is a DCNN-based pixel-wise classifier. For each given pixel p, the DCNN predicts its class using raw RGB values in its local square image patch centered on p.

2.1 Training the Detector

Using the weakly annotated ground truth data G, we label each patch centered on the given ground truth \(g_m\) as positive(cell) sample. Moreover, we randomly sample the negative(non-cell) samples from the local pixel patches whose center are outside of the boundary of positive patches. The amount of negative sample patches is the same as the positive ones. If a patch window lies partly outside of the image boundary, the missing pixels are fetched in the mirror padded image.

For these images, we only feed very few patches into the proposed model for training, therefore extremely accelerating the training stage. Besides, this technique also partly eliminates the effect of over-fitting due to the under-sampling usage of sample images (Fig. 1).

Fig. 1.
figure 1

The illustration of generation of training samples: (1) Tiles are randomly sampled from the whole slide images. (2) The sampled tiles are manually annotated by well-trained pathologists, which construct the weakly annotated information. (3) We only feed the local pixels patches center on the annotated pixels and the randomly sampled non-cell patches of the same amount as the cell ones.

2.2 Deep Convolution Neural Network Architecture

Our DCNN model contains two pairs of convolution and max-pooling layers, followed by a fully connected layer, rectified linear unit layer and another fully connected layer as output. Figure 2 illustrates the network architecture for training stage. Each convolution layer performs a 2D-convolution operation with a square filter. If the activation from previous layer contains more than one map, they are summed up first and then convoluted. In the training process, the stride of max-pooling layer is set the same as its kernel size to avoid overlap, provide more non-linearity and reduce dimensionality of previous activation map. The fully connected layer mixes the output from previous map into the feature vector. A rectified linear unit layer is followed because of its superior non-linearity. The output layer is simply another fully connected layer with just two neurons(one for cell class, the other for non-cell class), activated by a softmax function to provide the final possibility map for the two classes. We detail the layer type, neuron size, filter size and filter number parameters of the proposed DCNN framework in the left of Table 1.

Fig. 2.
figure 2

The DCNN architecture used in the training process of the proposed framework. C, MP, FC, ReLU represents the convolution layer, max pooling layer, fully connected layer and rectified linear unit layer, respectively.

Table 1. Backward (left) and accelerated forward (right) network architecture. M: the number of patch samples, N: the number of testing images. Layer type: I - Input, C - Convolution, MP - Max Pooling, ReLU - Rectified Linear Unit, FC - Fully Connected

2.3 Acceleration of Forward Detection

The traditional sliding window manner requires the patch-by-patch scanning for all the pixels in the same image. It sequentially and independently feeds patches to DCNN and the forward propagation is repeated for all the local pixel patches. However, this strategy is time consuming due to the fact that there exists a lot of redundant convolution operations among adjacent patches when computing the sliding-windows.

To reduce the redundant convolution operations, we utilize the relations between adjacent local image patches. In the proposed acceleration model, at the testing stage, the proposed model takes the whole input image as input and can predict the whole label map with just one pass of the accelerated forward propagation. If a DCNN takes \(n\times n\) image patches as inputs, a testing image of size \(h\times w\) should be padded to size \((h+n-1)\times (w+n-1)\) to keep the size consistency of the patches centered at the boundary of images. The proposed method, in the testing stage, uses the exact weights solved in the training stage to generate the exactly same result as the traditional sliding window method does. To achieve this goal, we involve the k-sparse kernel technique [8] for convolution and max-pooling layers into our approach. The k-sparse kernels are created by inserting all-zero rows and columns into the original kernels to make every two original neighboring entries k-pixel away. To accelerate the forward process of fully connect layer, we treat fully connected layer as a special convolution layer. Then the fully connect layer could be accelerated by the modified convolution layer. The proposed fast forwarding network is detailed in Table 1(right). Experimental results show that around 400 times speedup is achieved on \(512\times 512\) testing images for forward propagation (Fig. 3).

Fig. 3.
figure 3

The illustration of acceleration forward net: (1) The proposed method takes the whole image as input in testing stage. (2) The input image is mirror padded as the sampling process in the training stage. (3) The padded image is then put into the accelerated forward network which generates the whole label map in the rightmost. Note that the fully connected layer is implemented via a modified convolution layer to achieve acceleration.

3 Materials, Experiments and Results

3.1 Materials and Experiment Setup

Data Set. The proposed method is evaluated on part of the National Lung Screening Trial (NLST) data set [9]. Totally 215 tile images of size \(512\times 512\) are selected from the original high-resolution histopathological images. The nuclei in these tiles are manually annotated by the well-trained pathologist. The selected dataset contains a total of 83245 nuclei objects.

Experiments Setup. We partition the 215 images into three subsets: training set (143 images), validation set (62 images) and evaluation set (10 images). The evaluation result is reported on evaluation subset containing 10 images. We compare the proposed method with the state-of-the-art method in cell detection [4] and the traditional DCNN-based sliding window method [1].

3.2 Results

Training Time Cost. The mean training time for the proposed method is 229 s for the training set described below. The unaccelerated version with the same training strategy costs the same time as the proposed method. Besides, the state-of-the-art MSER-based method [4] costs more than 400000 s, roughly 5 days for training 143 images of size \(512\times 512\). The proposed method is able to impressively reduce several thousand times time cost of training stage than the state-of-the-art MSER-based method due to the proposed training strategy.

Table 2. \(F_1\) scores on the evaluation set
Table 3. Mean time cost comparison on the evaluation set
Fig. 4.
figure 4

Visual Comparison between the proposed method and MSER-based method [4]. The green area denotes the detected cell area by the corresponding method. Blue dots denote the ground-truth annotation. The proposed method is able to detect the cell area missed by the MSER-based method as denoted in red circle. Better viewed in \(\times 4\) pdf (Color figure online).

Accuracy of Testing. Table 2 reports the \(F_1\) score metric comparison between the proposed method and MSER-based method. The proposed method outperforms the state-of-the-art method in almost all of the evaluation images in terms of \(F_1\) scores. We also visually compares our results with the MSER-based method in Fig. 4. The proposed method detects almost all of the cell regions even in images with intensive cells.

Testing Time Cost. As shown in Table 3, the proposed method only costs around 0.1 s for a single \(512\times 512\) tile image, which is the fastest among the three methods. The proposed method accelerates the forwarding procedure around 400 times compared with the traditional pixel-wise sliding-window method, which is due to the accelerated forwarding technique.

4 Conclusion

In this paper, we propose an efficient and robust lung cancer cell detection method. The proposed method is designed based on the Deep Convolution Neural Network framework [10], which is able to provide state-of-the-art accuracy with only weakly annotated ground truth. For each cell area, only one local patch containing the cell area is fed into the detector for training. The training strategy significantly reduces the time cost of training procedure due to the fact that only around one percent of all pixel labels are used. In the testing stage, by utilizing the relation of adjacent patches, the proposed method provides the exact same results within a few hundredths time. Experimental results clearly demonstrate the efficiency and effectiveness of the proposed method for large-scale lung cancer cell detection. In the future, we shall attempt to combine the structured techniques [1113] to further improve the accuracy.