Introduction

Diagnostic imaging of primary liver cancers is important, because primary liver cancers are often treated through imaging diagnosis only, without pathological diagnosis [1]. Furthermore, the therapeutic strategy can differ significantly depending on the pathological subtype. For example, it has been reported that macro-trabecular and compact types, which are common in poorly differentiated hepatocellular carcinoma (HCC), exhibit higher rates of recurrence after transarterial catheter embolization than after hepatectomy or radio frequency ablation [2].

A convolutional neural network (CNN) is a machine learning algorithm that has attracted considerable attention in diagnostic imaging, because it can perform as well as or better than humans in image classification tasks [3]. The advantage of transfer learning (TL) using pretrained CNNs compared to usual deep learning algorithms utilizing untrained CNNs is that TL can achieve a high classification performance with a relatively small dataset [3]. The usefulness of TL with pretrained CNNs for liver disease has been demonstrated by several studies [4, 5]. However, the effect of misregistration of input images on the diagnostic performance of CNNs has not been fully investigated. In particular, this is an important issue for radiologists when multiple images are employed as input images for a CNN, such as in dynamic contrast-enhanced computed tomography (DCE-CT), because respiratory misregistration frequently occurs in DCE-CT imaging of the liver. Furthermore, manual registration is laborious and time-consuming for radiologists.

The purpose of this study was to evaluate the effects of image registration on the diagnostic performance of TL using a pretrained CNN and three-phasic DCE-CT for primary liver cancers.

Materials and methods

Subjects

We retrospectively evaluated 215 consecutive patients (median age = 70 years; age range = 34–85 years; male/female = 165:52) with histologically proven primary liver cancers in a single institute (Shinshu University Hospital, Matsumoto, Japan) from 2005 to 2010, including six early (eHCC), 58 well-differentiated (wHCC), 109 moderately differentiated (mHCC), 29 poorly differentiated (pHCC) HCCs, and 13 non-HCC malignant lesions containing cholangiocellular components (CCC). Written informed consent was obtained from all patients when preoperative DCE-CT was performed. The patients who did not undergo preoperative DCE-CT within 1 month before hepatectomy were excluded from the study.

DCE-CT protocol

Three-phasic DCE-CT (a pre-contrast phase and two phases after intravenous contrast agent injection) was performed at 40 (early phase) and 130 s (delayed phase) after injection, using a 64-row CT scanner. The scan parameters were as follows: the range was whole abdomen from the upper level of the diaphragm; the tube voltage was 120 kVp; the tube current was 500 mA; the matrix had 512 × 512 pixels; the field of view was 320 × 320 mm; the size of collimation was 0.625 mm; and the reconstruction thickness was 2.5 mm. A nonionic iodinated contrast agent (Iopamiron 370 mg/mL; Bayer Healthcare, Berlin, Germany) was administered intravenously through a 22-gauge catheter in the median cubital vein. The total dose was 100 mL, and the rate of injection was 3 mL/s.

TL using pretrained CNNs

TL was performed using various pretrained CNNs (Alexnet, VGG-16, VGG-19, GoogLeNet, Inception-v3, ResNet-50, and ResNet-101) and preoperative three-phasic DCE-CT images at the maximal cross-sectional lesion area. In the image presentation in TL, three-phasic DCE-CT DICOM images were manually registered to correct respiratory motion by an abdominal radiologist (A.Y.) who has 18 years of diagnostic experience. The registered three-phasic DCE-CT DICOM images were then cropped at the hepatic lesion and assigned to the three color channels of an input JPEG image for TL as follows: pre-contrast, early phase, and delayed phase images for the blue, red, and green channels, respectively. The window level and width for DICOM images were fixed as 80 and 350 Hounsfield units, respectively. The image size was transformed according to the utilized pretrained CNN (227 × 227 pixels for Alexnet, 299 × 299 pixels for Inception-v3, and 224 × 224 pixels for the other CNNs). To evaluate the effect of registration, manually registered input images were intentionally misaligned to various degrees in the three color channels by pixel shifts (0, 1, 2, 4, 8, 16, and 32 pixels), rotations (0, 1, 2, 4, 8, 16, and 32 degrees), and skews (0%, 1%, 2%, 4%, 8%, 16%, and 32%) (Fig. 1). The image with 0 pixel shift, 0 degree rotation, and 0% skew represents the original registered image. The input images with specific degrees of misalignment were divided into training (70%) and test (30%) sets, such that the proportion of histological subtypes was the same in both sets. In the transfer learning procedure, the final three layers of the pretrained CNNs, originally developed for the ImageNet dataset (1000 classes), were replaced by a fully connected layer, a softmax layer, and a classification output layer (with five classes in this study, including eHCC, wHCC, mHCC, pHCC, and CCC). To learn faster in the new layers than in the transferred layers, the initial learning rate was set to a small value (0.0001). Meanwhile, the learning rate factor for the new fully connected layer was set to a large value (20). The mini-batch size was set to 10. A classification test was performed on the pretrained CNNs after TL with 500 iterations of training using the fivefold cross-validation method. The mean value of the obtained results was utilized for a statistical analysis. All the procedures were carried out using MATLAB software (2018a, MathWorks, Natick, MA, USA).

Fig. 1
figure 1

Illustration of input image preparation from three-phasic DCE-CT for transfer learning using a pretrained CNN. Three-phasic DCE-CT images were manually registered to correct respiratory motion (misaligned value = 0). The registered three-phasic DCE-CT images were then assigned into the three color channels of an input image for transfer learning as follows: pre-contrast, early phase, and delayed phase images for the blue, red, and green channels, respectively. The manually registered input images were intentionally misaligned in the three color channels by pixel shifts, rotations, and skews with various misaligned values, to generate misaligned input images for transfer learning. DCE-CT dynamic contrast-enhanced computed tomography, CNN convolutional neural network

Statistical analysis

The diagnostic performances (DP = [number of correctly classified cases]/[total number of cases] × 100) of the pretrained CNNs after TL in the test set were compared by three general radiologists (GRs) and two experienced abdominal radiologists (ARs). The observer agreement was tested by weighted kappa. The effects of misalignments (pixel shifts, rotations, and skews) in the input image and the type of pretrained CNN on DP were statistically evaluated by two-way analysis of variance (ANOVA) and a multiple comparison test using Turkey’s honest significant difference criterion. A probability value of less than 0.05 or no overlapping in a 95% confidence interval was regarded as statistically significant. All the procedures were carried out using MATLAB software (2018a, MathWorks, Natick, MA, USA).

Results

The mean DPs for the classification of histological subtype and differentiation in primary malignant liver tumors on DCE-CT for GR and AR were 39.1% and 47.9%, respectively. The mean weighted kappa between observers was 0.92 (range 0.90–0.95).

Two-way ANOVA revealed that the type of pretrained CNN (P < 0.0001) had a significant effect on the DP, whereas the degree of misalignment in input images for TL (P = 0.17) when pixel-shift misalignment was applied (Table 1) did not. A multi-comparison revealed that GoogLeNet exhibited the highest mean DP (44.1%) using input images misaligned by pixel shift. Statistical significance was observed between GoogLeNet and some other pretrained CNNs (VGG-16, VGG-19, ResNet-50, and ResNet-101) (Fig. 2).

Table 1 Mean diagnostic performances of pretrained CNNs after transfer learning in classification of primary liver malignant tumors using three-phasic DCE-CT misaligned by pixel shifts
Fig. 2
figure 2

Multi-comparison of diagnostic performances of CNNs according to pixel-shift values in misaligned input images and the type of pretrained CNN. Circles and bars indicate the mean values and 95% confidence intervals, respectively. There was no significant difference between diagnostic performances of CNNs for registered and misaligned input images. GoogLeNet exhibited the highest mean diagnostic performance (44.1%) using input images by misaligned pixel shift. Statistical significance was observed between GoogLeNet and some other pretrained CNNs (VGG-16, VGG-19, ResNet-50, and ResNet-101). CNN convolutional neural network

Significant effects on the DP were observed for the type of pretrained CNN (P < 0.0001) and degree of misalignment of input images for TL (P = 0.001) when a rotation misalignment was applied (Table 2). However, a multi-comparison revealed that there was no significant difference in the DPs of CNNs between registered and misaligned input images (Fig. 3). GoogLeNet exhibited the highest mean DP (44.2%) using input images misaligned by rotation. Statistical significance was observed between GoogLeNet and some other pretrained CNNs (Alexnet, VGG-16, VGG-19, and ResNet-50) (Fig. 3).

Table 2 Mean diagnostic performances of pretrained CNNs after transfer learning in classification of primary liver malignant tumors using three-phasic DCE-CT misaligned by rotation
Fig. 3
figure 3

Multi-comparison of diagnostic performances of CNNs according to the rotation value in misaligned input images and the type of pretrained CNN. Circles and bars indicate the mean values and 95% confidence intervals, respectively. There was no significant difference between the diagnostic performances of CNNs for registered and misaligned input images. GoogLeNet exhibited the highest mean diagnostic performance (44.2%) using input images misaligned by rotation. Statistical significance was observed between GoogLeNet and some other pretrained CNNs (Alexnet, VGG-16, VGG-19, and ResNet-50). CNN convolutional neural network

Two-way ANOVA revealed that the type of pretrained CNN (P < 0.0001) and the degree of misalignment in input images for TL (P < 0.0001) had a significant effect on the DP when skew misalignment was applied (Table 3). There was a significant decrease in the DPs of CNNs when the skew ratios in input images were 4% and 8% (Fig. 4). Inception-v3 and GoogLeNet exhibited higher mean DPs (43.7% and 43.4%) even if skew misalignment was applied. Statistical significance was observed between these two pretrained CNNs and others (Alexnet, VGG-16, VGG-19, ResNet-50, and ResNet-101) (Fig. 4).

Table 3 Mean diagnostic performances of pretrained CNNs after transfer learning in classification of primary liver malignant tumors using three-phasic DCE-CT by misaligned skewing
Fig. 4
figure 4

Multi-comparison of diagnostic performances of CNNs according to skew values in misaligned input images and the type of pretrained CNN. Circles and bars indicate the mean values and 95% confidence intervals, respectively. There was a significant decrease in the diagnostic performances of CNNs when skew values in misaligned input images were 4% and 8%. Inception-v3 and GoogLeNet exhibited higher mean diagnostic performances (43.7% and 43.4%) using input images misaligned by skewing. Statistical significance was observed between these two pretrained CNNs and others (Alexnet, VGG-16, VGG-19, ResNet-50, and ResNet-101). CNN convolutional neural network

Discussion

Our results demonstrate the high diagnostic performance of TL using a pretrained CNN, which is comparable to experienced ARs in classifying primary liver cancers using three-phasic DCE-CT. Our results also clarify that TL using particular pretrained CNNs (GoogLeNet and Inception-v3) was robust against misregistration of DCE-CT images, even if the pretrained CNNs were trained using RGB images without misalignment in the color channels [6]. One of the common features between GoogLeNet and Inception-v3 is the inception architecture, which enables efficient parameter reduction and allows for training high-quality networks on relatively modest-sized training sets [7, 8]. This architecture may relate to the robustness against misregistration and higher diagnostic performance of TL using multiphasic DCE-CT images. However, further study is required to confirm this.

Our findings in this study can accelerate the application of TL using pretrained CNNs, not only in dynamic contrast-enhanced study, but also for multiparametric imaging, such as magnetic resonance imaging. This is because this approach can be more easily applied in a clinical setting, without time-consuming registration procedures and using smaller training datasets compared to conventional deep learning algorithms using untrained CNNs [3]. However, special caution should be exercised when applied this approach to hollow organs, such as the heart or alimentary tract, which are frequently accompanied by skew-type deformations, because some CNNs were not robust to skew misregistration.

In conclusion, TL using pretrained CNNs is robust against misregistrations and comparable to experienced ARs in the classification of primary liver cancers using three-phasic DCE-CT. Therefore, there is no need for the correction of misregistrations for TL using pretrained CNNs.