1 Introduction

Drug development and disease diagnosis are manifest through the microscopic examination of the surgical samples or biopsy. This analysis of biopsy is termed as histopathology and is generally performed manually by the pathologists. To perform diagnosis, pathologists study various properties of the biopsy like tissue structure, count of tissue cells, or disparity in the shape of the cells [1, 2]. However, this procedure has number of concerns such as time taken and costly procedure. Moreover, the knowledge of the pathologist guides the manual analysis, hence this approach is biased in nature [3]. Therefore, automatic analysis is utmost important for unbiased and fast disease diagnosis [4]. The digital transformation has digitized the biopsy in the form of images by capturing through microscopic mounted camera and termed as histopathological images. The analysis of such images through advanced computing technologies has resulted in better diagnosis. Therefore, histopathological image analysis is the prime area of medical research wherein accurate classification of histopathological images is the key step for meticulous diagnosis [5]. However, histopathological image classification is a challenging problem due to the complexity in the histopathological images [2, 6]. To illustrate the involved complexity, Fig. 1 illustrates the representative histopathological images of four types of cancers, taken from the publicly available colorectal cancer histology dataset [7].

Fig. 1
figure 1

Representative histopathological images taken from colorectal cancer histology dataset [7]

In literature, machine learning models are widely preferred for the histopathological image analysis wherein a set of biopsy images are used to train a classifier which further infers the respective class of an unknown image. The general procedure of a traditional classification system consists of three phases, namely image pre-processing, feature extraction, and classification. The procedure of extracting features from training image and modeling the optimal decision boundary for the classification is still a quite successful in medical research. However, the success of such approaches is highly dependent on the extracted features [8]. Moreover, the extracted features are dependent on the method used for the same which is likely specified by humans. In literature, many types of such techniques exist like principal component analysis, clustering of image patches, dictionary approaches, and many more. A brief review of such techniques can be found in [9].

Zhang et al. [10] assembled two random classifiers, namely support vector machine (SVM) and multi-layer perceptron to classify the biopsy images. For validation of the method, a dataset of 361 images were used which includes 119 normal tissue images, 102 carcinoma in situ, and 140 lobular carcinoma images. The classification accuracy, given by the proposed system, was 99.25% which was a good accuracy for the considered dataset. Further, Kowal et al. [11] used three different classification techniques after the nuclei segmentation for the categorization of breast cancer images into benign and malignant classes. For the same, they first performed a nuclei segmentation using four clustering methods and then extracted the features to train the classifiers. The trained classifier gives the 96% accuracy for the dataset. Similarly, Filipczuk et al. [12] discriminated benign or malignant biopsies by using four traditional learning models, namely KNN (K-nearest neighbor), naive Bayes classifier, decision tree and SVM with an accuracy of 98.51%. Moreover, Asri et al. [13] performed a performance comparison among four machine learning methods, namely SVM, decision tree, naive bayes, and KNN on Wisconsin breast cancer image dataset, having total 699 images of benign and malignant classes. Out of these four machine learning models, SVM obtained the best accuracy of 97.13%. Although traditional machine learning models perform good in case of histopathological image classification, their accuracy is highly dependent on the extracted features [14] which are decided by human being and may be biased towards human knowledge and experience. Instead of human involvement, a better approach would be letting the machine learns the optimal features from the input data and performs the required analysis. This type of automated feature extraction is the main reason and success factor for deep learning models. Deep learning based models have been successfully applied in various applications like image classification, machine translation, speech recognition and many more.

Deep learning models are composed of large network of layers made up with neurons and perform classification by learning features internally. Deep learning models have reported outperforming results in histopathology image analysis, such as mitosis detection [15], tissue grading (classification) [8], and nuclei segmentation [16] from the high-resolution images. Generally, convolutional neural network (CNN) has been quite successful deep learning model for histopathological image analysis, especially for detection [17, 18] and classification [19,20,21]. The architecture of CNN transforms the input data to output by using a combination of different layers like convolution, pooling, and drop-out. Lo et al. [22] used the CNN for the first time on medical image. However, the first CNN that succeeded on a real-world application was LeNet [23] and solved the hand-written digit recognition. With the advancement in computing systems, there has been potential growth in the use of CNN based methods for automated classification of histopathology images, specifically after the introduction of AlexNet which won the ImageNet challenge with a large margin. Saha et al. [24] used handcrafted features, like intensity, morphological, and textual features, with deep learning model and achieved superior accuracy in the detection of mitoses from histopathological breast images. Further, a Han et al. [21] presented a new deep learning model for multi-class cancer classification from the histopathological breast images. Zheng et al. [25] introduced a new architecture based on CNN for the breast tumor classification. Litjens et al. [26] reviews various such models for the histopathological image analysis.

Although, CNN shows better performance for various image classification problems, it still lacks for histopathological image classification due to the lack of number of labeled histopathological images. As in CNN, large number of parameters are to be tuned which may lead to over-fitting problem in the model. To reduce the over-fitting problem, a large number of labeled histopathological images are required for training. However, to obtain the labeled images is a costly process due to the dependency on pathologists. Therefore, in case of limited histopathological image dataset, an efficient CNN model is required which should have fewer parameters to tune and can perform good on smaller dataset. Hence, in this paper an efficient light weighted CNN model is presented, especially for histopathological images classification with small dataset. The performance of the proposed model is validated against H&E stained histopathological cancer images taken from the colorectal cancer histology dataset and compared with different traditional machine learning methods.

The organization of rest of the paper is as follows: Sect. 2 briefs the standard layers of a convolutional neural network. The proposed convolutional neural network has been detailed in Sect. 3. The experimental results are discussed in Sect. 4. Finally, conclusion is drawn in Sect. 5.

2 Preliminaries

2.1 Convolutional neural network

A convolutional neural network (CNN) is a sequence of multiple layers, where each layer may belong to one of the five main layers, namely convolutional, non-linear activation, pooling, drop-out, and full-connected, CNN takes the input image and models the best representative features to attain high accuracy. Generally, it has been used for the image classification tasks, while its other applicability domains include the transfer learning, wherein a pretrained CNN is applied on new problem domain for either feature extraction or classification task. The architecture of a typical CNN is illustrated in Fig. 2. The detailed overview about each layer is discussed below.

Fig. 2
figure 2

The architecture of a typical convolution neural network [27]

2.2 Convolution layer

This layer corresponds to apply the convolutional operation on the input values. Specifically, the input to this layer is a matrix and convolved with ‘K’ learnable filters (or kernels) to generate ‘K’ new feature maps. A feature map is the summation of the dot product between the filter value and input value along with an added bias. Figure 3a represents the working of convolution operation.

Fig. 3
figure 3

Functionalities of convolution and ReLU layers of CNN

2.3 Activation layer

In this layer, the generated feature map is mapped to a non-linear value by using non-linear activation functions. In CNN, rectified linear unit (ReLU) has been the most widely used activation function. It returns zero if the input value is less than zero else the input value is returned. Figure 3b depicts the function for the same. Other preferred activation functions are tanh and sigmoid. Usually, convolutional layer and activation layer are used in combination.

2.4 Pooling layer

In pooling layer, input values are down-sampled with focus on extracting relevant and important features. This layer benefits in reducing the computational complexity by performing the spatial dimensionality reduction of the given input values. Generally, there are two types of pooling layers, namely average pooling and max-pooling, out of which max-pooling is the most popular one. In max-pooling, maximum value from a region of input is filtered out by placing a kernel (usually of size 2 × 2) over the considered region. Figure 4 depicts the max-pooling operation.

Fig. 4
figure 4

The max-pooling operation [28]

2.5 Drop-out layer

In this layer, a set of neurons are randomly de-activated which results in generating zero output while training the CNN. The main reason of this layer is to avoid over-fitting and generalizing the model.

2.6 Fully connected layer

The neuron of this layer is connected to every neuron of the previous layer which is conventional to the hidden layer of a multi-layer neural network.

3 Proposed light weighted CNN

The paper proposes a new architecture of the convolutional neural network for the histopathological image classification as depicted in Fig. 5. The presented CNN model contains 01 input layer, 05 subsequent blocks of convolution layers, drop-out layer and max-pooling layer, and 01 fully connected layer. In complete CNN model, there are 16 convolutional layers, 05 dropout layers, 05 max-pooling layers, and 01 fully connected layer. As shown in Fig. 5, the first layer is the input layer, containing 150 × 150 × 3 neurons. The number of neurons in the input layer is generally equals to number of pixels in the input image. In this work, each input color image contains three channels, each of size 150 × 150. The input layer is followed by first block, containing four subsequent layers of convolution operation, 01 drop-out layer and 01 max-pooling layer. Each convolutional layer of first block consists of 16 filters of size (3 × 3) with activation function as ReLU and same padding. To overcome the problem of over-fitting, the sequence of convolution operations is followed by a drop-out layer with a significant probability (0.3). The drop-out layer is further connected to max-pooling layer with filter size of (3 × 3). The max-pooling layer is used to reduce the dimensions of the feature maps, generated by the convolution operations.

Fig. 5
figure 5

The architecture of the proposed CNN

The output of first block is given to next block which also contains four convolutional layers with 32 filters of size (3 × 3), the drop-out layer with probability 0.2 and max-pooling layer. In the next block, similar four convolutional layers have been used with 64 filters of size (3 × 3), followed by the drop-out layer with probability of 0.1 and max-pooling layer. Then, the fourth block contains three convolutional layers with 128 filters of size (3 × 3), drop-out layer with probability of 0.05, and max-pooling layer. The output of this layer is used by the last block of single convolutional layer, carrying 256 filters of size (3 × 3), a drop-out layer with 0.05 probability, and a max-pooling layer. Lastly, a dense layer with activation function as softmax is used to perform the classification task. For illustration, Fig. 5 represents the architecture of the proposed convolutional neural network. In the proposed model, the drop-out probability is reduced from 0.3 to 0.05 as dependencies generally occur at the initial layers which cause the over-fitting problem. Furthermore, the number of filters are also varied from first block to last block to capture the significant feature map.

4 Experimental results

4.1 Considered dataset

This paper uses the colorectal cancer histology dataset which is made publicly available by Kather et al. [29]. The dataset consists of histopathological images of human patients with colorectal cancer and represents different texture patterns. The dataset consists of eight categories, namely stroma, debris, adipose, mucose, tumor, lympho, complex, and empty. This dataset is a collection of RGB colored images with 0.495 µm per pixel, captured at the magnification of 20 ×, and digitized with an Aperio ScanScope (Aperio/Leica biosystems) [29].

4.2 Results

To validate the performance of the proposed CNN model, a confusion matrix, generated by it, is shown in Fig. 6. In the confusion matrix, x-axis represents the predicted labels and y-axis depicts the true labels. As there are eight classes of images, 8 × 8 size confusion matrix is generated. From the confusion matrix, it can be visualized that for the classes mucosa, tumor, and debris, the classification accuracy is greater than 90%. From stroma, complex, and adipose classes, 84%, 77%, and 74% respectively images are correctly identified. However, for empty and lympho classes, the prediction is lower than 70% due to various variations available in the images. Moreover, to judge the efficiency of the proposed method, precision, recall, F1-score, and support measures are computed and presented in Table 1. From the table, it can be seen that the minimum precision is 0.55 for complex class while the highest is 0.97 for mucosa class. Similarly, other parameters values are good for all classes as maximum parameters values are greater than 70% which signify the efficiency of the proposed CNN.

Fig. 6
figure 6

The confusion matrix, generated by the proposed CNN

Table 1 Performance of the proposed CNN with respect to precision, recall, F1-score, and support

To compare the performance of the proposed method, four classifiers are considered, namely 1-nearest neighbor (1-NN), linear basis function support vector machine (linSVM), radial basis function support vector machine (rbfSVM), and ensemble of decision trees (ensTree). As stated, a machine learning model learns from a set of extracted features from the input dataset rather than the input directly. In this paper, different feature extraction methods are considered, namely higher-order histogram features (HOHF), local binary patterns (LBP), gray-level co-occurrence matrix (GLCM), Gabor filters (GF), and perception-like features (PF). Therefore, the comparison models are named accordingly i.e., 1-NN-HOHF, ensTree-HOHF, linSVM-HOHF, and rbfSVM-HOHF for higher-order histogram features (HOHF). Similarly, other names are presented in association with classifiers and respective feature extraction methods which give a total 20 methods for comparison. For performance analysis among the proposed and considered methods, the classification error has been computed on the same dataset. Table 2 tabulates the classification error of the proposed CNN against the considered models. Since, the comparison models are deterministic, the results are taken from [7]. It can be observed from the table that if the classifier is 1-NN, then for HOHF features, it shows the best performance with error rate 35.6%. For ensTree classifier, the features extracted from GLCM provide the best error rate of 40.9%. In case of linSVM, LBP features give 24.6% error rate which is least among other feature extraction methods. Similarly, for rbfSVM, LBP features show the minimum error rate of 23.8%. The worst performance of 52.4% error rate is given by 1-NN classifier with PF features. This signifies that no single feature extraction method can give the optimum features and different classifiers give variations in the classification performance. That’s why, deep learning methods are preferred to classify the histopathological images. From the table, it can be visualized that the proposed CNN achieves the lowest classification error i.e., 22.7% among all other methods. For more visual analysis, the error rates, generated by various methods are depicted in bar graphs as shown in Fig. 7. From the bar graphs also, it can be seen that the proposed CNN has the smallest bars as compared to all the four classifiers and respective feature extraction methods. Therefore, it can be stated that the proposed CNN may serve as an alternative solution for the histopathological image classification.

Table 2 Classification error of the proposed CNN and considered machine learning methods
Fig. 7
figure 7

Graphical comparison among the proposed CNN and considered methods

5 Conclusion

This paper presents a new architecture for the convolutional neural network for the classification of the histopathological images. The proposed convolutional neural network has been defined with multiple combination of convolutional layer, activation layer, max-pooling layer, drop-out layer, and dense layer. The experimental analysis of the proposed convolutional neural network has been conducted for the colorectal cancer histology dataset which is publicly available. The dataset contains RGB colored images, having eight classes. The performance has been analyzed in terms of precision, F1-score, recall, support, confusion matrix, and classification error. For fair analysis, the proposed method has been compared with 20 different methods. The comparative methods are created using different existing machine learning models which works on manually extracted features. From the experimental results, it can be visualized that the proposed convolutional neural network provides the lowest error rate of 22.7% as compared to other considered methods. In future, different layers and their combinations may be considered for the improvement.