Keywords

1 Introduction

Histopathology is essential in medical imaging. Consequently, automatic histopathological computer vision has a significant influence on the overall affordability, reliability, and accessibility of healthcare. In the first stages when a person experiences symptoms of any disease or cancer he/she undergoes the examination under CT-Scans MRIs or X-ray. Then the detection of suspicious things under that scan is still doubtful to radiologists and examiners. The samples of tissues are taken from the organ and sent to histopathological labs. Thus, Histopathology or Biopsy test is done for the final confirmation of diseases or cancer especially in ML, specifically DL-based models. DL models have outperformed in a variety of disciplines, along with clinical applications and profound features in healthcare.

1.1 Whole Slide Imaging Technology in Histopathology

Under the heading of “digital pathology,” a number of technologies have been developed, such as whole slide imaging (WSI), which involves scanning whole histologic sections using a digital slide scanner to create “digital slides”. With WSI, wide-field and high-resolution microscopy pictures may be quickly collected, providing incredibly precise information regarding tissue morphology. Multiple histological parameters can be quantified using WSI and computational technologies, which offers insights into disease pathophysiology and tissue/organ biology.

WSI scanner is a robotic microscope that can digitize an entire glass slide by combining or stitching together individual pictures using software. The collected image of a slide can be seen, zoomed, and moved around spatially on a computer screen after the digital file has been retrieved, much like a traditional light microscope. The two processes that make up WSI are, in general, the digitalization of glass slides using specialized hardware (a scanner made up of an optical microscope and a computer-connected digital camera), which produces a digital image, and the viewing and/or analysis of that image through software that is in charge of image creation and management. The images obtained using this technology are frequently referred to as whole slide images, WSIs, whole slide scans, or digital slides. Each virtual image represents a complete glass slide. The last ten years have seen steady advancements in WSI technology, and today there are a number of commercially accessible scanners that can capture digital images. To create digital photographs, there are primarily two methods. While some models employ a line-scanning approach that produces linear scans of tissue sections, most models use a tiling system, in which the original slide is obtained as tiles. In order to make a single digital image of the histologic slice using either method, the tiles or line scans must be stitched together and smoothed down using a specialist software.

The field of transplantation medicine is another one where the use of WSI as a tool for histological assessment is growing. The liver is the second most transplanted organ in the US, just after the kidney.

Histopathological examination is the diagnostic and research tool for tissue disorders. An integrated suite of histopathological samples has greatly aided doctors and researchers in the world of clinical research. The identification of cancerous tissues is a critical issue for clinicians in providing appropriate oncology treatments. A whole slide image (WSI) is a digitized scanning of the tissues on the glass slide that allows the samples taken to be stored digitally on the computer system in the form of a digital picture. India is witnessing more than one million cases of breast cancer per year. WSI processing and storage have greatly aided professionals while also encouraging researchers to develop more reliable and efficient fully automated analyses, and cancer diagnosing models.

1.2 Deep Learning

It is a Machine Learning discipline that is entirely dependent on ANNs (Artificial Neural Networks). The neural networks are depicted to mimic the human brain; likewise, deep learning is also the human brain mimic (Sharma et al. 2021). The beauty of deep learning is that we do not have to program everything in deep learning explicitly. In deep learning, we have to string a model on the training dataset and also improve it till it predicts nearly correctly on both the testing and validation datasets. Deep learning models can focus on precise features on their own, with hardly a small input from the programmer, and are quite beneficial in sorting out the dimensionality issue (Kaur et al. 2022). The DL model can be broadly divided into two components.

Feature extraction phase: In this phase, we train deep architectures on a large dataset by extracting a feature using the cascade of different layers. We simply input the images and then feed them to different layers (Sharma et al. 2018).

Classification: In this phase, the images are classified into the respective class. Machine learning is a subset of Artificial Intelligence, while Deep Learning is a subset of Machine Learning, as seen in the Fig. 15.1.

Fig. 15.1
An onion chart has 3 layers. The innermost, center, and outer layers are labeled deep learning, machine learning, and artificial intelligence, respectively.

Relation between AI, ML and DL

2 Literature Survey

In (Kather et al. 2019), authors evaluated various DL models, fine-tuned them and trained on the NCT-HE-100K, which was composed of 100,000 HE patches, and tested on the CRC-VAL-HE-7K, which contains about 7000 HE patches. VGG19 has shown outstanding performance 94.3% in classifying the 9 different classes of cancer. Authors in Mishra et al. (2018) have carried out a comparative study of different deep learning models and a self-build model for the classification of the Osteosarcoma tumor images. The main aim was to come up with an efficient and accurate classification model. The self-build model has shown a magnificent performance than the fine-tune models. The dataset utilized contains about 64k image samples, which were annotated manually with the help of experienced pathologists. The accuracies of VGGNet, LeNet, AlexNet, and the self-build model were recorded as 67%, 67%, 73%, and 92% respectively.

Further, authors in Babaie et al. (2017) have introduced a dataset under the title of Kimia path24, which has 24 different types of tissue classes that were selected on the bases of their texture pattern. Three different methods employed for the retrieval and classification of the patches. The Bag-of-words method did not perform well, the local binary pattern (LBP) and CNN-based method have shown relatively good accuracy of 41.33%, and 41.80% respectively. Irum et al. performed patch-based DL that was implemented for the detection and classification of breast cancer and the relatively small dataset consists of about 300 images of four different classes of breast cancer, which were taken from the publicly available were utilized, and 70k patches were created from the same dataset for training and validating the methods and achieved an accuracy of 86% (Modeling 2021). Similarly authors in Kumar et al. (2017) introduced a dataset that is freely accessible from the KIMIA Lab official site. The dataset is composed of about 960 histopathological image samples of 20 different types of tissues. The LBP method showed a slightly good performance of 90.62%, BoVW achieved better accuracy followed by CNN of 96.50%, and 94.72%, respectively. Authors in Tsai and Tao (2020) collected Colorectal cancer (CRC) histopathological samples and utilized them as the exploratory dataset to validate optimized parameters, and the potential of the 5 most widely DL models were used to accurately classify colorectal cancer tissues evaluated by comparing performance on CRC-VAL-HE-7K, and CRC-VAL-HE-100K datasets and achieved an accuracy of 77 and 79%.

Sobhan et al. used the Kimia-Path 24C dataset which contains 24 WSIs from various tissue classes (Riasatian et al. 2021). The whole dataset is designed to resemble retrieval work activities in clinical practice. Color is a vital feature in histopathology and Color, was completely disregarded in the Kimia-Path24 dataset, with all patches stored as gray-scale since retrieved from Colored WSIs. In the Kimia-Path24C dataset, the color feature has to give great significance. To extract the interesting patches, K-means clustering and the Gaussian Mixture Model (GMM) segmentation algorithms were employed. VGG16, Inception, and DenseNet models were used as feature extraction to provide further initial findings for setting a benchmark and have achieved accuracies of 92%, 92.45%, and 95.92% respectively. Computer Science has given promising results over the decades in science and technology. Authors in Bukhari et al. (2020) acknowledged the use of computers in medical diagnosis first in 1995. Later, authors in Khvostikov et al. (2021) developed digital chest X-rays and applied them for the diagnosis of lung cancer. The cancer diagnosis was mostly done through X-rays throughout the 70 and 80 s. Many studies have been proposed for the classification and identification of various cancers using the DL and ML approaches.

Further research work done by Borkowski et al. (2019) compared models for the classification of colon cancer from the LC25000 dataset and achieved the accuracy of 99.67% for MobileNetV2 and a loss of 1.24%. Sun et al. carried out their research on the LIDC lung cancer database and used CNN for the examination of lung nodules and attained an accuracy of 89.9% in their study. Here (Howard et al. 2017), the authors used six different datasets in their study and compared CNN with DFCNet for classifying lung cancer and attained accuracies of 77.6 and 84.6%.

2.1 Research Gaps

There are some problems in the detection process:

  1. 1.

    Small nodules and less contrasted nodules are difficult to observe in CT-Scans.

  2. 2.

    Some nodules/cancer-causing cells can be missed by the models.

  3. 3.

    Some tissues of the lungs and some arteries can be wrongly detected as lung nodules.

  4. 4.

    Using the models directly on the data can give false positives often.

  5. 5.

    The key challenge in this is Whole Slide Imaging (WSI) that develops gigapixel files with the resolution of 100,000 × 100,000 causing morphological variance which causes difficulty in the visual understanding and learning from images.

  6. 6.

    The above-surveyed studies and algorithms have been performed on smaller datasets.

  7. 7.

    The quality of image enhancement in WSI and its application with CNN is required.

  8. 8.

    Various models like Alex Net, and VGG16 face vanishing gradient problems.

3 Methodology and Implementation

3.1 Problem Definition

WSI processing and storage have greatly aided professionals while also encouraging researchers to develop more reliable and efficient fully automated analysis diagnosing car models. ML, specifically DL-based models, strengthen medical imaging analysis software solutions. DL with a CNN is a rapidly growing area of histopathologic image analysis (Filho et al. 2018).

The continuous technological advancements in digital scanners, image visualization methods, as well as the incorporation of AI-powered methodologies offer opportunities for new and emerging technologies (Masood et al. 2018a). Their advantages are numerous, including easy accessibility via the web, exclusion of physical storage, and no risk of smudging depletion or slide breakdown, to name a few. Several hurdles, including heavy price, technical difficulties, and specialist reluctance to adopt an innovation, have hampered it being used in pathology (Masood et al. 2018b).

As per reports in cancer statistics, millions of people die of lung cancer and colon cancer as both have low survival chances. The reasons behind this are smoke, dysfunction of the lungs, and alcohol. It is almost impossible to find a way of healing cancer in its terminal stages. Early diagnosis of cancer can increase the survival rate. After years of advancement in the field of technology computed tomography was able to develop high resolution in the images and in the CT scans needed to be observed and scanned by radiologists. Screening of X-rays and CT scans only takes plenty of time. Substantial technological advancements have resulted in the implementation of novel digital imaging potential solutions in Histopathology WSI is gaining popularity among many pathologists for diagnosing, academic, and scientific research (Bejnordi et al. 2018). Automatic analysis of histopathology image data has greatly aided doctors and researchers in medicine, primarily because of the abundance of labeled data and technology that really can be used for analysis purposes, and specialists from various fields of computer. Despite the advancements in the field of healthcare, there is a scope for further research to increase the survival rate and early diagnosis of cancer (Zoph et al. 2017).

3.2 Proposed Work Methodology

The overall methodology of our proposed work is discussed below.

Step 1: Data Pre-processing

Pre-processing refers to the removal of extra data after giving it to the classifier. The motive of preprocessing is to enhance data by suppressing unwanted distortions or enhancing particular visual properties that are important for any further analysis and processing. Some main forms of data preprocessing involve outlier identification, missing value treatments, and removing undesired or noisy data. It is the filtering of data done to improve the accuracy of the model.

Step 2: Data Augmentation

Data augmentation refers to significantly increasing the number of images without taking the new images. We augment the image dataset by flipping the images from left to right or from right to left, rotating, zoom-in or zoom-out, and cropping. These are some data augmentation techniques that I have used in our proposed work. This technique is also used to reduce overfitting.

Step 3: Data Splitting

Data splitting is the process of dividing the data into two or more subsets. It is the most important feature of data science, especially for constructing data-driven models. This method aims to provide help in the design of data models and data-driven processes. We have divided the dataset into three subsets: training, validation, and test sets.

The proposed methodology is visually depicted in Fig. 15.2.

Fig. 15.2
A flowchart of a methodology. The flow includes dataset, pre-processing, augmentation, splice, testing, training, and validation of the data set, D L models, visualization of the performance, model evaluation, and accuracy comparison. The test data are directly evaluated.

Proposed methodology

3.3 Models Used in Our Study (DL Models)

MobileNetV2

The Mobile NetV1 was introduced to decrease complexities, and it was developed so light that it was able to be used on mobile devices. It reduces the required memory. Mobile NetV1 uses depth wise separable convolution. MobileNetV2 uses three structures in it.

(1) Depth wise Separable Convolution

This operation has two parts.

Depth wise in which the filter is applied per channel, and a pointwise filter is used asan output of the previous phase.

(2) Linear Bottleneck

Pointwise is combined with bottleneck, and linear activation is used

(3) Inverted Residual

The expansion layer is added at the block input’s beginning, and output is added together as output for the whole block.

The two blocks in it have three layers: the first layer is in the convolution layer with ReLu proceeding with the next with depthwise convolution, and the third one does convolution with nonlinearity. We have used the Adam compiler in this model and the categorical_crossentropy loss function. The general architecture of MobileNetV2 is shown in the Fig. 15.3.

Fig. 15.3
A flow diagram of mobile net V 2. An input image flows through the layers of preprocessing, 3 by 3 convolution, R e L U, and 2 by 2 max pool. It then flows to a flattened fully connected classifier and the softmax layers that consist of n number of nodes.

Architecture of MobileNetV2

This model is similar to the Mobile Net except it is based on an inverted residual structure (Hatuwal and Thapa 2020). It performs depth-wise separable convolution operations which are building blocks for many neural network architectures (Kieffer et al. 2017). It improves the performance of mobile models. It is a lightweight model and has less parameter count than MobileNet (Komura and Ishikawa 2018). There are 2,264,389 parameters in this model out of which 2,238,597 are trainable. The weights of the first 20 layers have been frozen and then we fine-tuned the model by using the cross-entropy loss function and Adagrad algorithm with a learning rate of 0.001. The last layer has been truncated and replaced with a dense layer having a SoftMax activation function to classify the images into five classes.

Xception Model

It was developed by Google scholars and is inspired by the inception model and ResNet. The convolution operations are depthwise and pointwise in it. They perform shortcut operations like ResNet. We have used the activation function as softmax in this model. We have used categorical_crossentropy as a loss function in it. In this model there are 84,936,983 total parameters out of which 84,740,315 are trainable. The architecture of Xception is shown below in Fig. 15.4.

Fig. 15.4
A flow diagram of the Xception model. An input image flows through the layers of convolution, batch normalization, activation, and pooling. It then flows to flattened and dense layers to produce outputs of colon A C A, lung A C A, and up to colon N.

Architecture of Xception model

This model is inspired by Inception V3 and stands for ‘Extreme Inception’ (Farahani et al. 2015). Both the models have the same parameters but the performance measure is dependent upon the use of parameters (Chollet 2017). It consists of 36 linear stacks of depth-wise convolutional layers and it has residual connections. There are 20,871,725 parameters in this model out of which 20,187,197 are trainable. In this model the weights of the first 16 layers have been frozen and then finetuned the model by using the cross-entropy loss function and Adam algorithm with a learning rate of 0.001 and the last layer has been truncated and replaced with a dense layer having a SoftMax activation function to classify the images into five classes.

EfficientNetV2L

This family of CNNs has higher speed in training than previous ones. EfficientNetV2 uses a 3 × 3 kernel size. In this model, we have used the global average pooling 2D layer and activation function as SOFTMAX and compiler as SGD. The architecture of EfficientNetV2L is given in Fig. 15.5.

Fig. 15.5
A flow diagram of the efficient net V 2 L. An input image flows through the layers of rescaling, convolution, batch normalization, activation, and reshaping. It then flows to flattened and dense layers to produce outputs of colon A C A, lung A C A, and up to colon N.

Architecture of EfficientNetV2L

This model is inspired by Inception V3 and stands for ‘Extreme Inception’ (Chollet 2017). Both the models have the same parameters but the performance measure is dependent upon the use of parameters (Urban et al. 2018). It consists of 36 linear stacks of depth-wise convolutional layers and it has residual connections. There are 20,871,725 parameters in this model out of which 20,187,197 are trainable. In this model the weights of the first 16 layers have been frozen and then finetuned the model by using the cross-entropy loss function and Adam algorithm with a learning rate of 0.001 and the last layer has been truncated and replaced with a dense layer having a SoftMax activation function to classify the images into five classes.

NasNet Large

This model is one of the finest models of the CNN family, which has been trained on millions of images. The model is acquainted with classifying thousands of images of large datasets, and it reuses this approach in the classification of different problems. The input shape is (331, 331, 3) in this model by default. In this model, we have used SOFTMAX as an activation function. There are 84,936,983 total parameters out of which 84,740,315 are trainable.

We have popped the last three-layer and added a new dense layer that contains 38 neurons representing the number of classes. The last layer we added is the soft-max layer to classify images. We utilized the Adam optimizer and categorical cross-entropy. The learning rate parameter was set to 0.0001, as it is always set between 0.01 and 0.0001. The minimum the learning rate, the minimum will be the loss. The model trained 30 epochs were used to update the internal model parameters., and the batch size was set to 128 as the size of the dataset is very large.

A pre-trained version of this network has learned that over a million images from the ImageNet dataset may be loaded. The pre-trained model can categorize pictures into 1000 classes. The network accepts an image with a resolution of 224 by 224.

The general architecture of NasNet Large is shown in Fig. 15.6.

Fig. 15.6
A flow diagram of the Nas net large model. An input image flows through the layers of convolution, batch normalization, activation, zero padding, and pooling. It then flows to flattened and dense layers to produce outputs of colon A C A, lung A C A, and up to colon N.

Architecture of NasNet Large

This model is inspired by the Neural Architecture Search (NAS) framework. This convolution neural network was designed by stacking copies of convolutional layers on the ImageNet dataset. NasNet achieved the accuracy of 82.7 and 96.2% on the ImageNet dataset. Different versions of NASNets can be created by varying the convolution layers and the size of filters that can outperform the accuracies of human effort models (Modeling 2021). In this model there are 84,936,983 total parameters out of which 84,740,315 are trainable; This model has been fine-tuned by truncating the last two layers and replacing them with the dense layer having a SoftMax activation function for the classification of images into 5 classes. Adam algorithm has been used in this model with a learning rate = 0.0001 and loss function as categorical_crossentropy in this model.

CNN (Proposed Model)

In this study, there is a small-sized proposed CNN model of just 15 layers which is very small in comparison to the fine-tuned models used in this study but shows good results in classifying cancers. The dimensions of the input image are (224, 224, 3). There are a total of five convolution layers in this model. The output of every convolution layer is activated with the activation function. In this study ReLU activation function is used except for the last dense layer where the SoftMax activation function is used. There are a total of 42,756,997 parameters in this model out of which 42,754,117 are trainable. The first layer in our model is the convolution layer which analyzes the input image using 32 kernels of size 3 × 3. The convolution layers are used to detect patterns from the given input. The output from the previous layer is activated using the activation ReLU function. It is used as the default activation function in various types of neural networks due to its better performance. The output from the previous layer is normalized by batch normalization which helps in the regularization of the models. It eliminates the problem of overfitting in the model. This normalization is often placed after the convolution layer. The next layer is a max-pooling layer which has a pool size of 3, 3 which reduces the dimensions of the image thereby selecting the maximum element from the feature map. The next layer consists of dropouts, and they prevent overfitting and enhance the learning mechanism of the model. But switching off neurons above 50% can cause poor learning of the model. The second and third convolution layers perform convolution operations using 64 kernels of size 3 × 3 followed by the activation function, batch normalization, and dropouts. The fourth and fifth convolution layers use 128 kernels of size 3 × 3 followed by the abovementioned layers. The last two dense layers are stacked which contain 1024 neurons. In the last layer, we have used the SoftMax activation function to predict the multinomial probability distribution of 5 classes.

3.4 Model Evaluation

Model assessment is the method of analyzing the deep learning model’s performance, as well as its strengths and limitations, using various evaluation criteria. Model evaluation is critical for determining a model’s performance during the early stages of research, as well as for model monitoring. The model performance of the proposed models is listed in Tables 15.1 and 15.2.

Table 15.1 Validation accuracy of Kimiapath24C
Table 15.2 Observed results of proposed models (LC25000)

3.5 Accuracy Comparison

We have implemented five different fine-tuned CNN models namely MobileNetV2, EfficientNetV2L, CNN, Xception, NasNet Large and achieved the validation accuracy of 98%, 96%, 94%, 99%, 99% respectively.

4 Experimentation and Results

Extensive experiments are carried out in this chapter on two publicly available datasets to assess the efficacy of the proposed automated cancer detection and classification systems developed in this thesis. The Experimentation and results are discussed in two parts, firstly on the KimiaPath24C dataset followed by the LC25000 dataset.

5 Conclusion and Future Scope

In this study, I have done fine-tuning on deep learning models and proposed a small-sized CNN for the classification of cancer images in the LC25000 dataset and KimiaPath24 dataset. Comparison of Five deep learning models have been done on both datasets. These models have achieved accuracies ranging from 93 to 99%. The highest accuracy of 99.69% was achieved in NasNet Large in the LC25000 dataset which has the size of 343 MB and has 88.9 M parameters and is 533 layers deep and on Kimia Path 24 that has the highest accuracy of 96% achieved in Efficient Net V2L which is the latest model of the Efficient Net family i.e. Efficient NetV2L which was developed in 2021 has been used in this study. I have saved h5 files of the architecture and the weights of the models used in this study. This proposed study has given better results in terms of accuracy than most cancer classification methods used in the literature survey. The computer vision-based techniques and deep learning models can assist pathologists to diagnose the different types of cancer at low cost and time. In future work, I would like to extend our work on different datasets of histopathological images and will try to improve the performance in the classification of cancers.