Keywords

1 Introduction

The skin is the body’s biggest organ, protecting all of the inner organs from the environment. It aids in temperature regulation and infection protection. Thereare three layers to the skin: The epidermis, dermis, and hypodermis are the three layers of the skin.

CCancer is a life-threatening disease for humans. It can sometimes result in a human’s death. In the human body, various types of cancer can exist, and skin cancer is one of the most rapidly developing tumors that can lead to death. It is triggered by a variety of circumstances, including smoking, alcohol consumption, allergy, infections, viruses, physical stress, changes in the environment, and sensitivity to ultraviolet (UV) rays, among others. UV radiation from the sun have the potential to destroy the DNA within skins. Skin cancer can also be caused by odd inflammations of the human body. According to the World Health Organization (WHO), Skin cancer is one out of three problems in every cancer cases [23]. In the United States, Canada, and Australia, the amount of persons detected with skin cancer has already been steadily growing over the previous few years. In the United States, it is estimated that 5.4 million cases of skin cancer will be detected each year. Every day, there is a growing pressure for speedy and accurate clinical testing [29].

AAs a consequence, timely identification of skin cancer may result to earlier detection and treatment, potentially saving lives. Various forms of computer-aided diagnosis (CAD) methods have been designed to detect skin cancer throughout the last few years. In order to identify cancer, conventional computer vision techniques are mostly employed as a detector to capture a large number of attributes such as shape, size, color, and texture. Artificial intelligence (AI) has evolved into a capability to address these issues in recent years. There are some architectures that mostly uses in the medical fieldLike DNN (Deep neural network), CNN (convolutional neural network), LSTM (long short-term memory), and recruit neural network (RNN). All of those models are able to check for skin cancer. Furthermore, CNN and DNN create satisfied results in this field. The most often used method is CNN, which is a collection of classification algorithms for feature learning and classification techniques. Now the outcome will be increased using transfer learning within vast data sets.

The following is a summary of our paper’s primary contribution:

  • We present a transfer learning model based on the GoogLeNet model that more effectively detects skin cancer people, even if they are at a preliminary phase.

  • With a large dataset, our suggested transfer learning model performs better in terms of accuracy than other deep learning (DL) models that are currently available.

The remainder of this paper is organized as follows: The rest of the part of this paper is managed and Sect. 2 shows the literature review. Section 3 explains the methodology. Section 4 contains the results and discussion, while Sect. 5 contains the conclusion and future work.

2 Related Work

In [6], A. Enezi et al. have proposed two machine learning algorithms. Feature extraction is done using a convolutional neural network (CNN), and classification is done with a support vector machine. The proposed system can successfully detect three skin diseases, and the accuracy rate is 100%. The system was fast and accurate. The system can only detect three diseases, and being web-based, it was not helpful to everyone.

In [33], Vijayalakshmi, M.M. et al. have offered three alternative techniques to effectively identify and categorise melanoma skin cancer. The model is designed in three phases. Pre-processing involves removing hair, glare, and shade from photos in the initial stage. Segmentation and classification form the second phase. Utilising a Convolution Neural Network to extract features (CNN). In order to classify the photos, a neural network and support vector machine are used. They have an accuracy of 85%.

In [28], Rathod, J., Waghmare, V., Sudha, A., and Shivashankar, et al. have proposed an automated image-based system for skin disease recognition using machine learning classification. The proposed system extracts the features using a Convolutional Neural Network (CNN) and classifies the image based on the algorithm of the softmax classifier. An initial training gives the output accuracy of 70% approximately. In this paper, they initially tested five diseases. We can further increase the accuracy by more than 90% if we use a large dataset.

In [10], Bhadula, S., Sharma, S., Juyal, P., and Kulshrestha, C. et al. have included five different machine learning techniques that were used on a dataset of skin infections to predict skin diseases. These algorithms are random forest, naive Bayes, logistic regression, kernel SVM, and CNN. Above all, the algorithm Convolutional Neural Network (CNN) gives the best training and testing accuracy of 99.05%. Early diagnosis and classification of skin diseases helps to lessen the disease’s effects. In this study, the researcher have some limitation with access to and availability of medical information.

In [18], Bhadula, S., Sharma, S., Juyal, P., and Kulshrestha, C. et al. have included five different machine learning techniques that were used on a dataset of skin infections to predict skin diseases. These algorithms are random forest, naive Bayes, logistic regression, kernel SVM, and CNN. Above all, the algorithm Convolutional Neural Network (CNN) gives the best training and testing accuracy of 99.05%. Early diagnosis and classification of skin diseases helps to lessen the disease’s effects. In this study, the researcher have some limitation with access to and availability of medical information.

In [24], Padmavathi, S., Mithaa, E.M., et al. have proposed convolutional neural networks (CNN) and residual neural networks (ResNet) to predict skin disease. A dataset of 10015 dermatoscopic images divided into seven classifications was used in this study. The experimental results show that the Convolutional neural network has an accuracy of 77%, whereas ResNet has an accuracy of 68%. They mentioned that Convolution Neural Networks perform better than Residual Neural Networks in diagnosing skin diseases.

In [13], EL SALEH, R., BAKHSHI, et al. have mentioned a Convolutional neural network model named VGG-16 for face disease identification. A dataset comprising ten classes and each containing 1200 photos is used to test and analyse the suggested approach. The model can successfully identify eight facial skin diseases. Python is utilized to implement the algorithms, while python OPENCV is employed for pre-processing. The model achieves an accuracy of 88%. Further, we can improve the model by increasing the dataset size and applying a new deep neural network.

In [5], Samuel Akyeramfo-Sam, Derrick Yeboah et al. proposed an intelligent way to detect skin diseases by Machine learning (ML) that uses Convolutional Neural Network (CNN), decision trees (DT), artificial neural network (ANN) support vector machines (SVM). The CNN model and the pattern learned are used to classify the test dataset. The system is successful in detecting three types of diseases. The average accuracy is 85.9%.

In [12], JinenDaghrir, LotfiTlig et al. raised an automated system to detect melanoma using the three different methods. This relies on a convolutional neural network, Two classical machine learning methods. Taking the feature out, a training phase is necessary to create a classification model for melanoma detection. The support vector machines (SVMs) process the image and calculate complexity. However, they suggested comparing a KNearestNeighbor (KNN) classifier and an Artificial Neural Network (ANN), which showed that the ANN was more accurate than the KNN. Though their raised system is so fast and successful, they work only on some diseases. They can use CNN to improve the system.

In [31], Xiaoxiao Sun et al. proposed an automated skin diseases detection system. They work on some datasets to detect some skin diseases. Their paper presents no exact method and system. This paper introduces and comes out with a data set to find some skin diseases. The main success flow is data collection, which we will use in the future.

3 Methodology

The technique we recommend for detecting Skin cancer is outlined in this section. The approach is separated into several sections. In this methodology, we first gather a training dataset. After gathering the training dataset, we pre-process the dataset to obtain clean image data for better input and carry out data augmentation. Data analysis is the last step before classification and a learning model are created. Figure 1 illustrates our research’s general technique.

Fig. 1.
figure 1

Proposed Model

3.1 Dataset Description

The data were obtained via Kaggle [1]. There are a lot of contrasting pictures in the collection. Nine types of data are included in the dataset. There are nine types of skin cancer:Actinic Keratosis, Basal Cell Carcinoma, Dermato Fibroma, Melanoma, Nevus, Pigmented Benign, Keratosis, Seborrheickeratosis, Squamous Cell Carcinoma, Vascular Lesion. The system takes the picture and compare those picture with the dataset and perform some action.

3.2 Data Preprocessing

There are taken pictures in the collection. However, as GoogleNet is built to accept coloured photos with an input layer size of 224 224 3, pre-processing is necessary. Z-score Normalization was used to first standardize the intensity levels of the images. Equation 1 was used to normalize each image’s value to be within the range of 0 to 1.

$$\begin{aligned} z=\frac{x-\sigma }{s} \end{aligned}$$
(1)

where s represents the standard deviation of the training sample and x represents the training sample.

3.3 Data Augmentation

Due to insufficient training data, deep learning models such as GoogleNettransfer learning model for skin disease classification become hampered. To increase the stability, and expand the functional variety of the model, more data is needed. To achieve this, we employ augmentation [30] to significantly skew the dataset. Image augmentation techniques include rotation, width shift, shear range, height shift, and zoom. The model can now generalize more effectively thanks to the enhanced data. In this regard, we have utilized Image Data Generator. The settings for data augmentation used in this study are as follows:

Table 1. Data augmentation settings

3.4 GoogLeNet

GoogLeNet is a 22 layers deep CNN (Convolutional Neural Network). GoogLeNet features nine linearly fitted inception modules. The architecture, which determines the global average pooling, which replaces fully connected layers withaverage of each map’s feature. Nowadays, GoogLeNet is now utilized for various computer vision tasks, including face detection and identification, adversarial training, and so on. GoogLeNet took first place in the ILSVRC 2014 competition thanks to the inception block, which utilizes parallel convolution to enhance the width and depth of networks. The specifics of each inception block are shown in Fig. 2. Each inception block employs four routes to obtain detailed spatial information. To lessenThe use of 1\(\,\times \,\)1 convolutions is based on feature dimensions and processing costs. BecauseAfter each inception block, features are concatenated; if no constraints were placed in place, computation costs would increase as feature dimensions in a matter of steps increased. The intermediate features’ dimensions are reduced by utilizing 1-by-1 convolutions. Each path’s units have a different filter after convolution. Widths to guarantee that separate local spatial feature sets may be retrieved and combined. Noteworthy is the use of max-pooling in the final approach, which removes the ability to extract new features eliminates the requirement for additional parameters. Following the integration of all the data, Google Net topped the ImageNet classification test of the well designed architecture.

Fig. 2.
figure 2

Inception Block.

3.5 Xception Model

DDeeply separable Convolutions are employed in the Xception deep convolutional neural network design [11]. Francois Chollet, an employee at Google, Inc., introduced this network. It’s call extreme version and it’s come from Inception module. An Inception module is called a deep neural network (DNN). The inventor of this model inspires by a movie that’s name is Inception (directed by Christopher Nolan) is a movie released in 2010 Xception is a 71 layers convolutional neural network. Its accuracy will more from Inception V3. For this assumption it’s an extreme version of Inception. Xception, which stands for “extreme inception,” pushes Inception’s core concepts to their absolute extent.1\(\,\times \,\)1 convolutions were used to compress the original input in Inception, and different sorts of filters were applied to each depth space from each of those input spaces.With xception, the opposite happens. Instead, Xception applies the filters to every depth map separately before compressing the input space all at once with 1X1 convolution. A depthwise separable convolution is quite similar to this method. There is another difference between Inception and Xception. Whether or not there is a non-linearity after the previous trial. While Xception doesn’t introduce any non-linearity, the Inception model has a ReLU non-linearity that follows both processes.

3.6 DenseNet Model

Dense Neural network (DenseNet) is working like a feed-forward fashion [35]]. It’s connecting each layer. Main focus point of this model is to go deeper and eventually take care about to making them more efficient to train. If we think about other neural network then we can see there are L connection for L layers but for DenseNet our network has L(L+1)/2 direct connections. Image classification is main, fundamental and essential computer vision task. VGG has 19 layers, the original LeNet5 had 5, and Residual Networks (ResNet) have crossed the 100-layer threshold. These models could encounter issues including too many parameters, gradient disappearance, and challenging training. In comparison to models like VGG and ResNet, the Dense Convolutional Network (DenseNet) exhibits dense connection. Direct connections from any layer to all subsequent layers distinguish the DenseNet model from other CNNs and potentially enhance the information flow between layers. As a result, DenseNet may effectively minimize some parameters, improve feature map propagation, and solve the gradient vanishing problem.

3.7 Inception-Resnet V2 Model

More than one million pictures were used to train the convolutional neural network named Inception-Resnet V2. In this case, ImageNet was used to train the model [34]. The Inception-Resnet V2 model contains a 164-layer network that can classify images into 1000 different object categories, including pencil, mouse, keyboard, and animal images. Through a comparative investigation and examination of the classification model’s structure, an improved Inception-ResNet-v2 model based on CNN is created in order to increase the convolutional neural networks (CNN) accuracy in image classification [27]. Model Inception-ResNet-v2, which can extract features under various receptive fields and lower the number of model parameters. In addition, it creates a channel filtering module based on a comparison of all available data to filter and combine channels, realizing efficient feature extraction.

3.8 Transfer Learning

A technique called transfer learning uses a model that has already been trained to learn new information from an existing set of data [32]. There is an input space (Ds), a training task (Ts), a target domain (Dt), and related data in the input space. Transfer learning seeks to raise trainee performance on a given task (Tt). Data from Ds and T are combined. Different transfer learning settings are established depending on the type of task and the nature of the data available at the source and destination domains. The transfer learning method is called “inductive transfer learning” when both the source and target domains have labelled data available for a classification task [25]. The domain in this instance is D = (xi, yi), where xi is the feature vector of the ith training set and yi the classifier. There are 24 million trainable parameters in the 164-layer Google Net. For training and optimization, this kind of deep model needed a sizable dataset, which is why Google’s Neural Network (googleLeNet) was trained on the ImageNet dataset, which has over 1.2 million photos organised into 1000 different categories.

As a result, smaller datasets, such as the skin cancer, are more easily analyzed. Overfitting is likely to be a problem for the model. It is at this stage that the transfer takes place.

This is where learning plays a role. We use pretrained weights to create the model. After which you should fine-tune it to ensure that it can complete the task at it, which in our instance was. For smaller datasets, such as brain tumours, we don’t need to start from scratch when training the model. Because the GoogLeNet model was designed for a different purpose, considerable structural changes are needed to classify skin cancer. The last three layers of the GoogLeNet model were tweaked to fit the intended purpose. The average pooling layer in the flatten layer and the fully connected layer of googleNet were added to the original model to replace it was also scrapped, along with a system that was supposed to categorise 1000 separate classifications. Four output sizes in a new FC layer have been added. After the FC layer, the softmax layer’s activation was similarly changed out for a fresh one.

4 Result and Discussion

This section describes our approach’s experiments and outcomes.

4.1 System Configuration

In this system we used a tensor-flow neural network to run the convolutional neural network. The most probable reason is to use this network for there are several matrix multiplication. We face some problem when we work on this and only CPU processing is the most reason for our work and then we use Google collaborative cloud server. And then easily we operate the CPU and Jupyter Notebook. After use those we can train and evaluate our proposed deep learning approach

4.2 Training and Test Datasets

In this experiment we use 2357 colored 521X512 sized image of Skin Cancer. The total number of images in each class in given in the following table:

Table 2. Different parameters

4.3 Transfer Learning Model’s Hyperparameters

Categorical crossentropy is the loss function we’re employing to train our model. We train our approach for a maximum of 50 epochs because there are no more such variances in training and validation levels of accuracy. The loss function is optimized using Adam’s optimizer. In our method, Table 4 illustrates the best-configured hyper-parameters. The total number of epochs and batch size in our test are 50 and 16, respectively.

Table 3. Hyperparameters

4.4 Performance Matrices

For the traditional evaluation of a classification model, a number of performance parameters are described. The most often used statistic is classification accuracy. Classification accuracy is determined by the ratio of correctly classified observational data to the total number of observational data. Precision, recall (or sensitivity), and specificity, which are all significant measures in classification problems, can be calculated using the following equations. The number of classified true positives, false positives, true negatives, and false negatives is denoted by the letters TP, FP, TN, and FN. The harmonic mean of precision and recall is used to calculate the F-score, which is a useful statistical tool for classification.

The following equation can be used to determine accuracy, precision, recall, and f-score.

$$\begin{aligned} Accuracy=\frac{TP+TN}{TP+FN+FP+TN} \end{aligned}$$
(2)
$$\begin{aligned} Precicion=\frac{TP}{TP+Fp} \end{aligned}$$
(3)
$$\begin{aligned} Recall=\frac{TP}{TP+FN} \end{aligned}$$
(4)
$$\begin{aligned} F-Score=\frac{2 \times Precision \times Recall}{Precision + Recall} \end{aligned}$$
(5)

4.5 Result

We take our model and try to find the final result and now we get training result is 91.67% and loss is 1.11%. In the same we also get 89.93% accuracy and 1.99% loss for testing data. All those data are given bellow in the Table 3.

Table 4. Accuracy and Loss

Our calculation will be more perfect for our classification when the test dataset contains an equal number of observations for each class. Otherwise our dataset will be used to solve the aforementioned classification problem. This model needs a more in-depth analysis of the proposed strategy employing additional performance metrics. Table 4 shows the Precision, Recall, and F1-score of our recommended transfer learning method as well as a comparison to alternative methods like DenseNet, Xception, and InceptionResnetV2. Precision, recall, and F1-score for our suggested technique are 0.785, 0.687, and 0.733, respectively. Our approach also outperforms the other three pre-trained algorithms, as seen in the table.

Table 5. Performance Matrices
Fig. 3.
figure 3

Training and Validation Accuracy of GoogLeNet

Fig. 4.
figure 4

Training and Validation Loss of GoogLeNet

In this measurement our CNN-based transfer model was trained with more iterations condition (up to 50) The optimal set epoch is 50 because our model’s training and validation accuracies have not increased. The accuracy and loss of the model are depicted in Fig. 3 and 4. In Fig. 3, the training accuracy is less than 72.5 percent, and the validation accuracy is less than 80.00 percent at the first epoch. In fig 4, the training loss is more than 1.4 and the validation loss is greater than 1.1 at the starting epoch. When the number of epochs is increased, accuracy improves and loss decreases.

4.6 Comparison with Existing Work

[13] This paper applied the VGG-16 model to identify skin defects. A database of 12,000 photos was used to train and verify the model. It has an 88% accuracy rate.[8] Skin diseases may be predicted with 77% and 68% accuracy using deep learning neural networks (CNN) and Residual neural networks (ResNet). Additionally, it has been discovered that Convolution Neural Networks outperform Residual Neural Networks in diagnosing skin diseases. To increase the accuracy, they might need to create a hierarchical classification algorithm utilizing retrieved photos. Predictions may therefore be made more often than with earlier models by utilizing ensemble features and deep learning.[10] TThe system analyses an image, and performing the most important part feature extraction using the CNN method and show the SoftMax image classifier to identify diseases. An initial training results in an output accuracy of about 70%.

Table 6. Comparison with Existing Work

5 Conclusion and Future Work

The categorization of skin malignancies using transfer learning with GoogLeNet was discussed in this work. We categorised skin cancer into nine kinds in our study, which is the most comprehensive categorization of skin cancer to date. We used data augmentation techniques for the existing dataset because we needed a large amount of data for effective training and deployment of CNN-based architecture. We were able to obtain the required result with this method. The suggested approach greatly outperforms state-of-the-art models, with precision, recall, and F1 scores of 76.16%, 78.15%, and 76.92%, respectively, according to the exploratory research. Using a variety of performance matrices, including the weighted average and total accuracy In addition, the model demonstrates its capability. More research can be done to analyze and understand the situation. In future we will collect more data to detect skin cancer disease and we will work with some other deep learning method and other method [2,3,4, 7,8,9, 14,15,17, 19,20,22, 26, 36].