1 Introduction

Cancer is a significant health problem in the world nowadays. Every sixth death in the world is due to cancer, making it the second leading cause of death after cardiovascular diseases. Cancer (malignant or tumor) is an abnormal growth of cells, which cells reproduce in an uncontrolled way that spread to other tissues (Roser and Ritchie 2019). Skin cancer is an unusual growth in skin cells. It effects the skin that exposed to sunlight. Skin cancer can affect all people and have the higher risk to those who have light skin. Skin cancer can occur any time of life, and it can grow to other inner cells of the skin and can damage its DNA. The major reason is because of exposing to Ultraviolet (UV) radiation than other reasons such as ozone coating exhaustion, chemical exposure and genetic issues (Yogita et al. 2018; Maiti and Chatterjee 2019).

Deep learning neural network based medical image analysis techniques are helpful for dermatologist in skin lesion segmentation,classification and detection (Indra et al. 2020). Machine learning techniques dynamically classify, predict, and detect different skin lesion using annotated skin lesions images produce by dermoscopy (Kittler et al. 2002; Salerni et al. 2013; Pan and Yang 2009; Tschandl et al. 2018). On the other hand, automatic recognition of dermoscopy images is a difficult task because of some factor such as skin lesion segmentation as it differs in its texture, color, size, location and shape. Second, the surroundings such as hair, ruler marks and veins (Hosny et al. 2018).

In recent years Deep Learning neural network has been used for Medical Image Segmentation. Convolutional Neural Networks (CNN) which works like feedforward neural network with additional of techniques such as convolutions, max-pooling, batch-normalization, dropout, etc. Generally CNN takes 2D input image and apply 2D filters. Transfer Learning is a technique that can be used for challenging problem which has the large medical dataset. The models are trained by freezing few layers and the last layer is replaced by a new softmax layer that classifies the lesion images into seven different classes.

Due to the development in image processing and machine learning techniques, classification tasks can be done better than humans by artificial convolutional neural network(Gavrilov 2017; Nagarajan et al. 2020). In New York Oncology Center, the decision-making in diagnostics and treatment of the patients with ontological diseases are done by deep learning methods of artificial intelligence techniques in medicine is the IBM Watson system which is the best example (Tsvetkova 2017) for the decision-making systems.

We proposed a deep learning architecture with transfer learning technique that classifies seven different classes of skin lesion. We select three pre-trained deep learning architecture namely VGG19, VGG16 and Inception V3 which are pre-trained architecture on ImageNet that achieves higher accuracy. In this study first the preprocessing such as normalization, resizing the image and data argumentation is done. Second the deep neural networks was developed by using pre-trained networks. Finally, the performance of three pre-trained architectures is evaluated.

The remaining part of the paper is organized as follows: a brief discussion on Background work in Sect. 2. Dataset are explained in Sects. 3 and 4 on details of Methodology. The experimental results of the models are presented in Sect. 5. Finally, conclusion on Sect. 6.

2 Related work

Hosny et al. (2018) proposed a AlexNet pretrained deep learning architecture by replacing the last layer with the softmax layer that has three classes namely common nevus, atypical nevus and melanoma.

Dorj et al. (2018) deals with rapid classification system of skin cancer by using the efficient deep convolutional neural network which mainly focus on classifying the skin cancer using ECOC SVM, and deep convolutional neural network. The collected images have noises that are cropped to remove the noise which gives better results. A pre-trained AlexNet convolutional neural network model is used in Transfer learning. The results obtained for a total of 3753 images, which include four kinds of skin cancers images. Gavrilov et al. (2019) proposed an Inception V3 architecture by removing the last layer and adding one neuron to classify the skin lesion. This model archives the accuracy of 91%, that is compared with the diagnostic done by extremely qualified dermatologists.

Guo et al. (2018) propose a novel multiple convolution neural network model (MCNN) which classify dermoscopic images with different disease. Using an additive sample learning strategy several models are trained. Alheejawi et al. (2019) propose a deep learning algorithm which is used for automatic measurement of PI values in Ki-67 stained biopsy image. By using trained convolutional neural network (CNN) model it segments the nuclei and measures the PI. Manually segmented nuclei of Ki-67 images are used to train the CNN model. Ech-Cherif et al. (2019a, b) propose a binary classification of skin lesions into benign or malignant by manually trained CNN MobileNetV2 model using transfer learning. A mobile application for iOS devices was developed using the Core ML library. This model gives a high accuracy.

Najat et al. (2019) proposed an Improved U-Net architecture consists of 46 layers separated into two parts Contraction path and Expansion path that contains convolutions, activation functions, down-sampling, and up-sampling operator which is used to predict the value of a particular pixel belongs to the skin lesion. Vesal et al. (2018) proposed a modified version of U-Net called skinNet was developed with dilated and densely block convolutions which helps to include multi-scale and global context information when training the model.

According to Zhang et al. (2019) a fully automatic framework was developed by coupling a deep FCN with a shallow network contains textons derived from domain specific filter kernels. The shallow network helps to add clinical prior knowledge (textures) with deep FCN.A fusing strategy is used to combine domain-specific hand-crafted texton features into a deep network. Abbes and Sellami (2019) developed a deep neural network by fuzzy modelling of Bag-of-Words (BoW) and the features are extracted by ABCD rules. For each BoW of lesion the membership degrees are determined by Fuzzy C-Means (FCM).This model produces the accuracy of 87.5%. Kavimathi (2016) developed a model in which the evaluation has been done with different classifiers like ensemble classifier, Support Vector Machine, probabilistic neural network, adaptive neuro-fuzzy inference system and come out with the best performance classifier.

Akram et al. (2020) developed a pre-trained model that is used to generate fused feature vectors then a framework was developed for feature selection and dimensionality reduction that is used for selecting the principle components and extricating the redundant and irrelevant data. This frameworks shows the accuracy of 98.8%, 99.2%, 97.1% and 95.9%. Thao and Quang (2017) developed a architecture to segment skin tumor from the surrounding skin and a pretrained network VGG-16 for classification of skin lesion. Ahmed et al. (2019a) propose a pre-trained architecture MobileNetV2 that is used to classify the skin lesion into benign or malignant classes. A mobile application for iOS devices was developed by using the trained model. This model produce the accuracy of 91.33%. According to Mahbod et al. (2019) three pre-trained networks such as VGG16, AlexNet and ResNet-18 are used to extracted features that is used by support vector machine classifiers for training then the classifier outputs are combined for classifying the skin lesion.

3 Dataset

Our pre-trained architectures are trained by the dataset taken from International Skin Imaging Collaboration as a part of ISIC 2018 (ISIC Archive) that consist of seven different types of skin lesion. The models classify the input image into a category as follows(Fig. 1): Melanoma (MEL) 271 images; Melanocytic Nevus (NV) 2061 images; Basal Cell Carcinoma (BCC) 151 images; Benign Keratosis (BKL) 345 images; Dermatofibroma (DF) 36 images; Actinic Keratosis (AKIEC) 113 images; Vascular Lesion (VASC) 45 images.

Fig. 1
figure 1

Sample images of ISIC dataset

3.1 Pre-processing

The images in our dataset are resized to 224 × 224 pixel. We applied the standard pre-processing steps such as normalization, resizing and data argumentation. We have normalized the images because the pre trained network was already trained with those images. The modified pre-trained architecture was trained with 2418 training images with seven different types of skin lesion that is used to distinguish different types of skin cancer. To increase the size of the dataset, data argumentation techniques such as shift, zoom, compression, flip and altering lighting are used. Data argumentation helps the network to identify new training images that give more accuracies of the model. After the data argumentation we get 2487 images.

4 Methodology

Convolution neural network is a most effective method for image classification and feature extraction in deep learning architecture (LeCun et al. 2015). The dermoscopic images might have noise (Nirmalraj and Nagarajan 2020, 2019), aberrations, and artefacts in addition with the variation of same feature and visual similarity of various lesion types. To overcome this problem, a huge dataset should be used for training (Yang et al. 2018; Greenspan et al. 2016).

The performance of the CNN can be improved by using pre-trained architecture like VGG16, VGG19, Resnet50, MobileNet and Inception V3. In this proposed CNN network VGG19, VGG16 and Inception V3 has been used which gives a better performance.

Due to insufficient dataset to train the model from the scratch, a pre-trained architecture is used for skin lesion classification. Transfer learning is a technique that allows to retrain the last layer of existing architecture which is used to decrease the training time with limited dataset. Adam optimizer is used for optimization with the initial learning rate of 4e-5.The most popular models in pre-trained architectures are Inception V3, VGG16 and VGG19 which gives an excellent performance in skin lesion classification.

4.1 Inception V3 CNN model

Inception.v.3 network (Fig. 2) demonstrates image classification model of various images for ILSVRC (ImageNet Large Scale Visual Recognition Challenge) competition. Inception.v.3 network consist of 42 layers. The network includes symmetric and asymmetric building blocks that consist of convolution layer, average and max pooling, dropouts, and fully connected layers. In modified pre-trained Inception.v.3 neural network the last layer of the network was removed and replaced with a softmax layer that classifies seven classes of skinlesion.

Fig. 2
figure 2

Architecture of inception V3 (Szegedy et al. 2015)

4.2 VGG19 CNN model

VGG19 network (Fig. 3) consist of 19 layers including sixteen convolution layers, three Fully connected layers, five MaxPool layers and a SoftMax layer. It takes the input image of 224 × 224 size into the network. The network uses the kernel size of 3 × 3 with stride 1 and spatial padding. Max pooling was done with stride 2. There are two fully connected layers and the last layer was removed and replaced with a softmax layer which classifies 7 classes of skin lesion.

Fig. 3
figure 3

VGG19 framework

4.3 VGG16 CNN model

VGG-16 (Fig. 4) is a simpler architecture mode with convolution layer that consist of 3 × 3 filter and stride 1.It consist of pooling layers 2 × 2 with SAME padding of stride 2. There are two fully connected layers and the last layer was removed and replaced with a softmax layer which classifies 7 classes of skin lesion.

Fig. 4
figure 4

VGG16 framework

5 Results

The proposed modified architecture used in our work are Inception V3,VGG16 and VGG19 network to extract features that can classify skin lesion in to seven classes. To obtain a better result, deep learning model need large dataset. The dataset used in our architecture consist of 2487 training images and 604 testing images. The dataset consist of various dimension photographic images. Our dataset is divided into 80% of training images and 20% of testing images. The performance of three different pre-trained architecture is evaluated by using skin lesion dataset of ISIC archives (Figs. 5, 6).

Fig. 5
figure 5

Classification rates for the model

Fig. 6
figure 6

Sample visualized output of the model

5.1 Classification results

5.1.1 Inception V3

The training set consist of 2487 images that are trained for 60 epochs with the batch size of 32. Figure 7 shows the accuracy and loss values produced by Inception V3 pre-trained architecture. From the graph, the train accuracy and test accuracy gradually increasing and loss decreases for both train and test data. This model gives the best accuracy of 82% of training accuracy and 74% of test accuracy.

Fig. 7
figure 7

Accuracy and loss graph on modified inception V3 network

5.1.2 VGG19

The training set consist of 2487 images that are trained for 60 epochs with the batch size of 32. Figure 8 shows the accuracy and loss values of VGG19 pre-trained architecture. From the graph, the train accuracy and test accuracy gradually increasing and loss decreases for both train and test data. This model gives the best accuracy of 86%and test accuracy of 76%

Fig. 8
figure 8

Accuracy and loss graph on modified Vgg19 network

5.1.3 VGG16

The training set consist of 2487 images that are trained for 50 epochs with the batch size of 32. Figure 9 shows the accuracy and loss values of VGG16 pre-trained architecture. From the graph, the train accuracy and test accuracy gradually increasing and loss decreases for both train and test data. This model gives the best accuracy of 92% for training and 77% of testing accuracy.

Fig. 9
figure 9

Accuracy and loss graph on modified Vgg16 network

5.2 Recall, precision and F-1 score

The results of the models can be analyzed by calculating its values such as recall, precision and F-1 score.

$$ Precision = \frac{true\,\, positive}{{true\,\, positive + false \,\,positive}} $$
(1)
$$ Recall = \frac{true\,\, positive}{{true positive + false\,\, negative}} $$
(2)
$$ F - measure = 2 \times \frac{precision \times recall}{{precision + recall}} $$
(3)

5.3 Confusion matrix

The performance of the models can be evaluated by confusion matrix. In Figs. 10, 11, 12 we show the Confusion Matrix of three models. In this figures Actinic Keratosis = 0; Basal Cell Carcinoma = 1; Benign Keratosis = 2; Dermatofibroma = 3; Melanoma = 4, Melanocytic Nevus = 5; Vascular Lesion = 6.

Fig. 10
figure 10

Confusion matrix for Inception v3architecture

Fig. 11
figure 11

Confusion matrix for VGG16 architecture

Fig. 12
figure 12

Confusion matrix for VGG19 Architecture

6 Conclusion

In the proposed deep learning architecture we used three pre-trained model to classify seven classes of skin lesion. We obtained a good accuracy rates with our dataset. Our results of test data show that, when we compare Vgg19, vgg16 and Inception pre-trained architecture, the performance of VGG16 network gives the better accuracy of 77% and Vgg19 give the accuracy of 76%, Inception V3 give the accuracy of 74%.Since the pre-trained deep learning architecture obtains a good accuracy in medical image processing, the proposed work can be used in classifying different types of skin lesions.