1 Introduction

Human skin is the biggest organ in the body and is loaded with various factors, including sun rays (Ultraviolet), sunburn, lifestyle, smoking, liquor utilization, physical action, viruses, and the workplace. These factors compromise its trustworthiness and have real and dangerous impacts on human health. Diseases that influence the skin directly are the common reasons for every human disease, influencing almost 33% of the total population of around 1.9 billion at once, leading to research in this discipline. Skin diseases added to roughly 1.79% of the global problem of diseases, estimated by balanced life years. In Great Britain, 60% of the population experiences skin diseases for the duration of their lives. Skin issues can be harmful, incendiary, or infectious and influence individuals, all things considered, particularly older people and young children. There are various consequences of skin issues, for example, death (on account of melanoma), loss of connections, effect on day-to-day activity and damage to internal organs. Besides, they likewise represent a danger to humans, affecting them mentally, prompting loneliness, misery, and even rashness. Dermal diseases should be treated at the beginning to decrease related outcomes, costs, mortality, and morbidity rates. As indicated by Dr Macrene Alexiades-Armenakas, cancer and eczema are among the five most basic skin syndromes [1]. In this manner, our fundamental emphasis is on building a system that can automatically diagnose for analyses and grouping these types of diseases.

The visual assessment with the naked eye during a skin disease health check-up makes it hard to contrast skin injuries and typical tissue, producing a misdiagnosis. Indeed, dermoscopy is a mainly solid conceptual method for skin injuries. The input thought of dermoscopy is to obtain an enlarged high–resolution image while removing reflections from the skin's surface. The utilization of dermoscopy imaging tools improves the general perception of skin lesions, sensitivity (correct finding of disease) and specificity (correct finding of doubtful disease) contrasted with visual assessment; further, it needs improvement to analyze skin disease. However, since physical examination from dermoscopy images are frequently time-consuming, error-prone, complex, and personal (i.e., might create different diagnostic outcome). Hence, an automated and consistent computer-aided diagnostic (CAD) framework to recognize skin cancer growth has developed a significant evaluation device that offers dermatologists a subsequent input to help and aid their decisions [2].

There are numerous investigations on diagnosing skin disease utilizing deep learning and machine learning strategies in the literature. In recent times, complex medical problems are handled through deep learning convolutional neural networks (CNN) [3,4,5,6], particularly dermoscopic image analysis [7,8,9,10,11] to melanoma recognition. Overall, there are existing and tested convolutional neural network classifiers based on deep learning, for example, AlexNet [12], VGGNet [13], GoogLeNet [14], ResNet [15], and DenseNet [16]. These CNNs go further to expand for more accurately identifying the five challenging tasks [17, 18]. The research goes to a two-course model that utilizes a deep residual network to segment and classify skin lesions [11]. The method of moving a portioned mask of skin lesions, alongside descriptions of clinical rules, such as, shading, texture, and morphological qualities, to the CNN through U-Net [19] has been used. Later, it introduced a two-advance model for the classification of skin lesions [20]. Shading, texture, and shape qualities were removed from images through U-Net and sent to a Support Vector Machine (SVM) classifier to analyze favourable or dangerous dermoscopy pictures. The hybrid system for classifying the skin lesion is introduced, a blend of CNN and SVM with sparse coding(SC) [21]. In recent years, researchers have also built a deep learning network based on a hybrid approach through encoding local descriptors [11]. The various features, including deep and statistical from ResNet, utilized the SVM classifier and the chi-square kernel to recognize the distinctive skin lesions. The usage of both handcrafted through traditional approach and learned features using deep learning with ResNet-50 for improvement in classification was proposed. It consolidates high-quality conventional capacities through picture handling and ResNet-50 profound, getting the hang of learning capacities [8]. In the latest, the authors detect seven classes of cancer using a support vector machine with an artificial bee colony method. RNA sequencing is also utilized [9].

Apart from the literature, we have classified the seven different skin diseases, including Actinic keratoses and intraepithelial carcinoma, Basal cell carcinoma, Benign keratosis, Dermatofibroma, Melanoma, Melanocytic type, and Vascular lesions, using fourteen types of deep learning networks. The objective of this paper is to evaluate the performance of these networks based on performance measures. These networks are trained, validated, and tested on ISIC 2018 dataset using Matlab.

In the following, we list contributions of this study:

  1. 1.

    We used transfer learning by adding one fully connected layer, softmax layer and cross-entropy layer as pre-trained networks. These three layers in total can classify skin lesions into seven classes.

  2. 2.

    To improve the classification performance, we added pre-processing steps as data normalization, resizing and augmentation.

  3. 3.

    The comparison of fourteen types of deep learning networks is analysed to build an automated classification model.

The remainder of this paper is outlined as follows. Section 2 presents the methodology with description of dataset and detail of fourteen deep learning models. Section 3 shows the experiments and results with implementation. Section 4 deliberates the performance of deep learning networks for classification of multiple skin lesion. Finally, the paper ends with a conclusion in Sect. 5.

2 Methodology

This section briefly explains the dataset description, its pre-processing and different deep convolutional neural networks. Then, all these networks are compared using the same number of epochs, learning rate and batch size.

2.1 Dataset

This study used the ISIC 2018 dataset for dermoscopy images [22, 23]. It contains pigmented lesions in different populations. This data set consists of 10,015 images in total. The image sizes are scaled to 224 × 224 × 3 or 227 × 227 × 3 or as per the network requirement. The diagnostic classes in this dataset are given as follows [22, 23].

  1. (1)

    Actinic keratoses and intraepithelial carcinoma abbreviated as ‘akiec’.

  2. (2)

    Basal cell carcinoma abbreviated as ‘bcc’.

  3. (3)

    Benign keratosis abbreviated as ‘bkl’.

  4. (4)

    Dermatofibroma abbreviated as ‘df’.

  5. (5)

    Melanoma abbreviated as ‘mel’.

  6. (6)

    Melanocytic type abbreviated as ‘nv’.

  7. (7)

    Vascular lesions abbreviated as ‘vasc’.

Figure 1 illustrate the examples for akiec. Figure 2 shows the examples for bcc. Figure 3 shows the examples for bkl. Figure 4 shows the examples for df. Figure 5 shows the examples for mel. Figure 6 shows the examples for nv. Figure 7 shows the examples for vasc. All examples are taken from [22, 23].

Fig. 1
figure 1

The dataset for akiec [22, 23]

Fig. 2
figure 2

The dataset for bcc [22, 23]

Fig. 3
figure 3

The dataset for bkl [22, 23]

Fig. 4
figure 4

The dataset for df [22, 23]

Fig. 5
figure 5

The dataset for mel [22, 23]

Fig. 6
figure 6

The dataset for nv [22, 23]

Fig. 7
figure 7

The dataset for vasc [22, 23]

The number of data contained in image set for each class is defined in Table 1.

Table 1 Data indicating how many data each set contains in the dataset

As shown in Table 1, most of the data in the dataset belong to the sixth grade, and there is no equal distribution between classes. Therefore, the training phase will lead to more learning of the 6th class and the network to learn this information.

2.2 Data pre-processing

In our approach, the pre-processing steps are reduced to guarantee better generalizability when tried on the dermoscopic skin lesion dataset. In this way, just the three standard pre-processing steps are applied, generally utilized for transfer learning. Initially, the images are normalized by taking away the typical RGB values from the dataset. After that, the images are resized utilizing bicubic interpolation for the networks input (227 × 227 and 224 × 224). At last, the training set is augmented by arbitrarily flipping the training images along the vertical axis and changing it up to 30 pixels horizontally and vertically. Data augmentation keeps the network away from overfitting and memorizing the exact details of the training images. The block diagram of the classification model is shown in Fig. 8.

Fig. 8
figure 8

Block diagram of Classification model

2.3 AlexNet

This network contains 8 learned layers (5 convolutional and 3 fully connected layers) [24, 25]. The convolution filters and an activation non-linear function (ReLU) is applied after every layer. Due to fully connected layers, the input size is fixed to RGB images of 224 × 224 × 3, i.e., 150,528 values. In the FCN and cross-entropy layer, there are 4096 neurons; each neuron is a different feature of the image. Dropouts are utilized at regular intervals to avoid overfitting issues.

2.4 GoogleNet

The GoogleNet architecture comprises 22 convolutional layers containing 9 Inception modules [26]. The Inception module has three kernel varieties with various sizes: 5 × 5, 3 × 3 and 1 × 1 for convolution and 3 × 3 filters for pooling. These small convolutions help reduce the depth of the feature map, and many inception modules also helps reduce the number of computations. The receptive field size for this network is 224 × 224 × 3 utilizing the RGB colors gap with the predefined parameter. Like different CNNs, this network also learn convolutional filters contributions through stochastic gradient descent (SGD) algorithms throughout the training stage to retrieve compelling features as the image crosses the hierarchical structure of the network.

2.5 ResNet50

This network consists of a stem used for input, followed by four stages and an output layer [27]. It consists of 50 layers and an input image size fixed to 224 × 224. The stem is input through a 7 × 7 convolution by a return of 64 channels besides a pitch of 2, trailed through a 3 × 3 max pooling layer besides a pitch of 2. The width and height of the input decrease by stem through a factor of 4, and rises a size of channel to 64. The individual stage starts with a downsampling block, further trailed by various residual blocks at the starting of stage2. This block consists of path A and path B. Path A consists of three convolutions with a kernel size of 1 × 1, 3 × 3, and 1 × 1. Path B utilizes a pitch of 2 with a 1 × 1 convolution to convert the input into the output of path A. Finally, the sum of these two paths is used to get the output of the downsampling block.

2.6 VGG16/VGG19

The VGG-16 architecture is reasonable for GPUs with local memory [28]. The architecture consists of 16 layers, among which there are 13 convolutional layers, 5 Max Pooling layers and 3 fully connected layers, which sums up to 21 layers but only 16 weight layers. The information layer is thought to be a 224 × 224 × 3 pixel RGB picture. All convolutional layers are followed by the Relu function with a size of 3 × 3. The VGG19 network [29] contains 19 trained layers combining convolutional and FC layers, max-pooling, and dropout layers.

2.7 ResNet101

The ResNet101 network structure consists of residual connections and generally used for classification purposes. The gradients flow throughout the layers directly and prevent gradient as zero after the chain rule applications [30]. A total of 104 convolutional layers are present in the network. Alongside, there are 33 blocks of layers, and the output of the previous block is used among these 29 blocks directly as residual connections above. To receive the input of the other blocks, these residual connections are being utilized at the termination of every block using the initial value of the sigma operator.

2.8 InceptionV3

This network is based on inception modules and consists of 48 layers deep [31]. These inception blocks contain convolutions corresponding with diverse kernel sizes to abstract features and finally summative the results. Initially, the input image is passed through convolution, batch normalization, Relu and this sequence are followed by pooling and various inception layers for feature extraction. Finally, in the classification part, the dropout layer is applied to reduce overfitting, and softmax with cross-entropy as output layers.

2.9 InceptionResNetV2

This network [32] consists of 164 layers in deep and inception blocks with residual connections. The input size of an image acceptable to the network is of size 299 × 299. The network consists of the main module as stem, followed by Inception resnet-A, Reduction-A, Inception resnet-B module, reduction B and inception resnet-C modules. The stem module consists of series of 3 × 3 convolution, 3 × 3 maxpool, and filter concat to give input to inception resnet-A block. The Relu activation function is applied in each inception block and the reduction block contains 1 × 1, 3 × 3 convolution with a pooling layer. Inception-ResNet used batch-normalization only on top of the traditional layers but not on top of the summations.

2.9.1 SqueezeNet

This network starts with [33] a separate convolution layer (conv1), trailed by 8 Fire modules (fire2-9), and termination with a final convolution layer (conv10). The filters numbers get increased as per fire module from starting till the end of the network. The fire module is composed with a squeezed convolution layer of 1 × 1 filters only, serving into an enlarge layer with a combination of 1 × 1 and 3 × 3 size convolution filters. The maxpooling operation is applied after convolution1, fire4, fire8, and convolution10 layers.

2.9.2 DenseNet201

The connections are directly between the convolutional layers in this network (DenseNet) [34, 35]. In every network layer, the feature map of the continuing layer acts as input, whereas feature maps that are generated are used by way of inputs to the next layer. This network consists of four blocks with layers in the same amount. These blocks have feature maps with various sizes of 56-by-56, 28-by-28, 14-by-14, and 7-by-7. The number of input feature maps are reduced to advance the computational efficiency by using 1 × 1 convolution before each 3 × 3 convolution, known as the bottleneck layer.

2.9.3 ResNet18

This network starts with 7 × 7 convolution, 3 × 3 max pooling, followed by 4 convolutional layers in each module [36]. There are total 18 layers which include the first convolution layer and the last FC layer. This network has an image input size of 224-by-224. The initial 2 layers of this network are similar to the GoogLeNet: the 64 output channels with stride 2 based 7 × 7 convolutional layer trailed by 3 × 3 max pooling layer, stride 2 and Relu function. The change is the batch wise normalization layer additional next to every convolutional layer in the network. It consists of residual blocks with 3 × 3 convolutional layers having a similar amount of output channels.

2.9.4 Mobile NetV2

The MobileNetV2 model [37] is based on the concept of separable convolutions, which can be depth wise (dw) or pointwise (pw) convolution. In the case of the former, a filter is applied to every channel used for input. Then to combine an output of former 1 × 1 convolution is applied by later convolution. A one-step process is required to get the original block of outputs by filtering and pooling together by convolution. The process is separated in two layers, filtering and combining. This factorization results into significantly reducing computation and size.

2.9.5 ShuffleNet

The ShuffleNet [38] utilizes pointwise group convolution and channel shuffle to reduce computation cost while maintaining accuracy. By shuffling the channels, ShuffleNet outperforms MobileNetV1. The network contains a heap of network units assembled in three stages. It has two units as bottleneck unit through depthwise convolution and shufflenet unit through pointwise group convolution.

2.9.6 NasNetMobile

Nasnet [39] is an architecture constitute of basic building blocks i.e. cells optimized through reinforcement learning. The network has an image input size of 224-by-224. A cell composed of convolution and pooling operations, depends upon the strength of the network to repeat these operations at n number of times. This network contains 12 cells and millions of multiply-accumulates (MACs). There are two cells: Normal cell, which returns a feature map of the same dimension and Reduction cell, which returns a feature map where the feature map height and width are reduced by a factor of two. The various combinations of operations or layers applied to the network in the form of identity, convolution (1 × 7, 7 × 1), average pooling (3 × 3), max pooling (5 × 5), depth wise convolution (3 × 3, 7 × 7, 5 × 5) and dilated convolution (3 ×  3).

3 Experiments and results

The experiments are performed with Dell compatible computer equipped with a Core i7 processor. The ISIC 2018 dataset defined in RGB space is used. The datasets are divided into seven classes, Actinic keratoses and intraepithelial carcinoma, Basal cell carcinoma, Benign keratosis, Dermatofibroma, Melanoma, Melanocytic type, and Vascular lesions. The fourteen types of convolution deep learning networks are used; in which the last fully connected layer is replaced with FC7, softmax and cross-entropy for classification. The experiments are performed with all types of networks using 10,015 dermoscopic images.

Each experiment consisted of the number of runs where the batch size, epochs and the initial learning rate were 64, 10 and 0.0001, respectively, and these values are fixed for all experiments. The images are randomly divided: 70% of the images are used for training and 30% are used for validation and testing. All the images are pre-processed using re-sizing, normalization, and augmentation. The networks are evaluated using the following metric described in Eqs. 1, 2, 3 and 4:

$${\text{Accuracy}}\; = \;{\text{tp}}\; + \;{\text{tn}}/{\text{tp}}\; + \;{\text{fp}}\; + \;{\text{fn}}\; + \;{\text{tn}}$$
(1)
$${\text{Recall}}\; = \;{\text{tp}}/{\text{tp}}\; + \;{\text{fn}}$$
(2)
$${\text{Precision}}\; = \;{\text{tp}}/{\text{tp}}\; + \;{\text{fp}}$$
(3)
$$F1\;{\text{score}} = \, 2*\left( {{\text{Recall}}*{\text{Precision}}} \right)/\left( {{\text{Recall}}\; + \;{\text{Precision}}} \right)$$
(4)

where, tp, tn, fp and fn refer to true positive, true negative, false positive and false negative, respectively.

The alexnet training Progress with accuracy and error loss over 10 Epochs is shown in Fig. 9. The performance of alexnet in which class wise recall, precision, f1 score and overall accuracy based on confusion matrix are shown in Fig. 10 and Table 2, the best result is highlighted in bold. The Predictions on Random samples made by Transfer Learning on AlexNet is shown in Fig. 11.

Fig. 9
figure 9

AlexNet training Progress with accuracy and error loss over 10 Epochs

Fig. 10
figure 10

Confusion matrix of predictions made by transfer learning on AlexNet

Table 2 Performance measures of AlexNet
Fig. 11
figure 11

Predictions on random samples made by transfer learning on AlexNet

The GoogleNet training Progress with accuracy and error loss over 10 Epochs is shown in Fig. 12. The performance of GoogleNet in which class wise recall, precision, f1 score and overall accuracy based on confusion matrix are shown in Fig. 13 and Table 3, the best result is highlighted in bold. The Predictions on Random samples made by Transfer Learning on GoogleNet is shown in Fig. 14.

Fig. 12
figure 12

GoogleNet training progress with accuracy and error loss over 10 Epochs

Fig. 13
figure 13

Confusion matrix of predictions made by transfer learning on GoogleNet

Table 3 Performance measures of GoogleNet
Fig.14
figure 14

Predictions on random samples made by transfer learning on GoogleNet

The ResNet50 training Progress with accuracy and error loss over 10 Epochs is shown in Fig. 15. The performance of ResNet50 in which class wise recall, precision, f1 score and overall accuracy based on confusion matrix are shown in Fig. 16 and Table 4, the best result is highlighted in bold. The Predictions on Random samples made by Transfer Learning on ResNet50 is shown in Fig. 17.

Fig. 15
figure 15

ResNet50 training with accuracy and error loss over 10 Epochs

Fig. 16
figure 16

Confusion matrix of predictions made by transfer learning on ResNet50

Table 4 Performance measures of ResNet50
Fig. 17
figure 17

Predictions on random samples made by transfer learning on ResNet50

The VGG16 training Progress with accuracy and error loss over 10 Epochs is shown in Fig. 18. The performance of VGG16 in which class wise recall, precision, f1 score and overall accuracy based on confusion matrix are shown in Fig. 19 and Table 5, the best result is highlighted in bold. The Predictions on Random samples made by Transfer Learning on VGG16 is shown in Fig. 20.

Fig. 18
figure 18

VGG16 training with accuracy and error loss over 10 epochs

Fig. 19
figure 19

Confusion matrix of predictions made by transfer learning on VGG16

Table 5 Performance measures of VGG16
Fig. 20
figure 20

Predictions on random samples made by transfer learning on VGG16

The VGG19 training Progress with accuracy and error loss over 10 Epochs is shown in Fig. 21. The performance of VGG19 in which class wise recall, precision, f1 score and overall accuracy based on confusion matrix are shown in Fig. 22 and Table 6, the best result is highlighted in bold. The Predictions on Random samples made by Transfer Learning on VGG19 is shown in Fig. 23.

Fig. 21
figure 21

VGG19 training with accuracy and error loss over 10 epochs

Fig. 22
figure 22

Confusion matrix of predictions made by transfer learning on VGG19

Table 6 Performance measures of VGG19
Fig. 23
figure 23

Predictions on random samples made by transfer learning on VGG19

The ResNet101 training Progress with accuracy and error loss over 10 Epochs is shown in Fig. 24. The performance of ResNet101 in which class wise recall, precision, f1 score and overall accuracy based on confusion matrix are shown in Fig. 25 and Table 7, the best result is highlighted in bold. The Predictions on Random samples made by Transfer Learning on ResNet101 is shown in Fig. 26.

Fig. 24
figure 24

ResNet101 training with accuracy and error loss over 10 epochs

Fig. 25
figure 25

Confusion matrix of predictions made by transfer learning on ResNet101

Table 7 Performance measures of ResNet101
Fig. 26
figure 26

Predictions on random samples made by transfer learning on ResNet101

The InceptionV3 training Progress with accuracy and error loss over 10 Epochs is shown in Fig. 27. The performance of InceptionV3 in which class wise recall, precision, f1 score and overall accuracy based on confusion matrix are shown in Fig. 28 and Table 8, the best result is highlighted in bold. The Predictions on Random samples made by Transfer Learning on InceptionV3 is shown in Fig. 29.

Fig. 27
figure 27

InceptionV3 training with accuracy and error loss over 10 epochs

Fig. 28
figure 28

Confusion matrix of predictions made by transfer learning on InceptionV3

Table 8 Performance measures of inceptionV3
Fig. 29
figure 29

Predictions on random samples made by transfer learning on InceptionV3

The InceptionResNetV2 training Progress with accuracy and error loss over 10 Epochs is shown in Fig. 30. The performance of InceptionResNetV2 in which class wise recall, precision, f1 score and overall accuracy based on confusion matrix are shown in Fig. 31 and Table 9, the best result is highlighted in bold. The Predictions on Random samples made by Transfer Learning on InceptionResNetV2 is shown in Fig. 32.

Fig. 30
figure 30

InceptionResNetV2 training with accuracy and error loss over 10 epochs

Fig. 31
figure 31

Confusion matrix of predictions made by transfer learning on InceptionResNetV2

Table 9 Performance measures of inceptionResNetV2
Fig. 32
figure 32

Predictions on random samples made by transfer learning on InceptionResNetV2

The SqueezeNet training Progress with accuracy and error loss over 10 Epochs is shown in Fig. 33. The performance of SqueezeNet in which class wise recall, precision, f1 score and overall accuracy based on confusion matrix are shown in Fig. 34 and Table 10, the best result is highlighted in bold. The Predictions on Random samples made by Transfer Learning on SqueezeNet is shown in Fig. 35.

Fig. 33
figure 33

SqueezeNet training with accuracy and error loss over 10 epochs

Fig. 34
figure 34

Confusion matrix of predictions made by transfer learning on SqueezeNet

Table 10 Performance measures of squeezeNet
Fig. 35
figure 35

Predictions on random samples made by transfer learning on SqueezeNet

The DenseNet201 training Progress with accuracy and error loss over 10 Epochs is shown in Fig. 36. The performance of DenseNet201 in which class wise recall, precision, f1 score and overall accuracy based on confusion matrix are shown in Fig. 37 and Table 11, the best result is highlighted in bold. The Predictions on Random samples made by Transfer Learning on DenseNet201 is shown in Fig. 38.

Fig. 36
figure 36

DenseNet201 training with accuracy and error loss over 10 epochs

Fig. 37
figure 37

Confusion matrix of predictions made by transfer learning on DenseNet201

Table 11 Performance measures of denseNet201
Fig. 38
figure 38

Predictions on random samples made by transfer learning on DenseNet201

The ResNet18 training Progress with accuracy and error loss over 10 Epochs is shown in Fig. 39. The performance of ResNet18 in which class wise recall, precision, f1 score and overall accuracy based on confusion matrix are shown in Fig. 40 and Table 12, the best result is highlighted in bold. The Predictions on Random samples made by Transfer Learning on ResNet18 is shown in Fig. 41.

Fig. 39
figure 39

ResNet18 training with accuracy and error loss over 10 epochs

Fig. 40
figure 40

Confusion matrix of predictions made by transfer learning on ResNet18

Table 12 Performance measures of resNet18
Fig. 41
figure 41

Predictions on random samples made by transfer learning on ResNet18

The MobileNetV2 training Progress with accuracy and error loss over 10 Epochs is shown in Fig. 42. The performance of MobileNetV2 in which class wise recall, precision, f1 score and overall accuracy based on confusion matrix are shown in Fig. 43 and Table 13, the best result is highlighted in bold. The Predictions on Random samples made by Transfer Learning on MobileNetV2 is shown in Fig. 44.

Fig. 42
figure 42

MobileNetV2 training with accuracy and error loss over 10 epochs

Fig. 43
figure 43

Confusion matrix of predictions made by transfer learning on MobileNetV2

Table 13 Performance measures of mobileNetV2
Fig. 44
figure 44

Predictions on random samples made by transfer learning on MobileNetV2

The ShuffleNet training Progress with accuracy and error loss over 10 Epochs is shown in Fig. 45. The performance of ShuffleNet in which class wise recall, precision, f1 score and overall accuracy based on confusion matrix are shown in Fig. 46 and Table 14, the best result is highlighted in bold. The Predictions on Random samples made by Transfer Learning on ShuffleNet is shown in Fig. 47.

Fig. 45
figure 45

ShuffleNet training with accuracy and error loss over 10 epochs

Fig. 46
figure 46

Confusion matrix of predictions made by transfer learning on ShuffleNet

Table 14 Performance measures of shuffleNet
Fig. 47
figure 47

Predictions on random samples made by transfer learning on ShuffleNet

The NasNetMobile training Progress with accuracy and error loss over 10 Epochs is shown in Fig. 48. The performance of NasNetMobile in which class wise recall, precision, f1 score and overall accuracy based on confusion matrix are shown in Fig. 49 and Table 15, the best result is highlighted in bold. The Predictions on Random samples made by Transfer Learning on NasNetMobile is shown in Fig. 50.

Fig. 48
figure 48

NasNetMobile training with accuracy and error loss over 10 epochs

Fig. 49
figure 49

Confusion matrix of predictions made by transfer learning on NasNetMobile

Table 15 Performance measures of nasNetMobile
Fig. 50
figure 50

Predictions on random samples made by transfer learning on NasNetMobile

The fourteen deep convolutional networks with number of layers, size, parameters, input size, time, accuracy, recall, precision and f1 score, the average out for a single matrix, are presented in Table 16. DenseNet201 performs better in performance measures and highlighted bold in the Table 16.

Table 16 Comparison of fourteen deep convolutional networks

4 Discussion

It is noted that the results obtained from the fourteen deep convolutional networks are comparable in terms of accuracy, recall, precision and f1 score performance measures. By comparing the performance measures, DenseNet201 network can perform best with 0.825 accuracy. Furthermore, this network gives good efficiency by eliminating the vanishing gradient problem, supporting feature propagation, encouraging reuse of features, and reducing the number of parameters. Besides improved efficiency, another best point is that the information flow and gradients movement is smooth in this network, which helps them in training and reduces the overfitting problem.

The performance of the networks is evaluated on the ISIC 2018 dataset consists of 10,015 dermoscopic images. The training progress, confusion matrix for each type of class and testing images results show the best performance of DenseNet201. The ResNet50 model takes the second position, which is taking less time and good accuracy. Similarly, the other model's performance is measurable with time parameters complexity.

5 Conclusion

This study compared the performance of fourteen deep convolutional neural networks for classifying seven types of diseases. This model's performance is measured using ISIC 2018 dataset. The pre-trained networks are modified by replacing their last fully connected layer with our fully connected-7, softmax and cross-entropy layer for classification into seven classes. The results are summarized in Table 16. The results show the best performance of DenseNet201 for classification. In the upcoming years, computer-aided diagnostic systems will be more for classifying a skin lesion by looking at the inspiring results of our study and the former studies using deep learning techniques. The models designed with this baseline are expected to give high accuracy with minimum computation time. In future, our goal is to achieve more effective results by increasing the number of epochs and skin lesions categories.