A weighted ensemble transfer learning approach for melanoma classification from skin lesion images

Meswal, Himanshi; Kumar, Deepika; Gupta, Aryan; Roy, Sudipta

doi:10.1007/s11042-023-16783-y

A weighted ensemble transfer learning approach for melanoma classification from skin lesion images

Published: 23 September 2023

Volume 83, pages 33615–33637, (2024)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Multimedia Tools and Applications Aims and scope Submit manuscript

A weighted ensemble transfer learning approach for melanoma classification from skin lesion images

Download PDF

Himanshi Meswal¹,
Deepika Kumar²,
Aryan Gupta² &
…
Sudipta Roy ORCID: orcid.org/0000-0001-5161-9311³

421 Accesses
5 Citations
Explore all metrics

Abstract

Cancer is the foremost cause of mortality among humans, as per statistics and accurate classification of the lesion is critical for treating skin cancer at an early stage. Identification of the disease via computer-aided tools can help in accurate diagnosis. This study’s primary goal is to suggest an effective strategy for more accurately classifying skin lesions. The binary classification of skin lesions has been proposed using the weighted average ensemble approach. The predictions from various models are combined via the weighted sum ensemble, where the weights of each model are determined by how well it performs. Weights for each learner in the weighted ensemble are scientifically determined based on their average accuracy on the testing dataset. The proposed weighted ensemble classifier uses an ensemble of seven deep-learning neural networks to perform binary classification, including InceptionV3, VGG16, Xception, ResNet50, and others. The International Skin Imaging Collaboration (ISIC) dataset has been used for experimentation, which is binary classified into Melanoma and Nevus. The proposed ensemble method provides the highest level of accuracy, precision, recall, f1-score, sensitivity, and specificity of 93.36%, 93%, 93%, 93%, 97%, and 97% respectively on the first ISIC dataset. The proposed methodology’s efficiency has also been compared and evaluated with another ISIC dataset. On the other ISIC dataset, the proposed weighted ensemble classifier had an accuracy of 85.54%. Additionally, the proposed methodology has been compared with state-of-art techniques. a weighted ensemble method where the final result decision is made based on the weighted total of the anticipated outputs from the classifiers. Each model is given a specific weight, which is then multiplied by the value it predicted and used to get the sum or average forecast. The suggested classification model concluded about the expected probabilities for each class and selected the class with the highest probability.

Deep Ensemble Architectures for Skin Lesion Detection

An Integrated Ensemble Network Model for Skin Abnormality Detection with Combined Textural Features

Article 25 May 2023

Skin cancer detection using ensemble of machine learning and deep learning techniques

Article 16 February 2023

Discover the latest articles, news and stories from top researchers in related subjects.

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

The uncontrolled growth of malignant cells in the epidermis, the skin’s outermost layer, as a result of unrepaired DNA damage that generates mutations is known as skin cancer. Skin cells with these abnormalities develop fast, resulting in malignant tumours. Malignant cancer cells are treated with chemotherapy, radiation, and surgery because they are lethal. Depending on gender and age, tumours may be distributed differently. Depending on the species and organ affected, cancer presents itself in a variety of ways [1]. As per the World Health Organization (WHO), approximately 14 million cancer cases and 9.6 million cancer-related casualties all across the world have been reported. Cancer is the foremost cause of mortality among humans, as per statistics [2]. In 2022, an estimated 197,700 new cases have been detected in the United States. There will be 97,920 in situ (non-invasive) cases limited to the epidermis (top layer of skin) and 99,780 invasive cases that will penetrate the epidermis and enter the second layer of skin (the dermis).

The most prevalent types include basal cell carcinoma (BCC), nevus, melanoma, and pigmented benign keratosis (PBK). BCCs are uncontrolled, abnormal growths that appear in the outermost layer of the skin’s basal cells. It is the most frequent type of skin cancer identified in the United States. Melanoma develops in melanocytes, which are responsible for producing the pigment melanin. It appears in different colours viz. rose pink, royal purple, azure, and colourless [3]. White people are more than 20 times more likely than black persons to develop melanoma. Melanoma has a 2.6% (1 in 38) lifetime risk in whites, 0.1% (1 in 1,000) lifetime risk in blacks, and 0.58% (1 in 172) lifetime risk in Hispanics [4]. People can understand their skin condition and what precautions and steps they should take so that it can help in early diagnosis. Skin cancer is caused by air pollution, UV radiation, and an unhealthy lifestyle [5, 41]. Applying sunscreen with an SPF of 15 or higher on a regular basis can lower the incidence of melanoma and squamous cell carcinoma by 50% and 40% respectively [6]. Skin lesions can be viewed without being impeded by skin surface reflections with this tool [7]. It’s usually utilized to evaluate pigmented lesions in order to distinguish benign melanocytic nevi and seborrheic keratosis from malignant melanoma and pigmented BCC [8]. If any different skin patches are identified and it progresses as time passes should be noted [9]. Manual dermoscopy image evaluation is laborious, error-prone, and discretionary, it can yield widely disparate diagnostic conclusions [10].

In addition, many techniques have been introduced in dermatology to prevent from skin cancer that elucidates a set of elements through which confirmed cases of melanoma can be identified. [11]. A computer-aided diagnostic procedure is more objective and reliable than a human-expert diagnosis, which is subjective and not necessarily reproducible [12, 38]. When compared to naked eye assessment, several meta-analyses have found that dermoscopy improves and enhances the accuracy of melanoma diagnosis. This practice is time-consuming and relies on their interpretation, making it very subjective, and they may deliver significantly different diagnostic conclusions [13]. Computer-aided design (CAD) is a tool that helps in design creation, modification, analysis, and optimization. It’s also utilised to boost designer productivity, design quality, documentation, and the creation of a manufacturing database. CAD technology allows medical procedures to be customized to an individual’s specific needs. The techniques and technologies designed to automate the visual inspection of skin lesions, which is typically performed by dermatologists, are intended to help clinicians i) to detect early signs of cancer; ii) allow patient to properly evaluate lesions; and iii) focus on promoting melanoma prevention and iv) awareness programs [14]. Deep Neural Networks had a substantial effect on healthcare data, reaching high accuracy in the categorization of a variety of illnesses, including skin cancer [15]. On a variety of complex computer vision and image classification tasks, deep learning algorithms have reached human-level performance [17]. Computer-based detection employs imaging techniques and artificial intelligence. The numerous phases of detection include collecting dermoscopic images, hair filtering, sounds, entropy threshold segmentation, feature extraction and classification. A Back-Propagation Neural Network (BPN) has been utilised for classification. It determines whether a lesion is malignant or not. In BPN, weights are randomly assigned at the start of training and various machine-learning techniques have been used as a result. During the forward pass of the signal, the network generates an output based on initial weights and the activation function utilised. The result is compared to the expected result, if they don’t match then the error can be calculated by subtracting the desired output from the actual output [22, 40]. The study shows a weighted ensemble strategy by using an ensemble of seven DL algorithms. The following are the research contributions:

A weighted ensemble approach has been proposed by an ensemble of deep learning algorithms, including InceptionV3, Xception, ResNet50, EfficientNetB4, and MobileNet. The suggested methodology classifies data on skin cancer into two categories: melanoma and nevus.
ISIC skin lesion image dataset has been used for experimentation.
For evaluating the efficacy and efficiency of the predicted technique, evaluation metrics include accuracy, precision, recall, F1 Score, sensitivity, and specificity.
The robustness of the proposed method has also been checked on another ISIC dataset.
On the same set of parameters, comparisons have been made between modern approaches and conventional algorithms.

The paper is systematized as follows: Section 2 reviews the Literature done, and Section 3 enlightens the detailed architecture of the methodology. Section 4 elucidates the results & analysis section which is followed by the conclusion section.

2 Literature review

Detailed research conducted has been done using machine learning (ML) and deep learning (DL) algorithms for skin lesion classification and identification. Satin Jain et al. proposed research using six different transfer learning models and applied multi-classification using HAM1000 skin cancer dataset. The proposed methodology achieved an accuracy of 90.48 [9, 30], it was concluded that Xception outperform as compared with the other transfer learning networks employed in the study. The authors proposed a Region of Interest based methodology that uses a Convolutional Neural Network (CNN) with data augmentation for ROI pictures using the DermIS and DermQuest datasets. For DermIS and DermQuest, the suggested approach provided an accuracy of 97.9% and 97.4%, respectively [3]. Fengying Xie et al. developed an ensemble method for binary classification of melanocytic tumours as benign or malignant, with an accuracy of 94.17% for xanthous races and 91.11% for Caucasian races [18]. On the Dermweb dataset, Suganya R et al. introduced a support vector machine classifier that binary classifies images with an accuracy of 96.80% [19]. The authors proposed GoogleNet which categorises eight types of skin lesions and the accuracy achieved by the proposed approach was 94.92% [2]. M. A. Farooq et al. used a probabilistic strategy in which active contours and a watershed merged mask were used, and then SVM and Neural Classifier was used to categorize the segmented mole. On the DermIS and DermQuest datasets have been used during experimentation to categorise melanoma or non-melanoma images with an accuracy of 80% [20]. Fikret Ercal et al. used a feed-forward neural network to extract discriminant features from 326 images and the model attained an accuracy of 86% [21]. J Abdul Jaleel et al. used ANN for classification with a backpropagation algorithm for training images to obtain an accuracy of 93% [22]. The authors proposed graphic processors for clinical skin image analysis that uses an ANN to analyse the image and study borders or contour to determine a final diagnosis. The collection includes 730 images of good and bad incidents from the International Skin Images Collaboration’s MED-NODE project (ISIC). Finally, the proposed approach has achieved a 76.67% accuracy, with a 78.79% success rate in melanoma cases and a 74.07% success rate in benign lesions cases [11]. Authors proposed a set of procedures for segmenting skin lesions and assessing the observed area and surrounding skin tissues for melanoma detection. The method was tested using the ISIC 2016 dataset, which resulted in a sensitivity of 95% [17]. Ardan Adi Nugroho et al. put forward a CNN-based identification system using dermoscopy images. The proposed model offered accuracy of 80% and 78% at training and testing time respectively [23]. Amirreza Rezvantalab et al. studied the effectiveness and capability of CNNs to classify Eight skin diseases using HAM10000 and PH2 datasets using pre-trained models and obtained an accuracy of 94.40% for melanoma and BCC [13]. Yuexiang Li et al. proposed a DL system that provides segmentation and coarse classification performance simultaneously using FCRN. A lesion index calculation unit has been anticipated to refine the results. A basic CNN was presented for the dermoscopy image feature removal. On the ISIC 2017 dataset, the proposed DL frameworks were examined. The accuracy of the frameworks was reached in the experiments, with 75.3%, 84.8%, and 91.2% respectively [24]. Gerald Schaefer et al. suggested an ensemble method which resulted sensitivity value of 93.76% using 564 skin lesion images [25]. E. Nasr-Esfahani et al. implemented a computational complex method based on CNNs using several clinical images resulting in accuracy of 81% [26]. Titus J. Brinker et al. proposed CNN classifier using International Symposium on Biomedical Imaging 2016 challenge dataset giving an average precision of 70.9% [27]. Shorfuzzaman applied Deep learning models (MobileNet, Xception, ResNet50, ResNet50V2, and DenseNet121) that were used for transfer learning, and were already pre-trained on ImageNet data. The classifier was evaluated for each model [32]. Shahsavari diffused four different CNN model and evaluated classification results on 934 and 200 images from ISIC and PH2 test data with the average accuracy of 97.1% and 96%, Area under receiver operating characteristics curve (AUC) of 98.6% and 98.1%, precision of 87.1% and 90.2%, recall of 86.7% and 85.4% for ISIC and PH2 test data, respectively [33]. Jin proposed a cascade knowledge diffusion network (CKDNet) to transfer and aggregate knowledge learnt from different tasks to simultaneously boost the performances of classification and segmentation [34]. Imran deployed model using learners of VGG, CapsNet, and ResNet for skin cancer detection [35]. Tembhurne designed model achieves a higher accuracy of 93% with an individual recall score of 99.7% and 86% for the benign and malignant forms of cancer, respectively [36]. Basuk used MFSNet (Multi-Focus Segmentation Network), with differently scaled feature maps for computing the final segmentation mask using raw input RGB images of skin lesions [37].

3 Weighted ensemble framework

The weighted average ensemble, also known as the weighted sum ensemble that combines predictions from multiple models, where each model’s weights are based on its performance. Weights for each learner (Y_InceptionV3, Y_MobileNet, Y_Xception, Y_VGG16, Y_CNN, Y_ResNet50, Y_{EfficientNetB4}) in the weighted ensemble are scientifically determined based on their average accuracy on the testing dataset. The resulting weights Y_k; k = 1, …, 7 are scaled to equal one. The weighted ensemble result will not be influenced by this scaling process. The weights allocated to each model in the weighted ensemble model are shown in Table 1 and the algorithm for the same is shown in Fig. 1.

Table 1 Weights used in second weighted ensemble model

Full size table

The individual learners’ decision values for each image i in the test dataset are stacked to create the ensemble decision mapping. For the two proposed ensemble approaches, the average ensemble model and the weighted ensemble model, the ensembled decision values are calculated using an indicator function, for i^th class of the dataset, which aligns to the predicted value of the kth model with the associated class label as in Eq. (1).

$$\Delta \left({p}_{k}^{j},{l}^{i}\right)= \{1 ; \;{p}_{k}^{j} \in {l}^{1} 2 ;\; {p}_{k}^{j} \in {l}^{2}$$

(1)

Here, ${p}_{k}^{j}$ is the predicted value of the image I i.e., ${p}_{InceptionV3}^{j}$= InceptionV3(I^(j)), ${p}_{MobileNet}^{j}$= MobileNet(I^(j)), ${p}_{Xception}^{j}$= Xception(I^(j)), ${p}_{VGG16}^{j}$= VGG16(I^(j)), ${p}_{CNN}^{j}$= CNN(I^(j)), ${p}_{ResNet50}^{j}$= ResNet50(I^(j)), ${p}_{EfficientNetB4}^{j}$= EfficientNetB4(I^(j)), where j $\in M$, M is the number of test images.

$${final\_pred }_{(i)}^{\left(j\right)}=\sum\nolimits_{k=1}^{7}{Y}_{k}*\Delta \left({p}_{k}^{j},{l}^{\mathrm{i}}\right), for \;i\in [\mathrm{1,7}]$$

(2)

The weighted ensemble model’s final predicted output for an image I^(j) from i^th class is obtained by adding the product of all the individual predictions and their respective weights from all different models and then selecting the class with the highest weight.

$$output_{(i)}^{\left(j\right)}=\max\left({final\_pred}_{(i)}^{\left(j\right)}\right)$$

(3)

3.1 Algorithms: proposed CNN model

4 Proposed methodology

The ISIC dataset is used in this study to propose an ensemble of classifiers for identifying skin cancer lesions. The weighted ensemble technique is the subject of this study, which uses classifiers to compute predicted outputs in a weighted manner to determine the final output [28]. With the use of a variety of pre-trained CNN models, an ensemble technique that has been employed for skin cancer classification as had been investigated in [4] was able to obtain an accuracy of 76%, and [25] produced a sensitivity of 93.76%. This section provides a comparison of the effectiveness of several transfer learning networks. Figure 1 shows the proposed methodology in use.

4.1 Dataset description

The dataset comprises 11449 skin lesion images in (.jpg) format. The dataset used in experimentation is ISIC which has been taken from a publicly available repository (https://www.kaggle.com/datasets/qikangdeng/isic-2019-and-2020-melanoma-dataset). The dataset is divided into 2 classes of skin cancer: Melanoma and Nevus. The sample images of both classes are displayed below in Fig. 2a, b:

Figure 3 represents the distribution of images in different classes where ~ 44% of the data contributes to the melanoma class and ~ 56% of the data is contributed by the class, nevus. The dataset was split into training and testing sets with a split of 80–20, the complete distribution is shown in detail in Table 2 and class distribution for the same has been depicted in Fig. 3.

Table 2 A complete distribution of skin cancer images

Full size table

4.2 Data preprocessing

The skin lesion images have been trained and validated using various image pre-processing techniques. This is done to reduce the complexity of the data provided to the model. Each image in the dataset was reduced to 100 × 100 pixels and has three channels: Red, Blue, and Green (RGB). The data was first separated into training and testing sets, whereas 80% of data has been used for training and 20% for testing. The next phase in the preprocessing process was label encoding, which is the act of converting labels into a numeric format so that machines can read them. The use of those labels can then be better determined by ML techniques. It is a necessary pre-processing step for the structured dataset in supervised learning. Melanoma – 0 and nevus – 1 were assigned to both groups in the suggested model.

4.3 Model architecture

To outperform a single classifier, an ensemble of classifiers employs deep learning classifiers [29]. This study attempts to develop a weighted ensemble technique where the final output decision is made based on the weighted total of the anticipated outputs from the classifiers. The weighted ensemble model of the proposed methodology combined predictions from numerous CNN models, including InceptionV3, MobileNet, Xception, ResNet50, VGG16 and others, with weights assigned to each model based on performance and expertise [28]. The weighted ensemble is a voting variation that either gives each model a different weight or assumes that all models are equally capable and contribute equally to the ensemble’s predictions. Each model is given a specific weight, which is then multiplied by the value it predicted and used to get the sum or average forecast. The classification model that was suggested added up the expected probabilities for each class and selected the one with the highest probability. Figure 4 depicts the suggested methodology’s architecture.

To make a weighted average prediction, each ensemble member must first be assigned a fixed weight coefficient. This could be a percentage of the weight represented by a floating-point value between 0 and 1. It might also be an integer beginning with 1 that represents the number of votes each model should receive. The first weighted ensemble model employed fixed average weights that were all equal, whereas the second weighted ensemble model used weights that were computed using a search method that evaluated the model’s performance on various weight combinations. ImageNet dataset has been used as a transfer learning model along with a basic CNN model has been used to propose the weighted ensemble approach. The proposed methodology combines the below-mentioned pre-trained models using transfer learning by changing the output layers to fit the dataset. These models are briefly discussed below in this section: The inceptionV3 model includes convolutions, average pooling, max pooling, concatenations, dropouts, and completely linked layers. To calculate loss, Softmax [13] is utilised. The softmax activation function is depicted in Eq. (4). Batch normalization is used and applied to activation inputs and loss is calculated using the softmax function:

$${\sigma (z)}_{i}=\frac{{e}^{{z}_{i}}}{\sum_{i=1}^{k}{e}^{{z}_{j}}},$$

(4)

where i = 1, 2…, k and z = (z₁, z₂, …, z_k) ∈ ℝ.

MobileNet uses a 53-layer deep CNN architecture that is targeted for mobile and embedded vision applications. The Xception architecture is a 71-layer CNN with the same set of parameters as Inception V3. The implementation advantages are more effectual use of model parameters rather than increased capacity. VGG16 is a deep CNN with 16 layers. It can be imported from pre-trained ImageNet database. The network classifies 1000 different object categories, which results the network library consisting of rich feature representations for a widespread range of images. ResNet50 has 48 layers along with 1–1 max-pool and average pool layer [31]. EfficientNet has CNN architecture that evenly balances all depth, breadth, and resolution dimensions using a compound coefficient with preset scaling factors. 4- layer deep CNN model was fed in the ensemble model in which the kernel size was taken to be 3 × 3, uniformly with an input image size of 100 × 100 and a default batch size of 32. Max pooling was performed on layer 2 and 4 along with dropout on layer 2 to 4 of 0.25. The model has 2 dense layers with batch size 32 and dropout at 0.5. It used ReLU as the activation function in the hidden layers and sigmoid function for the dense layers [30]. The equation for the same has been shown below:

$$\mathrm{ReLU }=\{0 ;x<0 x ;x\ge 0$$

(5)

5 Results and analysis

This section presents the analysis and performance of the suggested methodology on the ISIC dataset. All of the images were fed into both ensemble models, with the first model using equal weights and the second model using different weights. The dataset was split into training and testing sets with an 80:20 ratio. The results were contrasted using the subsequent measures. Formulas for the evaluation criteria are presented in Eq. (6–10):

$$\mathrm{Accuracy }=\frac{(TP+TN)}{\left(TP+FP+FN+TN\right)}$$

(6)

$$\mathrm{Precision }=\frac{(TP)}{\left(TP+FP\right)}$$

(7)

$$\mathrm{Recall }=\frac{(TP)}{\left(TP+FN\right)}$$

(8)

$$\mathrm{F}1-\mathrm{Score }=\frac{2 \times precision \times recall}{precision+ recall}$$

(9)

$$\mathrm{Specificity }=\frac{(TN)}{\left(TN+FP\right)}$$

(10)

The performance of all the 6 different types of transfer learning nets, as well as the basic CNN model with self-constructive layers, is shown using Tables 3 and 4.

Table 3 Comparison of various CNN models on ISIC dataset

Full size table

Table 4 Comparison of various CNN models on another ISIC dataset

Full size table

All the models have been given around with 11449 images in total as input, divided into two categories: 5106 MEL images and 6343 NEV images in the second dataset with an image size of 100 × 100 with RGB configuration. In all the models first, the input images were flattened, using of ‘Sigmoid’ activation function, the ‘BinaryCrossEntropy’ loss function and ‘Adam’ as optimizer.

The first model is InceptionV3. It uses CNN design from the Inception family, which includes label smoothing, which is factorized in 7 × 7 convolution. An auxiliary classifier has been used to transfer label information for lowering down the network. It gives maximum accuracy of 79.69%, precision of 81%, recall of 80%, F1 score of 79%, sensitivity of 65%, and specificity of 92%, it is the most accurate method available. The second model is MobileNet. It has a streamlined architecture with low latency and it uses depth-wise separable convolutions. With a maximum accuracy of 84.54%, precision of 85%, recall of 85%, F1 score of 84%, sensitivity of 77%, and specificity of 91%, this is the best possible result. Xception obtained a precision 86%, recall 85%, F1 score 85%, sensitivity75%, specificity-94%, and the accuracy 85.20%. For all input weights, the default pre-training model has been set to "xception."VGG-16 achieved an accuracy of 87.68%, precision of 88%, recall 88%, F1 score of 88%, sensitivity of 88% and, specificity of 88%. The default pre-training model has been set as "Imagenet" for all input weights. The fifth model is CNN. The ensemble model was fed uniformly with an input picture size of 100 100 and a default batch size of 32 using a custom 4-layer deep CNN model with a kernel size of 33. Layers 2 and 4 had a dropout of 0.25 and were max pooled. Two dense layers with a batch size of 32 and a dropout of 0.5 make up the model. ReLU was the chosen activation function for the hidden layers, and sigmoid was used for the dense layers. This model achieves 89.87%, 90% precision, 90% recall, 90% F1 score, 85% sensitivity, and 94% specificity. ResNet50 is the sixth model. The design is transformed into a residual network by these residual blocks or skip links. This model has a 90.61 accuracy, a 91% precision, a 91% recall, a 91% F1 score, an 88% sensitivity, and a 93% specificity. EfficientNetB4 is the seventh and final model. EfficientNetB4 has a CNN design and scaling strategy that equally scales all depth/width/resolution dimensions using a compound coefficient. With the maximum attainable accuracy of 91.79%, the precision of 92%, recall of 92%, F1 score of 92%, the sensitivity of 90%, and specificity of 93%, it outperforms all other models.

The suggested methodology was also applied to another ISIC dataset (https://www.kaggle.com/datasets/jaiahuja/skin-cancer-detection) to assess the model’s robustness in light of the recent decades’ rapid growth in the prevalence of melanoma-related mortality. The suggested methodology distinguishes between two types of skin cancer, Melanoma and Nevus, only these classes were used to compare the two datasets. There are 827 images in this collection, divided into two categories: Melanoma (454) and Nevus (373). The other dataset was employed to train and test all of the CNN models used in the proposed methodology. Table 4 illustrates the performance of the six different pre-trained transfer learning models as well as the custom CNN on this dataset. Figure 6 represents comparative graphs of the metrics in Tables 3 and 4.

As it can be seen from Table 4 ResNet50 was found to be the best-performing algorithm, with accuracy, precision, recall, and performance factors of 78.31%, 78%, and (Sensitivity – 84% and Specificity – 70%), respectively. VGG16 comes in second, followed by Xception and the CNN algorithm. Furthermore, the InceptionV3 and MobileNet algorithms have the lowest accuracy, with an average accuracy of 55.42% and 67.46%, respectively. According to performance factors such as sensitivity and specificity, the EffiecientNetB4 model has the highest specificity (79%) and the highest sensitivity (84%).

In the comparative graph, the blue bars showcase the accuracy, orange bars show the precision, grey bars show the recall, yellow bars show the F1-score, light blue shows the sensitivity and green shows the specificity of respective algorithms. Figure 5a, b are comparative graphs of the first and second ISIC datasets respectively showcasing the accuracy, precision, recall, f1-score, sensitivity and specificity of seven different algorithms.

The Training (blue line) and Testing (red line) curves are shown in Fig. 6 for the ISIC dataset’s model accuracy curve (Orange line). There are 5106 images in MEL class and 6343 images in NEV class. The input shape of the images was taken to be the same as the image size with their RGB configuration. After 76 epochs of convergence, EfficientNetB4 and ResNet50 have the highest accuracy of 91.79% and 90.61%, respectively, as shown in the graph above. InceptionV3 model has the lowest accuracy, followed by MobileNet, Xception, and VGG16. The best accuracy was found to be around 90.04% using a self-layer constructed CNN model.

Figure 7 is the loss curves showcase the bad essence of our model. Or in other words how bad our model is behaving as time progresses. Better the accuracy of the model less deviated will be the loss curve. From the curves as it could be observed that in Fig. 8a, c, g the loss is deviating similarly throughout the epochs. But in the case of Fig. 8b, d, f the curves keep on getting poorer as the height of the curve is exponentially rising. Thus, the loss is more in such cases. On analyzing Table 4, it was seen that VGG16 and ResNet50 performed equally well on the other ISIC dataset. Figure 7 shows the model accuracy curves of all the CNN models mentioned in Table 4 and corresponding model loss curves of all the CNN models are shown in Fig. 10.

Figure 8 is the Training (blue line) and Testing (red line) curves are shown in Fig. 8 for the first ISIC dataset’s model accuracy curve (Orange line). There are 454 images in MEL class and 373 images in NEV class in the dataset. The input shape of the images was taken to be the same as the image size with their RGB configuration. After 76 epochs of convergence, the best performing algorithm turned out to be ResNet50 with accuracy, precision, recall and performance factors close to 78.31%, 78%, 78% and (Sensitivity – 84% and Specificity -70%) respectively. The second best is VGG16 followed by Xception and CNN algorithm. Also, InceptionV3 and MobileNet algorithm shows out the least accuracy with an average accuracy close to 55.42% and 67.46% respectively (Fig. 9).

The suggested average ensemble model and the weighted ensemble model’s accuracy, precision, recall, and F1 score are shown in Table 5. When looking at Table 5(a), it is clear that the weighted ensemble model outperformed the average ensemble model using the weights listed in Table 2. The weighted ensemble model’s accuracy was 93.36%, and its precision, recall, and f1-score were all 93% whereas performance factors were 97%. Table 5(b) reveals that the dataset used to test the resilience of both models performed well, with the weighted ensemble model having an accuracy of 85.54% and precision, recall, and f1-score of 86% each. whereas performance factors were 88%—sensitivity and 82% specificity.

Table 5 Performance metrics of proposed ensemble models

Full size table

A confusion matrix (CM) is a tabular representation that describes the performance of a classification method. A confusion matrix is used to display and summarize the results of a classification algorithm. The confusion matrix is made up of four major features (numbers) that define the classifier’s metric of measurement. TP, TN, FP, FN are the four numbers. A True Positive (TP) was considered when the individual record was correctly identified as a malignant sample. If the non-cancerous sample was correctly predicted, so it has been identified as a True Negative (TN). False Positive (FP) was considered when the model incorrectly identified healthy patients as malignant instances. False Negative (FN) was taken into consideration when malignant cases were recognized as normal ones [39]. On both datasets, Fig. 10 displays the confusion matrices of the suggested average and weighted ensemble models.

Figure 10 depicts confusion matrix of average and weighted ensemble models of both datasets. The average accuracy of all the models was taken and then the ensemble model was applied. The first dataset consisted of 5106 images in MEL class and 6343 images in NEV class and the second dataset consisted of 454 images in MEL class and 373 images in NEV class. Where the X-axis showcases the predicted labels and Y-axis the True labels (Fig. 11).

Model training run time refers to the duration required to train a machine learning model on a given dataset. The duration depends on various factors, such as model complexity, dataset size, hardware resources, and optimization techniques used. Smaller models with limited data may train quickly, taking a few minutes or hours. In contrast, larger models, like deep neural networks and vast datasets might necessitate days, weeks, or even months to complete training. Parallel processing, distributed computing, and accelerators like GPUs can significantly reduce training time. Efficient algorithms, transfer learning, and optimization advances also play a crucial role in decreasing training run time, making model development more feasible and scalable (Table 6, Fig. 12).

Table 6 Performance metrics of proposed ensemble models

Full size table

An AUV curve has been generated for the proposed ensemble model. This curve depicts that the prediction of EfficientNetB4 is highest with an AUC value of about 0.917. Whereas all other models have almost the same AUC value with InceptionV3 and MobileNet models presenting the least value of about 0.785 and 0.839 respectively. On average all the seven models are best for this dataset as the AUC value is revolving near to grade between 0.8 to 0.9 on average.

5.1 Analysis using state-of-the-art methods

The implementation of the proposed weighted ensemble model is compared and analyzed. The authors of [4] worked with PNASNet-5-Large, InceptionResNetV2, SENet154, and InceptionV4 neural networks that use deep learning-based models. Before being fed into the network, dermoscopic images were correctly processed and enhanced and proposed a melanoma screening system [7] that segmented and classified skin lesions as malignant or benign using a convolutional neural network (CNN) architecture, as well as other ML classification methodologies. For the test image, the segmentation technique creates a binary mask, which was then used to segment out the lesion. A multi-scale convolutional neural network was utilised in [8]. Their method relied on a pre-trained Inception-v3 network using the ImageNet dataset, which was fine-tuned for skin lesion classification using two distinct scales of input images. Similar to [17], where the authors proposed a set of procedures for segmenting skin lesions and assessing the observed area and surrounding skin tissues for melanoma detection. The authors of [11] proposed graphic processors for medical skin image analysis that uses an ANN system to detect similar patterns by processing in a set of components tasked with detecting the features of the object to analyse into the image and studying edges or contour to determine a final diagnostic. The collection includes 730 photographs of good and bad incidents from the International Skin Images Collaboration’s MED-NODE project (ISIC). The data generation unit, depending on GAN in [14], was largely reliant on the processing system for removing image occlusions and the processing unit for populating limited lesion classes or equivalently constructing remote patients with pre-defined types of lesions. [15] suggested a progressive generative adversarial network that produced a substantially larger range of augmentations. Similarly, [16] looked into the prospect of using generative adversarial networks to create realistic-looking dermoscopic images (GANs). The images were then used to supplement a deep CNN’s existing training set in order to improve its performance on the skin lesion classification test. [24] describes a deep learning system that uses two full CNN to achieve segmentation and coarse classification performance simultaneously (FCRN). By computing the distance heat map, a lesion index calculation unit (LICU) was established to refine the coarse categorization findings. For the dermoscopy picture feature extraction job, a basic CNN was presented. The proposed methodology’s performance was examined and analysed against state-of-the-art approaches. Table 7 depicts the comparison analysis.

Table 7 A comparison of the proposed methodology with the state-of-the-art methods

Full size table

6 Discussion & conclusion

Skin cancer is defined as the abnormal proliferation of malignant cells in the epidermis, the skin’s outer layer, as a result of unrepaired DNA damage that causes mutations. Dermoscopy images must be manually evaluated by dermatologists, which takes time and is imprecise and subjective. The study’s main objective was to combine DL algorithms to better effectively detect skin cancer. The scientists then presented a weighted ensemble classifier that performs binary classification using an ensemble of seven deep learning neural networks, including InceptionV3, VGG16, Xception, ResNet50, and others. The suggested ensemble technique, when applied to the first ISIC dataset, obtains maximum accuracy, precision, recall, f1-score, sensitivity, and specificity of 93.36%, 93%, 93%, 93%, and 97%, respectively. Another ISIC dataset has been used to assess and analyse the efficacy of the proposed method. On the other ISIC dataset, the proposed weighted ensemble classifier’s accuracy was 85.54%.

It has been concluded that a weighted ensemble technique is one in which the decision regarding the final output is based on the weighted sum of the expected outputs of the classifiers. Each model is given a different weight under the weighted ensemble voting type. Each model is given a certain weight, which is then multiplied by the outcome it predicted to produce the total or average prediction. By calculating the anticipated probabilities for each class, the suggested classification model selected the one with the highest probability.

Future research will present a technique to evaluate the performance of the classifier by combining the machine learning and deep learning methodologies. To make the photos occlusion-free, a more effective data purification technique will be devised. Finally, the classifier’s accuracy can be increased further by using the proposed future method in conjunction with a better data purification technique and a sizable, evenly distributed dataset.

Data availability

The data used in this study are given in this link: https://www.kaggle.com/datasets/qikangdeng/isic-2019-and-2020-melanoma-dataset

References

Togaçar M, Cömert Z, Ergen B (2021) Intelligent skin cancer detection applying autoencoder, MobileNetV2 and spiking neural networks. Chaos, Solitons & Fractals, Elsevier, vol 144(C). https://doi.org/10.1016/j.chaos.2021.110714
Kassem M, Hosny K, Fouad M (2020) Skin lesions classification into eight classes for ISIC 2019 using deep convolutional neural network and transfer learning. IEEE Access 8:114822–114832. https://doi.org/10.1109/ACCESS.2020.3003890
Ashraf R, Afzal S, Rehman A, Gul S, Baber J, Bakhtyar M, Mehmood I, Song O, Maqsood M (2020) Region-of-interest based transfer learning assisted framework for skin cancer detection. IEEE Access 8:147858–147871. https://doi.org/10.1109/ACCESS.2020.3014701
Milton MAA (2019) Automated skin lesion classification using ensemble of deep neural networks in ISIC 2018: skin lesion analysis towards melanoma detection challenge, arXiv:1901.10802
Banasode P, Patil M, Ammanagi N (2021) A melanoma skin cancer detection using machine learning technique: support vector machine. In: IOP Conference Series: Materials Science and Engineering. IOP Publishing, vol 1065, no. 1, p 012039
Jana E, Subban R, Saraswathi S (2017) Research on skin cancer cell detection using image processing. In 2017 IEEE International conference on Computational Intelligence and Computing Research (ICCIC). IEEE, pp 1–8. https://doi.org/10.1109/ICCIC.2017.8524554
Singh V, Nwogu I (2018) Analyzing skin lesions in dermoscopy images using convolutional neural networks. In 2018 IEEE International Conference on Systems, Man, and Cybernetics (SMC). IEEE, pp 4035–4040. https://doi.org/10.1109/SMC.2018.00684
DeViries T, Ramachandram D (2017) Skin lesion classification using deep multi-scale convolutional neural networks. arXiv preprint arXiv:1703.01402
Jain S, Singhania U, Tripathy B, Nasr E, Aboudaif M, Kamrani A (2021) Deep learning-based transfer learning for classification of skin cancer. Sensors 21:8142. https://doi.org/10.3390/s21238142
Article ADS CAS PubMed PubMed Central Google Scholar
Yu L, Chen H, Dou Q, Qin J, Heng P (2016) Automated melanoma recognition in dermoscopy images via very deep residual networks. IEEE 36(4):994–1004. https://doi.org/10.1109/TMI.2016.2642839
Article Google Scholar
Mahecha M, Parra O, Velandia J (2019) Design of a system for melanoma detection through the processing of clinical images using artificial neural networks. 17th Conferenceone-Business, e-Services ande-Society(I3E), pp 605–616, https://doi.org/10.1007/978-3-030-02131-3_53ff.ffhal-02274187f
Liao H, Li Y, Luo J (2016) Skin disease classification versus skin lesion characterization: achieving robust diagnosis using multi-label deep neural networks. In: 23rd International Conference on Pattern Recognition (ICPR). IEEE, pp 355–360
Rezvantalab A, Safigholi H, Karimijeshni S (2018) Dermatologist-level dermoscopy skin cancer classification using different deep learning convolutional neural networks algorithms. arXiv preprint arXiv:1810.10348
Bisla D, Choromanska A, Berman RS, Stein JA, Polsky D (2019) Towards automated melanoma detection with deep learning: data purification and augmentation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops, pp 0–0
Abdelhalim ISA, Mohamed MF, Mahdy YB (2021) Data augmentation for skin lesion using self-attention based progressive generative adversarial network. Expert Syst Appl 165:113922
Rashid H, Tanveer MA, Khan HA (2019) Skin lesion classification using GAN based data augmentation. In: 2019 41St annual international conference of the IEEE engineering in medicine and biology society (EMBC). IEEE, pp 916–919
Codella NC, Nguyen QB, Pankanti S, Gutman DA, Helba B, Halpern AC, Smith JR (2017) Deep learning ensembles for melanoma recognition in dermoscopy images. IBM J Res Dev 61(4/5):5–1. https://doi.org/10.1147/JRD.2017.2708299
Xie F, Fan H, Li Y, Jiang Z, Meng R, Bovik A (2016) Melanoma classification on dermoscopy images using a neural network ensemble model. IEEE Trans Med Imaging 36(3):849–858. https://doi.org/10.1109/TMI.2016.2633551
Suganya R (2016) An automated computer aided diagnosis of skin lesions detection and classification for dermoscopy images. 2016 fifth international conference on recent trends in information technology, pp 1-5
Farooq M, Azhar M, Raza R (2016) Automatic Lesion Detection System (ALDS) for skin cancer classification using SVM and neural classifiers. 2016 IEEE 16th international conference on bioinformatics and bioengineering, pp 301–308. https://doi.org/10.1109/BIBE.2016.53
Srivastava V, Kumar D, Roy S (2022) A median based quadrilateral local quantized ternary pattern technique for the classification of dermatoscopic images of skin cancer. Comput Electr Eng 102:108259
Article Google Scholar
Jaleel JA, Salim S, Aswin RB (2013). Computer aided detection of skin cancer. In: 2013 International Conference on Circuits, Power and Computing Technologies (ICCPCT). IEEE, pp 1137–1142
Nugroho A, Slamet I, Sugiyanto (2019) Skins cancer identification system of HAMl0000 skin cancer dataset using convolutional neural network. In: AIP conference proceedings. AIP Publishing, vol 2202, no. 1. https://doi.org/10.1063/1.5141652
Li Y, Shen L (2018) Skin lesion analysis towards melanoma detection using deep learning network. Sensors 18(2):556. https://doi.org/10.3390/s18020556
Roy S, Shoghi KI (2019) Computer-aided tumor segmentation from T2-weighted MR images of patient-derived tumor xenografts. In: Image analysis and recognition: 16th international conference, ICIAR 2019, Waterloo, ON, Canada, August 27–29, 2019, Proceedings, Part II 16 (pp. 159–171). Springer International Publishing
Nijhawan R, Bhatnagar D, Roy S (2022) Diagnosing skin lesion using multi-modal analysis. In: 2022 5th international conference on computational intelligence and networks (CINE), IEEE, pp 1–5
Brinker T, Hekler A, Enk A, Kalle C (2019) Enhanced classifier training to improve precision of a convolutional neural network to identify images of skin lesions. PLoS ONE 14(6):e0218713. https://doi.org/10.1371/journal.pone.0218713
Dang T, Nguyen TT, Moreno-Garcia C, Elyan E, McCall J (2021) Weighted ensemble of deep learning models based on comprehensive learning particle swarm optimization for medical image segmentation. 021 IEEE Congress on Evolutionary Computation (CEC), pp 744–751
Roy S, Meena T, Lim SJ (2022) Demystifying supervised learning in healthcare 4.0: a new reality of transforming diagnostic medicine. Diagnostics 12(10):2549
Meena T, Kabiraj A, Reddy PB, Roy S (2023) Weakly supervised confidence aware probabilistic CAM multi-thorax anomaly localization network. In: 2023 IEEE 24th International Conference on Information Reuse and Integration for Data Science (IRI), Bellevue, pp 309–314. https://doi.org/10.1109/IRI58017.2023.00061
Roy S, Bhattacharyya D, Bandyopadhyay SK, Kim TH (2017) An effective method for computerized prediction and segmentation of multiple sclerosis lesions in brain MRI. Comput Methods Programs Biomed 140:307–320
Article PubMed Google Scholar
Alfi IA, Rahman MM, Shorfuzzaman M, Nazir A (2022) A non-invasive interpretable diagnosis of melanoma skin cancer using deep Learning and ensemble stacking of machine learning models. Diagnostics 12(3):726
Article PubMed PubMed Central Google Scholar
Shahsavari A, Khatibi T, Ranjbari S (2023) Skin lesion detection using an ensemble of deep models: SLDED. Multimed Tools Appl 82(7):10575–10594
Article Google Scholar
Jin Q, Cui H, Sun C, Meng Z, Su R (2021) Cascade knowledge diffusion network for skin lesion diagnosis and segmentation. Appl Soft Comput 99:106881
Article Google Scholar
Imran A, Nasir A, Bilal M, Sun G, Alzahrani A, Almuhaimeed A (2022) Skin cancer detection using combined decision of deep learners. IEEE Access 10:118198–118212
Article Google Scholar
Tembhurne JV, Hebbar N, Patil HY, Diwan T (2023) Skin cancer detection using ensemble of machine learning and deep learning techniques. Multimed Tools Appl 82(18):27501–27524
Article Google Scholar
Basak H, Kundu R, Sarkar R (2022) MFSNet: A multi focus segmentation network for skin lesion segmentation. Pattern Recogn 128:108673
Article Google Scholar
Srinivasu PN, SivaSai JG, Ijaz MF, Bhoi AK, Kim W, Kang JJ (2021) Classification of skin disease using deep learning neural networks with MobileNet V2 and LSTM. Sensors 21:2852. https://doi.org/10.3390/s21082852
Article ADS PubMed PubMed Central Google Scholar
Srinivasu PN, Shafi J, Krishna TB, Sujatha CN, Praveen SP, Ijaz MF (2022) Using recurrent neural networks for predicting type-2 diabetes from genomic and tabular data. Diagnostics 12:3067. https://doi.org/10.3390/diagnostics12123067
Article CAS PubMed PubMed Central Google Scholar
Pal D, Meena T, Roy S (2023) A fully connected reproducible SE-UResNet for multiorgan chest radiographs segmentation. In: 2023 IEEE 24th International Conference on Information Reuse and Integration for Data Science (IRI), Bellevue, pp 261–266. https://doi.org/10.1109/IRI58017.2023.00052
Fekri-Ershad S, Alsaffar MF (2023) Developing a tuned three-layer perceptron fed with trained deep convolutional neural networks for cervical cancer diagnosis. Diagnostics 13(4):686. https://doi.org/10.3390/diagnostics13040686
Article PubMed PubMed Central Google Scholar

Download references

Author information

Authors and Affiliations

AIAS, Amity University, Uttar Pradesh, Noida, 201303, India
Himanshi Meswal
Department of Computer Science and Engineering, Bharati Vidyapeeth’s College of Engineering, New Delhi, 110063, India
Deepika Kumar & Aryan Gupta
Artificial Intelligence & Data Science, Jio Institute, Navi Mumbai, 410206, India
Sudipta Roy

Authors

Himanshi Meswal
View author publications
You can also search for this author in PubMed Google Scholar
Deepika Kumar
View author publications
You can also search for this author in PubMed Google Scholar
Aryan Gupta
View author publications
You can also search for this author in PubMed Google Scholar
Sudipta Roy
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sudipta Roy.

Ethics declarations

Conflict of interest

All authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Meswal, H., Kumar, D., Gupta, A. et al. A weighted ensemble transfer learning approach for melanoma classification from skin lesion images. Multimed Tools Appl 83, 33615–33637 (2024). https://doi.org/10.1007/s11042-023-16783-y

Download citation

Received: 27 March 2023
Revised: 21 July 2023
Accepted: 31 August 2023
Published: 23 September 2023
Issue Date: March 2024
DOI: https://doi.org/10.1007/s11042-023-16783-y

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

A weighted ensemble transfer learning approach for melanoma classification from skin lesion images

Abstract

Similar content being viewed by others

Deep Ensemble Architectures for Skin Lesion Detection

An Integrated Ensemble Network Model for Skin Abnormality Detection with Combined Textural Features

Skin cancer detection using ensemble of machine learning and deep learning techniques

1 Introduction

2 Literature review