1 Introduction

Melanoma is the most serious type of skin cancer that develops in the skin cells called melanocytes [43]. A study [41] reported about 10,000 annual deaths because of melanoma skin cancer in the US only. The melanoma skin lesion is caused by the abnormal production of skin particles which in turn produce the body cells [46]. Based on the adversity of melanoma, these skin moles vary in texture and colors like brown, pink, red and black, etc. If the size of moles is beyond 6 mm with abnormal color, then there is a need for a detailed checkup by the dermatologist for possible melanoma containment. Melanoma lesions are divided into two classes named Benign and Malignant. The benign melanoma lesions depict the initial level of skin cancer, which is curable. Whereas, malignant melanoma is a dangerous type of skin cancer that can eventually cause the death of the patient. Initially, the dermatologists used to analyze such abnormal moles with naked eyes, by examining their texture, color, and size [34]. However, we often experience delays in the screening process of melanoma because of the limited availability of dermatologists. Melanoma identification at an early stage is crucial as not only it increases the survival chances of the patient but can also save them from the tough surgical processes [37]. The availability of sophisticated computer vision algorithms these days has motivated researchers to develop effective automated methods for melanoma detection [10, 37].

Existing automated approaches for melanoma detection can be broadly categorized into handcrafted features and deep learning (DL) based approaches. The handcrafted features use the key points of extraction-based methods for the recognition of skin moles [6, 13, 47]. However, these techniques are unable to accurately detect the skin lesions because of deviations in size, texture, and color of the skin moles. To increase the accuracy of melanoma identification systems, classification is performed after the segmentation of the melanoma region from the normal skin, as done by the expert dermatologists [8]. Region-of-interest (ROI) based techniques are employed in [40, 44] for the segmentation of melanoma regions. These techniques provide a better representation of the melanoma attributes during feature extraction and precise classification of melanoma affected regions. Thus, effective segmentation is a mandatory requirement for developing accurate melanoma detection systems [19, 20, 42] However, the performance of the ROI or threshold-based methods degrades significantly in case of low-resolution images, low illumination conditions, and variations in contrast, illumination conditions, and chrominance. For real-world problems, it is practically difficult to capture the image with uniform properties, therefore, we need to develop effective melanoma detection systems to overcome the above-mentioned limitations.

In the last few years, we witnessed the effectiveness of DL based approaches in various domains including medical image processing [3, 32, 36, 48]. The deep convolutional neural network (CNN) automatically learns complex keypoints directly from the input samples and provides improved recognition ability of melanoma-affected regions. Based on these advantages, DL based skin lesion detection has attracted the researcher’s attention [49]. However, most of the DL based approaches require preprocessing of the input images to overcome the problem of feature map saturation [4]. To avoid the preprocessing step, in [5] a SegNet based network is used which maps input image to pixel-wise semantic label using feature learning. In [4] recurrent and CNNs are employed, whereas, in [9] a semi-automated fully CNN is used for melanoma detection.

Timely and precise automated identification and classification of melanoma lesion is still a challenging task because of the presence of low-intensity information among melanoma lesion and skin portion, and massive similarity between the melanoma affected and non-affected body parts. In this paper, we have tried to overcome these challenges by employing a faster regional convolutional neural network (Faster-RCNN) to compute the deep features of input images and to localize the melanoma moles. These deep features are then used to train the SVM for classification. The proposed method is robust to variations in chrominance, intensity, contrast, illumination conditions, hair, and tiny blood vessels, blurring, and high-dense noisy images. The major contributions of the proposed work are as follows:

  • Accurate localization of melanoma regions due to the region proposal network of Faster-RCNN.

  • Efficient classification of melanoma because of the competence of the SVM classifier to deal with the over-fitted training data.

  • Rigorous experimentation was performed against several latest melanoma recognition methods on a standard ISIC-2016 database containing different distortions like blurring, chrominance and intensity variations, the existence of hair and tiny blood vessels, and high-dense noisy images to show the efficacy of the introduced framework. Moreover, we have also performed cross-validation over the ISIC-2017 dataset to show the applicability of our method in real-world scenarios.

  • To the best of our knowledge, it is the first time in medical analysis when Faster-RCNN has been employed for skin lesion detection. Reported results exhibit the efficacy of Faster-RCNN to detect the melanoma moles and computation of a deep and discriminative set of features with improved performance.

The rest of the paper has the following structure: Section II presents the related work, while the proposed framework is discussed in detail in Section III. Performance evaluation of our framework is presented in Section IV, and finally, Section V concludes the proposed work.

2 Related work

This section presents a critical investigation of the existing latest melanoma recognition methods. Existing works have used either conventional machine learning (ML)-based techniques or DL-based techniques for melanoma detection.

The conventional ML classifiers have been employed in [7, 9, 14, 39, 45] for melanoma detection. In [9, 14], hand-crafted key-point extraction-based approaches were used for melanoma detection. Codella et al. [14] employed the edge and color histogram along with the local binary patterns (LBP) for melanoma lesion identification. Finally, the SVM classifier was trained for melanoma classification. Similarly, Barata et al. [7] introduced a framework for skin lesion localization using the local and global features extracted from the input images. More specifically, Laplacian pyramids and gradient histogram were used for global key-points extraction, whereas, Bag of Features (BoF) was employed for local keypoints extraction. Based on the obtained features, KNN, SVM, and AdaBoost classifiers were trained for skin lesions classification. It is concluded in [7] that global features are more efficient in melanoma detection over local features. Rehman et al. [39] presented a method for melanoma localization. Speeded-up robust feature (SURF) descriptor along-with histogram-oriented gradient (HOG) was employed for feature extraction that was then used to train the SVM for classifying the melanoma moles. This technique is robust to melanoma detection for low contrast skin lesion images at the expense of increased features computation cost. Singh et al. [45] employed the Zernike Moments (ZM) and Pseudo Zernike Moments (PZM) features to train the SVM for the classification of skin lesions. Alcon et al. [1] presented a method based on using information about the medical history of the patient before performing the diagnosing procedure. Otsu’s thresholding algorithm [31] was used for the lesion segmentation, and the ABCD rule of dermatoscopy was employed for feature extraction. Finally, a hybrid classification model consisting of Decision Tree Learner (J48), Decision Stump, Logistic Model Trees, and Bayesian Networks were trained to classify the skin moles. This approach [1] exhibits better accuracy for melanoma classification, however, it achieves poor specificity. Cavalcanti et al. [12] proposed a two-stage technique to classify skin lesions. After performing the segmentation using the 3-channel image representation-based technique [11], ABCD rules of dermatoscopy along with melanin variation features were employed to compute the key points. Finally, the KNN classifier was trained to discriminate against the skin moles. This method [12] is unable to achieve a better classification accuracy of melanoma lesions due to the morphological variations of skin moles. Giotis et al. [21] employed both automated and manual features to develop a technique for melanoma detection. Automated features were computed using the color and texture information of the skin moles, whereas, manual features like size, color, shape, and body part where moles exist, etc. were decided by the dermatologist after performing the examining process. Finally, feature selection between the manual and automated features was performed using a voting process. This method provides better performance for melanoma lesion detection, however, the hand-crafted key-features extraction-based methods in general exhibit lower performance for melanoma localization because of change in size, shape, and chrominance of the skin moles. Hu et al. [28] presented an approach for the automated classification of melanoma lesions. Initially, [28] introduced an efficient codebook learning technique based on FSM, which employed the linearly independent and linear prediction (LP) approaches to compute keypoints similarity. Secondly, RGB color histogram and SIFT were adopted to calculate the final feature vector. Finally, the SVM classifier was trained over the computed keypoints to classify the input samples into various classes. The approach in [28] is economically efficient, however, it may not performs well for the images with intense variations in color.

Recently, deep-learning methods are commonly used in various computer vision applications due to their high accuracy rate [52]. Nida et al. [35] introduced an automated framework for the identification and segmentation of skin lesions. RCNN was employed for detecting the melanoma moles followed by applying the Fuzzy C-mean (FCM) for lesion segmentation. Although, this method provides superior melanoma segmentation accuracy, however, at the overhead of increased computational cost. Gulati et al. [24] proposed a DL network for automated classification of a skin lesion by employing two different architectures of CNN named AlexNet and VGG16 for feature extraction. The extracted key-points were used to train the ECOC-SVM classifier to differentiate the melanocytic moles. A hybrid CNN model [33] based on AlexNet, ResNet-18, and VGG16 was employed for feature extraction. Later, these features were used to train the SVM for melanoma classification. Similarly, Harangi et al. [26] presented a hybrid CNN model consisting of GoogLeNet, AlexNet, ResNet, and VGGNet to classify the skin moles. The output of each network was fused to generate the output. These methods [24, 26, 33] exhibit better melanoma classification performance, however, these techniques are computationally complex. Li et al. [30] presented a methodology for the automated classification of melanoma lesions. Lesion Feature Network (LFN) was used for computing the image features. A DL architecture containing two fully convolutional residual networks (FCRN) was introduced to consecutively generate the segmentation and classification results. Finally, the lesion index calculation unit (LICU) was employed to improve the classification output by computing the distance heat-map. Hosny et al. [27] introduced a CNN-based framework for skin lesion classification. Data augmentation and transfer learning along-with AlexNet was applied to classify the skin moles into three classes. Yap et al. [54] presented a methodology for melanoma classification by employing a ResNet DL model for extracting the image features. Afterward, the extracted key-points were combined with the patient’s medical history. This method [26] provides reasonable melanoma classification accuracy, however, unable to perform well for those scenarios where the patient’s metadata is missing. Al-Masni et al. [2] utilized the VGGNet model for the classification of skin cancer moles. This approach [2] exhibits better classification performance, however suffers from the over-fitting problem. Yang et al. [53] proposed a DL-based framework for the automated segmentation and classification of melanoma moles. Initially, a region average pooling (RAPooling) approach was used which computed the keypoints from the region of interest. Secondly, an end-to-end classification network together with the segmentation process was designed, which employed the localized lesion area to guide the classification by RAPooling. Finally, the RankOpt-based classifier was used for melanoma classification. The approach in [53] performs well for skin moles classification, however, it is an economically inefficient method. Existing literature on melanoma detection is unable to achieve better classification performance under variations in illumination conditions, color and texture of skin moles, contrast, blurring, and high-dense noisy images. Additionally, DL-based approaches are computational complex. Thus, to address the limitations, there exists a dire need to develop a more effective and efficient melanoma detection method.

3 Materials and methods

This work presents a novel technique for the automatic detection and classification of melanoma lesions by employing the Faster-RCNN deep learning model and SVM classifier. In the first step, the input image is preprocessed to eliminate the unnecessary objects that can degrade the classification results of the presented methodology. We employ the Faster-RCNN technique on the processed images to compute the deep features and to localize the melanoma moles. Finally, these deep features are used to train a binary SVM for melanoma lesion classification into benign and malignant classes. The architecture of the proposed framework is shown in Fig. 1.

Fig. 1
figure 1

Architecture of proposed method

3.1 Preprocessing

For real-life scenarios, it is practically impossible to obtain images without lightning or chrominance variations that can occur due to the sudden change in illumination source, or reflection of the light produced by the skin. The presence of unnecessary information in the training images can affect the detection performance of the proposed network. Additionally, the presence of hair and tiny blood vessels in skin mole images also affects the classification performance. Therefore, a light adjustment step is performed on the input images to address these problems. For this purpose, the morphological closing operation is applied to remove the undesirable information from the images. To restore the quality of the images, an un-sharp filter [38] is employed to reduce the impact of smoothness caused by the morphological closing operation. We applied the morphological closing operator as follows:

$$ {I}_m\left(x,y\right)=\left(I\left(x,y\right)\oplus S\right)\ominus S $$
(1)

where I(x, y) shows the input image, x and y are pixels location, whereas, S is the structuring elements of size 10 with square shape along the angle of 900 and 1800 against every pixel. The Im(x, y) represents the processed image, without hair or tiny blood vessels. Although, this operation effectively extracts the skin mole region without hair or tiny blood vessels, however, the closing operation resulted in blurriness in the processed image. To reduce the blurring effect from the image, we applied the unsharp filter as follows:

$$ {I}_{us}\left(x,y\right)={I}_m\left(x,y\right)\times \varpi \left(x,y\right) $$
(2)

Where

$$ \varpi \left(x,y\right)=-\frac{1}{\pi {\sigma}^4}\left[1-\frac{x^2+{y}^2}{2{\sigma}^2}\ \right]{e}^{-\frac{x^2+{y}^2}{2{\sigma}^2}} $$
(3)

The final processed image If(x, y) is obtained using Eq. (4) that only represents the necessary information required for melanoma lesion classification.

$$ {I}_f\left(x,y\right)=I\left(x,y\right)-{I}_{us}\left(x,y\right) $$
(4)

After the image enhancement, the processed images are resized to 227 × 227 resolution using bicubic interpolation to minimize the computational complexity of the proposed approach. Finally, the augmentation step is employed to increase the image samples to eight times the current images (900 samples) in the dataset. For this purpose, the resized images are rotated at the angles of 0o, 90o, 180o, and 270o degrees and flipped horizontally.

3.2 Features extraction using faster-RCNN

Effective feature extraction is mandatory for the accurate classification of melanoma lesions into various classes. However, finding an effective feature descriptor is a complex task due to the following reasons: such as using a large features-set, the model may learn some incoherent characteristics i.e., noise, illumination changes, etc. While using a small features-set, the model may skip some important attributes i.e., texture and chrominance variations which make melanoma indistinguishable from the normal skin. To obtain the discriminative and efficient set of features, it is necessary to select an automated features extraction technique without requiring to employ the hand-coded key-features extraction approaches. The approaches based on handcrafted features are unable to efficiently detect melanoma moles because of variations in size, texture, and color of the melanoma moles. To overcome the limitations of hand-crafted features, we employed a Faster-RCNN deep-learning model because it obtains the set of deep features learned directly from the images. The convolution filters of Faster-RCNN compute the key-points of the input image effectively by examining the image structure. The motivation of employing the Faster-RCNN over the RCNN [23] and Fast-RCNN [22] techniques for lesion classification is the dependency of RCNN and Fast-RCNN on the input generic object proposals, which is computed through hand-coded techniques like selective search [17] or EdgeBox [50], etc. Thus, RCNN and Fast RCNN approaches are computationally more complex as compared to Faster-RCNN.

The Faster-CNN, better addresses the limitations of RCNN and Fast-RCNN, comprises of two parts: i) Regional Proposal Network (RPN), and ii) Fast-RCNN. The RPN being the fully convolutional module can automatically generate the object proposal of the input image, which is then passed as input and also refined by the Fast-RCNN module. Both modules share the same convolutional layer which allows the input image to pass through CNN only once to produce and refine its object proposal. For melanoma skin lesion classification, it is challenging to obtain the features of interest amid two potential reasons: i) obtaining the actual position of multiple objects (i.e. lesion) from the input image, ii) class associated with each object (i.e. benign, malignant).

The Faster-RCNN technique can efficiently detect and classify the melanoma lesions of different classes by employing its fully convolutional modules RPN and Fast-RCNN which work by replacing the selective search algorithm. From the input image, the main attributes of melanoma moles such as shape, color, size, and texture are used by the RPN module to locate its position in the input image. RPN module works by employing fewer selected windows and achieves higher recall rates, which helps to reduce the features computation cost of the proposed framework.

The Faster R-CNN approach works by performing the following steps:

Convolution layers

Faster-RCNN is a fully convolutional layers network consisting of a total of 13 convolutional and ReLU layers along with 4 pooling layers. These convolutional layers help the Faster-RCNN network to calculate the feature map of the input image. The calculated feature map is later shared with the RPN module and related layers.

RPN

This step involves the generation of the input object proposal. The RPN module consists of 3 × 3 fully convolutional layer network, which is used to create the anchors and bounding box regression offsets. This module employs the softmax function to determine whether the computed anchors are part of the foreground or background. Finally, the produced anchors and bounding boxes are employed to compute the object proposals.

RoI pooling

This layer works by employing the computed feature map from convolutional layers and proposals from the RPN module to generate the proposal feature maps and share them with all associated layers of the network.

Classification

Finally, the classification step is performed to determine the class of the detected objects (skin lesion). It works by using the output of the Roi pooling layer. The bounding box regression is used to exhibit the resultant location of the detected test box.

The convolution layers perform the mapping between the input and output by using Eq. (5).

$$ {x}_{j+1}=I\left({x}_j,{w}_j\right) $$
(5)

Here xj and xj + 1 are the input and output of the jth layer, respectively, wj shows the value of weights and biases linked with the jth layer. I(xj, wj) shows a dot product between the weights and input regions. Next, the activation function performs transformation by adding the weighted input from the node into the activation of the node or output for that input. Here, we have used the ReLU layer as an activation function which performs pixel-wise (x) activation by using the following equation:

$$ I(x)=\mathit{\max}\ \left(0,x\right) $$
(6)

Where

$$ I(x)=\left\{\begin{array}{c}0\kern4em if\ x<0\\ {}x\kern5.25em else\end{array}\right. $$
(7)

Additionally, max-pooling layers are used to perform downsampling operations to reduce the spatial size by using Eq. (8).

$$ I(X)=\max \left({x}_1,{x}_2,{x}_3,\dots ..{x}_n\right)\kern0.5em where\ {x}_n\mathbf{\in}X $$
(8)

Here, I(X) is presenting the optimized feature vector. A detailed description of the employed Faster-RCNN network is given in Table 1. The main workflow of melanoma lesion localization through Faster-RCNN can be categorized into four steps. Firstly, the suspected image is passed to the convolutional layer to extract the feature map, secondly, the computed features are passed as input to the RPN module to acquire the keypoints information of the region proposals. Next, the ROI pooling layer generates the proposal feature maps by employing the computed feature map from convolutional layers and proposals from the RPN module. Finally, the bounding box regression is used to exhibit the resultant location of the detected melanoma lesion.

Table 1 The faster-RCNN architecture

3.3 Training parameters

To minimize the melanoma classification error, we have used the stochastic gradient descent approach for weight optimization. In the training process, we have employed mini-batch size δ =128, learning rate α = 0.001 along with 130 learning drop factor β with the value of 0.1 after performing the thorough investigation. During training, the value of the learning rate is automatically adjusted by using the piecewise learning schedule. To eliminate the lesion’s localization error and to have an optimized cost function, we have used 100 epochs for the training phase, which employs that we have repeated Faster-RCNN training 100 times to achieve robust localization results. Furthermore, during the testing phase, to accurately localize the melanoma moles, Faster-RCNN uses greedy overlapping criteria for ground truth box and predicted box called intersection-over-union (IoU) with a value of 0.5. The deep keypoints are fed into the softmax layer of Faster-RCNN to calculate the confidence scores in which only the scored proposals with IoU values greater than 0.5 are selected. Furthermore, a Tensorflow Faster-RCNN is used to determine the number of batch cycles required to achieve an acceptable loss value by utilizing the TensorBoard tool. Fig. 2 presented the graphical view of the training loss graph of our work. It can be visualized from Fig. 2 that the training loss reached an optimal value of 0.0021 at the batch size of 3000, which is indicating the robust learning of our model.

Fig. 2
figure 2

Training loss graph

3.4 Classification using SVM

SVM [16] is a mathematical framework that constructs hyperplanes to find the decision boundary for classification. SVM can effectively handle the curse of dimensionality as compared to other traditional approaches [18, 29] and reduces the occurrence of empirical error along-with preserving the complexity level of the mapping function. These properties of SVM enable it to better generalize its prediction behavior and perform well for a new data sample. The main reason to select SVM for melanoma classification is its robustness and ability to handle the over-fitted training data.

After extraction of the deep features, we used these features to train the SVM to classify the melanoma into two classes, i.e., benign and malignant. The training data consists of N melanoma feature vectors prepared as: (x(i), y(i)), i = 1,….N, where y(i) ∈ {1, − 1} represents the benign and malignant melanoma classes. For each feature vector x(i), SVM draws a hyperplane that linearly separates the two classes as:

$$ {w}^T.{x}^{(i)}+\upbeta \ge 1\ \mathrm{if}\ {y}^{(i)}=+1 $$
(9)
$$ {w}^T.{x}^{(i)}+\upbeta <1\ \mathrm{if}\ {y}^{(i)}=-1 $$
(10)

where w is the weight vector and β is the bias. The objective is to maximize the distance between two support vectors by minimizing the norm ||w|| which can be defined as a quadratic optimization problem as shown in Eq. (11):

$$ \min\ \left\Vert w\right\Vert, \mathrm{such}\ \mathrm{that}\ {y}^{(i)}\left({w}^T.{x}^{(i)}+\upbeta \right)\ge 1 $$
(11)

The two melanoma classes (benign and malignant) can be identified by applying the discriminant function f(x) = sign (wT. x(i) + β) as follows:

$$ \left\{\begin{array}{c}\mathrm{malignant},f\left({x}^{(i)}\right)=+1,\\ {}\mathrm{benign},f\left({x}^{(i)}\right)=-1\end{array}\right. $$
(12)

4 Results

This section gives a thorough description of the results obtained after evaluating the performance of the proposed method. Additionally, the details of the dataset are also provided in this section.

4.1 Dataset

The evaluation of the presented technique is performed on the standard dataset named International Skin Imaging Collaboration (ISIC) by “International Symposium on biomedical images (ISBI) in the challenge of skin lesion analysis towards melanoma detection 2016” [25]. This dataset consists of a total of 1279 samples belonging to two classes of melanoma i.e. malignant and benign. The training collection consists of a total of 900 training images among which 173 images belong to the malignant class and the rest of 727 images belong to the benign class. Whereas, the test collection contains 379 images in which 75 images belong to the malignant class and the remaining 304 images to the benign class. All images in the ISIC-2016 dataset are analyzed by a panel of dermatologists. ISIC-2016 dataset is a pathology-based database for melanoma classification where its images contain various artifacts i.e. variations in lesion size, color, texture, and illumination conditions, presence of hair, and tiny blood vessels, etc., which makes it a challenging dataset for melanoma classification. Shown in Fig. 3 are the sample images of the ISIC-2016 dataset.

Fig. 3
figure 3

Sample images of ISIC-2016 dataset

4.2 Evaluation metrics

We employed the sensitivity (sen), specificity (spe), accuracy (acc), Mean Average Precision (mAP), and Intersection over union (IOU) metrics to evaluate the results of the introduced technique. We computed the sensitivity, specificity, accuracy, mAP, and IOU metrics as follows:

$$ sen=\frac{tp}{tp+ fn} $$
(13)
$$ \mathrm{s} pe=\frac{tp}{tp+ fp} $$
(14)
$$ acc=\frac{tp+ tn}{tp+ fp+ tn+ fn} $$
(15)
$$ mAP= mean\frac{tp}{tp+ fp} $$
(16)
$$ IOU=\frac{tp}{tp+ fn+ fp}\times 2 $$
(17)

where tp, tn, fp, and fn are representing the true positive, true negative, false positive, and false-negative cases, respectively.

4.3 Melanoma localization

This section provides the discussion of the experiment used to evaluate the performance of the melanoma localization. For this experiment, we used all samples from the ISIC-2016 dataset and reported the visual results of 30 images. Because of the unstable distribution of data samples in the database, we have employed the data augmentation approach already explained in Section 3.1. The data augmentation step improves the diversity of data samples and overcomes the risk of overfitting. Based on this technique, the proposed approach performs the evaluation analysis for all the test samples of the database. Moreover, the localization power of the Faster-RCNN enables it to accurately detect and differentiate the melanoma moles from the skin portion.

We have used the boxplot to show the localization results in terms of mAP and mean IOU as shown in Fig. 4. The regression layer of Faster-RCNN localized the melanoma moles with better mean mAP and IOU. More specifically, we achieved the average mAP and mean IOU of 0.902 and 0.932 respectively as also shown in the red line inside the box (Fig. 4). The qualitative results are shown in Fig. 5. We can see from the resultant images that the presented framework can precisely localize the melanoma moles even under the occurrence of skin marks, hairs, blood vessels, and clinical swatches. Moreover, our approach can accurately diagnose melanoma lesions of varying sizes and orientations.

Fig. 4
figure 4

Melanoma localization results

Fig. 5
figure 5

Localization of melanoma regions

Although our approach is robust in localizing the skin lesions under the presence of noise, hair, tiny blood vessels, contrast, and chrominance variations, however, still there are some cases in which it may not work accurately. In Fig. 6 we have reported the results on which our method fails to detect the melanoma lesion. The false detection is due to intense variations in light intensity which results in extreme matching between the skin color and melanoma lesions.

Fig. 6
figure 6

Sample images of inaccurately localized melanoma lesions

4.4 Melanoma classification

In this part, we have discussed the classification results to show the effectiveness of the presented solution for classifying the melanoma moles. To evaluate the classification performance of our work, several experiments have been performed over the test images of ISIC-2016 datasets. We have employed an SVM classifier for this purpose. We have increased the size of the training database through the data augmentation step as discussed earlier. So, more than 7000 images localized through Faster-RCNN have been used to train the SVM classifier. The ability of the SVM classifier to deal with over-fitted training data has enabled it to better classify the melanoma lesions. Classification results through the SVM classifier are shown in Fig. 7, which have input images, localization of melanoma regions, and then classification results of the localized region into benign and malignant. The third column of the figure exhibits the benign cases and the last column presents the malignant class images. It can be seen from the reported results that the SVM classifier has accurately classified benign and malignant cases. Table 2 demonstrates the class-wise evaluation results of the presented solution in terms of accuracy, sensitivity, and specificity. Our method has achieved an average accuracy of 0.891 along with the average values of 0.859 and 0.870 for sensitivity and specificity respectively that are showing the proficiency of the introduced framework. Moreover, the confusion matrix is reported in Fig. 8.

Fig. 7
figure 7

Classification results

Table 2 Stage wise performance of the proposed method
Fig. 8
figure 8

Sample images of inaccurately localized melanoma lesions

4.5 Comparative analysis

We have performed an experimental analysis to compare the proposed method against the work of the top eight teams [25] of ISBI challenge-2016 and results are reported in Fig. 9. The teams are ranked according to their obtained specificity score. It can be witnessed from Fig. 9 that our approach attains the highest specificity and average AUC values. More specifically, we achieved the specificity rate of 0.870 and the average AUC value of 0.843, which is greater than the specificity and AUC values achieved by all the works [50]. However, these networks are computationally more complex than our approach, as they have deeper network architectures. From this comparative analysis, we can conclude that the presented methodology exhibits effective and efficient classification performance due to the accurate localization of melanoma lesions using Faster-RCNN.

Fig. 9
figure 9

A comparison of classification results with other technique

4.6 Comparison with base models

We have also evaluated the performance of our approach with existing latest techniques like DenseNet-201 [2], ResNet-50 [2], Inception-v3 [2] and Inception-ResNet-v2 [2]. The results are reported in Table 3 and Fig. 10. The presented framework achieves the average specificity score of 0.87, whereas the comparative methods namely DenseNet-201, ResNet-50, Inception-v3, and Inception-ResNet-v2 achieved the specificity score of 0.652, 0.679, 0.662, and 0.714 respectively. The average sensitivity rate of our method is 0.859 while the other methods like DenseNet-201, ResNet-50, Inception-v3, and Inception-ResNet-v2 achieved the sensitivity rate of 0.812, 0.799, 0.770, and 0.818 respectively. Similarly, the average AUC value of our approach is 0.843, while the AUC value of DenseNet-201, ResNet-50, Inception-v3, and Inception-ResNet-v2 frameworks are 0.736, 0.739, 0.716, and 0.766 respectively. From the results, it can be visualized that our method achieved higher results as compared to comparative techniques in terms of AUC, sensitivity, and specificity. The comparative methods employed very dense layered deep networks that can easily encounter the problem of over-fitting. As in our technique, the SVM classifier can deal with an over-fitted model, therefore we can say that our method is more robust in terms of pattern recognition for melanoma classification. Furthermore, the presented framework has potential and applicable for future medical practices, as it requires a minimum time (.004 s) to process a dermoscopic image which is less than from all other approaches (Table 3).

Table 3 Comparative results
Fig. 10
figure 10

Graphical representation of comparative results

4.7 Performance comparison with state-of-the-art techniques

This experiment has been designed to evaluate the performance of our approach for melanoma lesion classification against the latest state-of-the-art methods. For classification, we have compared our framework against [51, 55,56,57] techniques and run on Nvidia GTX1070 GPU-based system. The comparative results in terms of acc, mAP, and AUC are reported in Fig. 11. From the results in Fig. 11, we can visualize that our approach achieved an accuracy of 89.1%, mAP of 90.2%, and AUC of 84.3%, which is the highest among all the comparative methods. The introduced framework attained the average AUC of 0.843 while the comparative approaches achieved the average AUC of 0.818, hence, our method achieved a 2.47% performance gain. Similarly, the mAP value of our method is 0.902 while the average mAP value of comparative approaches is 0.647, which depicts a 25.4% performance gain. In terms of average acc, our work achieved 0.891, whereas other approaches showed an average acc value of 0.849. Therefore, we can say that our method gave a 4.1% performance gain and more robust to skin lesion classification than the comparative approaches. The stable performance of the presented framework is effective towards the low-resolution feature map generation based on the region proposals, while the methods in [51, 55,56,57] are applied directly on entire images which result in misclassification due to the presence of complex artifacts (i.e. hair, tiny blood vessels, noise, etc.). However, our method uses a region proposals network that contributes to the precise localization of the melanoma moles even for the input samples suffering from the artifacts. Hence, from the results, we can say that our approach is more accurate for skin lesion classification as compared to the other methods used for the comparison (Fig. 11).

Fig. 11
figure 11

Comparison of presented model with latest techniques

4.8 Cross dataset-validation

We finalized the evaluation analysis of our framework by applying the cross-dataset validation. By using the cross-dataset validation, we can estimate the robustness of the presented method in terms of dealing with the training and testing complexities and show its suitability for real-world scenarios. We have used the ISIC-2017 dataset [15] to perform the cross dataset-validation process. ISIC-2017 database contains a total of 2750 samples, in which 517 images are of malignant category, while the remaining 2223 samples are of benign category. The images of ISIC-2017 are diverse in terms of angles, lighting conditions, and different artifacts like the presence of dark corners, skin hair, and ruler markers, etc. To accomplish the cross-dataset validation task, we took the following scenarios: (a) training over the ISIC-2016 dataset and perform testing over the ISIC-2017 database (b) training over the ISIC-2017 dataset, and perform testing over the ISIC-2016 database. Figure 12 presented the classified visual results for the ISIC-2017 dataset. In the cross-dataset validation process, our work attained an average accuracy of 0.9125%, 0.9122% for training, and 0.8941%, 0.9070% accuracy on test sets over the ISIC-2017 and ISIC-2016 databases respectively (Figs. 13 and 14). Therefore, through cross-database validation, we can conclude that our approach can be utilized in real-world scenarios to deal with any condition of melanoma lesion to assist the dermatologists.

Fig. 12
figure 12

Classification results over ISIC-2017 dataset

Fig. 13
figure 13

Cross validation over ISIC-2017 dataset

Fig. 14
figure 14

Cross validation over ISIC-2016 dataset

5 Conclusion and future work

This paper has presented a novel approach for the automated classification of melanoma lesions by employing the Faster-RCNN deep-learning technique along with the SVM classifier. In the presented work, we have also introduced the application of Faster-RCNN for melanoma lesion classification. More specifically, we employed the Faster-RCNN for deep feature extraction and melanoma detection. Finally, we used the deep features to train the SVM for melanoma classification. The proposed method effectively localize the melanoma segment from the input image and classifies the melanoma region into benign and malignant classes. Our approach is robust to various artifacts i.e. noise, blurring, chrominance changes, light variations, melanoma size, and presence of hair and tiny blood vessels. Experimental results have confirmed that the presented framework outperforms the existing latest techniques. In the future, we plan to extend our technique to apply it to other skin diseases as well.