1 Introduction

Skin cancer occurs due to the uncontrolled proliferation of DNA structures of damaged skin cells. It is among the most pervasive cancers in the world [40]. Skin cancer is the most widespread malignant disease among white people. However, the percentage of skin cancer is increasing on a global scale [32]. Malignant skin cancer is one of the types of skin cancer that has a high mortality rate. About 55,000 deaths from melanoma are reported worldwide each year. This rate is equal to 0.7% of all cancer deaths. However, death rates vary widely from country to country. Due to excessive exposure to ultraviolet rays, the annual number of events of melanoma increased by 53% from 2008 to 2018 [14].

Although it is rare, it is the most malignant type of skin cancer and the number of deaths is increasing. The American Cancer Society reported that 100,350 new cases were detected in 2020. Also, estimated 6850 people died due to melanoma [44]. Although melanoma is one of the fatal kinds of skin cancer, early detection greatly increases the chances of survival. The low rate of disease results in limited real image data on this deadly disease. This is major handicap in the application of image processing and machine learning techniques. The first step in diagnosing a malignant lesion by a dermatologist is a visual inspection of the doubtful skin area. Accurate diagnosis is crucial because of the similarity of some lesion types, and diagnostic correctness correlates with the expert’s professional experience [20]. When skin cancer is detected early, definitive treatment is highly likely. Optical methods are available for skin cancer scanning. These methods are superficial and give a quick response. Among the non-superficial methods, the most widely used is dermoscopic scanning. Dermoscopy is an imaging technique to obtain a magnified and illuminated image of the relevant area for accurate diagnosis of the stained area on the skin. Removing the surface reflection of the skin can improve the visual effect of deeper skin levels and give a more detailed view of skin lesions. In doubtful cases, visual inspection is assisted by dermatoscopic images taken via magnifying and high-resolution cameras. In order to see the deeper skin layers, the illumination is controlled with a filter to minimize reflections on the skin during recording. Dermoscopy evaluation gives much higher accuracy than natural eye evaluation. Dermoscopic images are mostly analyzed by visual inspection. The correctness of skin lesion diagnosis can be improved thanks to this technical support [7, 8, 33, 47, 52]. The use of traditional methods like visual inspection, clinical scanning, biopsy, histopathological examination, and dermoscopic analysis of skin lesion need a high degree of skill, concentration, and time [1, 8, 52]. Even when the diagnosis of skin cancer is made by expert dermatologists, it can be erroneous due to factors such as different shapes, indistinct borders, low contrast, skin hairs, oils, and air bubbles in skin lesions. Under these circumstances, the development of rapid and high success rate computer-aided diagnostic systems for skin cancer detection and classification is becoming increasingly important. However, diagnostic accuracy can vary widely among professionals with different experiences. Consequently, there is a great interest in screening programs and the development of semi or fully automated computer-aided diagnostic systems that can be used as a second stand-alone opinion. Artificial intelligence models are the most used approaches in such computer-aided diagnosis systems [11, 12, 15]. Especially in 2012, the use of deep learning approaches in medical image classification has increased after the success of the model named AlexNet by Krizhevsky et al. [27] in the ImageNet 2012 competition.

Deep learning models have been frequently used in the classification of skin cancer images. Brinker et al. [6] split skin cancer images into two classes (Melanoma and Nevus). In the ResNet-50 model, instead of a constant learning rate, a distinct learning rate is used for each layer of the model. In addition, new methods based on the cosine function have been used to reduce learning rates. A sensitivity rate of 82.3% was achieved with this method. Hosny et al. [24] augmented the data set using the data augmentation technique for each image in the data set and used the transfer learning approach. With this approach, a classification accuracy of 95.91% was achieved. Esteva et al. [13] divided the dataset into two classes. In the preprocessing step, the images in the data set were processed with Gaussian filtering. With the deep learning model called AdNet, 87.81% accuracy was achieved. Nugroho et al. [34] divided the HAM10000 skin cancer dataset into seven classes. The designed and trained Convolutional Neural Networks (CNN) model reached 78% classification accuracy. Alqudah et al. [5] used pre-trained AlexNet and GoogLeNet models to recognize three classes of skin cancer images. The data set is divided into two formats, the unsegmented data set and the segmented data set. The classification accuracy was 89.8% for the unsegmented dataset and 92.2% for the segmented dataset. Moataz et al. [30] proposed to use a pre-trained model with fine-tuning on Xception to recognize seven classes of skin cancer images. They executed data augmentation to improve the model performance. They used 30,294 images for training, 7,574 images for validation, and 7,714 images for testing. They obtained 96% of the average accuracy of classification for the augmented and balanced HAM10000 dataset. Chaturvedi et al. [10] proposed to use fine-tuning on Xception architecture for HAM10000 dataset (total 10,015 images, 8,912 for training and 1,103 for validation). They modified the Xception model that has a dense layer with ‘Relu’ activation, softmax layer (for seven classes), and Adam optimizer. The proposed method detected cancer with 91.47% accuracy. Aldwgeri and Abubacker [3] used and modified deep learning models (VGG16, VGG19, ResNet50, DenseNet121, InceptionV3, and Xception) to classify skin lesions for HAM 10,000 dataset (for unbalanced balanced and datasets). The highest reported accuracy was 80% for this ensemble model. Kassani et al. [26] studied different deep learning models to detect melanoma on the augmented HAM 10,000 dataset. They reported in this study that the highest accuracy was 92% with ResNet50 and 90% with the Xception model. Cengil et al. [9] used Alexnet and Resnet architectures and created hybrid architectures with these two models. Instead of Softmax classifier in the last layer, they used decision tree, kNN and SVM for classification with Alexnet and Resnet. The highest reported accuracy was 77.8% with Alexnet+SVM on HAM10000 dataset.

Literature review denoted that although there are different diagnosis and classification methods for skin cancer, there are still many gaps that need to be addressed, for instance, complex configuration, higher complexity of some studies, and less accuracy. It can be said that most of the skin lesion diagnosis systems in the literature give reasonable classification results to distinguish malignant melanoma from benign lesions. The performance of most machine learning techniques depends on the selected features characterizing the cancerous region and requires high computation time. Most of these studies were trained on a set of handcrafted features from images and used simple classifiers. With deep learning techniques and CNNs, impressive results have been achieved in image classification to perform skin lesion analysis and automatic diagnosis of cancer types. In skin lesion classification, transfer learning techniques were used to reduce computational and memory requirements, and data augmentation techniques were used to overcome the lack of data. Rather than training a CNN from scratch, which requires large amounts of data and cost of high computation, it is computationally efficient to use a pre-trained CNN architecture (e.g. AlexNet, DenseNet, Inception, and ResNet) and fine-tune its performance to speed up the process.

The scope and contributions of this study could be summarized as follows:

  • In this study, effective data augmentation and a pre-trained deep learning approach are proposed for skin lesion classification.

  • A hybrid network model called the Inception-Resnet-v2 is proposed to classify skin cancer images.

  • By applying the affine transformation technique, the number of images in the dataset has increased to analyze its effect on the accuracy of skin cancer classification.

  • Performance comparison of the proposed method with other pre-trained methods is performed on an augmented skin cancer dataset.

The rest of this study is carried out as follows. In Section 2, the material and method are presented. In this section, the studied original and augmented datasets, and pre-trained Inception-Resnet-v2 architecture are explained in detail. Experimental results are given in Section 3, and the discussion and conclusion part of this study is given in Section 4.

2 Material and method

Deep learning-based models have recently been performing above human-level accuracy in classification tasks [49]. There exist a great impact of hyper-parameter on the performance of these models. Furthermore, the size of the dataset on which deep models are trained has a great impact on performance.

In this study, effective data augmentation and a pre-trained deep learning approach are proposed for skin lesion classification. Figure 1 shows the general flowchart of system design. In this study, a hybrid network model called the Inception-Resnet-v2 is proposed to classify skin cancer images. We have increased the number of images in the dataset by applying the affine transformation technique and analyzing its effect on the skin cancer classification system. The datasets used in this study are defined below as the original and augmented datasets. The original dataset contains images that have not applied any preprocessing. The augmented dataset consists of the images in the original dataset and the new images obtained by applying the affine transform to the related images.

Fig. 1
figure 1

General flowchart of system design

2.1 Original dataset

In this study, the public skin cancer MNIST HAM10000 dataset [48] was used to classify skin cancer. The dataset consists of samples of pigmented lesions from distinct populations, as shown in Fig. 2. The classes and their types of data contained in the dataset are given in Table 1. As shown in Table 2, there exist images belonging to seven distinct classes (as shown in Table 1) in the dataset. It contains an extensive catalog of multi-source dermatoscopic images of pigmented injuries. In the original dataset, there are 10015 skin-threatening dermatoscopic images from different classes gathered from different sources. Furthermore, these images have a size of 600x450 in the RGB format. Then they are rescaled to 224x224 pixels for model.

Fig. 2
figure 2

Some example images from skin cancer dataset

Table 1 The classes of the skin cancer MNIST HAM10000 dataset
Table 2 The number of data before and after data augmentation process

2.2 Augmented dataset

The biggest problem that machine and deep learning algorithms face is that there is not enough data to train the model. The lack of sufficient data creates an overfitting problem. This is a big problem that often occurs in these algorithms. This event causes the network to memorize the training data, and it fails when it encounters an input other than the training data. One of the most important methods of getting rid of this problem is data augmentation. This method is applied to the training set, and many images are obtained artificially by changing the properties of the available data [4].

The size of datasets affects deep learning and classification models. Creating a skin cancer dataset from scratch is a difficult and time-consuming problem. Also, there are more images in the dataset in some classes than in others, as shown in Table 2. Dealing with unbalanced data can lead to a lower performance of the minority class. This situation can lead to data misclassification in the most machine and deep learning approaches. The aim of our study is to generate new images from existing images with the affine transformation technique and analyze the effect of these images on skin cancer classification. At this stage, we have performed data augmentation to improve the performance of the Inception-ResNet model. The affine transformation could be brightness, rotation, shift, flip, and zoom for image data augmentation [19, 50]. In this section, random rotation augmentation technique is used to enlarge the size of the dataset.

Rotation augmentation is obtained by randomly rotating the image clockwise, a certain number of degrees from 0 to 360. Rotation returns pixels from the image frame, leaving the areas of the frame without pixel data that needs to be filled in. The figure shows random rotations applied to the image between 0 and 90 degrees. In this stage, each image in each class except nv class is randomly rotated nine times, as shown in Fig. 3. Therefore, when the original data was added to the image, the data of each class increased ten times, as shown in Table 2. At the end of this process, the number of images in the data set increased from 10,015 to 39,787 as shown in Table 2. Dataset image distribution before and after augmentation are also given in Tables 3 and 4. The dataset is split as 70% testing, 10% validation, and 20% testing as shown Tables 3 and 4.

Fig. 3
figure 3

A sample original image (a) from vasc class and randomly rotated nine images, (b-j) augmented images

Table 3 Data split and dataset image distribution before augmentation (original dataset)
Table 4 Data split and dataset image distribution after augmentation (augmented dataset)

2.3 Method

In this study, a hybrid network called the Inception-Resnet-v2 is proposed for skin lesion classification, which is composed of the Inception and the Residual modules. A pre-trained model has already been trained on a dataset and includes biases and weights. The model represents the features of the dataset on which it was trained.

As shown in Fig. 4(a), the Inception network has many convolution kernels at different sizes to improve the adaptability of the network and extract many features of representations. Through the Inception network structure, the parameters of the model are reduced. Thus, the network does not lose the model feature representation. As a result, the number of convolution kernels is reduced as much as possible. Figure 4(b) shows a residual network structure. In order to make network training and parameter optimization quickly, signals of different units and layers can be transmitted directly forward and backward to any layer. The residuals are necessary for deep network to prevent the problem of degradation. Now we can say that using residual networks aids the network learn both depths and weights at the same time. We also provide that the new layer (l+ 1) learns something new by ensuring the output of the previous layer (l) without making any changes to the output of the current layer (l+ 1). Thus, this technique overcomes both degradation and vanishing problems in very deep networks. The number of feature maps of xi may differ from that of the feature map in the residual convolution network, so it is required to use 1 × 1 convolution to increase or decrease the dimension. Meanwhile, the residual operation is stated by (1), (2), (3) as follows [36, 51]:

$$ F(X_{i}) = X_{i} *w + \alpha $$
(1)
$$ Y_{i} = R(F) + h(X_{i}) $$
(2)
$$ X_{i + 1} = R(Y_{i}) $$
(3)
$$ R(z) = \max (0,z) $$
(4)
$$ R(z) = \left\{\begin{array}{l} 0,z < 0 \\ z,z \ge 0 \end{array}\right\} $$
(5)
Fig. 4
figure 4

a) An Inception building block [46] b) A residual building block [22]

In (1); Xi is the input; w is the weight; α is the offset; F(Xi) points to the convolution operation. In (2); R is the ReLU function; h(Xi) is a basic transformation for the Xi input; Yi is the sum of two branches. In (3); Xi + 1 is the final output of the residual module. There exist many different activation functions that could be used. The three most common activation functions are the tanh, sigmoid, and rectified linear unit (ReLU) function. In this study, ReLU activation function is used as (4). Because, ReLU is simple and increases the nonlinearity and prevents network saturation. Especially, ReLU has a good effect because it removes vanishing gradients and is utilized in hidden layers. But, the weak point here is dead neurons. ReLU thresholds all negative values to zero, and its positive side has a fixed gradient of 1 as (5). While z ≥ 0, R(z) = z, and its lead is 1; while z < 0, R(z) = 0, with a lead of 0. As a result, The ReLU will not be saturated on the positive side. However, the gradient of ReLU with respect to the input is zero on the negative side. This means that the gradient flow to the neurons will always be zero because the ReLU neuron starts producing a negative output. Therefore, due to the zero gradient, the weight of the neuron can never be optimized [51].

$$ \frac{\partial X_{n}}{\partial X_{i}} = \frac{\partial X_{i} + F(X_{i},\omega_{i},\alpha_{i})}{\partial X_{i}} = 1 + \frac{\partial F(X_{n},\omega_{n},\alpha_{n})}{\partial X_{n}} $$
(6)

The aim of utilizing a residual network learning unit is to prevent the problem of the gradient disappearing entirely while training the Inception network model. When the performance of the network model achieves an exact saturation, the residual network layer could be mapped in the same way. This enables the training network to converge faster and easier. From shallow i layer to deep n layer for the learning characteristics with (6), we understand that no matter how deep the network layers are, the gradient will never reach zero. In (6); Xi points the input of the i th residual unit, and Xn points the input of the n th unit, and F(.) is the residual function [51].

In this study, it is proposed to classify skin cancer images with Inception-ResNet-v2 architecture as shown in Fig. 5. Inception-ResNet-v2 is a model created by combining Inception and ResNet architectures with improved recognition and classification performance. The Inception and Residual modules use benefits of each other to enhance the detection accuracy and decrease the number of computations. Inception-ResNet-v2 has been trained on the ImageNet database with too many images and is a convolutional neural network (CNN). This CNN network has 164 layers and is able to split images into a thousand object categories like keyboard, pencil, mouse, and many animals [53]. Consequently, the network learned about the plentiful feature representations for a wide variety of images. ResNet and Inception give boosting performance in image recognition with low computational cost when it is compared to other models. ResNet architecture is about growing deep, while Inception is about growing wide. Therefore, with the Inception-ResNet-v2 architecture, we can achieve the optimum result in going both deep and wide. Inception-ResNet-v2 is a CNN algorithm based on the Inception architecture and includes residual connections. The connections now allow for shortcuts while the model is being trained. Thus researchers are able to set up deeper neural networks for better performance. This also provides a significant simplification of the initial blocks. This structure allows optimization of the residual layer by changing the size of the first convolution operation to 1 × 1. The process of transferring the previous activation value to the output also continues even if learning stops [17, 22].

$$ f(\overset{\to}{z})_{i} = \frac{e^{z_{i}}}{\sum\limits_{j = 1}^{K} e^{z_{j}}}, for i = 1.....k and z = z_{i}.....z_{k} $$
(7)
Fig. 5
figure 5

Proposed Inception-Resnet-v2 architecture for skin lesion classification

In this study, Inception-ResNet-V2 model has 54,339,810 total parameters which are 54,279,266 of trainable and 60,544 of non-trainable. As shown in Fig. 5, the top layers of method contain a global average pooling layer, a FCL (fully-connected layer) that has 1024 neurons with ReLU activation function, and the end of the layer the neurons that provide classification in each of the seven classes with Softmax activation function. When there are multiple classes to be predicted, the Softmax activation function is usually used. For k classes, Softmax is calculated by (7). In this equation, \(\overset {\boldsymbol {\to }}{\boldsymbol {z}}\) is input vector to the Softmax function, all the zi values are elements of the input vector, \(e^{{z_{i}}}\) standard exponential function of each element in the input vector, and \({\sum }_{j = 1}^{K} e^{z_{j}}\) normalization term at the bottom of the formula [31].

Global average pooling is a process that calculates the average output of the feature map in the previous layer. There is a global average pooling layer that precedes the fully connected layer at the end of the network to obtain the deep features, as shown in Fig. 5. This fairly simple operation significantly reduces the data, preparing the model for the final classification layer. In the global average pooling process, overfitting is prevented by taking the average value of the feature map. A dropout layer was used to reduce overfitting during the training process. It is the elimination of some memorizing nodes in the network to prevent the memorization of the network. Thus, the memorization of the network is tried to be eliminated. The dropout layer is a flatting layer for fully connected layers. The dropout operation increases the neural network’s ability to be flattened [28]. With the dropout process, a random zero weight value was assigned to the neurons in the network. The dropout ratio for this process was determined as 0.5. Thus, it is ensured that the model becomes resistant to small changes in the input and achieves a higher accuracy rate.

3 Experimental results

We compared the proposed Inception-Resnet-v2 method with the others, trying to find the best accuracy value according to different deep networks. In this study, we ran each method ten times on different training and test sets. Each method was run for 100 iterations in training. We performed all comparisons on the same machine and on the same data set. We recorded and compared the runtime of all methods in the experimental results. We used the classification report tool in the Python-Sklearn library to evaluate classification performance. In order to obtain the best accuracy in classification, we observed and compared the methods on the same machine by recording execution time under the same conditions. All experiments conducted in this study were developed using the Python 3.10.0 Jupyter Notebook development environment on a computer with an CPU with i7 (8700U) @ 3.20 GHz processor, 4 GB graphics card, and 16 GB primary memory hardware. In addition, the Keras and Tensorflow libraries are used.

In preprocessing stage, all images are uniformly rescaled to 224 × 224 to reduce the computational load. In experimental studies, the data is split into 70% training, 10% validation and 20% testing. We settled the best parameters through the quantitative experiments, and then skin lesion classification have been carried out according to those parameters. Our main aim is to demonstrate the functionality of the proposed model on large skin cancer data set by comparing it with well-known pre-trained deep learning models. Model evaluations are carried out using a running average of the parameters calculated over time. In this study, in order to make an acceptable contrast between the various approaches to application configurations, we decided to adjust the basic parameters throughout all the studies. There are many hyperparameters that help to adjust the accuracy of the approximation. In this section, we have performed experimental setup throughout with model hyperparameters. We have tuned hyperparameters according to the accuracy of the model’s experimental results. We utilized ‘Adam’ as an adaptive optimizer. We can say that Adam is one of the time-efficient and important optimizers for deep networks. With learning rate = 0.0001, this optimizer uses ‘categorical crosss-entropy’ as a loss function. We trained the models for 100 epochs with a 128 batch size. We used dropout (0.5) to generalize the network. Furthermore, tuning training parameters are; learning rate η = e− 5, β1 = 0.9, β1 = 0.999, ε = e− 8 dropout rate (0.5) and batch size (128) are set respectively. The momentum rate (0.9) and the weight decay parameters (e− 5) are set respectively. We set regularization parameter to be 0.0001 to prevent overfitting.

At this stage, the performance of the method was evaluated with different evaluation criteria. The performance of the method is measured using four performance criteria that are recall, precision, F1-score, and accuracy (Acc). The confusion matrix gives these values for each class (6=vasc, 5=nv, 4=mel, 3=df, 2=bkl, 1=bcc, 0=akiec). The performance of the method was evaluated according to the accuracy value calculated over the confusion matrices for the original and augmented datasets as shown in Fig. 6. For example, performance records of the Inception-Resnet-v2 model are given in Table 5 for the original dataset. The calculated scores for recall, F1-score, precision for each class, and obtained average results are presented in this table. According to this experimental study, the accuracy value obtained with the Inception-Resnet-v2 model is 83.59% for the original dataset. Similarly, performance scores of the Inception-Resnet-v2 model are given in Table 6 for the augmented dataset. Obtained accuracy value with the Inception-Resnet-v2 model is 95.09% for the augmented dataset. A comparison of overall accuracy rates from original and augmented datasets with Inception-Resnet-v2 model is also shown in Table 7.

Fig. 6
figure 6

Confusion matrices of the Inception-ResNet-v2 model from original and augmented datasets

Table 5 Performance scores of the Inception-Resnet-v2 model for original dataset
Table 6 Performance scores of the Inception-Resnet-v2 model for augmented dataset
Table 7 Comparison of overall accuracy rates from original and augmented datasets with Inception-Resnet-v2 model

In Fig. 7(a), graph of training/test accuracy and graph of training/test loss for 100 iterations of the InceptionResNetV2 model are given for the original dataset. Similarly in Fig. 7(b), the graph of training/test accuracy and the graph of training/test loss for 100 iterations of the Inception-Resnet-v2 model are also given for the augmented dataset. Both the test and training accuracy curves indicate that when iteration number increases, learning occurs, and test accuracy gives successful results as shown in Fig. 7(a) and (b). Both the test and training loss curves indicate that when iteration number increases, learning occurs, and test error rate decreases as shown in Fig. 7 (a) and (b).

Fig. 7
figure 7

Accuracy and loss graphs of the Inception-Resnet-v2 model for (a) original dataset and (b) augmented dataset

In addition, the performance comparison of the proposed InceptionResNetV2 model with other pre-trained models in terms of accuracy was performed in this study. Table 8 shows the performance comparison of the different pre-trained methods on the augmented skin cancer dataset. These pre-trained methods are VGG16, VGG19, SqueezeNet, LeNet-5, AlexNet, and an established deep CNN model. The deep CNN model consists of four sequential convolution pooling layers, one flatten layer, three fully connected layers, and a softmax classifier. We compared the proposed InceptionResNetV2 method in this study with the others, trying to find the best accuracy value according to different deep networks and trainable parameters. Other studied existing pre-trained models in experimental studies with low accuracy are not included in this table. The proposed InceptionResNetV2 model achieved the highest accuracy with 95.09% among all other methods as shown in Table 8 and the boxplot graph in Fig. 8. Performance comparison of these methods is also shown by using boxplots, as shown in Fig. 8. In this study, we ran each method ten times on different training and test sets. Each method was run for 100 iterations in training. The average accuracy value of each method was calculated. The boxplot graph was formed with the min-max and average values of each method.

Table 8 Performance comparison of the different pre-trained methods on augmented skin cancer dataset
Fig. 8
figure 8

Boxplot graph of studied pre-trained methods for accuracy

In addition, the execution times of different pre-trained methods are analyzed by taking into account the training time over approximately 30k images for 100 iterations. Depending on the complex configuration of the methods, the depth of the network and the number of trainable parameters; high computation times can be obtained. The execution time of the InceptionResNetV2 model is approximately recorded as 1h 50 min 4 sec as shown in Table 8. According to the running analysis, the lowest execution time was obtained during the training period with the InceptionResNetV2 model, and the lowest system response in the test evaluations was obtained with this model. Furthermore, Table 9 shows the performance comparison of similar studies on the MNIST HAM10000 dataset recently. By researching the best practice methods in the literature, we applied an effective data augmentation and a pre-trained deep learning approach to get a high classification accuracy. The classification accuracy for the augmented dataset is achieved to 95.09% by effective data augmentation and pre-trained deep learning approach. In comparison, the InceptionResNetV2 model gives 83.59% accuracy with the original dataset, as shown in Table 9.

Table 9 Performance comparison of similar studies in the literature on MNIST HAM10000 dataset

4 Discussion and conclusion

Skin cancer is a serious disease that is common and causes death if left untreated. If skin cancer is not diagnosed early, it can lead to fatal cases.Dermatoscopic images are of great importance in the early diagnosis of skin cancer. When skin cancer is detected early from dermatoscopic images, definitive treatment is highly likely. The low rate of disease results in limited real image data on this deadly disease. This is a significant handicap in the application of deep learning techniques. Thus, we have increased total number of images in the dataset using by the data augmentation technique. Deep learning-based models have recently been performing above human-level accuracy in classification tasks. There exist a significant impact of hyper-parameter on the performance of models. Furthermore, the size of the dataset on which deep models are trained has a significant impact on performance. The biggest problem encountered in the machine and deep learning algorithms is that there is not enough data to train the model. The lack of enough data creates the overfitting problem, a big problem that frequently occurs in these algorithms. This event causes the network to memorize the training data, and fails when it encounters an input other than the training data.

In this study, data augmentation is carried out to the training set, and more images are obtained artificially by changing the properties of the available data. In this context, effective data augmentation and pre-trained deep learning approach are proposed for skin lesion classification. A hybrid network model called the Inception-Resnet-v2 is proposed to classify skin cancer images. The aim is to increase the number of images in the dataset by applying the affine transformation technique and analyzing its effect on the skin cancer classification system. Creating a skin cancer dataset from scratch is a complex and time-consuming problem and many datasets are unbalanced. Dealing with unbalanced data can cause lower performance of the minority class. This situation can lead to data misclassification in most deep and machine learning approaches. Our study aims to generate new images from existing images with the affine transformation technique and analyze the effect of these images on skin cancer classification. Thus, we have performed data augmentation to improve the performance of the Inception-ResNet model. ResNet and Inception give boosting performance in image recognition with low computational cost when it is compared to other models. ResNet architecture is about growing deep, while Inception is about growing wide. Therefore, with the Inception-ResNet-v2 architecture, we can achieve the optimum result in going both deep and wide. The highest reported accuracy in this study with an augmented dataset is 95.09% for the Inception-Resnet-v2 model while the same model achieved 83.59% with the original dataset.

Although these pre-trained deep learning models can be utilized to solve many important problems, their usage is still seriously criticized. Since it is extremely difficult to determine which data descriptors are the most sufficient to represent a particular phenomenon of special interest. Classifiers could be unsuccessful when there are too many variables and there exists a high correlation relationship between these variables. At this stage, vectors belonging to the representation set are kept at lower dimensions and the number of random variables is reduced by dimension reduction techniques. Dimension reduction is a preprocessing step in machine learning to eliminate unwanted features and improve learning accuracy. There are methods of data representation, each of which has its own advantages to reduce redundancy characteristics. In addition, imbalance data and high dimensionality common problems in pattern recognition and machine learning. Imbalance is a major problem in classification, and this process becomes more complex when the dataset has numerous features. For attribute selection, the traditional classification generally prefers the majority class. This situation leads to poor performance for parameter setting or the selection of attributes that better define the majority class [2, 16, 41]. In order to solve the imbalance problem, in data-driven methods, the expected balance is tried to be achieved by reducing the majority class data. Another data-driven method is to generate data from the minority class distribution [54]. In this context, data from the minority class (for example akiec, bcc, df, vasc ) has been augmented by using the data-driven method in this study. When the studies on this topic are examined, Roccetti et al. [41] modified the training strategy by re-evaluating categorical data in the light of the Pareto analysis approach. They have developed a tool that gives a new shape to the dataset based on the Pareto rule. In this way, they used these categorical descriptors as a tool, not as an input, to train their deep learning model. With this data arrangement, they developed a more efficient deep learning model. Akram et al. [2] proposed a new framework for classification of skin lesion that incorporates in-depth feature information to generate the best distinguishing feature vectors while maintaining the original feature domain. To select distinctive features and reduce dimensionality, they used entropy-controlled neighbor component analysis. To test the success of the method, they examined the accuracy success with different classifiers. Fattahi et al. [16] proposed a hybrid method that performs the process of feature extracting and selecting concurrently to reduce the data dimensionality in the shape of a cost-sensitive optimization problem.

As a result, our main aim is to demonstrate the functionality of the proposed model on a large skin cancer data set by comparing it with well-known pre-trained deep learning models. This model assembles the advantages of Inception and Residual module, expanding the network width and lightening the training problem of the deep network. Since, these modules can benefit from each other to increase accuracy of detection and reduce the total number of calculations. Residual connections have been observed to significantly increase the training speed of the Inception architecture. We have achieved high accuracy by building both deep and wide networks through the proposed model on the augmented dataset. To the best of our knowledge, there exist limited studies that combine effective data augmentation and Inception-Resnet-v2 model to increase the accuracy of skin lesion classification and compare it with other pre-trained deep learning models. In further studies, we will try to construct a cost-sensitive model of misclassification of minority class data by proposing a new cost-sensitive or model-based method. We will also expand this work on effective dimension reduction on high-dimensional data. In addition, other deep learning models and hybrid methods will be studied, and comparisons will be made on the augmented datasets, which include more affine transformation techniques.