Introduction

The skin of human body is the major organ, with a total 20 square feet covered area. The skin protects us from heat and cold and helps regulate body temperature. The disease related to skin is called erythemato-squamous disease. Identification and diagnosis are difficult in erythemato-squamous disease, because all the classes contribute to the same clinical properties: scaling and erythema, with minute changes. The different classes of erythemato-squamous disease are C1: psoriasis, C2: seborrheic dermatitis, C3: lichen planus, C4: pityriasis rosea, C5: chronic dermatitis, and C6: pityriasis rubra. Basic treatment and diagnosing these skin diseases is biopsy. In initial stage, a skin disease of one class may also contain the symptoms of other classes of the skin disease, which is the main problem of dermatologists when they perform the diagnosis of skin diseases. In the beginning, new patient was examined with 12 clinical attributes and if the symptom of disease is found, then the examination of 22 histopathological attributes was done for obtaining skin disease parameters. Histopathological attributes were examined by analyzing the parameters with the help of microscope [1].

The expert system developed with the help of machine learning methods for generating decisions in the medical applications is increasing, as data becomes effortlessly obtained from internet. In the past few decades, an extensive growth has been seen in medical field for predicting the disease with the help of dataset obtained from the previous patients by observing exterior symptoms and using laboratory tests without detailed internal examinations. The applications of machine learning techniques in medical decision support are now proving fruitful for those patients which have not enough resources to do medical tests. Machine learning algorithms are now assist doctors for finding the help to diagnose better. Machine learning methods help doctors in compiling complex diagnostic tests, finding information from various sources (images, clinical data, and scientific knowledge) [2].

Various works were done on the erythemato-squamous disease by different authors. A proposed model FELM for automatic detection of skin diseases combined the fuzzy logic approach and the machine learning techniques. The performance obtained by FELM is superior than other methods in case of accuracy prediction and time complexity. The accuracy obtained by FELM model was calculated 93% [3].

Badrinath et al. [4] uses introduced a hybrid Adaboost ensemble method for prediction of erythemato-squamous disease. Machine learning classifiers neural networks, support vector machines (SVMs), and ANFIS are used, then ensemble method AdaBoost is discussed, and 99.3% of classification accuracy is achieved. Therefore, the proposed hybrid models with AdaBoost used to predict erythemato-squamous disease with higher accuracy as compared to others.

Verma et al. [5] use six machine learning classification techniques to obtain the prediction of erythemato-squamous disease, and different ensemble methods Bagging, AdaBoost, and Gradient boosting are used to enhance the accuracy. A feature importance technique to choose relevant features is also used on erythemato-squamous disease optimal data subset and achieves accuracy of 99.68% by gradient boosting applied on radius neighbor classification.

A two-stage mixed feature selection technique is used to diagnose skin diseases. SVMs are used as classification tools, extended sequential forward search, sequential forward floating search, and sequential backward floating search as searching tools. Normalized F scores are used to find the significance of each attribute. The developed model was claimed to achieve higher accuracy as in comparison to previous studies [6].

Übeyli and Doğdu [7] experimented with five classes of erythemato-squamous disease (excluding pityriasis rubra class) and the k-means clustering classifier was developed to find the erythemato-squamous diseases with 33 attributes and five classes. The classification accuracy of the k-means clustering was calculated as 94.22%. The k-means clustering algorithm can be used considering the misclassification rates in finding erythemato-squamous diseases.

A computer-aided model was developed for the analysis of skin disease. In this model, 22 attributes are used extend it with other parameters. This expert system predicts eight different skin diseases classes and can help doctors for predicting the disease confidence. The data for the study was collected from a limited region which will help to predict the demographic dependence of the disease. The best prediction model was obtained with algorithm developed using J48 [8] .

Ozcift and Gulten [9] discussed a rotating forest integrated decision tree that encapsulates the best first search strategy. The wrapper uses positive selection to select the best subset on the erythematous squamous disease dataset. Machine learning techniques are used to calculate the discriminative power of selected features and a bagging algorithm is used to assess the diversity of training data. Based on the characteristics of the rotation forest integration algorithm, the accuracy reaches BNET 98.91% and SL 98.64%.

In order to find the optimal feature value of KSVM for diagnosis of erythematous squamous disease, a new method based on catfish binary particle swarm optimization algorithm was proposed. In order to obtain an optimal subset of features, the AR method is applied as feature reduction in both the training and testing phases. In addition, considering the RBF kernel can improve the performance of the classification. Compared with other methods such as AR-MLP and pure SVM, the experimental results show that the proposed method is more accurate and can achieve an accuracy of 99.09% [10].

Using artificial neural networks (ANN), a framework for diagnosing erythematous squamous disease was developed. The developed system is capable to achieve a high success rate using the artificial neural network technique. Accuracy of proposed model was achieved 90% [11].

The new mixed feature selection technique obtained better classification accuracy with 10 attributes out of 34 attributes from erythemato-squamous diseases dataset. The optimal feature subset then trained with different machine learning techniques and accuracy obtained by decision tree (CART) 95.62%, RBF neural networks 97.26%, SMO-poly kernel 98.36%, and RBF kernel 98.08% [12].

Almarabeh and Amer [13] discussed application of different machine learning techniques for prediction of various type of disease. They calculated the accuracy for heart disease, breast cancer, lung cancer, and diabetes, and in skin disease, the best accuracy is ANN (97.17%).

Maryam et al. [14] proposed an erythematous squamous disease detection model using mixed feature selection and multi-class support vector machines. The combination of these two methods takes advantage of the filter and wrapper approach, in which a level-based chi-square is used as an evaluation criterion and a genetic algorithm is determined to find the best feature subset. Training and test set for selecting features through multiple types of SVMs. The results show that the proposed model has 18 features and has a high accuracy rate (99.18%).

Different machine learning techniques for skin disease prediction used are correlation and regression tree (CART), SVM, decision tree (DT), random forest (RF), and GBDT to obtain the prediction of skin disease. The best accuracy find is 95.90% from GBDT. A multi-model ensemble method is then applied to combining these five data mining technique to get the highest accuracy of 98.64% [15].

A model based on logistic regression, linear discriminate analysis, k-nearest neighbor, classification and regression tree, Gaussian Bayesian, and support vector machine was developed to classify erythemato-squamous disease. Then, four different ensemble machine learning algorithms boosting AdaBoost and gradient boosting and random forests and extra trees are used to improve the performance of model. The maximum accuracy achieved was 98.64% in gradient boosting [16].

Machine learning algorithms and their abbreviations used in this study are shown in Table 1

Table 1 Machine learning techniques and their abbreviations

Methods

In this study, we applied three different feature selection methods: (1) univariate feature selection, (2) feature importance, and (3) correlation matrix with heat map on skin disease dataset obtained by UCI machine learning repository. Skin disease dataset contains of 34 variables; by applying these three feature selection techniques, we obtained 15 most important features (attributes) and obtained new optimum subsets of skin disease dataset. Then, we have applied four machine learning tetchiness (1) Gaussian Naïve Bayesian classifier, (2) decision tree classifier, (3) support vector machine, and (4) random forest classifier to train the optimum subset of skin disease dataset. The four machine learning algorithms predictions are then improved using stacking ensemble methods.

The predictions obtained by three different feature selection methods are compared to choose the best feature selection techniques and best prediction accuracy. The whole proposed methodology used in this research paper is described in Fig. 1.

Fig. 1
figure 1

System framework of the proposed system

Dataset Analysis

The database is taken from the UCI Machine Learning Repository (http://archive.ics.uci.edu/ml). The database was collected for examination of skin diseases and to classify various types of erythematous-squamous diseases. The database contains 34 attributes, 33 of which are linear, and 1 attribute is nominal. The family history (f11) is set to 1 if any diseases are found in the family of patient, and if not found in the family, then the value is 0. The all other remaining attributes (both clinical and histopathological) were assigned a value from the range 0 to 3 (0 = for absence of disease; 1, 2 = comparative intermediate values for disease; 3 = highest value). There are six classes of erythemato-squamous disease, with 366 instances and 34features as shown in Table 2.

Table 2 Erythemato-squamous disease dataset [1]

Feature Selection

Machine learning algorithms works rule is defined as, if we send garbage in, we will only let the garbage out. Garbage means error in data or unnecessary number of features. Feature selection is especially more important when the number of features is high. We need not use every attribute for applying any classification algorithm. We can measure the efficiency of algorithm by putting only those features that are really important for prediction. It has been mentioned in many research papers that feature subsets provide better prediction than complete feature datasets of the same algorithm. But if the dataset already contains only important attributes, then feature selection method is not necessary and the use of feature selection technique may not give better results.

The main reasons to use feature selections are:

  • It reduces the classification algorithm training time.

  • It reduces the model complexity and make easy to interpret.

  • It enhances the prediction accuracy of classifier algorithm if the accurate subset is selected.

  • It decrease over-fitting.

In this paper, we have used three feature selection techniques to select 15 important features for training out reduced subsets of main dataset. These techniques are described below.

Univariate Feature Selection

Statistical tests can be used to select those attributes that have the significant relationship with the target attribute. This univariate feature selection method uses the chi-squared (χ2) test for non-negative attributes to select 15 best features from the skin disease dataset.

Chi-square test is applied for categorical attributes in a dataset. Chi-square values are calculated between each attribute and the target attribute, and we can choose required number of attributes with best Chi-square values. Chi- square score is given by the following:

$$ {\chi}^2=\frac{{\left( Observed\ frequency- Expected\ frequency\right)}^2}{Expected\ frequency} $$

Here:

Observed frequency = Number of observations of a class

Expected frequency = Number of expected observations of a class if no relationship exists between the attribute and the target attribute.

The important 15 features obtained by univariate feature selection method and their chi-square values are shown in Table 3.

Table 3 Univariate feature important table

Feature Importance

The importance of the feature is calculated by reducing the impurity of the node and weighting it to obtain the probability that the impurity will reach the node. The probability of a node is calculated by dividing the number of observations arriving on the node by total number of observations. The high value of a feature shows the importance of that feature in feature importance method. Then, the importance of each attribute on the decision tree is evaluated as follows:

$$ {fi}_i=\frac{\sum \limits_j{ni}_j}{\sum \limits_k{ni}_k}\kern1.5em \mathrm{where}\ j=\mathrm{node}\ j\ \mathrm{split}\ \mathrm{on}\ \mathrm{feature}\ i\ \mathrm{and}\ \mathrm{k}\upepsilon\ \mathrm{all}\ \mathrm{nodes} $$
$$ \mathrm{where}\ {fi}_{\mathrm{i}}=\mathrm{the}\ \mathrm{importance}\ \mathrm{of}\ \mathrm{feature}\ i $$
$$ {ni}_j=\mathrm{importance}\ \mathrm{of}\ \mathrm{node}\ j $$

These values are then normalized by dividing the sum of all feature importance values between 0 and 1:

$$ \operatorname{norm}{fi}_i=\frac{fi_i}{\sum \limits_j{fi}_j}\kern1em \mathrm{where}\ j\epsilon\ \mathrm{all}\ \mathrm{features} $$

At the “random forest” level, the feature importance of the last element is the average of all trees. Calculate the sum of the importance values of the features on each tree and divide by the total number of trees:

$$ \mathrm{RF}\ {fi}_i=\frac{\sum \limits_j\mathrm{norms}\ {fi}_{ij}}{T}\kern0.5em \mathrm{where}\ j\epsilon\ \mathrm{all}\ \mathrm{trees} $$

where RF = the importance of feature i calculated from all trees in the Random Forest model

$$ \sum \limits_j\mathrm{norms}\ {fi}_{ij}=\mathrm{the}\ \mathrm{normalized}\ \mathrm{feature}\ \mathrm{importance}\ \mathrm{for}\ i\ \mathrm{in}\ \mathrm{tree}\ j $$

T = total number of trees

The top most important 15 features obtained by feature importance method and their node probability using random forest values are shown in Fig. 2.

Fig. 2
figure 2

Important attributes using feature importance

Correlation Matrix with Heat Map

Correlation shows how the attributes are associated to each other or the target attribute. Correlation can be positive (increase in value of an attribute increases the value of the target attribute) or negative (increase in value of a attribute decreases the value of the target attribute). It is easy to recognize which attributes are most related to the target attribute using heat map. The values of correlation coefficient lie between − 1 and 1.

  • The value near to 0 shows weak correlation (0 means no correlation)

  • The value near to 1 shows strong positive correlation

  • The value near to − 1 shows strong negative correlation

The heat map is shown in the Fig. 3.

Fig. 3
figure 3

Heat map

We will only choose those attributes which have value of correlation coefficient more than 0.4 (absolute value only) and the best 15 features are listed in Table 4.

Table 4 Correlation and heat map feature important table

Machine Learning Classifiers

Ensemble methods are used to combine multiple classifiers predicted values into a single value to generate a strong model. Mainly, there are two types of ensemble methods—combining multiple models of same type and combining multiple models of different type. In this study, we have used staking ensemble technique so we have used combining multiple model of different type and therefore four different types of models are used namely.

Gaussian Naïve Bayesian Classifier

Gaussian Naïve Bayesian algorithms are used for classification. The probability of the attributes is assumed to be Gaussian:

$$ P\left({x}_i|y\right)=\frac{1}{\sqrt{2\pi {\sigma}_y^2}}\exp \left(-\frac{{\left({x}_i-{\mu}_y\right)}^2}{2{\sigma}_y^2}\right) $$

The values of σyand μy are calculated by maximum likelihood [17].

Decision Tree Classifier

Decision tree classifier generates classification or regression models in the shape of a structured tree. It divides a dataset into various smaller data subsets, and this forms associated decision tree. The end result is a tree with decision nodes which has two or more branches and leaf nodes which represents a classification or decision. The highest decision node in the tree represents to the best predictor called the root node. Decision trees can use categorical data and digital data [18].

Support Vector Machine

Support vector machines (SVMs) are supervised learning algorithms used for classification, regression, and outlier detection that analyze data used for classification and regression analysis. Support vector machines (SVMs) are discriminant classifiers that are correctly defined by separate hyperplanes. The SVM uses labeled training data (supervised learning), and the algorithm outputs the best hyperplane, which classifies the new records [18].

Random Forest Classifier

A random forest classifier is a supervised learning technique and can be used for classification and regression analysis. This algorithm is most simple and flexible to use. A forest is collection of various trees. If high number of trees is present, then the forest is more robust. Random forests randomly select data to create decision trees and give prediction from each tree and choose the best solution by use of voting technique. It also provides an attractive excellent display of the feature importance [19].

Staking Ensemble Technique

Stacking is an ensemble method for joining multiple classification models of different types through a meta-classifier. The individual four classification models NB, DT, SVM, and RF are trained using complete training set; then, the meta-classifier is fitted based on the outputs “meta-features” of the individual classification models in the ensemble techniques. The meta-classifier can either be trained on the predicted class labels or probabilities from the ensemble techniques. The staking method is shown in Fig. 4.

Fig. 4
figure 4

Stacking ensemble technique

Results

We performed three experiments for finding the predictions for skin disease datasets using three different feature selection techniques. In first experiment, we choose the skin disease data subset taken from UCI Machine repository by choosing only 15 most important attributes using univariate feature selection methods, which consists 15 attributes and one target attribute “Class.” Second experiment has done with feature importance method, and third experiment has done using correlation matrix with heat map.

The important features obtained by three feature selection techniques used in three experiments are listed in Table 5 to illustrate the features selected.

Table 5 Important features selected by three different feature selection techniques

We visualize the bar chart for distribution values of attributes as shown in Fig. 5. Figure shows the distribution of values for each attribute. Each feature bar chart shows the value of 366 instances, how they are distributed between the 4 values using univariate feature selection, feature importance, and correlation matrix with heat map techniques.

Fig. 5
figure 5

Visualization of skin disease data subset obtained using three feature selection techniques

Python code is developed to calculate the prediction and to obtain different metrics and to evaluate the performance of different classifiers used on skin diseases data subset obtained by univariate feature selection technique, feature importance, and correlation matrix with heat map techniques. To measure the performance of different classifiers, the calculated values of mean, standard deviations, and accuracy are shown in Table 6.

Table 6 Mean value, standard deviation, and accuracy

Table 6 shows that the highest accuracy achieved is 89.18% using Gaussian Naïve Bayesian classification and the lowest accuracy obtained is 77.02% in Random Forest Classification in case of univariate feature selection technique. In feature importance technique, we achieve highest accuracy in case of SVM classifier and lowest accuracy in Naïve Bayesian classifier. The highest accuracy is obtained in random forest classifier 95.94% and minimum in decision tree classifier in case of correlation matrix with heat map.

The overall highest accuracy achieved is 97.29% by SVM using feature importance technique. This shows that the feature importance produced the best calculation for choosing the important features for skin disease data subset.

The comparison of different accuracy obtained by four classifiers, and three feature extraction techniques are shown by the box and whisker plot Fig. 6.

Fig. 6
figure 6

Accuracy of different classifier algorithms

The accuracy and performance of any classifier and ensemble methods are related with some metrics. These metrics are calculated for measuring the performance of different features used in the feature selection techniques. We have calculated three different metrics root mean square error (RMSE), kappa statistic error (KSE) and area under receiver operating characteristics (AUC).

Root Mean Square Error

RMSE plays important role in the performance of classifiers. It is defined as the values predicted by a classifier and the values actually observed. The values of RSME for training and testing datasets are similar if we have developed the good classifier; in other case, if the RMSE values are much higher in testing of data than training data, the classifier developed is not good. The RMSE values is calculated using the formula

$$ \mathrm{RMSE}=\sqrt{\frac{1}{n}\sum \limits_{i=1}^n{\left({y}_i-{\hat{y}}_i\right)}^2} $$

Kappa Statistic Error

The KSE is a metric which compares between calculated accuracy and expected accuracy. The kappa statistic error is calculated for both a single classifier and for ensemble classifiers. The kappa statistic error can be calculated with the help of following formula:

$$ \mathrm{KSE}=\frac{\left( Observed\ Accuracy- Expected\ Accuracy\right)}{\left(1- Expected\ Accuracy\right)} $$

The value of KSE lies between − 1 and 1. If the calculate KSE value is almost 1, then the performance of classifier is more accurate rather than by observation.

Area Under Receiver Operating Characteristics

To understand the concept of confusion matrix, there are 4 types of results:

  • True negative—Observation is negative and we predicted that the class is negative.

  • False negative—Observation is positive but we predict that the class is negative.

  • False positive—Observation is negative and we predicted that the class is positive.

  • True positive—Observation is positive and we predicted that the class is positive.

From these 4 types of outcome, we calculate

$$ True\ Positive\ Rate\ \left(\mathrm{TPR}\right)=\frac{\mathrm{TP}}{\mathrm{TP}+\mathrm{FN}} $$
$$ True\ Negative\ Rate\left(\mathrm{TNR}\right)=\frac{\mathrm{TN}}{\mathrm{TN}+\mathrm{FP}} $$

Now, AUC is calculated from the formula

$$ AUC=\frac{1}{2}\left( TPR+ TNR\right) $$

Calculated RMSE values for different classification methods using stacking ensemble techniques using three different feature selection techniques are shown in Table 7.

Table 7 RMSE, KSE, and AUC values for ensemble method

The accuracy of a classification method is evaluated by confusion matrix using the following equation:

$$ Accuracy=\frac{Number\ of\ Correct\ Predictions\ }{Total\ Number\ of\ Predictions} $$

In another term is can be represented as

$$ Accuracy=\frac{TP+ TN}{TP+ TN+ FP+ FN} $$

The accuracy of stacking ensemble methods using four classifier algorithms is discussed in Table 8.

Table 8 Output of evaluating algorithms on the ensemble dataset

Here, we combine four classification algorithms by stacking classifier as a model. The results obtained after applying these three feature selection techniques to obtained performance of skin disease data subset methods and different values for confusion matrix, precision, recall, f1 score, and support.

$$ Precision=\frac{TP}{TP+ FP} $$
$$ Recall(Senstivity)=\frac{TP}{TP+ FN} $$
$$ \mathrm{F}1- Score=2\left(\frac{Precision\times Recall}{Precision+ Recall}\right) $$

The support is defined as the number of observations of the true response obtained in that class.

The accuracy of staking ensemble techniques and different values is shown in Table 9.

Table 9 Accuracy, confusion matrix, and other values obtained by stacking ensemble method

Discussion

Feature selection acts very vital role in classification methods as unnecessary attributes used in the dataset are removed and the performance of classification models are improved. Using feature selection the dataset is reduced into a subset which also contains the enough information for classification as original dataset. In this research paper, we have performed three experiments univariate feature selection, feature importance, and correlation matrix with heat map to select 15 important features related to erythematous squamous dataset with 366 instances and 34 features which deals with six classes of skin disease. The three feature selection techniques gives different attributes as important features; they are not same because they are calculated using three different techniques chi-square, decision tree, and correlation coefficient. The features selected by three feature selection techniques are presented in Table 6. It is clear that in univariate feature selection and correlation matrix with heat map techniques, 4 clinical and 11 histopathological features are selected. In feature importance techniques, 5 clinical and 10 histopathological features are selected.

After selecting the skin disease data subset using feature selection techniques, we have applied four classification algorithms NB, DT, SVM, and RF. We have chosen both linear and non-linear classification algorithms, so that the performance can be measure from both types of algorithms. These classification algorithms are evaluated on the basis of mean vale, slandered deviation, accuracy, root mean square error, kappa statistic error, and area under receiver operating characteristics. Standard deviation is a number used to inform how measurements for a feature are widen out from the average (mean) or expected value. The lower value of standard deviation represents that most of the numbers are closer to the mean value. A higher value for standard deviation means that numbers are more widen out. The values for mean, standard deviation, and accuracy are presented in Table 3. Table 3 shows that highest accuracy achieved is 89.18% for Gaussian Naïve Bayesian algorithm, and the lowest accuracy obtained is 77.02% in random forest classification in case of univariate feature selection technique. In feature importance technique, we achieve highest accuracy in case of SVM classifier and lowest accuracy in Naïve Bayesian classifier. The highest accuracy obtained in random forest classifier 95.94% and minimum in decision tree classifier in case of correlation matrix with heat map. The error measures, RMSE and KSE, and to evaluate the effectiveness of selected features area under receiver operating characteristics (AUC) value are evaluated in Table 4. If the values of RMSE are near to 0, the accuracy of used algorithm is more accurate. From this principle, it is easily find that the SVM classifier has best RMSE error for two out of four classifiers. If KSE value find for a classifier tends to 1, then performance of that classifier is more accurate. From this principle, the reliability of SVM classification is better to others. To evaluate the effectiveness of used classifiers, AUC value is calculated for each classifier. AUC values are given in Table 4.

In last, we have combined all four classifiers using stacking ensemble techniques to improve the performance of models on three feature selection data subset. Table 5 shows the accuracy obtained by stacking ensemble method. Table 6 shows the accuracy, confusion matrix, and different values for six classes of target variable like precision, recall, f1 score, and support. Precision is the percentage of results which are relevant to target variable, whereas recall (sensitivity) is the percentage of total relevant results correctly classified by the algorithm. F1 Score is the weighted average of precision and recall. The f1 score gives the harmonic mean of precision and recall. Therefore, this score takes both false positives and false negatives into consideration. The support is the number of samples of the true response that lie in that class. The highest accuracy 99.86% obtained by staking ensemble method is in the case of correlation matrix with heat map feature selection techniques. This shows that the correlation coefficient proves the better result in feature selection technique.

To demonstrate the achievement of our proposed model, the results evaluated by present study were compared to other results obtained from the previous study. To compare the accuracy of this model, with others, a huge number of studies of skin disease using the same dataset but using another classifications methods and feature selection methods have been done. According to these studies, the same dataset of skin disease and test datasets are used. To demonstrate this, the efficiency obtained by this model is compared to other studies and we achieve the highest accuracy as compared to other studies as shown in Table 10.

Table 10 Few results obtained by previous researcher which conduct experiment on skin disease

Conclusion

Machine learning plays an important role in healthcare. The knowledge obtained by previous stored information from healthcare industry/organizations is used by various machine learning algorithms and is used to build up a machine which makes an effective decision to advance and develop healthcare industry/organizations. This paper uses different machine learning techniques for improving skin disease prediction. Since skin disease dataset consists of 35 attributes for evaluation of disease, but all the 35 attributes are not necessary to check the disease, so we use three feature selection techniques univariate feature selection, feature importance, and correlation matrix with heat map to obtain the best featured data subset. Four machine learning techniques NB, DT, SVM, and RF are used on reduced data subset to classify the prediction of skin disease. The best accuracy find among these different techniques is 97.29% by SVM. We have also calculated the various performance metrics like mean value, standard deviation, root mean square error (RMSE), kappa statistic error (KSE), and area under receiver operating characteristics (AUC). A stacking ensemble method is then applied to combining these four machine learning techniques; we get the highest accuracy of 99.86% in case of correlation matrix with heat map feature selection techniques. We get the highest accuracy in the literature available on skin disease dataset, and we recommend that the correlation coefficient obtained using heat map is best feature selection techniques.