Introduction

The application of machine learning algorithms is widely used in the area of the medical field. Various disease diagnosis classification algorithms are developed to find the high accuracy for predicting the disease. Many machine learning algorithms are developed for predicting the various type of disease at early level after examining the various attributes of the disease. These algorithms are widely applicable in breast cancer, kidney diseases, thyroid disease, diabetes, cancer diseases, erythemato-squamous diseases, and many more. In this research paper, we choose erythemato-squamous disease for analysis. Various classification algorithms are applied and then ensemble methods are applied in this study. Another approach using feature selection is applied with these classification algorithms to obtain the accuracy of the prediction for the application to make an expert system [1, 2]. Various studies have been done in this field and some are discussed below.

Polat and Güneş [3] achieved 96.71% classification accurate rate on the diagnosis of erythemato-squamous diseases using a novel hybrid intelligence method based on C4.5 decision tree classifier and one-against-all approach for multi-class classification problem.

Immagulate and Vijaya [4] focus on non-melanoma skin cancer and classify types, using support vector machines (SVM) to accurately predict disease types. The chrominance and texture features are extracted pre-processed training datasets.

Chang and Chen [5] discussed the decision tree combined with neural network classification methods to construct the best predictive model of dermatology. The learning predicted and analyzed six common skin conditions. All classification techniques can predict disease fairly accurately, and the neural network model has the highest accuracy of 92.62%.

Ramya and Rajeshkumar [6] discussed the Gray-Level Co-Occurrence Matrix (GLCM) technique for finding features from segmented disease and classifying skin disease based on fuzzy classification, which is more accurate than existing ones.

Übeyli and Doğdu [7] find the results of a study where they deployed a k-means clustering approach to classifying erythemato-squamous diseases dataset. The results of the study indicate approximately 94% overall classification accuracy rate when using 5 out of the 6 decision classes (excluding Pityriasis rubra pilaris—20 instances).

Güvenir et al. [8] discussed the VFI5 classification algorithm which represents a concept description by a set of feature value intervals. The classification of a new instance is based on voting among the classifications made by the value of each feature separately.

Ahmed et al. [9] discussed clusters of pre-processed data, using k-means clustering algorithms to separate related and unrelated data into skin disease. Frequent patterns were evaluated using the MAFIA algorithm. Decision tree and AprioriTid algorithms are used to extract frequent patterns from clustered datasets.

Fernando et al. [10] discussed a disease prediction method, DOCAID, to predict malaria, typhoid fever, jaundice, tuberculosis, and gastroenteritis based on patient symptoms and complaints using the naïve Bayesian classifier algorithm. The authors reported an accuracy rate of 91% for predicting disease.

Jaleel et al. [11] extracted these features using a 2D wavelet transform method and then classified them using a back propagation neural network (BPNN). They classify the dataset as cancer or non-cancer.

Theodoraki et al. [12] developed a predictive model to predict the final outcome of a seriously injured patient after an accident. The investigation includes a comparison of data mining techniques using classification, clustering, and association algorithms. Using this analysis, they obtained results in terms of sensitivity, specificity, positive predictive value, and negative predictive value, and compared results between different predictive models.

Sharma and Hota [13] used SVM and ANN data mining techniques, to classify various types of erythema-squamous diseases. They used a confidential weighted voting scheme to combine the two technologies to achieve the highest accuracy of 99.25% in the training and 98.99% in the testing phases.

Rambhajani et al. [14] used a Bayesian classification to classify the erythemato-squamous disease dataset. The author used the best first search feature selection technology technique, and they removed 20 features from the dermatology dataset collection collected by the University of California Irving repository and then used the Bayesian technology to achieve 99.31% accuracy.

Bakpo and Kabari [15] used ANN for diagnosis of different skin diseases and they achieved 90% accuracy. There are a few unique features for skin cancer regions.

Manjusha et al. [16] predict different skin diseases using the naïve Bayesian algorithm. Automatic identification of circulatory disease dermatological features was extracted from the local binary pattern from affected skin images and used for classification.

Yadav and Pal [17] discussed about women thyroid prediction using data mining techniques. They used two ensemble techniques. The first ensemble technique generated by decision tree and the second was generated by bagging and boosting techniques. They observed dataset for thyroid symptom and find better accuracy results.

Tuba et al. [18] applied the automatic erythemato-squamous diseases detection method by optimized support vector machine. Parameters of the support vector machine were tuned by the recent swarm optimization algorithm, elephant herding optimization. They tested on a standard dataset that contains data for 366 patients with one of six different erythemato-squamous diseases. The accuracy achieved by them is 99.07%.

Zhang et al. [19] applied a hybrid approach that uses granular computing (GrC) and supports vector machines (SVM). The authors reviewed and evaluated most of the past artificial intelligence systems used for the diagnosis of erythemato-squamous disease and tabulated the classification results of all these algorithms. The results achieved averages of sensitivity and specificity as 98.43% and 99.71%, respectively.

In this study, an attempt is done to use machine learning methods to ensemble six different classifiers, which are the following:

Passive Aggressive Classifier

Passive aggressive algorithms are a family of online learning algorithms (for both classification and regression) proposed by Crammer at al. The passive aggressive classifier (PAC) algorithm is perfect for classifying massive streams of data. It is easy to implement and very fast but does not provide global guarantees like the support vector machine (SVM). Passive: if correct classification, keep the model; Aggressive: if incorrect classification, update to adjust to this misclassified example.

Linear Discriminant Analysis

Linear discriminant analysis (LDA) is closely related to the analysis of variance (ANOVA) and regression analysis, which also attempt to express one dependent variable as a linear combination of other features or measurements. LDA is also closely related to principal component analysis (PCA) and factor analysis in that they both look for linear combinations of variables which best explain the data.

Radius Neighbors Classifier

Radius neighbors classifier (RNC) is very similar to a k-neighbors classifier with the exception of two parameters. First, in radius neighbors classifier, we need to specify the radius of the fixed area used to determine if an observation is a neighbor using radius. Unless there is some substantive reason for setting radius to some value, it is best to treat it like any other hyper-parameter and tune it during model selection. The second useful parameter is outlier-label, which indicates what label to give an observation that has no observations within the radius—which itself can often be a useful tool for identifying outliers.

Bernoulli Naïve Bayesian

Bernoulli naïve Bayesian (BNB) implements the naïve Bayes training and classification algorithms for data that is distributed according to multivariate Bernoulli distributions; i.e., there may be multiple features but each one is assumed to be a binary-valued (Bernoulli, Boolean) variable. Therefore, this class requires samples to be represented as binary-valued feature vectors; if handed any other kind of data, the decision rule for Bernoulli naïve Bayes is based on:

$$ P\left({x}_i|y\right)=P\left(i|y\right){x}_i+\left(1-P\left(i|y\right)\right)\left(1-{x}_i\right) $$

which differs from the multinomial NB’s rule in that it explicitly penalizes the non-occurrence of a feature i that is an indicator for class y, where the multinomial variant would simply ignore a non-occurring feature.

Gaussian Naïve Bayesian

Gaussian naïve Bayesian (NB) implements the Gaussian naïve Bayes algorithm for classification. The likelihood of the features is assumed to be Gaussian:

$$ P\left({\mathrm{x}}_i|y\right)=\frac{1}{\sqrt{2\pi {\sigma}_y^2}}\exp \left(-\frac{{\left({x}_i-{\mu}_y\right)}^2}{2{\sigma}_y^2}\right) $$

The parameters σyand μy are estimated using maximum likelihood.

Extra Tree Classifier

An “extra trees” classifier (ETC), otherwise known as an “Extremely randomized trees” classifier, is a variant of a random forest. Unlike a random forest, at each step, the entire sample is used and decision boundaries are picked at random, rather than the best one. In real-world cases, performance is comparable with an ordinary random forest, sometimes a bit better.

Three different ensemble techniques, bagging classifier, AdaBoost classifier, and gradient boosting classifier, are applied to predict skin disease.

Methods

Figure 1 demonstrates the whole structure of the methodology used in this research paper. The figure demonstrates the different data mining methods (I) PAC, (II) LDA, (III) RNC, (IV) BNB, (V) NB, and (VI) ETC used in this study. The approach used in this paper is completely data driven. In this paper, we have applied the six classification algorithms to measure the accuracy and sensitivity of the predicted values of skin disease classes. The obtained values are then improved using ensemble technique by using the bagging classifier, AdaBoost classifier, and gradient boosting classifier. The same techniques are again applied on the skin disease dataset by using feature selection—in this selection, we obtained 15 important attributes using the feature importance method. Now, we reduce the dataset and take only 15 attributes and 366 instances of the dataset and again evaluate the accuracy of the prediction of skin disease dataset. A comparative study is then performed to evaluate the best prediction.

Fig. 1
figure 1

Methodological approach for skin disease

Dataset Analysis

The database used in this study is taken from the UCI machine learning repository (http://archive.ics.uci.edu/ml). Briefly, this dataset was formed to examine skin disease and classify the type of erythemato-squamous diseases. This dataset contains 34 variables; in this dataset, 33 variables are linear and 1 variable is nominal. These six classes of skin disease include the following: C1, psoriasis; C2, seborrheic dermatitis; C3, lichen planus; C4, pityriasis rosea; C5, chronic dermatitis; C6, pityriasis rubra. Biopsy is one of the basic treatments in diagnosing these diseases. A disease may also contain the properties of another class of disease in the initial stage, which is another difficulty faced by dermatologists when performing the different class of diagnosis of these diseases. Initially, patients were first examined with 12 clinical features, after which the assessment of 22 histopathological attributes was performed using skin disease samples. Histological features were identified by analyzing the samples under a microscope. If any of the diseases is found in the family, the family history attribute in the dataset constructed for the domain has a value of 1 (one), and if not found, the value is 0 (zero). The age of the patient is used to indicate age characteristics. All other attributes (clinical and histopathological both) were assigned a value in the range from 0 to 3 (0 = absence of features; 1, 2 = comparative intermediate values; 3 = highest value). There are six classes of erythemato-squamous disease, with 366 instances and 34 attributes in the domain. Table 1 summarizes the contents of the attributes.

Table 1 Skin disease dataset [8]

Feature Selection

Feature selection is a process to automatically select only those features from the dataset which contribute most to the prediction variable or output. Irrelevant attributes in the dataset can decrease the accuracy of many models, especially linear algorithms like linear and logistic regression. Three benefits of performing feature selection before modelling dataset are the following:

  1. 1.

    Reduces over fitting

  2. 2.

    Improves accuracy

  3. 3.

    Reduces training time

There are many feature selection techniques (univariate selection, recursive feature elimination, principal component analysis, and feature importance) available, but in this research paper, we have applied the feature importance method to choose 15 most important attributes from the skin disease dataset. The importance of attributes is shown in Fig. 2. Figure 2 shows an important score for each attribute where the larger the score, the more important the attribute.

Fig. 2
figure 2

Important attributes

We can get the feature importance of each feature of the dataset by using the feature importance property of the model. Feature importance gives a score for each feature of the dataset, the higher the score, the more important or relevant is the feature towards the output variable. The feature importance is calculated based on fallowing algorithm:

  1. 1.

    Train an FR suing D _ train (all k features)

  2. 2.

    Compute average RMSE of a model for cross-validation data (C-V)

  3. 3.

    Rank performance by \( {VI}_k=\sum \limits_{\theta}^{\varphi}\frac{\left(\in {\theta}_k-{\epsilon \theta}_{\pi k}\right)}{\varphi \delta} \)

  4. 4.

    For each subset of k ki = k − 1, k − 2, k − 3, ……. . , 1 do

  5. 5.

    Train a new forest from ki features, highest VI

  6. 6.

    Calculate the average RMSE of the model on C-V set

  7. 7.

    Re-rank the features (k)

  8. 8.

    Find ji with smallest RMSE

Ensembles Method

In this research paper, the ensemble method is used as a method to find the accuracy of the skin disease dataset to improve the performance of algorithms. We will apply an ensemble method to combine six different machine learning algorithms using bagging classifier, AdaBoost classifier and gradient boosting classifier.

  1. 1.

    Bagging. Bootstrap aggregating, also known as bagging, is a machine learning model aggregation technique designed to improve the stability and accuracy and to reduce variance to avoid over fitting of machine learning algorithms applied in regression and classification methods.

    1. A.

      Takes original dataset D with N training examples

    2. B.

      Creates M copies \( {\left\{\overset{\sim }{D_m}\right\}}_{m-1}^M \)

      1. a.

        Each \( \overset{\sim }{D_m} \) is generated from D by sampling with replacement.

      2. b.

        Each dataset \( \overset{\sim }{D_m} \) has the same number of examples as in dataset D.

      3. c.

        These datasets are reasonably different from each other.

    3. C.

      Trains models h1, … …, hM using \( \overset{\sim }{D_1},\dots \dots .\overset{\sim }{D_M} \), respectively

    4. D.

      Uses an averaged model \( h=\frac{1}{M}{\sum}_{m-1}^M{h}_m \) as the final model

  2. 2.

    AdaBoost, also known as adaptive boosting, is a machine learning meta-algorithm which uses the concept of combining independent individual hypothesis in a sequential order to improve the accuracy. Basically, boosting algorithms convert the weak learners into strong learners. It is well designed to address the bias problems.

    1. A.

      Given training data (x1, y1), … … …… … ,(xN, yN) with yn ∈ {−1, +1}, ∀ n

    2. B.

      Initialize weight for each example \( \left({x}_n,{y}_n\right):{D}_1(n)=\frac{1}{N},\forall n \)

    3. C.

      For round t = 1:T

      1. a.

        Learn a weak ht(x) → {−1, +1} using training data weighted as per Dt.

      2. b.

        Compute the weighted fraction of errors of ht on this training data \( {\in}_t=\sum \limits_{n-1}^N{D}_t(n)\coprod \left[{h}_t\left({x}_n\right)\ne {y}_n\right] \).

      3. c.

        Set “importance” of \( {h}_t:{\alpha}_t=\frac{1}{2}\mathit{\log}\left(\frac{1-{\epsilon}_t}{\epsilon_t}\right) \).

      4. d.

        Update the weight of each example. \( {D}_{t+1}(n)\kern0.75em \propto {\displaystyle \begin{array}{c}\left\{\begin{array}{c}{D}_t(n)\times \exp \left({\alpha}_t\right)\kern1.25em if\ {h}_t\left({x}_n\right)={y}_n\\ {}{D}_t(n)\times \exp \left({\alpha}_t\right)\kern1em if\ {h}_t\left({x}_n\right)\ne {y}_n\end{array}\right.\\ {}={D}_t(n)\times \exp \left({\alpha}_t{y}_n{h}_t\Big({x}_n\right)\Big)\kern5.25em \end{array}} \)

      5. e.

        Normalize Dt + 1 so that it sums to 1:\( {D}_{t+1}(n)-\frac{D_{t+1}(n)}{\sum \limits_{m-1}^N{D}_{t+1}(m)} \)

    4. D.

      Output the boosted final hypothesis \( H(x)=\operatorname{sign}\left({\sum}_{t-1}^T{\alpha}_t{h}_t(x)\right) \)

  3. 3.

    Gradient boosting. It is an example of a generalized boosting algorithm. Gradient boosting machine learning technique works for regression and classification problems, which produces a prediction model in the form of an ensemble of weak prediction models, typically decision trees.

    1. A.

      Given training data (x1, y1),……………, (xN, yN).

    2. B.

      Initialize \( {\hat{f}}_0 \)with a constant.

    3. C.

      For t = 1 to M do the following:

      1. a.

        Compute the negative gradient gt(x).

      2. b.

        Fit a new base-learner function h(x, θt).

      3. c.

        Find the best gradient descent step size ρt: \( {\rho}_t=\arg \min \rho \sum \limits_{i=1}^N\Psi \left[{y}_i,{\hat{f}}_{t-1}\left({x}_i\right)+\rho h\left({x}_i,{\theta}_t\right)\right] \).

      4. d.

        Update the function estimate \( {\hat{f}}_t\leftarrow {\hat{f}}_{t-1}+{\rho}_th\left(x,{\theta}_t\right) \).

Results

We have conducted two experiments to obtain the prediction results for skin disease datasets. In the first experiment, we have used the skin disease dataset obtained from the UCI machine repository which consists 34 variables and one target variable “Class.” Then we have applied an ensemble method to obtain the predicted values. While in the second experiment, we have used feature selection method with the same classifier and ensemble methods to check the results obtained are better or not. Same six classifiers are again used in the feature selection dataset to obtain the predictions.

Experiment 1

Before analyzing the dataset, we first visualize the distribution of values as shown in Figs. 3 and 4. Figure 3 depicts the distribution of values of skin disease used in our study containing 366 instances and 35 attributes. Each feature shows the distribution of frequency among the 4 values (0 to 3) except the feature f34 (which is age) and feature f35 (which is targeting variable class).

Fig. 3
figure 3

Visualization of skin disease dataset

Fig. 4
figure 4

Density map of skin disease dataset

The density map is a smooth continuous version of the smoothed graph estimated from the data. The most common form of estimation is called the kernel density estimation. In this method, a continuous curve (core) is drawn at each individual data point, and then all of these curves are added together for a single smoothed density estimate. The most commonly used kernel is Gaussian (which produces a Gaussian bell curve at each data point). The density map of the attributes is shown in Fig. 4.

We have used the Python code to calculate the prediction on the skin disease dataset to calculate the mean value, standard deviations, and accuracy of the six different classification techniques. The values obtained by the six classifiers are shown in Table 2.

Table 2 Mean and standard deviation of classifiers

The accuracy of any classification method is calculated using the confusion matrix and is defined as:

$$ \mathrm{Accuracy}=\frac{\mathrm{Number}\ \mathrm{of}\ \mathrm{Correct}\ \mathrm{Predictions}\ }{\mathrm{Total}\ \mathrm{Number}\ \mathrm{of}\ \mathrm{Predictions}} $$

In another term, it can be represented as:

$$ \mathrm{Accuracy}=\frac{\mathrm{TP}+\mathrm{TN}}{\mathrm{TP}+\mathrm{TN}+\mathrm{FP}+\mathrm{FN}} $$

where TP is true positives; TN, true negatives; FP, false positives; and FN, false negatives.

From Table 2, it is clear that the highest accuracy obtained is 98.64% in the case of Bernoulli naïve Bayesian classification algorithm and the least accuracy is 58.67% for radius neighbors classification algorithm with a radius of 10 (r = 10). The box and whisker plot of six classifier methods are shown in Fig. 5.

Fig. 5
figure 5

Accuracy of different classifier algorithms

Now, we have applied three ensemble methods boosting, AdaBoost, and gradient boosting to calculate the root mean square error values of six classification methods. This is the square root of the mean of the squared errors. RMSE indicates how close the predicted values are to the actual values; hence, a lower RMSE value signifies that the model performance is better. RMSE value is calculated by the formula:

$$ \sqrt{\frac{1}{n}\sum \limits_{i=1}^n{\left({y}_i-{\hat{y}}_i\right)}^2} $$

The RMSE values for each classification algorithms using ensemble techniques are shown in Table 3.

Table 3 RMSA values for ensemble methods

The accuracy of three ensemble methods using six classification machine learning algorithms is shown in Table 4. It is clear that the accuracy found in Table 4 is higher than the accuracy found in Table 2. The accuracy of RNC in gradient boosting ensemble method has a very high increase as compared with individual RNC accuracy of 58.67%.

Table 4 Output of evaluating algorithms on the scaled dataset

Now, we ensemble all the six classification algorithms using the bagging classifier, AdaBoost classifier, and gradient boosting classifier. The results obtained after applying these three ensemble methods are shown in Table 5.

Table 5 Output of ensemble method

Experiment 2

In the new obtained dataset, after feature selection using the feature importance method, we obtained 10 clinical features (f1, f3, f2, f7, f6, f10, f4, f5, f8, and f9) and 5 histopathological features (f14, f16, f20, f12, and f15). This shows that clinical features are more important than histopathological features in the accuracy of disease prediction. We first visualize the distribution of values as shown in Figs. 6 and 7. Figure 6 depicts the distribution of values of skin disease used in our study containing 366 instances and 16 attributes. Each feature shows the distribution of frequency among the 4 values (0 to 3) except the feature f35 (which is targeting variable class). Figure 7 depicts the density map of the features of the dataset.

Fig. 6
figure 6

Visualization of skin disease dataset

Fig. 7
figure 7

Density map of skin disease dataset

We have again used the Python code to calculate the prediction on the skin disease dataset obtained from the feature selection method to calculate the mean value, standard deviations, and accuracy of the six different classification techniques. The value obtained from the six classifiers is shown in Table 6.

Table 6 Mean and standard deviation of classifiers

From Table 6, it is clear that the highest accuracy obtained is 97.29% in the case of Bernoulli naïve Bayesian classification algorithm and the least accuracy is 81.08% in the radius neighbors classification algorithm with a radius of 4 (r = 4). The box and whisker plot of six classifier methods are shown in Fig. 8.

Fig. 8
figure 8

Accuracy of different classifier algorithms

The root mean square error (RMSE) values obtained by a dataset from feature selection method are shown in Table 7.

Table 7 RMSA values for ensemble methods

The accuracy of three ensemble methods using six classification machine learning algorithms on a reduced dataset obtained after features selection is shown in Table 8.

Table 8 Output of accuracy on the reduced dataset

Here, we observe that the accuracy of ensembles methods is increased in comparison to without ensemble data mining techniques. It is also clear that the accuracy obtained from experiment 2 (with feature selection on the reduced dataset) is higher than the accuracy obtained from experiment 1 (using three ensemble techniques on six classifiers on whole skin disease dataset). Now, we ensemble all the six techniques using the bagging classifier, AdaBoost classifier, and gradient boosting classifier. The results obtained after applying these three ensemble methods are shown in Table 9.

Table 9 Output of evaluating ensemble method

The accuracy obtained in the feature selection method has improved over the first experiment which considered the full spectrum of data. The possible causes are due to the inclusion of irrelevant features, which does not play an important role in the prediction of skin diseases.

ROC curve represents a plot of the true positive rate (sensitivity) in the function of the false positive rate (100-specificity) for different cut-off points of a parameter. Each point on the ROC curve represents a sensitivity/specificity pair corresponding to a particular decision threshold. The area under the ROC curve (AUC) is a measure of how well a parameter can distinguish between two diagnostic groups (diseased/normal). The ROC curve of the three ensemble techniques bagging, AdaBoost, and gradient boosting is shown in Fig. 9.

Fig. 9
figure 9

Receiver operating characteristic (ROC) curve

Discussion

In this paper, we have conducted two experiments differently one using three ensemble methods bagging, AdaBoost, and gradient boosting on six different classification algorithms PAC, LDA, RNC, BNB, NB, and ETC, and second using the feature selection method. This experiment is done on the UCI skin disease dataset which contains 366 instances and 34 attributes. The skin dataset consists of six classes of skin disease: C1, psoriasis; C2, seborrheic dermatitis; C3, lichen planus; C4, pityriasis rosea; C5, chronic dermatitis; C6, pityriasis rubra. We have analyzed the mean, standard deviation, root mean square error, and accuracy of six different machine learning algorithms to obtain the highest accuracy of 98.64% in the case of Bernoulli naïve Bayesian classification algorithm. The performance demonstrated by the ensemble data mining techniques for skin disease prediction lies in input variable choice and classification method selection. A root mean square error is calculated for each ensemble methods. After applying ensembles method, we get 99.46% accuracy in the case of the gradient boosting classifier method.

In the second experiment, we first choose the important attributes using the feature importance method to obtain the most crucial fifteen features and then reduce the dataset as a subset of the original dataset. The reduced dataset contains 15 attributes and 366 patient records. The same ensemble methods and machine learning algorithms are applied and we get different results; the highest accuracy achieved in this experiment is 99.68% in the case of the gradient boosting classifier method.

A comparison of experiment one and two are shown in Table 10 (FS stands for feature selection).

Table 10 Comparison of accuracies in experiment 1 and experiment 2

In this comparison, we execute six classifiers using three different ensemble algorithms without feature selection (all 34 features used) and we calculated the corresponding accuracy of classifiers. The accuracies obtained with all features used and after feature selection in experiment 2 are shown in Table 10. The highest accuracy of each classifier is also presented in the same row in order to make comparison better.

The comparison from Table 10 shows the efficiency of the feature selection method on the reduced dataset is better than without the feature selection in each case, and we get higher accuracy.

To illustrate the success of our approach, the results obtained in this study were compared with other results given in the literature. In order to compare the efficiency of the proposed dermatological classification, we used a large number of technical studies using the same information but using different classification techniques and then developing a multi-model ensemble method. According to these studies, the same partitions of the above test datasets were followed. To illustrate this, the classification efficiency is compared with previous studies. This is shown in Table 11.

Table 11 A few investigations which have dealt with skin disease mining

Although the predicted values after applying feature selection in experiment 2 gives better results than in experiment 1, it does not guarantee that it will perform in all cases, if we use more instances. The new feature selection process is used on only these 366 instances, so whether the selection of features is accurate or not will depend upon more instances, which can be used in a future study.

Conclusion

Machine learning techniques play an important role in the diagnosis of diseases in the biotechnology field. Knowledge obtained using machine learning techniques can be used to develop expert systems, which provide help in predicting various types of disease. This paper describes different data mining techniques for skin disease prediction. Six machine learning classification techniques PAC, LDA, RNC, BNB, NB, and ETC are used to classify the prediction of skin disease and three ensemble techniques bagging, AdaBoost, and gradient boosting classifiers are applied to improve the accuracy obtained by machine learning algorithms. A feature selection method is also applied on skin disease dataset and obtains more accurate accuracy of 99.68% in the case of gradient boosting ensemble method applied on RNC. We got the highest accuracy in the literature available on the skin disease dataset.

This study achieved higher accuracy as compared with previous research done in this field. This study can be useful to correctly predict the skin disease. Hence, we conclude that the proposed feature selection model can be of efficient use in the detection of erythemato-squamous diseases with improvement in speed and accuracy.