Introduction

Skin illnesses are the most frequent around the globe, since people contract them as a result of their ancestors’ genes, environmental circumstances, or the body’s own immune response. Many people overlook the effects of skin illness when it is still in its early stages. The prevalence of such dermatological diseases has increased with the degradation of environment and lifestyle diseases, and the cost of diagnosing them with advanced techniques is exorbitant. The current study was motivated to develop a low-cost, all-inclusive solution by employing artificial intelligence techniques that can accurately and precisely differentiate between different types of dermatological skin issues. In the current method, skin diseases are diagnosed through a biopsy process, which is then examined, and doctors must manually prescribe medications. However, the availability of data in clinical records, patient demographic information, imaging examination results, and data acquired from questionnaires represents a wealth of information that has the potential to revolutionize personalized care in dermatology and medicine in general. Data from the genome, epigenome, transcriptome, proteome, and microbiome, fields of research that are sometimes referred to by the shorter name “omics,” is already rich in dermatology. Machine learning (ML) algorithms with human-like intelligence have several applications in dermatology thanks to recent developments in quicker processing and cheaper storage. Moreover, CNNs have been used in medicine domains like as radiology and pathology as the deep learning age has progressed, particularly breakthroughs in convolutional neural networks (CNNs). The use of CNNs in dermatology, which is similarly dependent on pictures, is, however, quite limited. Psoriasis (Pso), eczema (Ecz), and atopic dermatitis (AD) are all inflammatory skin illnesses that are easily misdiagnosed in practice. The present research has two objectives: one, to accurately detect the skin disease using deep learning from the prevalent datasets using advance deep learning techniques. Secondly, it also aims to link the system to certain dermatologists such that they can diagnose it appropriately so that patient gets the timely and affordable care they need to treat the skin ailments as soon as possible. The paper is divided into eight sections; the “Introduction” section introduces the topic. The “Background” section briefly describes the different diseases that will be covered by the proposed system and WHO stance on dermatological diseases. The “Review of literature” section reviews the already prevalent work and discusses their shortcomings that led to the development of this research. The “Methodology” section focuses on the methodology of skin disease detection and prediction. The “Results” section gives the results generated through the machine learning model and compares the different models for better performance in terms of certain parameters. The “Result analysis” and “Discussion” sections summarize the outcome of the results generated. Finally, the “Conclusion” section concludes the given work.

Background

At WHO’s first global meeting on skin neglected tropical diseases, WHO has called for greater efforts to address the burden of NTDs as the pose a major public health problem. Overall, skin conditions are estimated to affect 1.8 billion people at any point in time. In tropical and resource-poor settings, skin infections, which can be of bacterial, viral, fungal, or parasitic origin, are the commonest cause of disease. In most communities, skin NTDs form about 10% of skin diseases. It is therefore important that endemic countries adopt holistic, community-oriented approaches to addressing comprehensively skin NTDs and all other skin conditions as part of universal health coverage and leaving no one behind [1].

Patients’ well-being, mental health, capacity to function, and social participation, a measure of disability defined generally by the WHO as a person’s ability to be involved and engaged in contacts with others, are all threatened by skin disorders. The influence of medical problems on key determinants of health can be estimated using quality of life (QoL) instruments. There are a variety of QoL measuring instruments available, including the Dermatology Life Quality Index (DLQI) and the Skindex, which may be tailored to individual situations.

Skin illnesses remain the fourth greatest source of nonfatal disease burden worldwide, according to the Global Burden of Disease research. However, research efforts and funding are insufficient in comparison to the severity of skin disorders. International and national efforts, such as the WHO list of essential medicines, are crucial in decreasing the socioeconomic burden of skin disorders and improving access to treatment. Teledermatology, point-of-care diagnostic technologies, and task-shifting are examples of recent advances that enable to bring dermatological treatment to underprivileged areas in a cost-effective manner.

Skin disorders are a major cause of non-fatal disability all across the world, especially in resource-poor areas. Increased funding for research on the burden of skin illness in low-resource settings, as well as policy measures to supply high-quality care, are critical to reducing the burden of skin disorders.

Review of literature

Multiple researches have been carried out for image-based detection and classification of severe diseases. CNNs in dermatology arose from the introduction of ground-breaking technology to aid in melanoma detection. There is now a slew of AI technologies that help doctors diagnose cancer using data from dermo scopes and histological pictures of skin biopsy samples. However, to the best of knowledge, no AI techniques have been used to help in the diagnosis of skin illnesses other than malignancies.

In [2], an extensive review is presented, which identifies five current areas of application for machine learning in dermatology. The goal of the study is to guide dermatologists to help explain the foundations of machine learning and its vast variety of applications so that they may properly assess its possible benefits and drawbacks. Whereas, a comparative analysis was performed between machine learning algorithms and convolutional neural networks. In [3], there was brief comparison made between the machine learning and deep learning processes. Three separate and well-known algorithms are utilized in both operations. The Bagged Tree Ensemble, K-Nearest Neighbor (KNN), and Support Vector Machine (SVM) algorithms were employed in the machine learning process. Three pre-trained deep neural network models, ResNet50, VGG16, and GoogleNet, were employed in the deep learning process, and the accuracy in each was compared.

Smartphone-based app development for some limited number diseases was developed in some research. In [4], the paper demonstrates that deep learning can be used in dermatology for purposes other than melanoma diagnosis. The research shows that CNNs can distinguish between numerous illnesses with only three skin problems such as Pso, AD, and Ecz. A smartphone app has been developed for physicians in China. In [5], the MobileNet model was utilized to construct a skin illness classification system on an Android application by applying transfer learning to the seven skin disorders. Because of the uneven quantity of photographs in each class, the proponents acquired a total of 3406 images, which was considered an unbalanced dataset. The paper [6] attempts to create the WonDerM pipeline which resamples preprocessed skin lesion pictures, develops a neural network architecture fine-tuned with segmentation task data, and employs an ensemble technique to identify the seven skin disorders.

In [7], the paper works on very limited scope and restricted dataset based on image processing and machine learning approaches; this research provides a method for detecting skin diseases. As an input to the prototype, the patient gives a photograph of the diseased skin region. On this picture, image processing techniques are used to extract feature values, and the illness is predicted using a classifier model. The paper in [8] proposes a hands-on pedagogical exercise that dissects the steps involved in training a CNN using a dataset of photos of skin lesions linked with several skin cancer classifications. The activity is open-source, and it does not need the installation of any software. A step-by-step description of the algorithm and its functions is given, taking the reader through the execution of a realistic example, including visualization and assessment of the results, by tracking the creation of the computer code’s building blocks. Similarly, CNNs were employed in three different phases in paper [9], i.e., the feature extraction phase, the training phase, and the testing and validation phase to perform classification of skin diseases. However, accuracy and efficiency were extremely limited for a limited dataset.

The paper in [10] presents a study that demonstrated a deep learning-based MobileNet V2 and long short-term memory (LSTM)-based automated approach for identifying skin condition. The MobileNet V2 model proved to possess more specificity; however, accuracy and efficacy were limited. Similarly, feature extraction was performed in [11], and a skin disease detection approach based on image processing was developed. The method is used on the inputs of a color image. Then, using a pretrained convolutional neural network, picture is resized to extract features. Following that, a feature picture of the disease’s effect on the skin was categorized, and image analysis was used to determine the disease’s kind.

The application of CNN in detection of skin cancer has been quite prevalent which has been demonstrated in [12,13,14,15]. While in [12], The model was used to distinguish between keratinocyte carcinomas and benign seborrheic keratoses, as well as malignant melanomas and benign nevi. Whereas in [13], dermoscopic pictures and diagnoses were used to train and verify Google’s Inception v4 CNN architecture. A 100-image test set was employed in a comparative cross-sectional reader study.

In [14], the study examines current practices, issues, and opportunities in image capture, pre-processing, segmentation, feature extraction and selection, and classification of dermoscopic pictures, as well as the state of the art in these systems. Finally, the paper in [15], evaluated the performance of various statistical classifiers on a large sample of pigmented skin lesions captured by four digital analyzers in two separate dermatological facilities. Similar studies were incorporated to critically understand the nature and prevalence and detection of skin diseases using deep learning techniques in references [16,17,18,19,20,21,22].

Methodology

In this paper, deep learning algorithms were employed, and each algorithm was evaluated based on the various parameters as described. Deep learning is part of the machine learning family, which includes supervised, unsupervised, and semi-supervised learning.

Unlike machine learning, deep learning employs a big dataset for learning and reduces the number of classifiers employed significantly. The use of a very big dataset increases the training time for the deep learning algorithm. Unlike machine learning, the deep learning algorithm selects its own characteristics, making the prediction process easier for the end user because it does not require extensive pre-processing.

Data collection, architecture definition, modifying the deep learning algorithms, evaluation, and comparative analysis are all important steps in this research as depicted in Fig. 1.

Fig. 1
figure 1

Flowchart of methodology

Dataset collection and processing

The dataset for skin disease classification has been taken from publicly available dataset from Kaggle portal in the form of images. It consist of 20 classes, namely, acne and rosacea (717 images), alopecia (179), atopic dermatitis (366), bullous disease (336), cellulitis and impetigo (552), eczema (926), exanthems drug eruptions (382), fungal infection (975), herpes HPV and STDs (305), keratosis (862), light diseases disorders (426), lupus connective tissue (315), melanoma skin cancer (348), nail disease (780), poison ivy and contact dermatitis (195), psoriasis lichen planus (1055), scabies lyme disease (324), seborrheic keratoses (1029), systemic disease (455), and urticaria hives (160).

Dataset preprocessing

Since the images were in different dimensions, they were resized to a common value of 299 × 299 pixel. Data augmentation was also tried but it could improve the result.

Model selection

The image dataset was trained using different state of the art pretrained network such as Xception, Inception-v3, Resnet50, DenseNet121, and Inception-ResNet-v2, and their performance was determined using accuracy, precision, recall, and macro average F1 score. Classification accuracy is the ratio of number of correct predictions to the total number of input samples. But for unbalanced data, accuracy or precision is insufficient. For this F1-score which is harmonic mean of sensitivity and precision, provide a value that indicates the overall quality of the technique.

The Inception-v3 is a deep learning model based on convolutional neural networks that has 42 layers, which is used for image classification. The Inception-v3 is a superior version of the basic model Inception V1. Xception is a convolutional neural network that is 71 layers deep. Xception is an extension of the inception architecture which replaces the standard inception modules.

Inception-ResNet-v2 is a convolutional neural network that is trained on more than a million images from the ImageNet database [1]. The network is 164 layers deep and can classify images into 1000 object categories. ResNet50 is a variant of ResNet model which has 48 convolution layers along with one MaxPool and one average pool layer. It has 3.8 × 109 floating point operations.

DenseNet (dense convolutional network) is an architecture that focuses on making the deep learning networks go even deeper, but at the same time making them more efficient to train, by using shorter connections between the layers. All the models were trained for 70 epochs with Adam optimizer (RMSprop could not perform well than Adam), and the results are tabulated in the below tables, and based on this, Xception model has been selected.

Optimizer deployment

SoftMax optimizer is gradient descent method or random gradient descent method, but its loss function is a multi-peak, strong nonlinear function, and the optimal solution is only the local optimal solution rather than the global optimal solution and depends on the gradient.

Testing, training, and validation

The processed dataset was split in the standard ratio of 60:20:20 as testing, training, and validation and was fed into the different deep learning models for result generation and classification of multi class dataset.

Comparison on performance metrics

The efficiency and effectiveness of the machine learning algorithms were tested on the basis of certain performance metrics. Some important performance metrics were utilized in this research to obtain the most efficient algorithm for the particular purpose.

Confusion matrix

The efficiency and effectiveness of the machine learning algorithms were tested on the basis whether a certain classifier correctly predicts the positive class as positive.

True negative (TN): It refers to the number of predictions where the classifier correctly predicts the negative class as negative.

False positive (FP): It refers to the number of predictions where the classifier incorrectly predicts the negative class as positive.

False negative (FN): It refers to the number of predictions where the classifier incorrectly predicts the positive class as negative.

Accuracy (ACC) is calculated as the number of all correct predictions divided by the total number of the dataset. Classification accuracy is defined as the ratio of correct predictions to total predictions made:

$$ACC=\frac{TP+TN}{TP+TN+FN+FP}=\frac{TP+TN}{P+N}$$
(1)

The number of correct positive predictions divided by the total number of positive predictions yields precision (PREC) or positive predictive value (PPV):

$$PREC = TP\;TP + FP$$
(2)

In an imbalanced classification problem with more than two classes, recall is calculated as the sum of true positives across all classes divided by the sum of true positives and false negatives across all classes:

$$Recall=\frac{TP}{TP+FN}$$
(3)

The harmonic mean of precision and recall is the F1 score. F1 score has a range of [0, 1]. It tells the precise and robust nature of the classifier. Precise nature is signified by the number of instances it successfully classifies, and robustness is measured by not missing a significant number of instances:

$$F1\;score=2\ast\frac1{\frac1{precision}+\frac1{recall}}$$
(4)

The macro average is the arithmetic mean of the individual class related to precision, memory, and F1 score. Macro average scores are used when all classes are to be treated equally to evaluate the overall performance of the classifier against the most common class labels.

Results

Tables 1, 2, 3, 4 and 5.

Result analysis

  1. 1.

    Xception algorithm shows the best performance in terms of all metrics as can be seen in Tables 1, 2, 3, 4, and 5. It achieves the macro F1 score of 80% and accuracy 81%.

  2. 2.

    Resnet50 shows the worst performance in terms of all metrics and achieves macro F1 score of 4% and accuracy of 16%.

  3. 3.

    Inception-v3 and Inception-ResNet-v2 shows moderate performance and achieves macro F1 score of 61% (Inception-v3) and 48% (Inception-ResNet-v2) and accuracy of 66% (Inception-v3) and 48% (Inception-ResNet-v2).

  4. 4.

    DenseNet121 reaches accuracy of 46% and macro F1 score of 37%.

  5. 5.

    The confusion matrices of all models are shown in Figs. 2, 3, 4, 5 and 6.

Table 1 Inception-v3 results
Table 2 Inception-ResNet-v2 results
Table 3 Xception results
Table 4 Resnet50 results
Table 5 DenseNet121 results
Fig. 2
figure 2

Confusion matrix of Xception

Fig. 3
figure 3

Confusion matrix of DenseNet121

Fig. 4
figure 4

Confusion matrix of ResNet 50

Fig. 5
figure 5

Confusion matrix of Inception-v3

Fig. 6
figure 6

Confusion matrix of Inception-ResNet-v2

Discussion

As can be seen, the present research has overcome the disadvantages of the previous research and has produced greater accuracy for greater diversity and number of datasets. This research has incorporated the objectives and motivation of our study as described and has proposed a low cost, comprehensive solution for detection and diagnosis of distinct types of dermatological skin abnormalities using deep learning algorithms. The present model is trained with Xception algorithm, and the training was very efficient and accurate for 20 different skin diseases incorporating a very large multi class dataset of more than 10,000 different images. The algorithm deployed was free from any inherent or systemic bias and treated all classes of diseases equally. For feature extraction and training and testing, different deep learning algorithms (Inception-v3, MobileNet, Resnet, Xception) were employed and finally the most efficient one was chosen, using state-of-the-art architecture significantly boosts accuracy by up to 81% (Table 6).

Table 6 Comparison with previous works

Conclusion

Dermatological diseases have become quite prevalent, and the cost of identification with advanced techniques is incredibly high. The present paper provides for a cost effective and a comprehensive solution using deep learning algorithms that is capable of differentiating between 20 different forms of dermatological skin disorders with high degree of precision and accuracy over an enlarged dataset of more than 10,000 images.

Deep learning techniques are used to create a model for predicting skin disorders in this study. It has been discovered that by utilizing the assembling features, a greater accuracy rate can be attained with deep learning algorithms and can also forecast more such diseases than any other prior models developed earlier. Prior models in this area of application of deep learning for skin disease prediction have reported fewer skin diseases with the low levels of accuracy. The present model can forecast up to 20 illnesses using this method with a greater degree of accuracy of 81%. This demonstrates that the potential of deep learning algorithms is enormous for skin disease diagnosis in the real world.

Combining deep learning methods and convolutional neural networks, the proposed system is capable of detecting skin illness with promising results. It can be used to aid people all around the world by precisely predicting skin illness and consulting a dermatologist for the best treatment options. Because the tools are free to use and available to the user, the system can be deployed for free. The developed application is lightweight and may be utilized in machines with modest system requirements. Convolutional neural networks and deep learning algorithms were effectively implemented.