Keywords

1 Introduction

Data on skin quality properties are often assembled and evaluated by a well-trained expert who allocates noticeable skin samples, both live or from photographs, to a recognised quality grade on a predefined grading scale. However, a machine vision approach to assess skin quality properties is useful in providing an objective analysis [1, 2]. This can avoid problems with repeatability and reproducibility, since a professional’s experience and knowledge is subjective and can differ amongst graders. This can also potentially result in reduced cost and more effective analysis while providing a consistent assessment of skin quality [3]. There is great importance in providing objectivity to the dermatologist’s visual evaluation of skin in order to efficiently develop effective pharmaceutical treatments. Recently, several skin assessment methods have been established; for instance, analysis of the skin appearance around pores on the face [4], evaluation of facial wrinkle improvements over time [5], measuring facial wrinkles using quantification methods and automatic detection [6]. Most of these assessments were subjective and revolved around a clinical perspective and a professional’s opinion rather than an objective assessment. Further research is required to understand the definition of skin quality based on human perception. In this work, we have experimented with several conventional machine learning (CML) methods and Convolutional Neural Networks (CNNs) with different parameters and settings to classify spots, wrinkles and normal skin patches. This was followed by a comparison of Support Vector Machine (SVM) [7] and GoogLeNet [8] performances.

The rest of the paper is organised as follows: The related work on classification of skin is given in Sect. 2; Sect. 3 elaborates on the dataset and experimental settings; Sect. 4 evaluates the efficiency of the different classifiers that are the best fit for the proposed purpose, for instance, Sensitivity and F-Measure; in the final section, the conclusion presents the prospects for future work and the limitations of this work.

2 Related Work

Standard machine learning methods have been widely used in several pattern recognition tasks. They have also been used for the detection of skin conditions such as acne [9]. These traditional machine learning methods performed well in many classification tasks. However, they do come with some consequences. For example, ANN (Artificial Neural Network) can be affected by the number of hidden layers, hidden nodes and learning rates. Another disadvantage is that the network has to be extensively trained in order to achieve optimal performance, which is why SVM was chosen for this experiment as a more suitable option. SVM has been used commonly over the last decade [10]. A categorisation of skin texture in early melanoma detection method was implemented using SVM and for skin colour categorisation [11, 12]. However, is the Convolutional Neural Networks (CNNs), which is a deep learning framework, has outperformed other method in image classification domain [10].

Recently, with the rapid growth of deep learning algorithms, they become most effective in classification tasks such as in facial recognition and face tracking. The purpose is to understand hierarchical representations of data by using a deep architecture model [13]. Krizhevsky et al. [14] used a deep convolutional neural network to classify high-resolution images in the ImageNet LSVRC-2010. Therefore, including deep learning in the process would provide better performance and more reliable results for the desired output. The network was trained with a total of 1.2 million images and 1000 different classes with error rates of 39.7% for top 1 and 18.9% for top 5. This illustrates the advantage of using this approach. On the other hand, the data used in that approach do not relate to skin attributes. Andre et al. [15] applied successful deep learning approach of skin cancer to dermatologist level by comparing the network performance against 21 dermatologists. Nevertheless, the research focused on clinical use. Therefore, this work will observe the performance of using CNNs in classification of non-clinical skin features such as spots and wrinkles.

3 Methodology

This section will describe in depth the appropriate datasets available, the two sets of experiments and their settings.

3.1 Dataset

Currently, there are limited datasets available for the analysis of facial skin conditions. An available dataset called DermNet consists of a total of 23000 images of various skin diseases. However, this dataset has two limitations. One limitation was that the data collection was not under a controlled environment, which has caused inconsistencies in the images and affected their integrity as well as their accuracy. Another limitation was that the images were not only of facial skin conditions, but also of different diseased body parts, which are unsuitable for this experiment focusing on the classification of common facial skin conditions. To address these limitations, we proposed an ongoing collection of consistent, high-quality images of faces from a wide demographic and from participants who engage in different social habits. These habits can include, but are not limited to, smoking and alcohol consumption. The dataset currently consists of 164 images of participants with a mean age of 48.43 (standard deviation (SD): 21.44, ages between 18 and 92). There are 25 different self-reported ethnicities in the dataset including African, Arabic, Chinese and Malaysian. The ethnic group with most participants is White British with 119 images. The main reported gender was female with a total number of 107 participants; there were also 56 male participants next and 1 transgender participant. To understand how certain habits can affect a person’s facial skin properties, participants were asked to complete a questionnaire asking if they consumed alcohol or smoked. Overall, 68 participants never drank alcohol, 88 currently drink and 8 used to drink but had stopped. As for smoking, 85 people never smoked, 21 currently smoke tobacco in some form, 1 smokes electronic cigarettes only, 6 had partaken in smoking a few times in their lives and 51 used to smoke but stopped. The images were taken with a Nikon D5300 at a resolution of 4496\(\ \times \ \)3000 to ensure that as much detail as possible on participants’ faces was captured.

Firstly, five expressionless images of each participant were captured at different angles to allow for a full view of the face and its profiles. Next, participants were asked to pose with six different facial expressions which were based on Ekmans [16] universal facial expressions: happiness, sadness, surprise, disgust, anger and fear. The replication of these expressions allows for the dataset to include within it some variation in the way each participant’s facial skin changes due to natural expressions. Being able to differentiate between actual wrinkles and ridges caused by expression lines would be extremely useful when analysing facial conditions in the future, as would the ability to distinguish between changes caused by natural expressions and deformities caused by other reasons like aging and social habits. The dataset is an on-going project of data collection. Therefore, in the near future, the dataset is likely to increase in size.

Skin patches were also collected. These were of size 100\(\ \times \ \)100 and consisted of three categories: normal skin, skin with spots and skin with wrinkles, as illustrated in (Fig. 1). The spotted skin class has different stages of spots, inflamed and non-inflamed. The wrinkled skin class has two different types of wrinkles: deep and fine wrinkles, which were taken from different parts of the face. The total number of patches is 325.

Fig. 1.
figure 1

Sample skin patch from each three classes.

3.2 Traditional Machine Learning

In this section, we used traditional supervised machine learning for the classification of three classes (Normal, Spot, and Wrinkle). Since these three classes of skin have major textural differences amongst them, we investigated popular feature extraction techniques including texture descriptors such as Local Binary Patterns (LBP) [17], and Histogram of Oriented Gradients (HOG) [14]. We did not include mutltifractal as texture descriptor due to it is better in representing face features than skin region [18]. In addition, we also used color descriptors such as Normalized RGB, HSV, and L*u*v features. After the feature extraction from images, we used the machine learning classifier Sequential Minimal Optimization (SMO) to train Support Vector Machine (SVM) for classification task.

3.3 Deep Learning

The Caffe framework [19] was chosen to implement the state-of-the-art CNNs architecture of GoogLeNet. The intention was to provide improvements on the existing model AlexNet when it comes to classifying ImageNet [8]. This model contains 22 layers, compared to AlexNet and CaffeNets [20]. To investigate the best optimisation algorithm for skin patches classification, we tested a number of solvers such as Stochastic Gradient Descent (SGD), Nesterovs Accelerated Gradient (NAG) and Adaptive Gradient (AdaGrad). SGD is one of the most commonly used approaches for large-scale machine learning tasks [21]. AdaGrad presented strong experimental performance on real-world complications, which were tested under different parameters as follows: Each optimizer was tested on the default setting using 30 epochs and 0.01 learning rate. On the second tested set, the number of epochs was increased to 60 and the learning rate was kept the same. For the last tested set, the learning rate was decreased to 0.001 and the number of epochs was kept at 60 [22]. Since the data starts to converge, there is no need to increase the number of epochs.

4 Results and Discussion

In this section, we present the results for various classification experiments on our face dataset of 164 images. These high-resolution images were manually split into the three pre-defined classes of skin patches with 100 \(\times \) 100 resolution. For this 3-class classification, we divided the dataset of skin patches into 70% for the training set and 30% for the testing set. We adopted the 10-fold cross validation technique to create 10 test cases with a total of 325 images of skin patches. We then divided the number of images equally from the three categories of skin patches. Thus there were 228 skin patches for the training dataset, 97 for testing set. As for evaluation. We chose a number of popular performance measures such as Sensitivity, False Negative Rate (FNR), F-Measure. Recall, Precision, Matthews Correlation Coefficient (MCC), and Accuracy. We investigated both traditional machine learning and extensive deep learning techniques to carry out the classification experiments.

Table 1. SVM resuts
Table 2. GoogLeNet results.

Tables 1 and 2 show the classification results achieved with the help of traditional machine learning techniques and deep learning techniques respectively. It is clearly illustrated that with proper parameter tuning, the deep learning techniques outperformed the traditional machine learning ones within the dataset used. Though deep learning techniques usually require a large dataset to train models for classification, this limited dataset was still able to achieve an accuracy rate of 85% and Sensitivity, Recall, Precision and MCC rates of 0.854, 0.856, 0.856, and 0.779 respectively. This is promising since traditional machine learning techniques are only able to get the best accuracy of approximately 74%. NAG is a first order method and has a distinctive mechanism compared to gradient descent in certain conditions in terms of convergence rate [21]. This predicts the gradient for the next epoch and updates the learning rate for the existing iteration based on the predicted gradient. Therefore, if the gradient is increased for the next set, the learning rate for the present iteration would be higher. Conversely, if the gradient is low, it would slow down the learning rate. In this experiment, the solver received the highest accuracy with 60 epochs and default learning rate.

5 Conclusion

We presented a dataset that is suitable for facial skin analysis. Our experiments showed the potential for using CNNs in classifying skin attributes. Thus far, GoogLeNet using NAG outperforms the other optimisers used in the experiments. Although the data collection was under a controlled environment and had high-resolution images, it is limited to three categories. Therefore, an expansion of the data is needed.

To improve the classification accuracy for non-clinical skin images, future research involves conducting experiment to understand human perception in classifying skin types and collect more data. We are also interested in comparing the performance of the experts and non-experts [23], in this case, the differences between dermatologists to non-dermatologists.