Keywords

1 Introduction

Segmentation of color image regions that contain skin pixels is a very important and challenging task of modern image processing. The general objective of described problem is to return the output image, whose every pixel was classified as either representing skin or not [1, 2]. Such information can be then used in various applications in computer vision [3, 4]. Important and interesting tasks, where skin segmentation is required, are automatic face [5,6,7] or gesture detection [8,9,10]. Additionally, many of the most effective filters of adult-only content are based on the information from the segmented skin regions [11, 12]. Image coding using regions of interest as presented in [13], is another very important example of the application of skin segmentation algorithms.

The most common approach to skin detection problem is pixel-wise, color-based classification [4]. In such methods each pixel is being classified independently from its neighbours only on the basis of its color features [14, 15]. Such discrimination between skin and non-skin pixels can be performed using skin color model represented either as a set of rules or thresholds or derived from used machine learning algorithm [4]. However, it must be noted that basing solely on the color information may not be sufficient for the task. It is a well known fact, that most popular color spaces such as the RGB and HSV or perceptually uniform CIELab suffer from many shortcomings (for more details see [16]). In order to improve the quality of pixel-based skin detection many approaches have been proposed. Among these worth mentioning are texture-based methods, adaptation techniques and spatial analysis. A very detailed description of this techniques can be found in [4].

In this research a direct, pixel-based approach was chosen. The main purpose was to evaluate the performance in this specific task of two very popular machine learning classifiers: Regularized Logistic Regression and Artificial Neural Network with Regularization trained with Backpropagation [17]. The research focused mostly on the aspect of model’s error evaluation and parameter tuning. Developing new algorithms [18,19,20,21] and computer simulation [22] are classical means to achieve progress of the technology.

2 Data Description

Data used in this research was ‘Skin Segmentation Dataset’ provided for the UCI Machine Learning Repository [23]. The dataset consists of 50859 examples marked as skin samples and 194198 non-skin samples. Available features are pixel’s values in B, G and R channels, coded with 8 bits. The skin dataset was collected by random sampling of these values from images of various gender, age and race groups obtained from Color FERET Image Database and PAL Face Database from Productive Aging Laboratory.

All 245057 available examples were randomly divided into three subsets. The training set was created by randomly selecting \(60\%\) of all skin samples and \(60\%\) of samples from the other class. That way an original proportion between both classes has been sustained. Analogously, the cross-validation and test sets were separated from the remaining examples. Each of these consisted of \(20\%\) of all skin and non-skin examples, sampled without repetition.

3 Description of Algorithms

3.1 Logistic Regression with Regularization

The goal of the regression is finding a set of parameters \(\varTheta \in \mathbb {R}^{n+1}\) (where n goes for the number of features and additional dimensionality stems from the bias term) that minimizes cost function presented in Eq. 1 [24]:

$$\begin{aligned} J(\varTheta ) = \frac{1}{M} \sum \limits _{i=1}^{M} \big ( h_{\varTheta }(x^{(i)})-y^{(i)}\big )^2 \end{aligned}$$
(1)

where M is the total number of training examples, defined by input variables (features) \(x \in \mathbb {R}^{M \times (n+1)}\) and output variables \(y \in \mathbb {R}^{n+1}\) (labels). Because of the dimensionality of a single example \(x^{(i)} \in \mathbb {R}^{n+1}\) the vectorized notation of the hypothesis \(h_{\varTheta }(x^{(i)})\) can be written as follows (we assume that \(x_0=1\) for each example):

$$\begin{aligned} h_{\varTheta }(x^{(i)}) = \varTheta _0x_0 + \varTheta _1x_1 + \cdots + \varTheta _nx_n = \varTheta ^Tx^{(i)} \end{aligned}$$
(2)

For the binary classification task the preferred output of the hypothesis function would be either 0 or 1. To enable that a slight modification of the hypothesis function is required. For that a sigmoid (logistic) function is used. The improved hypothesis is presented in Eq. 3:

$$\begin{aligned} h_{\varTheta }(x) = \frac{1}{1+e^{-\varTheta ^Tx}} \end{aligned}$$
(3)

In order to assign bigger penalization to predictions that differ highly from the required output hypothesis \(h_{\varTheta }(x)\) should be additionally logarithmized. To avoid problems with algorithm overfitting the training data the regularization of \(\varTheta \) parameters (apart from bias-related \(\varTheta _0\)) [24] was introduced in the form of \(\lambda \) multiplier. The final form of the minimized cost function of the Regularized Logistic Regression method used for the binary classification is presented in Eq. 4:

$$\begin{aligned} J(\varTheta ) =- \Big [ \frac{1}{M}\sum \limits _{i=1}^{M}\big ( y^{(i)}log \big (h_{\varTheta }(x^{(i)})\big ) + (1-y^{(i)})log \big (1- h_{\varTheta }(x^{(i)})\big ) \big ) \Big ] \end{aligned}$$
(4)

Finding optimal parameters \(\varTheta \) can be performed iteratively with the use of gradient-based numerical optimization techniques such as Gradient Descent or Conjugate Gradient. For such methods to work the derivative of cost function with respect to each parameter must be calculated and provided. However, it must be noted that the \(\varTheta _0\) parameter should not be regularized. Therefore, the rule for upgrading this parameter in iteration \(p+1\) presented in Eq. 5 does not take the regularization term into account.

$$\begin{aligned} \varTheta _0^{(p+1)} = \varTheta _0^{(p)} - \alpha \frac{1}{M}\sum \limits _{i=1}^{M}\Big ( h_{\varTheta }(x^{(i)})-y^{(i)} \Big )x_0^{(i)} \end{aligned}$$
(5)

For other parameters \(\varTheta _j\), where \(j=\{1,2,\ldots ,n\}\), the formula for finding their improved values in new iteration \(p+1\) using Gradeint Descent based method with step \(\alpha \) is presented in Eq. 6:

$$\begin{aligned} \varTheta _j^{(p+1)} = \varTheta _j^{(p)} - \alpha \Big [\frac{1}{M}\sum \limits _{i=1}^{M}\Big ( h_{\varTheta }(x^{(i)})-y^{(i)} \Big )x_j^{(i)} + \frac{\lambda }{M}\varTheta _j^{(p)} \Big ] \end{aligned}$$
(6)

3.2 Artificial Neural Network Model with Regularization

A typical Artificial Neural Network consists of structures known as layers [24]. Among them distinguished is the input layer and the output layer. Other ones are remotely referred to as hidden layers. Each layer is constructed of basic calculation units called neurons. If layer j has \(s_j\) neurons and layer \(j+1\) has \(s_{j+1}\) units then the weights of connections between neurons in particular layers \(\varTheta ^{(j)} \in \mathbb {R}^{s_{j+1} \times (s_j+1)}\). The process of Neural Network’s output calculation is known as the Forward Propagation where the vector of neuron’s activations in layer j \(a ^{(j)} \in \mathbb {R}^{s_j+1}\) (with added bias) is being calculated as was presented in Eq. 7. In first, input layer we treat inputs as activation, as in \( a^{(1)}=x^{(i)}\).

$$\begin{aligned} a^{(j+1)}=g(\varTheta ^{(j)}a^{(j)}) \end{aligned}$$
(7)

In proposed model a sigmoid activation function \(g(\varTheta , a)\) was used for each neuron (Eq. 8).

$$\begin{aligned} g(\varTheta , a) = \frac{1}{1+e^{-\varTheta ^{(j)T} a^{(j)}}} \end{aligned}$$
(8)

The cost function minimized by the Artificial Neural Network with Regularization of weights is presented in Eq. 9, where K stands for the number of classes, L is the total number of layers in the network [24].

$$\begin{aligned} J(\varTheta ) = - \Big [ \frac{1}{M}\sum \limits _{i=1}^{M} \sum \limits _{k=1}^{K} \big ( y_k^{(i)}log \big (h_{\varTheta }(x^{(i)})\big )_k + (1-y_k^{(i)})log \big (1- h_{\varTheta }(x^{(i)})\big ) \big )_k \Big ] + J_{reg}(\varTheta ) \end{aligned}$$
(9)

where \(J_{reg}(\varTheta )\) is the regularization term (Eq. 10) [24].

$$\begin{aligned} J_{reg}(\varTheta ) = \frac{\lambda }{2M}\sum \limits _{l=1}^{L-1} \sum \limits _{i=1}^{s_l} \sum \limits _{j=1}^{s_{l+1}} \big ( \varTheta _{ji}^{(l)}\big )^2 \end{aligned}$$
(10)

The most popular procedure of Artificial Neural Networks training is the Backpropagation algorithm. The detailed description of the algorithm can be found in [25].

4 Error Model Evaluation

Before the selection of the best parameters for the method it is important to properly evaluate its error on training and cross-validation datasets. Such effort can help to determine whether the model is capable of explaining the variance in the data properly without overfitting to the training examples. In this research, the base input of classification algorithm’s consisted of three features related to each pixel’s (example’s) value in one of the RGB color space channels. All features were scaled to the range [0, 1]. The reason for that operation is the use of the Nonlinear Conjugate Gradient based method for optimization of algorithms’ cost functions. Providing that features are on the similar scale improves convergence of this method.

4.1 Learning Curves

In order to determine whether any of the proposed models struggles with high bias or variance problem the adequate learning curves were calculated as the function of classification error’s dependence on the number of samples. High bias errors on training and cross-validation sets would converge with the increase of samples provided for training. However, at some point they will both set on a relatively high value. Such behaviour indicates that examined model does not explain the classes sufficiently. Models with high variance are characterized by low error achieved during classification of training set and higher number of misclassified samples in cross-validation set. The reason for that is the overfitting of the model to the training dataset. Formula used for error calculation is presented in Eq. 11.

$$\begin{aligned} J_{err}(\theta ) = \frac{1 }{m}\sum \limits _{i=1}^{m}(h_\theta (x^{(i)})-y^{(i)})^2 \end{aligned}$$
(11)

Learning curves for both models are presented in Figs. 1 and 2. They were calculated on the basis of the performance of classifier trained on the increasing number of \(m \le M\) samples. Because error of testing set classification was measured only on m examples it tends to increase with m. The size of cross-validation dataset remained unchanged during the whole testing.

The learning curves calculated for the Regularized Logistic Regression classifier are presented in Fig. 1. It can be noted that the increase of training samples does not reduce the classification error of cross-validation data. Additionally, both error functions converge to a relatively high value. These characteristics are a clear indicators that the model suffers from high bias problem. This means that it is not capable of properly describing the classes present in the data.

Fig. 1
figure 1

Learning curve of Regularized Logistic Regression model (\(\lambda = 0\))

Fig. 2
figure 2

Learning curve of Artificial Neural Network Model (\(\lambda = 0\))

Analysis of the curves presented in Fig. 2 shows that the Artificial Neural Network model does not suffer from high bias or variance. After the sufficient number of training samples is provided it can be observed that errors calculated on training and cross-validation sets converge to common value. This proves that proposed model generalizes well on the distinct features of the data. Additionally, low values of errors acquired for higher numbers of samples indicate that it is able to properly explain the differences between classes.

4.2 Tuning of Regularized Logistic Regression Model

Error analysis for Logistic Regression Model performed in the previous subsection revealed that it suffers from high bias. The best solution of this problem would be to add more features, that would help to better describe the data. In this research an addition of some higher order polynomial features (up to third degree) was proposed. Assuming that the feature vector for i-th example (with n features and unitary bias \(x_0\)) is of form \(x^{(i)} = [x_0^{(i)} x_1^{(i)} \ldots x_n^{(i)}]^T\), then addition of higher order would mean providing all the combination of features (except for bias) up to required order, i.e. \(x_1x_2, x_2^2, x_1x_2x_3, x_3^2x_1, etc\). Presented in Fig. 3 are the learning curves calculated for the extended set of features. Achieved results prove that the extension of feature vector is the solution of high bias problem for this case. For the final tuning of the parameter the best weight of the regularization term was selected.

Fig. 3
figure 3

Learning curve of Regularized Logistic Regression model with polynomial features of up to third order (\(\lambda = 0\))

Fig. 4
figure 4

Validation curve of Regularized Logistic Regression model presenting influence of regularization parameter \(\lambda \) on classification error

Finally, an additional tuning of the regularization weight was applied. Basing on the results presented in Fig. 4 \(\lambda = 3\) was chosen as the best regularization parameter for the classification of testing data.

4.3 Tuning of Artificial Neural Network Model

The Artificial Neural Network Model trained with Backpropagation did not exhibit any signs of high bias or variance despite it being very basic with only one hidden layer. In order to find the best number of neurons in hidden layer model’s classification error was calculated for training and cross-validation datasets with regularization parameter \(\lambda =1\). Thanks to such high regularization weight, achieved results were oriented mostly towards bias, not variance validation. Achieved validation curve is presented in Fig. 5.

Fig. 5
figure 5

Validation curve of ANN presenting the influence of hidden layer neurons on classification error (\(\lambda = 1\))

According to achieved results the best number of hidden neurons is 15. For that value the additional tuning was performed in order to select appropriate regularization parameter \(\lambda \). Basing on the results presented in Fig. 6 \(\lambda = 0.001\) was chosen as the best regularization parameter for the classification of testing data.

Fig. 6
figure 6

Validation curve of ANN (with 15 neurons in hidden layer) presenting influence of regularization parameter \(\lambda \) on classification error

5 Results

Both models with the best settings selected in Sect. 4 were used for the classification of test data. However, it must be noted that after a careful evaluation of both model’s errors it was determined that the Artificial Neural Network classifier proved to perform significantly better. For each model a confusion matrix was created on both cross-validation and test dataset. Because in this case proportion between classes is skewed evaluating algorithms only on the basis of their accuracy would not be meaningful. Having that in mind, classifier’s precision, recall and specificity were also calculated. Precision refers to a number of True Positive cases over all cases classified as Positive (here as skin samples). To describe the ratio between True Positives and actual Positive examples, the Recall can be used. Specificity is the equivalent of the Recall for the Negative class. In Table 1 presented are results achieved for cross-validation and test sets for both proposed models.

Table 1 Performance of both proposed classifiers evaluated on cross-validation and test data

Examination of the results achieved by the Artificial Neural Network model shows that it’s performance on the testing and cross-validation datasets is nearly perfect. Very high values of all used measures are the indicators that not only the accuracy of the model is high, but also the general quality of classification. On the other hand, the performance of the Logistic Regression is less stable. Although it achieved comparable accuracy and slightly higher recall, it is significantly less precise. The lacking precision of skin segmentation algorithms is often listed as one of the most important issues of pixel-based detectors.

To additionally test the performance of the Artificial Neural Network model it was tested and visually examined on the real images that did not belong the ‘Skin Segmentation Dataset’ provided for the UCI Machine Learning Repository. In Figs. 7 and 8 presented are results of segmentation of the images from PUT database. The PUT face database [26] contains high resolution color face images of 100 people acquired on a uniform background under controlled lighting conditions.

Fig. 7
figure 7

Results of skin segmentation performed on real image: a original image, b manually segmented ground truth image, c result of segmentation with Regularized Logistic Regression, d result of segmentation with Artificial Neural Network

Fig. 8
figure 8

Results of skin segmentation performed on real image: a original image, b manually segmented ground truth image, c result of segmentation with Regularized Logistic Regression, d result of segmentation with Artificial Neural Network

Analysis of the segmentation results proves the high quality of the developed classifier. The visual examination of the output image reveals that all the most important information as well as some really deep details of the original image were captured, thus providing a very satisfactory, precise and accurate skin segmentation result. It must be noted that the algorithm allowed the occurrence of some False Positive classification in the right side of the image. The most problematic regions of the image are related to areas where hair connects with skin. Because color features of this regions correspond belong to the developed skin model it is impossible to avoid classifying them as Positive cases. However, such inaccuracies can be easily removed with the use of some image postprocessing techniques mentioned in the Sect. 1 of this paper. To furtherly evaluate a quality of the segmentation the Jaccard similarity coefficient [27] was calculated for both images. The obtained values were 0.897 for Regularized Logistic Regression and 0.920 for Artificial Neural Network model for the first image and respectively 0.943 and 0.948 for the second image. The Jaccard index is a statistic commonly used for the evaluation of the similarity and diversity of two sets. For a full similarity the Jaccard index would take the value of 1. High values achieved for Artificial Neural Network classifier correspond with visual evaluation of the results.

6 Conclusions

In this paper the performance of two prominent classification algorithms was evaluated and compared in the task of skin pixel detection. It was proven that the exploratory data analysis approach can produce some highly satisfying results without the need of referring to some image specific methods, especially during low-level processing.

Achieved results indicate that the Artificial Neural Network is capable of extracting some hidden features and structures in the data. This can be best observed when used classifiers are compared. The Regularized Logistic Regression’s performance was poor until some additional features were created and provided as extra inputs to the classifier. At the same time the Artificial Neural Network (with only one hidden layer) performed significantly better using only basic features. This holds an important advantage over approaches where features must be designed by the data scientist during model creation. The reason for that is, that such automatic feature extraction is not constrained by the invention of the designer.

The further improvements to the method can still be applied in order to achieve even better results. An addition of different features derived from other known color spaces could prove to be a valuable extension of the method. Additionally, some advanced postprocessing like texture-based models, adaptation techniques and spatial analysis could be introduced to the proposed algorithm.