Weighted Fuzzy KNN Optimized by Simulated Annealing for Classification of Large Data: A New Approach to Skin Detection

Aggarwal, Swati; Bhandari, Lehar; Kapoor, Karan; Kaur, Jaswin

doi:10.1007/978-981-10-8527-7_14

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 799))

Included in the following conference series:

International Conference on Recent Developments in Science, Engineering and Technology

2036 Accesses

Abstract

Machine learning is being used in every field. In almost all technical and financial domains, machine learning is being used enormously from predicting new outcomes to classifying a given data into multiple sets. In this research work it has been tried to build and expand upon previously built binary classifiers to develop a unique classifier for skin detection that separates the given input data into two sets – Skin segment and Non-Skin segment. Skin Detection essentially means detecting in an image or video pixels or regions which are of the skin color. The input data given to the classifier has three attributes - value of the red, green and blue channel. The combination of these three values is the color of the object seen. The classifier classifies the input data into the above two classes on the basis of these attributes. In general, this classifier can be extended to any binary class data.

Access provided by CONRICYT-eBooks. Download conference paper PDF

Combined Supervised / Unsupervised Algorithm for Skin Detection: A Preliminary Phase for Face Detection

Improved Human Skin Segmentation Using Fuzzy Fusion Based on Optimized Thresholds by Genetic Algorithms

Pixel Classification for Skin Detection in Color Images

Keywords

1 Introduction

The classification problem, which is a core concept of Machine Learning aims to classify a test data, as the name itself suggests, to one of the multiple possible classes with the aid of training vectors or samples whose membership is known beforehand. Based on these training samples, and the similarity of attributes of test and training samples, we can predict which classes the test data will belong to. Classification is an example of pattern recognition which can be applied to skin detection [1].

Given a static image or a video, skin detection [2] primarily focuses on locating pixels which are of the skin color. This particularly goes a long way in detecting humans or human body parts in those images. Before that image or video is actually fed to a skin detector, it might have to undergo a change in its color space for better results [3]. The need for skin detection arises from the fact that skin color is something that uniquely identifies humans from other species. Skin detection is finding its use in plethora of fields like biometrics in security systems, medical fields such as in Lasik eye operation, blocking adult images and face detection on social networking sites.

The research work aims to classify new data sets into skin and non-skin segments using Weighted Fuzzy KNN classifier [4] optimized by Simulated Annealing [5]. The data set being used is the Skin Segmentation Data Set from the UCI repository [6]. The data set has three attributes - Blue, Green and Red values of a segment which may or may not correspond to skin color. The three input attributes are given random weights initially which are then optimized through Simulated Annealing to achieve the best accuracy on the training data through repeated iterations. The input attributes of the test data are then given the final weights eventually obtained and the Fuzzy KNN classifier is one last time run on the test data. The predicted outputs of the classifier are then compared with the actual outputs of the test data to find out the accuracy. Apart from this, the images can also be uploaded via the File Picker Window and skin portion can be detected in those images.

2 A Comparative Evaluation of Previous Models

One of the popular machine learning dataset for skin detection is the Compaq dataset used by Jones and Rehg in 1999. It contains 13640 web images, of which 4675 are composed of skin pixels and remaining 8965 aren’t. The results as obtained by them is shown in the below table. It contains true positive and false positive rates for different models proposed by various researchers. Even though the table shows different learning methods used, the true positive and false positive rates obtained give a fair and sufficient idea about the performance of the methods.

BNN [15] demonstrates the best results in the above table in terms of high TPR and low FPR. The parameter based methods like SOM based detector [12] do not perform as good as the other models do.

While the Bayes SPM model uses the Bayes rule to find the probability (P_skin|c) of the pixel, given the concrete value of a color c and qualifies it as a skin-colored pixel if its probability value is above a threshold point, the Single Gaussian uses the joint probability density function for the skin pixel detection. Both the models gave a high value for true positives, 93.4% and 90% respectively while the Single Gaussian model gave a greater value for false positives (33.3%) than that obtained in the former (19.8%). As it is observed, Gaussian model results in a high FPR. The Elliptical Boundary Model on the other hand produces great results, i.e. TPR-90% and FPR- 20.9% on the Compaq database. It proves to be a good substitute to the Gaussian model since its training as well as computation is not time consuming. In conclusion, the Bayes SPM and Maximum Entropy Model outshine the other techniques as can be observed from their low FPR for a given correct detection rate, 19.8% and 8% respectively.

3 Proposed Framework

The proposed model is different from the previous work done since different weights have been used for the three input features which are being optimized by Simulated Annealing. The combination of Fuzzy KNN classifier and Simulated Annealing doesn’t exist previously. Various classification models and algorithms like Mamdani and Sugeno fuzzy models [16], neural networks [17] and KNN algorithm were tried. Out of these, KNN algorithm gave the best accuracy. However, it has a drawback: it gives equal importance to all the neighbors while classifying a test vector [18]. Fuzzy KNN algorithm overcomes this limitation since it gives more importance to closer neighbors than the farther ones, leading to better classification.

Weights to the 3 input features were then assigned since not all the 3 features might contribute equally in classifying a test sample. These weights then had to be optimized to achieve the best possible classification accuracy, precision and recall. Then Simulated Annealing was used as an optimizing algorithm since it allows worse moves to be accepted in certain cases leading to a desirable property: it doesn’t get stuck in local optima [19], leading to good results.

3.1 Preprocessing

The preprocessing step involves random shuffling the data set since the original data set contains all skin segment instances followed by non-skin segment ones. There were no instances with missing values.

3.2 Dividing the Data Set into Training and Test Data

The original data set contains 245057 instances of which 50859 are skin samples and the remaining 194198 are non-skin samples. After random shuffling, 75% of the total sample size was taken as training data and the remaining 25% as the test data. Thus, the training data consists of 183792 instances and the test sample consists of 61265 instances.

3.3 Proposed Model

The proposed model is illustrated below (Fig. 1).

In this proposed model, Fuzzy KNN was used to predict the output class (Skin/Non Skin) for a particular set of pixels. The model uses a feature vector of size 1 × 3 where the three input features were values of the R, G and B color channels in a particular pixel. If the model predicts it to be a skin pixel, it outputs 1, else 0. Since all the input features may not have equal importance in deciding the output class, the weighted features were used. However the weights that need to be assigned to different features were unknown. Hence, initially all of the features were multiplied by random weights. The size of this weight vector was also 1 × 3. The training data was further divided into two sets of ratio 1:1; training 1 and training 2. Training 1 was given training memberships and training 2 wasn’t. Training memberships were assigned on the basis of the class they belonged to. All instances belonging to Skin class were given training membership as [1 0], and those belonging to Non-Skin class were given [0 1]. This set of training memberships was passed to the Fuzzy KNN module. Thus, accuracy was obtained on the training 2 set using training 1 set, their memberships and the set of random weights using Fuzzy KNN algorithm. This set of random weights was optimized by Simulated Annealing to achieve the best possible classification accuracy on the training 2 set.

The training set was quite large. If the model was to run on the entire dataset at once, it would be very time consuming. Hence, during every iteration of Simulated Annealing, the training dataset was randomly shuffled and the first 12000 instances of the shuffled data set were used for training. This ensured that the entire data set was covered for the training purpose.

The model which then contained optimal set of weights was then tested on test data that could predict the output class with high accuracy.

4 Results

4.1 Accuracy

The accuracy used in the classification task was represented as:

$$ Accuracy = \frac{tp + tn}{tp + tn + fp + fn} $$

(1)

Where tp: True Positives, tn: True Negatives, fp: False Positives, fn: False Negatives

True positives indicate the number of positive files which are rightly classified as positive. Similarly True negatives indicate the number of negative files which are correctly classified as negative. False positives denote the number of negative files which are misclassified as positive. False negatives indicate the number of positive files that are misclassified as negative.

4.2 Precision

Precision, for a positive class is the ratio of two values: the number of samples correctly classified by the classifier as part of the positive class, (True Positives) and the total number of samples classified as part of the positive class (the sum of True Positives and False Positives).

$$ Precision = \frac{tp}{tp + fp} $$

(2)

4.3 Recall or True Positive Rate (TPR)

Recall, is defined as the ratio of True Positives and the total number of samples that really are members of the positive class (the sum of True Positives and False Negatives). In classification tasks where the data is imbalanced, precision and recall serve as good parameters to judge the system.

$$ Recall\;or\;True\;Positive\;Rate = \frac{tp}{tp + fn} $$

(3)

4.4 False Positive Rate (FPR)

FPR, also known as the false alarm ratio, is the ratio of the number of negative samples incorrectly classified as positive (false positives) and the total number of actual negative samples, irrespective of what the classifier predicts them to be.

$$ False\;Positive\;Rate = \frac{fp}{fp + tn} $$

(4)

Result. Simulated Annealing algorithm was allowed to run for about 1400 iterations which took about 11 h and the results are tabulated in the following Tables 2 and 3.

5 A Comparative Evaluation of the Proposed Model with the Previous Best Model

The following table gives a comparison of the best method in Table 1, i.e. the BN method by Sebe and Huang [15] in terms of True Positive Rate and False Positive Rate in Table 2. It is to be noted, however, that the BN method used the Compaq dataset and the proposed model used the UCI Skin Segmentation dataset (Table 4).

Table 1. Performance of different models for skin detection as reported on Compaq dataset [7,8,9]

Full size table

Table 2. Results for simulated annealing (accuracy, precision and recall)

Full size table

Table 3. Results for Simulated Annealing (TP, TN, FP, FN)

Full size table

Table 4. Comparison of the proposed model with the previous best model

Full size table

A good classifier is characterized by a high TPR and a low FPR. The proposed model has a TPR higher than that of the previous best model (99.99 > 99.4) and a FPR lower than that of the previous best model (0.13 < 8), and thus is better than the models till date with significant improvements. The weights assigned to the input attributes of the dataset so that every attribute have an equal say in the classification process and also the optimizing algorithm, that is, Simulated Annealing being used to optimize these weights reduced the classification error on the data, thus resulting in building a promising skin classifier as presented in this paper.

6 Limitations

The model detects undesired skin colored areas too which may not actually be skin. Also, since the dataset is large, it takes time to train and test.

7 Real Life Applications of Skin Detection

The process of skin detection contributes immensely to alarm systems. Detection of skin in a human restricted area can trigger an alarm. Face recognition on social media is another important application. Gaming consoles like Xbox also make use of skin detection. It is also a preliminary step in gesture analysis, tracking and content based image retrieval system [20]. Apart from this, skin detection also has an important part to play in enhancing photographs.

8 Conclusion

The aim of the research project was to create a system to detect skin pixels and generate the accuracy in detecting the skin pixels. The proposed novel approach gave a pretty good accuracy of 99.8955%, a precision value of 0.9951 and a recall of 0.9999. Thus the application can be very easily used to detect skin pixels and be used in other applications that use the concept of skin detection as a preliminary step.

References

Alpaydin, E.: Introduction to Machine Learning (2010). [Sl]
Google Scholar
Elgammal, A., Muang, C., Hu, D.: Skin detection-a short tutorial. Encycl. Biom., 1–10 (2009)
Google Scholar
Alala, B., Mwangi, W., Okeyo, G.: Image representation using RGB color space. Int. J. Innov. Res. Dev. 3(8) (2014)
Google Scholar
Keller, J.M., Gray, M.R., Givens, J.A.: A fuzzy k-nearest neighbor algorithm.IEEE Trans. Syst. Man Cybern. (4), 580–585 (1985)
Google Scholar
Van Laarhoven, P.J., Aarts, E.H.: Simulated annealing. In: Simulated annealing:Theory and applications, pp. 7–15. Springer, Netherlands (1987)
Google Scholar
Bhatt, R., Dhall, A.: Skin segmentation dataset. UCI Machine LearningRepository (2010)
Google Scholar
Vezhnevets, V., Sazonov, V., Andreeva, A.: A survey on pixel-basedskin color detection techniques. In: Proceedings of the Graphicon, vol. 3, pp. 85–92, September 2003
Google Scholar
Jones, M.J., Rehg, J.M.: Statistical color models with application to skin detection. Int. J. Comput. Vis. 46(1), 81–96 (2002)
Article Google Scholar
Kakumanu, P., Makrogiannis, S., Bourbakis, N.: A survey of skin-colormodeling and detection methods. Pattern Recognit. 40(3), 1106–1122 (2007)
MATH Google Scholar
Brand, J., Mason, J.S.: A comparative assessment of three approaches topixel-level human skin-detection. In: Proceedings of the 15th International Conference on Pattern Recognition, vol. 1, pp. 1056–1059. IEEE (2000)
Google Scholar
Jedynak, B., Zheng, H., Daoudi, M.: Maximum entropy models for skin detection. In: International Workshop on Energy Minimization Methods in Computer Vision and Pattern Recognition, pp. 180–193. Springer, Heidelberg, July 2003
Google Scholar
Brown, D. A., Craw, I., Lewthwaite, J.: A SOM based approach to skin detection with application in real time systems. In: BMVC, vol. 1, pp. 491–500, July 2001
Google Scholar
Lee, J.Y., Yoo, S.I.: An elliptical boundary model for skin color detection. In: Proceedings of the 2002 International Conference on Imaging Science, Systems, and Technology, June 2002
Google Scholar
Joenes, M., Rehg, J.: Statistical color models with application to skin detection. In: IEEE Conference Computer Vision and Pattern Recognition. In: CVPR, vol. 99, pp. 274–280 (1999)
Google Scholar
Sebe, N., Cohen, I., Huang, T.S., Gevers, T.: Skin detection: a Bayesian network approach. In: Proceedings of the 17th International Conference on Pattern Recognition, ICPR 2004, vol. 2, pp. 903–906. IEEE, August 2004
Google Scholar
Setnes, M., Roubos, H.: GA-fuzzy modeling and classification: complexity and performance. IEEE Trans. Fuzzy Syst. 8(5), 509–522 (2000)
Article Google Scholar
Saha, P., Mandal, R.: Detection of Dengue Disease Using Artificial Neural Networks (2017)
Google Scholar
Imandoust, S.B., Bolandraftar, M.: Application of k-nearest neighbor (knn) approach for predicting economic events: Theoretical background. Int. J. Eng. Res. Appl. 3(5), 605–610 (2013)
Google Scholar
Goffe, W.L., Ferrier, G.D., Rogers, J.: Global optimization of statistical functions with simulated annealing. J. Econom. 60(1-2), 65–99 (1994)
Article Google Scholar
Liensberger, C., Stöttinger, J., Kampel, M.: Color-based skin detection and its application in video annotation
Google Scholar

Download references

Author information

Authors and Affiliations

Netaji Subhas Institute of Technology, New Delhi, 110078, India
Swati Aggarwal, Lehar Bhandari, Karan Kapoor & Jaswin Kaur

Authors

Swati Aggarwal
View author publications
You can also search for this author in PubMed Google Scholar
Lehar Bhandari
View author publications
You can also search for this author in PubMed Google Scholar
Karan Kapoor
View author publications
You can also search for this author in PubMed Google Scholar
Jaswin Kaur
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Karan Kapoor .

Editor information

Editors and Affiliations

University of Arkansas, Fayetteville, AR, USA
Brajendra Panda
GD Goenka University, Gurugram, Haryana, India
Sudeep Sharma
GD Goenka University, Gurugram, Haryana, India
Nihar Ranjan Roy

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Aggarwal, S., Bhandari, L., Kapoor, K., Kaur, J. (2018). Weighted Fuzzy KNN Optimized by Simulated Annealing for Classification of Large Data: A New Approach to Skin Detection. In: Panda, B., Sharma, S., Roy, N. (eds) Data Science and Analytics. REDSET 2017. Communications in Computer and Information Science, vol 799. Springer, Singapore. https://doi.org/10.1007/978-981-10-8527-7_14

Download citation

DOI: https://doi.org/10.1007/978-981-10-8527-7_14
Published: 08 March 2018
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-8526-0
Online ISBN: 978-981-10-8527-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Weighted Fuzzy KNN Optimized by Simulated Annealing for Classification of Large Data: A New Approach to Skin Detection

Abstract

Similar content being viewed by others

Combined Supervised / Unsupervised Algorithm for Skin Detection: A Preliminary Phase for Face Detection

Improved Human Skin Segmentation Using Fuzzy Fusion Based on Optimized Thresholds by Genetic Algorithms

Pixel Classification for Skin Detection in Color Images

Keywords

1 Introduction

2 A Comparative Evaluation of Previous Models