Abstract
NAO humanoid robots are being used in many human-robot interaction applications. One of the important existing challenges is developing an accurate real-time face recognition system which does not require to have high computational cost. In this research work a real-time face recognition system by using block processing of local binary patterns of the face images captured by NAO humanoid is proposed. Majority voting and best score ensemble approaches have been used in order to boost the recognition results obtained in different colour channels of YUV colour space, which is a default colour space provided by the camera of NAO humanoid. The proposed method has been adopted on NAO humanoid and tested under real-world conditions. The recognition results were boosted in the real-time scenario by employing majority voting on the intra-sequence decisions with window size of 5. The experimental results are showing that the proposed face recognition algorithm overcomes the conventional and state-of-the-art techniques.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
1 Introduction
Automatic face analysis is playing an important role in human-robot interaction [1,2,3,4,5]. It includes face detection and localisation, face recognition, facial expression and gender recognition. The main problem to be solved in face recognition is finding competent descriptors for the appearance of the face. Holistic methods such as principal component analysis (PCA) and linear discriminant analysis (LDA) are widely known and studied but have their limitations in accuracy and computational complexity [6,7,8,9,10,11,12].
Illumination variation robustness is among important features that a face recognition system should have [13, 14]. Local descriptors have earned attention of researchers due to their robustness to challenges such as illumination and pose changes. Local binary pattern (LBP) is such a descriptor and is one of the best performing texture descriptors [15]. Its key superiority compared to other descriptors are LBP’s computational efficiency and invariance to monotonic grey-level changes, which makes it opportune for demanding image analysis tasks. When using texture methods it is not rational to try to create a holistic description of a face because large-scale relations would not contain useful information. Also, the experimental results showed that facial images can effectively be presented as a configuration of micro-patterns. This is why local feature-based perspective is chosen [16,17,18].
Compared to other local descriptors LBP was equal to or performed slightly better than Texton histogram and clearly superior to Difference histogram as well as Homogeneous texture descriptors in recognition rates [19,20,21].
LBP methods have also been studied for facial expression recognition and were found to compete well with other state-of-the-art techniques [16]. For instance, LBP was found to be faster and less memory intensive when compared to the Gabor-filter while achieving recognition performance in the same range [22, 23].
Like other face analysis problems, acquiring a competent representation of the original face images is critical for adequate gender classification. If poor features are selected, even the finest classifier could be unsuccessful in obtaining sufficiently accurate recognition rates. Being an efficient method for summarising the local pattern of a picture, LBP has been utilised for determining the gender of a person based on face analysis [24,25,26,27]. As not all regions of an image contain discriminative information to recognise faces various methods such as AdaBoost, support vector machine (SVM), and nonlinear SVM are being used in order to extract the important features [26, 28, 29].
Real-time face recognition has been implemented in many ways and have been adopted on various stationary and mobile devices [30,31,32,33].Viola and Jones suggested a system that reduces the computation time for face detection while maintaining high accuracy [32]. However, while being efficient in controlled conditions, it becomes inaccurate after \({\pm }15^{\circ }\) of rotation in plane and \({\pm }45\) out of plane. Also, when lighting conditions are unfavourable either the detection rate drops or the computational cost increases. In addition, a system has been proposed that based on Compute Unified Device Architecture accelerates the recognition process by parallel computing yet it still requires a considerable amount of processing power [34].
The main focus of this paper is implementing a real-time face-recognition system on NAO humanoids [35], which have limited computational capacity, therefore, reducing computational complexity is playing an essential role. A way to compensate for the limited processing ability of individual devices is using cloud computing servers to do the costly data processing and send the results back to the device. For this, MOCHA mobile-cloudlet-cloud architecture has been used, demonstrating that the technique is feasible but not sufficiently quick yet [36]. Although the approach is certainly promising, we are proposing a stand-alone technique whose computational complexity is considerably lower than many state-of-the-art techniques and can be easily adopted on the NAO humanoids.
Existing real-time face detection and recognition methods require abundant processing power which the NAO humanoid doesn’t have. Therefore, we need one with less computational complexity and added robustness and the LBP approach fulfils these demands.
The remainder of this paper is organised in the following structure. The proposed face recognition algorithm is described in detail in Sect. 2. Section 3 presents the experimental results and includes the discussion. Lastly, Sect. 4 concludes the paper.
2 The proposed real-time face recognition algorithm
In this research work, the proposed real-time face recognition system has been adopted to NAO humanoid platform. Firstly within a frame a face image has been detected by using the Viola–Jones face detection [37]. It is important to note that the frame acquiesced by NAO camera is in YUV colour space. Then the local binary patterns (LBP) of detected face image in each of Y, U, V colour channels have been created [15]. The LBP face is then divided into blocks of \(16\times 16\) and the probability distribution functions (PDF) of each block has been calculated. A PDF is calculated based on the histogram of a given block. Histogram presents the occurrence of every pixel intensity. Pixels in an image can have intensity values that range from 0 to 255, which is why histograms have 256 values on the X-axis. The Y-axis displays the number of pixels that have a given intensity value. A PDF is obtained by dividing the occurrence of each intensity by the total number of pixels. as shown in Eq. 1
where H represents histogram values and P is a set of PDF values. The PDFs of blocks in a given colour channel have been concatenated in order to make one PDF for the LBP face in that colour channel. The obtained PDF has been compared to the PDFs of faces in the database using the Kullback–Leibler divergence (KLD) in the corresponding colour channel, finding the most likely match as shown in Eq. 2
where P is a set of PDFs of the training images and Q is the PDF of the query image subject to match to an existing one in the database.
In order to make the real-time real-world scenario, the algorithm is divided into two parts—training and testing. In the interim of training the robot will take several photographs per person in YUV which is the default colour space for NAO, ascertains their faces [37] and produces respective histograms for each colour channel that are thereupon used to generate probability distributions and saved to the database, as illustrated in Fig. 1.
For testing, the course of action is initially identical—images are taken and histograms composed in the aforementioned method. Anon, histograms of individual colour channels are correlated to the database and leading matches are found via KLD, as illustrated in Fig. 2.
The optimum matches for each channel are then fused by majority voting principle [38] such that when majority of channels agree on a match, it is regarded as the correct one. There are three versions of majority voting (MV), where the choice is made by one of the following options: (1) that everyone agrees on (unanimous voting); (2) that receives more than 50% of the votes (simple majority); and (3) the one with the maximum number of votes regardless of whether or not they exceed 50% (plurality voting). In this work the plurality voting is adopted.
In the case when all channels give a different answer, the result with the best score (BS) is used. BS is identified by finding the minimum KLD value between corresponding KLD values of chosen classes in each channel, as shown in Eq. 3.
where \(\zeta _{Y}, \zeta _{U}\) and \(\zeta _{V}\) are corresponding KLD values. MV, as expected, is boosting the results as in the worst case scenario, in which each colour channel results in to different class, BS method has been adopted. In the next section, the recognition rates obtained by BS and MV are shown.
Finally, the robot presents its conclusion as to who the person in consideration is.
Algorithm 1 describes the summary of proposed real-time face recognition system.
3 Experimental results and discussion
In this work two different setups were used. Firstly, the proposed method has been adopted in a way to achieve recognition rate of well-known databases. In this part the effectiveness of the proposed method is investigated. Secondly, a real-time setup is prepared in which a new database have been produced using NAO humanoid top camera. Then, real-time setup has been used in order to conduct recognition of the people in front of the robot. The details of the setup and the recognition rates are described in the below subsections.
3.1 Experimental results on well-known databases
For comparison purposes the proposed face recognition algorithm has been tested on Essex University face database [39], the facial recognition technology (FERET) database [40], and the Head Pose (HP) face database [41]. As NAO humanoids are acquiring the images in YUV directly, in order not to introduce additional computational complexity to the algorithm, all the recognition process is conducted in only Y, U and V channels.
Essex face database includes 150 different classes with varying illumination and background and each class has ten different samples. In each experiment the training face samples of a class have been selected randomly and the recognition results shown in Table 1 are the average of 500 iteration of the proposed algorithm. In order to fuse the decisions obtained in each of the aforementioned colour channels best score algorithm and majority voting have been adopted. Table 1 also represents the recognition rates of the best score and majority voting.
The FERET database includes 50 different classes, each class having ten samples, with varying facial expressions and poses, including frontal and profile poses. Unlike the Essex University database which mostly includes pictures of only the facial area of a person and in less than ten classes some shoulder area along with the face, the FERET database images always have some shoulder area and therefore less of facial area per image. Most importantly, the portraits in the FERET database have up to \(90^{\circ }\) of rotation to either side, making them extremely difficult to recognise or even detect [37]. As expected, the results, shown in Table 2.
The Head Pose database includes 15 different classes with up to \(90^{\circ }\) on head pose rotation to either side, each class has ten samples. The results were superior to the results of the FERET database, due to smaller number of classes, but not as good as with the Essex University database because of existence of changes in head poses per class. The results, shown in Table 3.
3.2 The real-time scenario’s setup
In order to be able to handle the real-world scenarios, which is the main goal of this research work, a new database has been created. The database consists of two main parts: (1) 310 images of 31 people with ten different poses each, which have been acquired by NAO upper camera, as the set up is shown in Fig. 3; (2) series of video streams where randomly selected people from the database started to walk on a predefined marked path starting from 6 m away from the NAO humanoid till they reach the robot, as the setup is shown in Fig. 4. Both the image database and videos were created using NAO’s camera.
The aim of the video streams is to not only validate the proposed real-time face recognition technique, but also to analyse the effect of the distance on the recognition performance. The lightening condition in the video streams is approximately the same as any typical room lightening. While making the image database the background of the faces were uniform but for the video streams the background was scenes in a laboratory. Figures 5 and 6 are showing some samples of faces in the prepared NAO’s camera based database and some frames of one of the video streams, respectively.
3.3 The experimental results for the real-time scenario
The correct recognition rate for the prepared face database is given in Table 4. The videos depicted people walking towards the robot, while the robot was sitting on a table. The person started approaching from a distance of 6 m and stopped right in front of NAO, in which case the person’s face was usually no longer in frame. Because Viola–Jones was unable to detect the face when the person was further than 2 m, the recognition rate has been calculated only for the interval of 2–0.5 m from NAO. The recognition results for six classes that existed in both databases are shown in Table 5.
In order to boost the recognition rate on the real-time scenario, we are also employing majority voting on intra-sequence decisions with the window size of 5. Here we are voting among the last 5 decision made by the robot in order to make the final decision. By introducing this MV on intra-sequence decisions, NAO declares the class as soon as the 5th decision has been made. This boosting increases the recognition rate and the results are shown in Table 6.
4 Conclusion
In this work a real-time face recognition system for NAO humanoid was introduced. The proposed method was using block processing of local binary patterns of the face images captured by NAO humanoid. The correct recognition rate was boosted by using majority voting and best score ensemble approaches on decision obtained in Y, U and V colour channels. Also a database was made which was acquired by NAO humanoid’s camera. In this paper also the effect of distance on recognition of people by NAO humanoid was studied and it was sown that the robot cannot recognise people who are more than 2 m away from the it. The recognition results was boosted in the real-time scenario by using MV on intra-sequence decisions with window size of 5.
References
Murphy, R. R., Nomura, T., Billard, A., & Burke, J. L. (2010). Human–robot interaction. IEEE Robotics & Automation Magazine, 17(2), 85–89.
Xue, Y. (2016). Recent development in analog computation: A brief overview. Analog Integrated Circuits and Signal Processing, 86(2), 181–187.
Anbarjafari, G., & Aabloo, A. (2014). Expression recognition by using facial and vocal expressions. In V&L Net 2014, p. 103.
Modares, H., Ranatunga, I., AlQaudi, B., Lewis, F. L., & Popa, D. O. (2017). Intelligent human–robot interaction systems using reinforcement learning and neural networks. In Y. Wang & F. Zhang (Eds.), Trends in control and decision-making for human–robot collaboration systems (pp. 153–176). Berlin: Springer.
Noroozi, F., Sapiński, T., Kamińska, D., & Anbarjafari, G. (2017). Vocal-based emotion recognition using random forests and decision tree. International Journal of Speech Technology, 20(2), 239–246.
Ding, C., & Tao, D. (2016). A comprehensive survey on pose-invariant face recognition. ACM Transactions on Intelligent Systems and Technology (TIST), 7(3), 37.
Anbarjafari, G. (2013). Face recognition using color local binary pattern from mutually independent color channels. EURASIP Journal on Image and Video Processing, 2013(1), 6.
Barreto, J., Menezes, P., Dias, J. (2004). Human–robot interaction based on haar-like features and eigenfaces. In Robotics and automation, 2004. Proceedings. ICRA’04. 2004 IEEE international conference on (vol. 2, pp. 1888–1893). IEEE.
Ahmed, M. T., Amin, S. H. M. (2015). Comparison of face recognition algorithms for human–robot interactions. Jurnal Teknologi, 72(2), 1–6.
Yan, H., Ang, M. H, Jr., & Poo, A. N. (2014). A survey on perception methods for human-robot interaction in social robots. International Journal of Social Robotics, 6(1), 85–119.
Lu, J., Plataniotis, K. N., & Venetsanopoulos, A. N. (2003). Face recognition using LDA-based algorithms. IEEE Transactions on Neural Networks, 14(1), 195–200.
Rasti, P., Uiboupin, T., Escalera, S., Anbarjafari, G. (2016). Convolutional neural network super resolution for face recognition in surveillance monitoring. In International conference on articulated motion and deformable objects (pp. 175–184). Springer.
Anbarjafari, G. (2013). Face recognition using color local binary pattern from mutually independent color channels. EURASIP Journal on Image and Video Processing, 2013(1), 1–11.
Zhuang, L., Chan, T.-H., Yang, A. Y., Sastry, S. S., & Ma, Y. (2015). Sparse illumination learning and transfer for single-sample face recognition with image corruption and misalignment. International Journal of Computer Vision, 114(2–3), 272–287.
Ahonen, T., Hadid, A., & Pietikäinen, M. (2004). Face recognition with local binary patterns. In Computer Vision—ECCV 2004 (pp. 469–481). Springer.
Ahonen, T., Hadid, A., & Pietikainen, M. (2006). Face description with local binary patterns: Application to face recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 28(12), 2037–2041.
Zhao, Y., Jia, W., Rong-Xiang, H., & Min, H. (2013). Completed robust local binary pattern for texture classification. Neurocomputing, 106, 68–76.
Liu, L., Lao, S., Fieguth, P. W., Guo, Y., Wang, X., & Pietikäinen, M. (2016). Median robust extended local binary pattern for texture classification. IEEE Transactions on Image Processing, 25(3), 1368–1381.
Liu, G.-H., Zhang, L., Hou, Y.-K., Li, Z.-Y., & Yang, J.-Y. (2010). Image retrieval based on multi-texton histogram. Pattern Recognition, 43(7), 2380–2389.
Cyril Höschl, I. V., & Flusser, J. (2016). Robust histogram-based image retrieval. Pattern Recognition Letters, 69, 72–81.
Beheshti, I., Demirel, H., Farokhian, F., Yang, C., Matsuda, H., Initiative, A. D. N., et al. (2016). Structural mri-based detection of Alzheimer’s disease using feature ranking and classification error. Computer Methods and Programs in Biomedicine, 137, 177–193.
Shan, C., Gong, S., & McOwan, P. W. (2009). Facial expression recognition based on local binary patterns: A comprehensive study. Image and Vision Computing, 27(6), 803–816.
Cament, L. A., Galdames, F. J., Bowyer, K. W., & Perez, C. A. (2015). Face recognition under pose variation with active shape model to adjust gabor filter kernels and to correct feature extraction location. In Automatic face and gesture recognition (FG), 2015 11th IEEE international conference and workshops on (vol. 1, pp. 1–6). IEEE.
Sun, N., Zheng, W., Sun, C., Zou, C., & Zhao, L. (2006). Gender classification based on boosting local binary pattern. In Advances in Neural Networks—ISNN 2006 (pp. 194–201). Springer.
Lian, H.-C., Lu, B.-L. (2006). Multi-view gender classification using local binary patterns and support vector machines. In Advances in neural networks—ISNN 2006 (pp. 202–209). Springer.
Shan, C. (2012). Learning local binary patterns for gender classification on real-world face images. Pattern Recognition Letters, 33(4), 431–437.
Shyam, R., & Singh, Y. N. (2015). Face recognition using augmented local binary pattern and Bray Curtis dissimilarity metric. In Signal processing and integrated networks (SPIN), 2015 2nd international conference on (pp. 779–784). IEEE.
Huang, G. B., Mattar, M., Berg, T., & Learned-Miller, E. (2008). Labeled faces in the wild: A database for studying face recognition in unconstrained environments. In Workshop on faces in ’real-life’ images: Detection, alignment, and recognition.
BenAbdelkader, C., & Griffin, P. (2005). A local region-based approach to gender classification from face images. In Computer vision and pattern recognition-workshops, 2005. CVPR workshops. IEEE computer society conference on (p. 52). IEEE.
Bartlett, M. S., Littlewort, G., Fasel, I., & Movellan, J. R. (2003). Real time face detection and facial expression recognition: Development and applications to human computer interaction. In Computer vision and pattern recognition workshop, 2003. CVPRW’03. Conference on (vol. 5, p. 53). IEEE.
Song, Y., Bao, L., Yang, Q., & Yang, M.H. (2014). Real-time exemplar-based face sketch synthesis. In European Conference on Computer Vision (pp. 800–813). Springer.
Viola, P., & Jones, M. J. (2004). Robust real-time face detection. International Journal of Computer Vision, 57(2), 137–154.
Dantone, M., Gall, J., Fanelli, G., Gool, L. V. (2012). Real-time facial feature detection using conditional regression forests. In Computer vision and pattern recognition (CVPR), 2012 IEEE conference on (pp. 2578–2585). IEEE.
Meng, R., Shengbing, Z., Yi, L., & Meng, Z. (2014). CUDA-based real-time face recognition system. In Digital information and communication technology and it’s applications (DICTAP), 2014 fourth international conference on (pp. 237–241). IEEE.
Tarvas, K., Bolotnikova, A., & Anbarjafari, G. (2016). Edge information based object classification for NAO robots. Cogent Engineering, 3(1), 1262571.
Soyata, T., Muraleedharan, R., Funai, C., Kwon, M., Heinzelman, W. (2012). Cloud-vision: Real-time face recognition using a mobile-cloudlet-cloud acceleration architecture. In Computers and communications (ISCC), 2012 IEEE symposium on (pp. 000059–000066). IEEE.
Viola, P., & Jones, M. (2001). Robust real-time object detection. International Journal of Computer Vision, 4, 51–52.
Polikar, R. (2006). Ensemble based systems in decision making. Circuits and Systems Magazine, IEEE, 6(3), 21–45.
Spacek, L. (2007). Collection of facial images: Faces94. Computer vision science and research projects, University of Essex, UK. http://cswww.essex.ac.uk/mv/allfaces/faces94.html.
Jonathon Phillips, P., Moon, H., Rizvi, S., Rauss, P. J., et al. (2000). The feret evaluation methodology for face-recognition algorithms. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(10), 1090–1104.
Gourier, N., Hall, D., & Crowley, J. L. (2004). Estimating face orientation from robust detection of salient facial structures. In FG Net workshop on visual observation of deictic gestures (pp. 1–9). FGnet (IST–2000–26434) Cambridge, UK.
Author information
Authors and Affiliations
Corresponding author
Additional information
This work has been partially supported by Estonian Information Technology Foundation, Skype Technologies, Estonian Research Council Grant (PUT638), the Estonian Centre of Excellence in IT (EXCITE) funded by the European Regional Development Fund and the European Network on Integrating Vision and Language (iV&L Net) ICT COST Action IC1307. The authors would like to thank the RoboCup SPL Team of University of Tartu, Philosopher, for helping to conduct real-time experiments and also gratefully acknowledge the support of NVIDIA Corporation with the donation of the Titan X Pascal GPU.
Rights and permissions
About this article
Cite this article
Bolotnikova, A., Demirel, H. & Anbarjafari, G. Real-time ensemble based face recognition system for NAO humanoids using local binary pattern. Analog Integr Circ Sig Process 92, 467–475 (2017). https://doi.org/10.1007/s10470-017-1006-3
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10470-017-1006-3