Keywords

1 Introduction

Automatic video data analysis is a very challenging problem. In order to find a particular object in a video stream and automatically decide if it belongs to a particular class one should utilize a number of different machine learning techniques and algorithms, solving object detection, tracking and recognition tasks [16]. A lot of different algorithms, using such popular techniques as principal component analysis, histogram analysis, artificial neural networks, Bayesian classification, adaptive boosting learning, different statistical methods, and many others, have been proposed in the field of computer vision and object recognition over recent years. Some of these techniques are invariant to the type of analyzed object, others, on the contrary, are utilizing aprioristic knowledge about a particular object type such as its shape, typical color distribution, relative positioning of parts, etc. [7]. In spite of the fact that in the real world there is a huge number of various objects, a considerable interest is being shown in the development of algorithms of analysis of a particular object type – human faces. The promising practical applications of face recognition algorithms can be automatic number of visitors calculation systems, throughput control on the entrance of office buildings, airports and subway; automatic systems of accident prevention, intelligent human-computer interfaces, etc.

Gender recognition, for example, can be used to collect and estimate demographic indicators [810]. Besides, it can be an important preprocessing step when solving the problem of person identification, as gender recognition allows twice to reduce the number of candidates for analysis (in case of identical number of men and women in a database), and thus twice to accelerate the identification process.

Human age estimation is another problem in the field of computer vision which is connected with face area analysis [11]. Among its possible applications one should note electronic customer relationship management (such systems assume the usage of interactive electronic tools for automatic collection of age information of potential consumers in order to provide individual advertising and services to clients of various age groups), security control and surveillance monitoring (for example, an age estimation system can warn or stop underage drinkers from entering bars or wine shops, prevent minors from purchasing tobacco products from vending machines, etc.), biometrics (when age estimation is used as a part that provides ancillary information of the users’ identity information, and thus decreases the whole system identification error rate). Besides, age estimation can be applied in the field of entertainment, for example, to sort images into several age groups, or to build an age-specific human-computer interaction system, etc. [11].

In order to organize a completely automatic system, classification algorithms are utilized in the combination with a face detection algorithm, which selects candidates for further analysis [1217]. In this paper we propose a system which extracts all the possible information about depicted people from the input video stream, aggregates and analyses it in order to measure different statistical parameters (Fig. 1).

Fig. 1.
figure 1

A block diagram of the proposed application for video analysis

The quality of face detection step is critical to the final result of the whole system, as inaccuracies at face position determination can lead to wrong decisions at the stage of recognition. To solve the task of face detection AdaBoost classifier, described in paper [18], is utilized. Detected fragments are preprocessed to align their luminance characteristics and to transform them to uniform scale. On the next stage detected and preprocessed image fragments are passed to the input of gender recognition classifier which makes a decision on their belonging to one of two classes («Male», «Female»). Same fragments are also analyzed by the age estimation algorithm. The proposed gender and age classifiers are based on non-linear SVM (Support Vector Machines) classifier with RBF kernel. To extract information from image fragment and to move to a lower dimension feature space LBP features are utilized.

To estimate the period of a person’s stay in the range of camera’s visibility, face tracking [1922] algorithm is used. It is based on Lucas-Kanade optical flow calculation procedure [23].

The rest of the paper briefly describes main algorithmic techniques utilized on the stages of gender and age recognition. The level of gender and age classification accuracy is estimated in real-life situations.

2 Gender Recognition

A new gender recognition algorithm, proposed in this paper, is based on non-linear SVM classifier with RBF kernel. Detected fragments are preprocessed to align their luminance characteristics and to transform them to uniform scale. After that to extract information from image fragment and to move to a lower dimension feature space local binary patterns (LBP) [24] operator is utilized. These simple local features have been proved to show good results in application to face recognition tasks. Their calculation procedure is shown in Fig. 2.

Fig. 2.
figure 2

LBP feature vector extraction procedure

On the first step each pixel is compared with its neighbors. The result of comparison is presented in binary scale. These digits from a given neighborhood (let’s say 3 × 3 pixels) form a binary number which can be presented in decimal format.

On the second stage image is divided into rectangular regions. A histogram of frequencies of emergence of numbers, acquired on the first step, is calculated for each region. The resulted feature vector is a concatenation of histograms from all regions.

The obtained feature vector is transformed using a Gaussian radial basis function kernel using Eq. 1:

$$ k\left( {z_{1} ,z_{2} } \right) = C\,\exp \left( {\frac{{ - \left\| {z_{1} ,z_{2} } \right\|^{2} }}{{\sigma^{2} }}} \right) $$
(1)

Kernel function parameters \( C \) and \( \sigma \) are defined during training. The resulted feature vector serves as an input to linear SVM classifier which decision rule is specified by Eq. 2:

$$ f(AF) = \text{sgn} \left( {\sum\limits_{i = 1}^{m} {y_{i} \alpha_{i} k(X_{i} ,AF) + b} } \right). $$
(2)

The set of support vectors \( \left\{ {X_{i} } \right\} \), the sets of coefficients \( \left\{ {y_{i} } \right\} \), \( \left\{ {\alpha_{i} } \right\} \) and the bias \( b \) are obtained at the stage of classifier training. This is how the proposed gender classifier based on LBP features and SVM was constructed (LBP-SVM classifier).

Both gender recognition algorithm training and testing require big enough color image database. The most commonly used image database for the tasks of human faces recognition is the FERET database [25], but it contains insufficient number of faces of different individuals, that’s why we collected our own image database, gathered from different sources (Table 1 and Fig. 3).

Table 1. The proposed training and testing image database parameters.
Fig. 3.
figure 3

Detected fragments from the proposed image database

Faces on the images from the proposed database were detected automatically by AdaBoost face detection algorithm. After that false detections were manually removed, and the resulted dataset consisting 10 500 image fragments (5 250 for each class) was obtained. This dataset was split into three independent image sets: training, validation and testing. Training set was utilized for SVM classifier construction. Validation set was required in order to avoid the effect of overtraining during the selection of optimal parameters for the kernel function.

For the representation of classification results we utilized the Receiver Operator Characteristic (ROC-curve). As there are two classes, one of them is considered to be a positive decision and the other – a negative. ROC-curve is created by plotting the fraction of true positives out of the positives (TPR = true positive rate) vs. the fraction of false positives out of the negatives (FPR = false positive rate), at various discrimination threshold settings. The advantage of ROC-curve representation lies in its invariance to the relation between the first and the second error type’s costs.

The proposed classifier was compared to AF-SVM algorithm described in paper [10]. AF-SVM was chosen as a reference because it has both high recognition rate and low operational complexity compared to state-of-the-art classifiers [26].

Testing results of the proposed LBP-SVM classifier compared to AF-SVM performance are presented in Table 2 and Fig. 4.

Table 2. Recognition rate of LBP-SVM classifier compared to AF-SVM
Fig. 4.
figure 4

ROC-curves for LBP-SVM and AF-SVM classifiers

Experimental results show that utilization of LBP features for gender recognition improves overall performance by 1.5 % allowing to acquire more than 92 % accuracy.

3 Age Estimation

A lot of research in the area of age classification has been done over last few years [2732]. The proposed age estimation algorithm realizes multiclass classification approach (Fig. 5) where for each age (from 1 to N) a binary classifier is constructed deciding whether a person on input image looks older than the given age or not. Input fragments are preprocessed to align their luminance characteristics and to transform them to uniform scale. Preprocessing includes color space transformation and scaling, both similar to that of gender recognition algorithm. Additionally image normalization was performed by histogram equalization procedure. Transformation to LBP feature space and SVM training procedure are used for binary classifier construction. To predict direct age binary classifier outputs are statistically analyzed and the most probable age becomes the algorithm output.

Fig. 5.
figure 5

LBP-SVM age estimation algorithm block diagram

Training and testing require a huge enough color image database. We used state-of-the-art image databases MORPH [33], FG-NET [34] and our own RUS-FD database of real-life test images which low (60 × 60 pixels on each face) resolution (Table 3). Faces on the images were detected automatically by AdaBoost face detection algorithm.

Table 3. Face databases for age estimation algorithms learning and testing

To test age estimation algorithms performance standard metrics were calculated:

  • Mean Absolute Error (MAE) – mean absolute difference between estimated and real ages.

  • Cumulative Score (CS) – the probability that estimated age lies within an interval dx from real age.

  • Probability Density Function of age estimation error.

To estimate the proposed algorithm in real-life situation testing firstly performed on FG-NET database. Age on FG-NET database was marked manually by a group of experts to compare subjective estimation with the algorithm performance. The corresponding dependences for LBP-SVM algorithm simulation are presented in Figs. 67, and 8.

Fig. 6.
figure 6

MAE on FG-NET database for the proposed age estimation classifier

Fig. 7.
figure 7

CS on FG-NET database for the proposed age estimation classifier

Fig. 8.
figure 8

Error probability density function on FG-NET database

The proposed algorithm shows results comparable to the subjective evaluation in a range of ages from 20 to 35 years. The average absolute error in this range is about 6 years old. Accuracy of LBP-SVM algorithm decreases on senior ages because of MAE grows. In this range (45–60 years), the proposed algorithm yields an expert evaluation approximately 10–15 years in terms of average error.

Cumulative score shows that around 40 % of estimations have less than 5 years deviation from true age and 70 % - less than 10 years deviation. Subjective evaluation curve in Fig. 7 give us the possible limit for future age estimation algorithm improvement.

Analysis of the error probability density function shows that the proposed algorithm has close to symmetric error distribution. Objective results are not inclined to overestimate the true age, which is typical for the evaluation of experts.

MAE and CS comparison for LBP-SVM algorithm on different test databases is presented in Fig. 9 and Fig. 10.

Fig. 9.
figure 9

MAE comparison on different databases for LBP-SVM algorithm

Fig. 10.
figure 10

CS comparison on different databases for LBP-SVM algorithm

Total MAE score of LBP-SVM algorithm on RUS-FD database is 6.94, MORTH database – 7.29, FG-NET database – 7.47. Subjective estimation MAE is 4.2 indicating that the proposed algorithm still needs much improvement to show results comparable to a human. The possible ways to improve the accuracy of age classifier are feature set expansion (utilization of a combination of different feature transforms), cost-sensitive SVM learning procedure utilization, pre-processing and post-processing steps efficiency improvement.

4 Overall Performance Comparison

The proposed audience analysis system is compared to its commercial analog – Intel Audience Impression Metrics Suite (Intel AIM Suite). Experimental setup was the following: an input video stream from IP-camera (Axis M1014) was split into two and analyzed simultaneously by Intel AIM Suite and by the proposed system.

During the experiment a group of people including men and women have been walking in front of the camera imitating difficult situations of movement such as partial occlusion and temporary disappearance. The following metric was proposed to compare algorithms performance (Eq. 3):

$$ K\frac{D}{N}, $$
(3)

where D is the total number of misclassified objects on testing video sequence, and N – the total number of frames. Testing results are presented in Table 4. Experimental results show that Intel AIM Suite seriously overestimates the number of people during people count while the proposed system has higher classification accuracy.

Table 4. Audience analysis system comparison results

5 Conclusion

The system, described in this paper, provides collection and processing of information about the audience in real time. It is fully automatic and does not require people to conduct it. No personal information is saved during the process of operation. A modern efficient classification algorithm allows to recognize viewer’s gender with more than 92 % accuracy.

The noted features allow applying the proposed system in various spheres of life: places of mass stay of people (stadiums, theaters and shopping centers), transport knots (airports, railway and auto stations), digital signage network optimization, etc.