1 Introduction

Classical biometry offers a natural and reliable solution for establishing the identity of an individual. The use of human physical and behavioral characteristics has been increasingly adopted in security applications due to various advantages, such as universality, robustness, permanence and accessibility. Currently state-of-the-art intrusion detection and security mechanism systems include meanwhile by default at least one biometric trait. The latest addition of soft biometry inherits a main part of the advantages of classical biometry and furthermore endorses by its own assets.

The beginnings of soft biometrics science were laid by Alphonse Bertillon in the nineteenth century, who firstly introduced the idea for a personal identification system based on biometric, morphological and anthropometric determinations [57]. He used traits like colors of eye, hair, beard and skin; shape and size of the head; general discriminators like height or weight and also description of indelible marks such as birth marks, scars or tattoos. A great majority of those descriptors fall at the present time into the category of soft biometrics. Jain et al. first introduced the term soft biometrics to be a set of characteristics that provide some information about the individual, but are not able to individually authenticate the person, mainly due to lack of distinctiveness and permanence [31]. Later on, the work in [32] additionally noted that soft biometrics are not expensive to compute, can be sensed at a distance, do not require the cooperation of the surveillance subjects and have the aim to narrow down the search from a group of candidate individuals. Moreover we here note that the human compliance of soft biometrics is a main factor, which differentiates soft biometrics from classical biometrics offering new application fields.

New definition

Soft biometric traits are physical, behavioral or adhered human characteristics, classifiable in pre-defined human compliant categories. These categories are, unlike in the classical biometric case, established and time-proven by humans with the aim of differentiating individuals. In other words the soft biometric traits instances are created in a natural way, used by humans to distinguish their peers. We note that the human compliant labeling is referred to as semantic annotation in [59].

Based on this new definition in this paper we proceed to ascertain candidate soft biometric traits and to elaborate advantages, relations and limitations among traits. We identify hereafter two novel traits, weight and clothes color, which involve both anthropometric measures as well as accessories attributes. We present associated estimation methods and provide results on their accuracy.

The plethora of soft biometrics related benefits motivates the application examination of employing solely soft biometric traits with the purpose of human identification. This approach is new and has several advantages over classical biometry human identification, as non obtrusiveness, computational and time efficiency to name a few. Towards the named application we create a Bag of Soft Biometrics (BoSB) by a previously presented set of facial soft biometrics and the new introduced traits, weight and clothes color. We analyze the identification potential of the facial, body and accessory system and proceed to show very promising results on the full Bag of Soft Biometrics.

The paper is organized as follows. Section 2 introduces a candidate list of soft biometric traits and portrays pertinent advantages and limitations. The same section also identifies former related work on soft biometric traits. In Section 3 three application scenarios are provided. Along with the already known and established efforts on fusion and filtering for/with hard biometrics, a novel application, namely human identification, is motivated. In Section 4 a set of six facial soft biometrics is elaborated, for which estimation algorithms are featured, along with the related experimental results. Section 5 establishes two novel soft biometric traits, weight and color of clothes, providing related experimental results of reliability and performance. Finally, towards a statistical analysis of a holistic body and face soft biometric system, Section 6 presents a framework for human identification, the related relevant parameters and design aspects. An example for a specific soft-biometric system reveals an intuition on the potential and the relation between complexity and performance.

2 Soft biometrics

Soft biometrics have gained more and more interest of the biometry and other communities for various reasons, like the need for higher reliability in biometric systems and the great number of advantages coming along with the integration of soft biometric traits in systems. In this chapter we want to provide an overview of soft biometric traits, their classification, the related advantages and limitations. Furthermore we take a look at work already performed on soft biometrics traits or systems integrating soft biometric traits.

2.1 Soft biometric traits

We here proceed to illustrate (see Table 1) a range of characteristics which accept the above stated definition. The presented soft biometric traits list is not exhaustive and will increase with technological progress. In a first attempt to differentiate between soft biometric traits we firstly identify the affiliation to face, body or accessory categories. We here note that even though classically accessories do not belong to biometry, the above stated definition clearly includes them in the category of soft biometrics. A further argumentation can be the intuitive human use of obvious accessory items as a mean of description and discrimination, for example “the person in the red shirt”. Further significant factors for classifying soft biometric traits are distinctiveness and permanence. Distinctiveness is the strength with which a trait is able to distinguish between individuals. Beard as an example has a low distinctiveness, since it can only be applied to the male part of the population and furthermore has binary categories. The latter points out a certain correlation between distinctiveness and nature of value. Continuous traits are in general more distinctive than discrete and moreover binary ones. In this context we want to mention the difference between nature of value and human labeling of traits. While hair color has different nuances and is thus of continuous character, humans tend to label it for convenience purposes as discrete. This approach will as well be followed by soft biometric estimation algorithms, detecting for example hair color in categories (black, blond, brown, etc.) rather than RGB values.

Table 1 Table of soft biometric traits

The permanence of a trait plays a major role for the employable application. As an example an application, where identification within a day is required will accept low permanent traits like age, weight or clothes color.

The final subdivision subjective perception stands for the ability of humans to unambiguously identify specific soft biometric traits. Again the nature of value plays an important role, since characteristics with binary categories, are generally more straightforward to be sensed than continuous ones. Increased subjective perception of discrete or continuous traits is further due to the not well-defined categories or the different beholder’s percipience. In fact the notion of soft biometrics bares subjectivity even in the decision of the nature of value. With other words, colors can be argued to be continues, due to the huge variance in nuances blending into each other, or discrete due to the fact that colors can be described by discrete RGB values.

We note that the classification of soft biometric traits can be expanded and aspects like accuracy and importance can be evaluated or deduced respectively, depending on the cause for specification (e.g. suitability for a specific application).

2.2 Characteristics, advantages and limitations

Soft biometrics has carried in some extent the attributes of classical biometrics over, as the idea of identification management based on who you are is still pursuit. The traits provide weak biometrical information about the individual and correspondingly have inherited the predicates to be universal, measurable and acceptable; the trait’s detection algorithm(s) performance should be able to meet the application’s requirements. To a certain degree also the aspects uniqueness, permanence and circumvention play a role for soft biometrics, but are treated to a greater extend flexible.

Recently, soft biometric traits have been employed to preliminary narrow down the search of a database, in order to decrease the computational time for the classical biometric trait. Another application approach is to fuse soft biometrics and classical biometric traits to increase the system reliability. Soft biometrics impart systems substantial advantages: they can be partly derived from main detected classical biometric identifier, their acquisition is non obtrusive and does not require enrolment; training can be performed in advance on individuals out of the specific identification group. Summarizing soft biometric traits typically are:

  • Human compliant: Traits are conform with natural human description labels.

  • Computational efficient: Sensor and computational requirements are marginal.

  • Enrolment free: Training of the system is performed off-line and without prior knowledge of the inspected individuals.

  • Deducible from classical biometrics: Traits can be partly derived from images captured for primary (classical) biometric identifier (e.g. eye color from iris images).

  • Non intrusive: Data acquisition is user friendly or can be fully imperceptible.

  • Identifiable from a distance: Data acquisition is achievable at long range.

  • Not requiring the individual’s cooperation: Consent and contribution from the subject are not needed.

  • Preserving human privacy: The stored signatures are visually available to everyone and serve in this sense privacy.

The plethora of utilities related to soft biometrics comes along with limitations, namely the lack of distinctiveness and permanence. A system consisting of fused soft biometric traits can overcome the lack of distinctiveness associated with a single trait. The lack of permanence affects and designates feasible applications.

2.3 Former work on soft biometric traits

Here we want to outline work pertinent to soft biometrics. This overview does not claim to be an exhaustive state of the art, but rather a highlight selection on performed scientific studies.

2.3.1 Facial soft biometrics

Former work on soft biometrics has been performed predominantly with the aim of preprocessing. In face recognition for person identification, for instance, beard detection and removal serves an improvement of recognition results, disregarding the information of the presence of beard.

Color based facial soft biometrics

The color based facial soft biometric traits (eye, skin, and hair color) are the most obvious facial identifiers, mentioned primarily by humans, when portraying unknown individuals. Challenges for skin classification are on the one hand the low spread of different skin colors in color space, and as a consequence, on the other hand the high illumination dependance of classification. Latter is described in various skin locus papers, for example in [25].

Hair color is detected by similar techniques like skin color and often researched along, but has more broadly scattered color categories. In [60] a method for human head detection based on hair-color is proposed through the use of Gaussian mixture density models describing the distribution of hair color. In [24] the fuzzy theory is used to detect faces in color images, where two fuzzy models describe the skin color and hair color, respectively.

Eye color detection, unlike the other color based facial soft biometrics is a relatively new research topic. Few publications offer insight (e.g. [14]), probably due to the fact that 90% of humans possess brown eyes. An advantage of eye color detection is the availability of all necessary information in images used for iris pattern analysis, with other words iris color is a free side effect. Work on fusion between iris texture and color can be found in [71], where the authors fuse iris and iris color with fingerprint and provide performance improvement in respect with the unimodal systems. In [56] iris color is used to successfully support an iris indexing method.

Beard and moustache detection

Presence of beard and moustache are not appearing in literature as an identification trait, but rather as an obstacle for face recognition, which is why their removal is performed as a preprocessing step. As an example, in [38] a beard removal algorithm from bearded images is shown using the concept of structural similarity and coordinate transformations.

Age

Age plays an important role for long time employable systems based on face or body and is a challenging and relatively new field. An interesting study on face changes over time can be found in [54], which spans a biometric, forensic, and anthropologic review, and further discusses work on synthesizing images of aged faces. In [67] the authors distinguish children from adults based on the face/iris size ratio. Viola–Jones face detection technique [64] is used, followed by an iterative Canny edge detection and a modified circular Hough transform for iris measuring, with good results. In [52] the authors observe facial skin regions of Caucasian women and build partial least square regression models to predict the chronological and the perceived age. They find out that the eye area and the skin color uniformity are the main attributes related to perceived age.

Gender

Gender perception and recognition has been immensely researched already in social and cognitive psychology work in the context of face recognition. From image processing point of view, the topic offers as well myriads of approaches. The latest efforts employ a selection of fused biometric traits to deduce gender information. For example in [6] gait energy images and facial features are fused and classified by support vector machines. Another approach in [58] proposes a combined gender and expression recognition system by modeling the face using an Active Appearance Model, feature extraction and finally linear, polynomial and radial based function based support vector machines for classification. The work in [3] proposes using adaboost on several weak classifiers, applied on low resolution grey scale images with good results. Matta et al. [46] present a novel multimodal gender recognition system, based on facial appearance, head and mouth motion, employing the means of a unified probabilistic framework.

Ethnicity

Ethnicity recognition is an ethically and sociological hot debated trait, once again relevant for face recognition. In the context of ethnicity a uniquely defined classification is a difficult and important task. For recognition of Asian and non-Asian faces in [44] machine learning framework applies a linear discriminant analysis (LDA) and multi scale analysis. A further framework, integrating the LDA analysis for input face images at different scales, further improves the classification performance. In the paper [28] an ethnicity recognition approach is based on Gabor Wavelets Transformation, combined with retina sampling for key facial features extraction. Finally support vector machines are used for ethnicity classification providing very good results, even in the presence of various lighting conditions.

Facial measurements

Facial measurements were early on found as very distinctive and helpful in the context of facial recognition [51]. Later studies continue employing facial measurements, and apply on 3D [7].

Recent work on facial soft biometrics is performed on scars, marks and tattoos by the authors in [42].

2.3.2 Body soft biometrics

Height, gait, body weight and color of clothes concern the body and are the main traits that can be extracted from a distance. The best distinctiveness is provided by gait detection, which is why gait occasionally is referred to as a classical biometric.

Gait

Gait is a complex pattern that involves not only some anthropometric parameters but also behavioral information. It is one of the few traits that can be gathered at a distance. A preliminary experiment on gait analysis is presented in [36], where the author uses lights attached to the joints of the human body to record subjects’ gait models. The author demonstrates how observers can recognize walking people familiar to them just by the light traces. Since 1970’s many other authors were interested in the topic of automatic gait recognition: in [65] a spatio-temporal signature is extracted by the moving silhouette, a principal component analysis is employed later to discard irrelevant information and finally supervised pattern classification techniques are performed in the lower-dimensional eigenspace. For recognition with this analysis both the structural and behavioral characteristics of gait are captured. Another interesting work is proposed in [59], where gait is chosen as primary biometric trait to be coupled with “semantic biometrics”, that seems to be a very similar concept to soft biometrics. The system merges the results of the signature generated by gait with the one generated by the semantic information in order to identify users of the biometric system. A recent approach based on soft biometrics is provided in [49]

Height

For automatical height estimation foreground and background recognition is necessary, which can be adopted by diverse silhouette extraction techniques used for gait recognition. Height is a trait employed for human tracking or as an aid for other algorithms, like gait. Important publications in this context are [11, 33] and [45], where single and multiple calibrated camera systems are used for height estimation, respectively. The estimation is performed via the computation of height related to the real world coordinates estimated in camera images.

Body measures

Work on anthropomeasures was done in [4] and [10] and involve height estimation plus shoulder breadth [4] or height estimation and stride information [10] as for building up a multimodal identification system.

Weight

To the best of our knowledge, the only paper on soft biometrics which involves weight [2], uses a scale to weigh users of a fingerprint recognition system. By exploiting weight and body fat measurements the authors reduce the total error rate of the system by 2.4%. It is clear that weight represents a novel soft biometric trait that still has to be explored especially for what concerns its measurement.

2.3.3 Accessory soft biometrics

The new soft biometrics definition allows the inclusion of accessories among these traits. Accessories can indeed be related to personal characteristics (as sight problems in case of glases), or personal choices (as adornment in case of jewelry).

Eye glasses detection

The forerunner for glasses detection are Jiang et al. [34], performing classically edge detection on a preprocessed gray level image. Certain face areas are observed and an indicator for glasses is searched for. The most successful identifier region for glasses is found to be the nose part of the glasses, between the eyes. A different approach for glasses extraction is employed in [70], where a face model is established based on the Delaunay triangulation. A 3D method to detect glasses frames is presented in [69], where 3D features are obtained by a trinocular stereo vision system. The best results on glasses detection up to now are achieved on thermal images [26].

2.3.4 Combined soft biometrics

Since soft biometrics is individually not distinctive and permanent, a combination of traits could overcome those limits. In this context, many recent papers deal with fusion of classical biometry and soft biometry or exclusively with fusion of soft biometric traits. An example for latter is the work in [53]. The authors propose algorithms for gender, body size, height, cadence, and stride using a novel gait analysis tool. In [18] height, and appearance are extracted from videos and exploited in a multiple camera video surveillance scenario in order to track the subjects that cross the surveillance network. In [43] a novel approach for recognizing the gender, ethnicity and age with facial images is proposed. The approach is a novel combination of Gabor filter, Adaboost learning and support vector machine classifier. The experiment results of the system based on this approach are reported to show a good performance. A further hybrid classification based on gender and ethnicity is considered in [23] and [22]. The hybrid approach consists of an ensemble of radial basis function networks and inductive decision trees. The authors show robustness and good perfomance. A different approach for analysis in hybrid soft biometric systems is provided in [59], where semantic information (which corresponds to soft biometric classifiers) is manually extracted from a series of videos. Using the analysis of variance the authors select a pool of traits which are considered the most representative. Those traits are then used together with gait information. The authors demonstrate that the additional information provided by the semantic traits increases the performance of the people recognition system based on gait. The authors in [1] go one step further and study the relation of human body measures, which allows for certain applications the prediction of missing body measures. In [62] the authors propose an approach for people search in surveillance data, characterized by three main elements: sensors, body parts, and their attributes. The body parts and attributes are hereby closely related to soft biometrics. In [13] a theoretical analysis of reliability performance of soft biometrics employed for identification is presented. In this work identification errors due to collision are considered. The statistical behavior of soft biometric systems is analyzed in the asymptotic setting of a large number of facial and body feature categories.

3 Domains of application

As already mentioned, most of biometric systems in scientific literature use a single trait for recognition, for which reason they are sometimes called unimodal biometric systems. It is well known that these kinds of systems are affected by problems like noisy sensor data, low permanence and/or lack of distinctiveness of the chosen biometric trait, unacceptable error rates, and spoof attacks. Some of the problems associated with unimodal biometric systems can be overcome by the use of multimodal biometric systems that combine the evidence obtained from multiple sources [27]. A multimodal biometric system based on different biometric identifiers like fingerprint, iris, face, and hand-geometry can be expected to be more robust to noise, to address the issue of non-universality, to improve the matching accuracy, and to provide reasonable protection against spoof attacks. However, such a system will require a longer time of the matching step, thereby causing inconvenience to the users. A possible solution to the problem of designing a reliable and user-friendly biometric system is to use soft biometric signatures to improve the performance of the primary biometric system.

3.1 Fusion with classical biometric traits

A general framework to integrate the information provided by soft biometric signatures with the ones of a primary biometric system is described in Fig. 1. This implementation serves an increase of detection reliability and was suggested in [31].

Fig. 1
figure 1

Framework of integration of soft biometrics to improve the accuracy of classical biometric systems

The suggested framework can be described as follows. Let \(W=\left\{{w_1, w_2, ..., w_n}\right\}\) be the set of the n users enrolled in the database, and let \(\textbf{x}\) be the feature vector corresponding to the primary biometric system. Without loss of generality, we can assume that the output of the primary biometric system is of the form \(P(w_i|\textbf{x})\), i = 1,2,...,n, where \(P(w_i|\textbf{x})\) is the probability that the test user is w i given the feature vector \(\textbf{x}\). Let \(\textbf{y}=\left\{ y_1,y_2,...,y_m \right\}\) be the soft biometric feature vector, the updated probability of user w i , given the primary biometric feature vector \(\textbf{x}\) and the soft biometric feature vector \(\textbf{y}\), \(P(w_i|\textbf{x}, \textbf{y})\), can be calculated using the Bayes rule as:

$$ P(w_i|\textbf{x}, \textbf{y})= \frac{p(\textbf{y}|w_i)P(w_i|\textbf{x}) } { \sum_{j=1}^n p(\textbf{y}|w_{\!j})P(w_{j}|\textbf{x}) } $$
(1)

In the above formulation it is necessary to take into account that all m soft biometric variables are not equally weighted. In practice, some soft biometric variables may contain more information than others. Therefore, it is necessary to introduce a weighting scheme for soft biometric traits based on an index of distinctiveness and permanence, i.e., traits that have smaller variability and larger distinguishing capability will be given more weight in the computation of the final matching probabilities. In the same way, smaller weights should be assigned to the soft biometric traits compared to those assigned to the primary biometric traits. This differential weighting also has another implicit advantage: even if a soft biometric trait of a user is measured incorrectly (e.g., a male user is identified as a female), there is only a small reduction in the posteriori probability and the user is not immediately rejected. In this case, if the primary biometric produces a good match, the user may still be accepted. Only in the case that several soft biometric traits do not match, there is significant reduction in the posteriori probability and the user could be possibly rejected. If the devices that measure the soft biometric traits are reasonably accurate, such a situation has very low probability of occurrence.

The described approach has been followed, for example, in [31]. In this paper the authors show the benefits of using gender, ethnicity and height information of the user in addition to fingerprint. The use of these soft biometric signatures leads to an improvement of approximately 5% over the primary biometric system.

Similar results were obtained in [2], in which the authors combine body weight and fat measurements with fingerprint. The experimental results show that the total error rate improves from 3.9% to 1.5% when body weight score is fused with fingerprint score.

3.2 Pruning the search

Other than improving the performance of a classical biometric system, soft biometric signatures were used in the scientific literature to prune large biometric database in order to improve the search efficiency of the biometric system, as shown in Fig. 2.

Fig. 2
figure 2

Framework of integration of soft biometrics to improve the search efficiency of classical biometric systems

In this scenario the soft biometric signature is used as a side information to filter the original dataset W and to find a subset of the dataset \(Z=\left\{ w_{\!j} \in W | P(w_{\!j}|\textbf{y}) \right\}\). The selected dataset \(Z=\left\{ z_1,z_2,...,z_p \right\}\) will contain the p (with p < = n) elements of W that satisfy the soft biometric information \(\textbf{y}\). The new filtered dataset will then be used in the classical biometric system to find the user identity based on the probability \(P(z_i|\textbf{x})\).

As shown in the figure, filtering refers to limiting the number of entries in a database to be searched, based on characteristics of the interacting user. For example, if the user can somehow be identified as a middle-aged male, the search can be restricted only to subjects enrolled in the database with this profile. This greatly improves the speed or the search efficiency of the biometric system. While filtering reduces the time required for identification, errors in filtering can degrade the recognition performance. This is a delicate aspect to take into account when designing the biometric system.

Wayman [66] proposed as an example the use of gender and age for filtering a large biometric database.

The framework introduced in Fig. 2 can be also useful to tune the parameters of the classical biometric system. Some studies [20, 50] have shown that factors such as age, gender, race, and occupation can affect the performance of a biometric system. For example, a young female Asian mine-worker is seen as the most difficult subject for a fingerprint system. This provides the motivation for tuning the biometric system parameters (dotted line arrow in the figure), such as the threshold on the matching score in a unimodal biometric system or the weights of the different modalities in a multimodal biometric system, to obtain the optimum performance for a particular user or a class of users. Filtering and system parameters tuning require an accurate classification of a user into a particular class or bin (e.g., male or female, blue or brown eyes, Caucasian or Asian or African). This requires a filtering module that can accurately perform this classification.

In this paper we introduce a new application of soft biometry, that is person identification. While it is obvious that the accuracy provided by the use of soft traits is limited with respect to classical biometric systems, in some scenarios they can provide useful information to identify and recognize people, as explained in the next section.

3.3 Human identification

As mentioned above distinctiveness is the power a biometric trait has for distinguishing different subjects. Considering this definition, it is straightforward to understand why distinctiveness is one of the leading characteristics of a biometric identifier. The higher this value is, the lower will be the possibility of the system to confuse two or more identities.

Intuitively it is easy to achieve good distinctiveness by using complex natural patterns, like iris and fingerprint. On the other hand poor distinctiveness and low permanence of soft biometric traits do not allow their individual use as features for the identification process. Hair color, like body weight or height, are features not discriminative enough to unambiguously identify a subject. Indulging the comparison, a single minutia or one ridge, or a small section of the iris, could not be representative and discriminative enough for the recognition process. Only by gathering many small features we are able to build a model of the biometric trait that represents the identity of the subject we want to recognise.

Considering this idea and extending the concept of Bag of Features, we introduce in this section the notion of Bag of Soft Biometrics (BoSB) for human identification. A BoSB computes a signature, constructed by an ensemble of extracted soft biometric features.

The concept of Bag of Soft Biometrics is directly inspired from the idea of Bag of Words [35, 68] and Bag of Features [41] developed under the context of text mining and content based image retrieval. In those cases the “items” of the bag are words occurrences in the first case, features extracted from the image in the second one. In our case the components of the bag are the soft biometric signatures we extract from the visual appearance of the subject.

Analog to fusing small feature components in classical biometry, we will proceed to show how increasing the number of soft biometric traits involved in the computation leads to a rise of the system’s distinctiveness. The identification approach we present is based on a signature composed by soft biometric traits, which can be extracted from images or videos. The general framework of the system is presented in Fig. 3. Here the camera network is connected to the system that extracts the features, which will compose the signature. These features become part of the Bag of Soft Biometrics which can be stored in a database for further availability. The database is accessible by the matching module, which has the purpose to compare the signature of the person we want to identify with the ones stored in the database. One noteworthy aspect is that the signature is human compliant: as an input we can provide a human description, as well as images/videos from a camera to automatically extract an identification signature (see the bottom part of Fig. 3).

Fig. 3
figure 3

The scheme presents the design of an identification system based on soft biometric traits. The bag of soft biometrics can be extracted directly from the image/videos of the subject to identify, or by a description of the physical aspect provided by a human operator

In order to better clarify the possible uses of such a system, two different scenarios are presented hereafter. The first one concerns mainly the use of images or videos to extract the signature. In this case a security agent can exploit the information of the last location of a theft victim to detect the thief in surveillance videos. Once the criminal has been detected at the moment of the theft, his/her Bag of Soft Biometrics can be extracted from the images and used to track him/her over the entire network.

In the second scenario the signature is not extracted from multimedia content but is obtained through the description of the person to be identified. As example let us consider a mother who lost her child in a supermarket. She can provide a description to the security officer, who will send these information into the Soft Biometric identification system. The algorithm will then find the current location of the missing child exploiting the Bag of Soft Biometrics associated to the child description.

As one can observe, the Bag of Soft Biometrics allows to fill the gap between the human and the machine as such signature is human readable by definition. Additionally, the signature extraction does not need the cooperation of the subject. This makes the Bag of Soft Biometrics suitable for all surveillance applications, where such cooperation is impossible or really difficult to achieve, as for the aforementioned scenarios.

In the following sections we will introduce the soft biometric traits for our Bag of Soft Biometrics which include both facial and body related characteristics. We will see that the use of multiple soft biometrics can guarantee enough distinctiveness to differentiate individuals and then to identify them.

4 Set of facial soft biometrics

As elaborated in the previous section higher and more satisfactory distinctiveness can be achieved by using more than one trait, rather than a single trait. Thus we here propose a set of facial soft biometrics that later in Section 6 will be exploited for human identification. In an effort to find a good balance between authentication-reliability and complexity, we here propose a soft-biometric system that focuses on simple and robust detection from a bounded set of traits and their trait-instances. In what follows, we will describe these basic elements, as well as the employed detection algorithms.

In the presented set of facial soft biometric traits, we allocate 6 traits, which we choose and label as shown in Table 2.

Table 2 Table of facial soft biometric traits

We proceed now to specify basic aspects of the detection algorithms that were used for trait-instance identification.

4.1 Detection algorithms

The basic detector consisted of an automatic frontal face and facial features detector, which was partially drawn and modified from the algorithms in [64]. Implementation of the different detection algorithms (see Table 2 for an overview) was performed using OpenCV.Footnote 1

Before describing some basic aspects of the implemented trait detection algorithms, we note few pertinent issues that accompany detection. Regarding coordinate determination, we note that typical eye, skin and hair color detectors require knowledge of the eye coordinates, and similarly hair color detection requires knowledge of the coordinates for the upper head region. The precise computation and extraction of the characteristic regions of interest (ROI) (see Fig. 4) for the eyes, mouth, nose and upper face coordinates, are essential for the subsequent detection. For higher accuracy, only in the training step, all coordinates were manually annotated. The considered ROIs for the selected soft biometric traits are illustrated in Fig. 4. Identification of the ROI was generally followed by acquisition of the Hue, Saturation and Value (HSV) values. We note that the HSV color-space was chosen for being robust to illumination changes, as well as for the fact that it allows for a high degree of independence between the H, S, and V parameters, which renders the system capable to better handle light changes or shadows. Regarding outlier filtering, we used a simple threshold on the HSV values, based on the color standard-deviation σ. This was followed by HSV normalization. Regarding the statistical modelling, the probability density functions of skin, eye and hair color were computed using 3-component Gaussian mixture models whose parameters were estimated using the EM algorithm. Posterior probabilities over the observed HSV vectors for all trained trait instances were computed, followed by a majority vote decision on the detected trait instance.

  1. 1.

    Eye color detection: In this setting, careful and precise consideration of the ROI was particularly important, due to the regions inherently small size. The specific ROIs were retrieved using the circular Hough transformation, followed by pupil and reflection extraction, and then by acquisition of the HSV vectors. Regarding the training step, each eye color group was trained using images from the Ubiris2Footnote 2 database.

  2. 2.

    Hair color detection: The hair color ROI was chosen as a thin bar in the upper head region, as indicated in Fig. 4. Training utilized 30 FeretFootnote 3 images for each of the hair colors, where the annotation was done manually.

  3. 3.

    Skin color: Detection of skin color was done in accordance to the eye coordinates which defined the ROI for the skin color detection to be the area underneath the ocular region. Training utilized 33 Feret images per skin color group, which were again annotated manually.

  4. 4.

    Eye glasses detection: Towards glasses detection, we considered that the areas around the eyes can be searched both for hints of glasses as well as for glass reflections. Challenges related to the fact that glasses frames are either occasionally absent, or that they often resemble wrinkles, brows, shades and hair. Further challenge came from the fact that illumination variances hindered the appearance of reflections. These challenges were handled by placing emphasis on a ROI corresponding to the nose part of the glasses. The specific algorithm consisted of eye position detection, grey-level conversion, histogram equalization, extraction of region between the eyes, Laplacian edge detection and finally line detection.

  5. 5.

    Beard and moustache detection: In this case, face detection and feature localization were followed by identification of the ROIs. These ROIs include the chin for the beard, and the area between the mouth and nose for the moustache. The color estimation was followed by outlier extraction and HSV normalization. The presence of beard and/or moustache was based on the Euclidean distance between the processed observation and skin and hair-color information respectively. The presence of moustache was determined independently.

Fig. 4
figure 4

ROI for the set of facial soft biometrics. Outlier filtering was a function of the standard deviation σ and the mean μ for each of the H, S and V parameters

Algorithmic dependencies

As it is the case with general optimization problems, identification of algorithmic dependencies endows the system with increased reliability and computational efficiency. Towards this we refer to notable examples of such dependencies, such as that between skin color and glasses where, due to ROI overlapping, the presence of glasses has an impact on the perceived skin color. This information can be utilized and employed by modifying the ROI for skin color detection. Additionally we recall that skin color is employed in the detection of hair, beard and moustache, where furthermore the latter two traits are also contingent upon hair color. Figure 5 sketches further dependencies of the mentioned facial soft biometric traits. Some of these dependencies were partly exploited in the process of detection.

Fig. 5
figure 5

Facial soft biometric traits algorithmic dependencies

4.2 Experimental results

The above introduced algorithms for detection of the chosen facial soft biometric traits are here examined and evaluated. It is to be noted that the traits glasses, beard and moustache are of a binary character, whereas the color based facial traits possess discrete traits instances.

Glasses

Tests for eye glasses detection were performed on a testing set of images of FeretFootnote 4 database. The threshold based algorithm provided a correct detection rate (containing the true positive and true negative rate) of 87.17% (see Table 3) comparable to the results in [34].

Table 3 Glasses, beard, andmoustache detection results. The experiments are conducted on the well known Feret database

Color based facial soft biometric traits: eye, skin and hair color

In the context of the color based facial soft biometrics it is to be noted, that the number of the established classification groups was adjusted to both, the performance and limitations of human perception and estimation capabilities. Results are presented in true positive rates and confusion matrices in Fig. 6. For the latter the values range from white (no confusion) to black (maximum confusion). The diagonal fields correspond to the true positive rates. Eye color results were performed on a testing set containing 5 eye color groups, namely black, brown, blue, gray and green. The images were retrieved from the Ubiris2Footnote 5 database and results are presented in Table 4 and in Fig. 6a. We here briefly note the peak confusion rate between blue and gray eye color, mostly responsible for the overall break-in in the true positive rate. Hair color is classified in 5 groups, black, brown, red, blond and grey. A testing set of Feret images provided the in Table 4 and Fig. 6b presented results. Skin color exhibits low variation in color spaces and thus slightly illumination changes result in wrong classifications. Due to this challenge the limitation of 3 skin color groups was adopted with related results presented in Table 4 and Fig. 6c. The confusions were mostly due to illumination variances and detected shadows, which result in a shift on the skin color shades.

Fig. 6
figure 6

Confusion matrices: a eye color b hair color and c skin color

Table 4 Eye, skin and hair color true positive rates

Beard and moustache detection

Once more a set of Feret images was employed for the validation of beard an moustache. The binary character of the traits (present or not present) is in real images ambiguous, due to various lengths and shapes of beard and moustache. This factor made a unique annotation and then in turn estimation difficult and led to the results shown in Table 3. A small fraction of the wrong detections is due to the not correspondence between hair color and beard/moustache color, which we assumed in the detection algorithm.

This chapter presented six facial soft biometric traits, which will be in the following a part of a Bag of Soft Biometrics, see Section 6. The referred bag will be furthermore constructed by the novel soft biometric traits introduced in the next chapter, and will constitute a tool for human identification.

5 Novel soft biometric traits

As already explained in the introduction, a contribution of this work is the description of two new soft biometric traits: weight and clothes color.

Although marginally introduced as possible soft biometric trait [31], and although being already exploited in [2], to the best of our knowledge no other work on weight relates to soft biometrics exists. As already mentioned in Section 2 the main limitation for its use has probably been the need of specialised sensors and the obliged cooperation of the user, limitations that contrast with the general use of soft biometrics. By using a scale in [2] the authors weigh clients of a fingerprint system. The identification score given by the fingerprint matching is coupled with the one of the weight. It has been proved that the latter one allows to increase the performances in terms of smaller EER. Although the contribution of weight as soft biometric trait has been proved being positive, alternative ways for measuring it had never been explored, that is to say we are bound to use the scale as unique tool for weighing people. Such limitation makes the use of this trait at a distance almost impossible, impairing many applications that could exploit this soft biometric trait. In Section 5.1 we present a preliminary analysis to overcome this limitation and to automatically estimate body weight at a distance. To the best of our knowledge this is the first attempt in this direction.

The second trait we introduce in this section is clothes color. It has never been considered as a soft biometric trait, since it is not a feature belonging to the human itself and thus it cannot be considered bio-inspired. Moreover its variability over time is high also in the same session (a session could be the time a selected human appear in the field of views of a camera network). However, clothes color is probably, together with weight and height, the main signature that the human visual system perceives about people appearance. For this reason this trait can be used to describe and identify human beings especially in complex scenarios like video-surveillance in which it is more difficult to access the face signatures, as introduced in the previous section. In this regards it is clear that clothes color could belong to the list of human compliant soft biometrics. In Section 5.2 we will introduce a new color descriptor designed to this aim and we will analyze its performance for humans identification.

5.1 Weight estimation

In order to study the feasibility of weight estimation, a model has to be found that could perform the assessment from anthropometric measurements.

Anthropometric and forensic studies have for long researched the relations among body parts and human body traits. A clear example is the relation among bones length and height found by [17, 19]. The basic assumption is that a relation can be inferred between height and other anthropometric measurements which are highly correlated with the stature.

Our hypothesis is derived directly from this idea: given a set of measures comprehensive of the whole body (upper and lower part), it is possible to infer a correspondence between such measures and the weight. Following our assumptions we defined a set of anthropometric features related to each part of the body (Fig. 7); altogether they describe the full body (height), the torso (waist), and the limbs (arms and legs measures).

Fig. 7
figure 7

Measures taken into account in weight estimation

Our purpose is to understand the underlying relation between the weight and each of these features, or each of their possible combinations. We propose to create a model that starting from this measurements values, can provide a good estimate of subjects weight.

To the best of our knowledge no work exists exploring the relation among these body measurements and weight. Since a determined relation could not be found, we explored the assumption of a linear dependency. In order to generate our model, we based our approach on multiple linear regression analysis, as this kind of analysis provides a powerful tool for problem fitting.

The equation will then be of the form: y = β 1 x 1 + ⋯ + β p x p ; where y represents the estimate, x the measures of the features, and β the corresponding coefficients. To find the solution, the ordinary least squares method was adopted for minimizing the sum of squared residuals and providing the optima β coefficients which minimize the estimation error.

The rest of this section will be devoted to the introduction of the dataset used in the experiments; subsequently the results of our weight estimation system will be presented.

5.1.1 Experimental results

As already introduced in [8, 29, 47], weight is considered a significant body trait from the medical community in many applications (anesthesia, quality of life index, nutritional situation); for this reason many medical databases include weight information. Together with height it is used to compute the Body Mass Index (BMI) and to identify possible weight disorders. NHANES [9] is one of these datasets, it is unique because of its characteristics: size of the population (more than 28,000 people), and time span analysis (from 1999 to 2005). The purpose of this database was the monitoring of American population, and the assessment of health and nutritional conditions. Data was collected by trained personnel, which guarantees the same measurement conditions, and reasonably excludes the possibility of errors.

To test our weight estimation framework we conducted two different experiments. The first one is related to ideal conditions (i.e. the raw data coming from NHANES dataset), the second one utilises anthropometric measurements estimated directly from the images of a standard resolution camera commonly employed in video surveillance (for this a self-made dataset was recorded).

The first part of our work is the model creation, for which we used the raw data of the NHANES dataset. We consider the use of such data as ideal conditions since the database is not expected to have high measurements error (the data was gathered by trained personnel).

Particularly an analysis was conducted where the fitting was performed while varying the number of involved features from 1 to 7, so that all the possible combinations of features were explored (i.e. 27 − 1 = 127). In a scenario that involves weight estimation at a distance, not all the measures we defined could be available. For this reason, having an estimator for each combination of traits, always allows to assess the weight of the subject within a given range of accuracy.

To perform the experiments we divided the database in training and testing set, respectively 70 and 30% of the available data. Training means finding the β coefficients by minimizing the sum of squared residuals. Hereafter we report as example the estimator that considers all the features at once:

$$\begin{array}{rll}\label{eq:weigthBest} \mbox{weight} &=&-122.27 + 0.48 f_{1} - 0.17 f_{2} + 0.52 f_{3}\\ &&+\, 0.16 f_{4} + 0.77 f_{5} +0.49 f_{6} + 0.58 f_{7}, \end{array}$$
(2)

where f 1, ..., f 7 refers to the list of measurements already described in Fig. 7. All 127 combinations of parameters were evaluated, and the best result (shown in Table 5) was obtained by the combination that involves all features (Eq. 3). This combination provides us a good estimate of subjects’ weight (considered to be ±10% from the medical community) for 93% of the testing set.

Table 5 Performance comparison between our estimation system and the human visual estimation in [47]

The result of our analysis is hereafter compared with experimental results from human estimates. To do so, we report the results of an experiment conducted at the Western Hospital of Melbourne documented in [47]. The subjects of the evaluation were 1,137 patients. Three human estimations were collected during this test: a first self-made by the patient, a second one by the nurses, and a last one by the physicians. The estimation was performed visually, that makes this experiment comparable with the one we propose. For comparison we report in Table 5 the results of the study. Our system’s results clearly outperform human estimations of nurses and physicians and furthermore are comparable with the patients’ estimations, which we assume closest to the reality. Thus our approach and the related results can be considered promising.

In order to measure the estimation performance of our system in a real case scenario a second experiment was conducted on real images. To the best of our knowledge, databases presenting weight as ground truth and fitting our needs are not available to the community. For this reason we created a test set of images and ground truths (using a scale) of 20 persons (15 male and 5 female), captured by a video surveillance camera to recreate a possible application scenario. The images were taken at a fixed distance from the camera. A total of 40 (profile/frontal) pictures are available in the database. Example images are shown in Fig. 8.

Fig. 8
figure 8

An example of the two poses recorded during our database collection. A frontal and a profile pose are experimented. One should notices how the clothes of the second subject are hiding the real shape of his body

Since important information about the 3D shape of the body is not available, and the body part measures are of paramount importance in our study, an estimate of the anthropometric measurements was performed. For what concerns the circumferences involved in the computation, we considered the width of the body part (upper arm, leg, waist, and calf) as the diameter of the cylinder that approximates that particular body part. This is obviously a straightforward approximation of the real measure that introduces errors in the process, nevertheless the results of our estimation look promising. Several techniques are present in the literature about body parts detection, like [48], and each of them could serve for the purpose of this technique. However, being this a preliminary analysis for this work, a manual tagging of the markers used for length information was performed directly on the images.

Our experiment can be divided in the following steps:

  1. 1.

    Height estimation: we trivially compared the height of the subject with an object of known height within the scene.

  2. 2.

    Approximation of the other measures: for exploiting the estimated height a proportional coefficient was used to estimate each measure as function of the height.

  3. 3.

    Weight estimation: application of our model for weight estimation.

The dataset we are considering is composed by images of dressed persons; for which (in contrast with the NHANES dataset) we can expect errors in the measurements’ approximation. Because of this we do not expect that the condition of “best model” for the previous analysis will still hold. To assess such hypothesis all the 127 combinations were tested once again. The experimental results confirmed our hypothesis. Among all the estimators tested, the one that reports the best results is not taking into account the calf information (f 3) since the trousers of the subjects are commonly large enough to make the body shape underneath the fabric hard to estimate; this affects negatively the performance of the calf contribute.

A summary of our results can be found in Table 6. The average estimate error for this experiment is 4.3%, which confirms that our system is able to estimate the weight from visual clues with an approximation of ±5% of error w.r.t. the real weight of the subject.

Table 6 The estimation results obtained from our database. The error is shown in percentage w.r.t. the real weight of the person

A further important consideration must be done. Usually people refer to the aspect (body build) of someone’s body saying it is normoweight or respectively under/overweight. Weight measure is not enough to establish such a correspondence, as another important information is missing. Indeed we need to know the relation between height and weight to understand how weight is distributed on the whole body. Leveraging again ideas from the medical community, we found it opportune to use the Body Mass Index (BMI), which was introduced by [39] as (\(\text{BMI}=\frac{W}{H^{2}}\)) and is still used for distinguishing people in the aforementioned classes (for the exact values refer to Table 7). For this reason the BMI is the component which will describe the according body characteristic in our Bag of Soft Biometrics.

Table 7 The values that defines the BMI classes

5.2 Clothes color

In this section we introduce the idea of using color, and specifically the clothes color, as a soft biometric information which could be used to identify and recognize people at a distance. Subjects walking across the FOVs of a camera network, indeed, can be identified using the color of the clothes they wear. Generally, in a subject appearance, we can distinguish two main colors (or set of colors) describing the upper and lower parts of the body. Thus the proposed soft biometrics information consists of 2 traits, the torso and the legs color.

In this regard we need to define a color descriptor which should be able to resemble the way the human visual system describes and perceives color. In the following section we will introduce a new approach to describe a robust, illumination and shadows invariant color descriptor.

5.2.1 Probabilistic color histogram

Color perception and understanding is an extremely complicated and nonlinear science and defining a robust color descriptor could be a not easy task. This is particularly true in the case of video-surveillance systems in which we must account for spatially-distributed cameras operating under different lighting conditions and with varying color sensitivity. In order to achieve robustness in a so complex scenario the first step of the proposed approach consists of performing a color quantization based on eleven colors: black, white, red, yellow, green, blue, brown, purple, pink, orange, and grey. These colors are usually referred to as culture colors [5] and describe the set of color terms that can be considered as universal constants among languages and cultures. Culture colors represent the way human way of perceiving and describing colors, thus they seem to be particularly suitable to describe human compliant soft biometrics. One might argue that having a finer quantization may better discern different objects. Berlin et al. [5] showed that finer quantization leads to less reliable color prediction, and can be counter-productive in improving prediction accuracy.

Color quantization in the eleven culture colors is performed in the Lab color space using a fuzzy k-nearest neighbor (KNN) clustering algorithm [40]. In fuzzy clustering, data elements can belong to more than one cluster, and associated with each element n is a membership vector \(\textbf{u}(n)=\left\{u_1(n), u_2(n), ..., u_C(n)\right\}\) describing the strength of the association between that element and all the possible C clusters. The possible clusters in our case are the eleven previously defined culture colors, thus C = 11.

The KNN algorithm is trained on a dataset of samples describing the eleven culture colors in the Lab color space. The design of the training set is crucial in the proposed approach. To this aim, we follow the methodology described by D’Angelo et al. [15] consisting of collected samples describing the culture colors under various lighting conditions and in different sensors camera. In order to obtain a so diversified dataset of colors, the authors collected pixel samples from video clips of sport teams with uniform color corresponding to the culture colors. The video clips of the selected teams were randomly chosen from the web. This procedure allows to obtain a great number of samples in real illumination conditions and thus to obtain a quantization process as much as possible robust to variations in illumination. Following the described approach, we collected about 1,200 samples that we used as training set for the discussed classifier.

Based on the previous considerations, the fuzzy KNN classifier is applied to each pixel of the Lab version of the selected segments (the human torso and legs) using the training set designed as above. The classifier assigns to each pixel a label corresponding to one of the classes describing the culture colors, as we can observe in the example shown in Fig. 9c, where it is applied to the torso of a subject extracted from the CAVIARFootnote 6 database.

Fig. 9
figure 9

Example of fuzzy color quantization: a bounding box of the detected people; b extracted torso; c quantized torso through fuzzy KNN classifier; d corresponding probabilistic color histogram

As already explained, the advantage of using a soft classifier (like the fuzzy classifier) with respect to the hard classifier, is that to each pixel n a membership vector \(\textbf{u}(n)\) is associated, describing the probabilities of the membership of the pixel to the 11 possible classes. This information is exploited in the definition of the new color descriptor.

A color descriptor \(\textbf{H}(X)=\{ H_1(X), H_2(X),.., H_{11}(X) \}\) is assigned to each quantized segment X, based on the definition of a probabilistic color histogram (PCH) described as follows:

$$\label{eq:eq_pch3} H_c(X)= \frac{1}{M} \sum\limits_{n=1}^M u_c(X_n) \ \ \ \ c=1,...,11 $$
(3)

and representing for each class c the sum of the probabilities that all the M pixels in the segment belong to that class. The PCH corresponding to the quantized torso in Fig. 9c is shown in Fig. 9d. As expected, the dominant color is white.

Since the dominant color is usually not enough to provide a description of the clothes color, we can sort the PCH in a descending order and consider the set of the most probable colors as the features vector describing the clothes color. The idea of using the most probable colors to describe the clothes is well suited to the proposed approach thanks to the probabilistic description provided by the fuzzy classifier. Moreover it is closer to real scenarios for which it is more probable to have clothes with several colors instead of monochromatic ones.

Let \(\textbf{Hs}(X)=\{ Hs_1(X), Hs_2(X),.., Hs_{11}(X) \}\) be the PCH \(\textbf{H}(X)\) sorted in a descending order. In the proposed approach, each segment will be described by the set of the first most probable N colors { Hs 1(X), Hs 2(X),.., Hs N (X) }, with N < = 11.

N can be defined as the kth-percentile of the probability density function \(\textbf{Hs}(X)\) of the random variable X describing the color of the selected segment, where N is the minimum number of color bins so that P(X < x) < = k. It follows that:

$$\label{eq:eq_pch} N=\left\{\min y \in \left\{1,2,...,11\right\} | \sum\limits_{i=1}^y Hs_i(X) >= k\right\} $$
(4)

It is obvious that the number of color bins necessary to describe the selected segment is strictly dependent on the value of the threshold k. The higher the value of k, the higher will be N with a consequent increase of the true positive (tpr) but also false positive (fpr) rate. The choice of k should be the good trade-off between these aspects.

5.2.2 Experimental results

In this section we summarize the experimental results that validate the described approach. The goal is to show that the proposed color descriptor is able to predict the way the human being perceives and defines colors, so that it can be used as a soft biometric signature towards human identification.

The first requirement to test the efficiency of the proposed descriptor is to select a statically meaningful dataset, big enough to consider as many different clothes colors as possible, analyzed in real illumination conditions. To the best of our knowledge, the most challenging dataset is VIPeR (Viewpoint Invariant Pedestrian Recognition) [21], introduced in the research field of pedestrian tracking which can be considered as a subset of the people tracking topic. The authors have collected two views of 632 individuals seen from widely differing viewpoints. The presented method is evaluated using their public dataset.Footnote 7 Some examples of images belonging to this dataset can be found in Fig. 10.

Fig. 10
figure 10

Some examples from the VIPeR dataset

We applied the proposed framework on both the upper (torso) and lower (legs) parts of the bodies, which were automatically extracted from the images, and the sets of detected colors were compared to the manually annotated dataset (the ground truth). For each segment, if one of the N colors { Hs 1(X), Hs 2(X),.., Hs N (X) } matches with the ground truth, the algorithm is successful in identifying the color of the selected segment. We tested the proposed approach for different values of the threshold k and the obtained results are shown in Fig. 11.

Fig. 11
figure 11

True positive a and false positive b rate of the proposed color descriptor

As expected, by increasing the value of the threshold, both the true positive and false positive rate increase with almost the same velocity. We can observe that the color descriptor of the torso is more efficient than the one associated to the legs of the subjects. This result is due to the not accurate segmentation of the bounding boxes. Both segments were indeed automatically extracted at a fixed position of images. While the size of the bounding boxes is constant in the dataset, the position of torsos and legs is not due to variations of the viewpoints in the images. That means that most of the segments will contain many pixels from the background that will add undesirable variability to the evaluation of the PCHs. Moreover, the legs extraction is more challenging due to subjects movement. This issue can be overcome in a real application in which the segments will be directly extracted from human silhouettes in people tracking algorithms with a consequent improvement of the accuracy.

If we use both the torso and the legs clothes colors to describe subjects, the tpr drastically decreases (the green line in Fig. 11) with respect of using only one trait, since both the predicted colors must match with the ground truth. In the same way, the fpr decreases reaching the maximum value of around 0.35% in case of k = 1.

In Table 8 the true positive rate, the false positive rate and the average number of color bins N are reported for different values of the threshold depicted from the plot in Fig. 11. Other than the considerations already done, it is interesting to note that the system is able to reach quite good tpr and fpr with a limited number of color bins, e.g. the correct detection rate for the torso is around 85% with only 2.6 bins in average. That also means that the proposed fuzzy classifier is able to quantize around 85% of the pixels of the image using only 2/3 bins.

Table 8 True positive rate, false positive rate and number of average color bins for different thresholds

The optimum choice of the threshold is strictly depending on the application scenario. Generally speaking, for the proposed application, we can assume that it is better to have a high tpr even if in this way we introduce many false alarms in the system that hopefully will be rejected by the integration of several soft biometric signatures.

6 Bag of Soft Biometrics for human identification

In this section we describe the application for human identification introduced in Section 3.3 with the help of an example. Specifically, we analyze the possibility of using the set of facial soft biometrics presented in Section 4 and the new traits: weight and color of clothes from Section 5 for human identification. Finally, we combine those traits in a Bag of Soft Biometrics and give insight as of design aspects, pertinent factors and related limitations. We finally provide results on the application of the combined system for human identification. We note here, that estimation errors are not considered in this section. Regarding accuracy and performance of the estimation algorithms, please refer to the Sections 4.2, 5.2.2 and 5.1.1.

6.1 General setting and design aspects

The setting of interest corresponds to the general scenario where, out of a large population, an authentication group is randomly extracted as a random set of N people, out of which one person is picked for identification (and differentiation from all the other members of the authentication group). A general soft-biometric system employs detection that relates to λ soft-biometric traits (hair color, skin color, etc), where each trait i (i = 1, 2,..., λ) is subdivided into μ i trait-instances, i.e., each trait i can take one of μ i values. We henceforth denote as category to be any λ-tuple of different trait-instances, and we let \(\Phi=\{\phi_i\}_{i=1}^\rho\) define a set of all ρ categories, i.e., the set of all ρ combinations of soft-biometric trait-instances. The number of categories ρ, that the system is endowed with, is given by

$$ \rho=\Pi_{i=1}^\lambda\mu_i. $$
(5)

We slightly abuse notation and henceforth say that a subject belongs in category ϕ if his or her trait-instances are the λ-tuple corresponding to category ϕ. We here note that to have conclusive authentication of a subject, and subsequent differentiation from the other subjects of the authentication group, it must be the case that the subject does not belong in the same category as other members of the authentication group. Given a specific authentication group, the maximum-likelihood (ML) optimizing rule for detecting the most probable category in which a chosen subject belongs, is given by:

$$ \label{eq:phi-hat} \hat{\phi}=\mbox {argmax}_{\phi \in \Phi}P(\phi)\cdot P(y/\phi), $$
(6)

where y is the observation vector, P(ϕ) is the pdf of the set of categories over the given population (note \(\sum^{\rho}_{\i=1}P(\phi_i)=1\)), and P(y/ϕ) the probability that y is observed, given that the subject belongs in category ϕ.

In designing a soft biometric system, the overall choice of the traits and trait-instances, must take into consideration aspects as traditional limitations on estimation reliability, which is commonly a function of the sensor resolution, and of the capabilities of the image-processing part of detection. In addition to this traditional aspect, new concerns come into the picture when designing a soft biometric system as of the size and statistics of the authentication group (such as the possible similarities that might exist between different subjects), as well as the statistical relationship between the authentication group and Φ. The interrelated nature of the above aspects brings to the fore different tradeoffs. Such tradeoffs include for example the fact that an increasing μ i , and thus also an increasing ρ, generally introduce a reduction in the reliability of detection, but can potentially result in a welcomed increase in the maximum authentication group size (N) that the system can accommodate for.

In the proposed BoSB we allocate 9 traits:

1.

Skin color

4.

Beard presence

7.

Weight

2.

Hair color

5.

Moustache presence

8.

Torso clothes color

3.

Eye color

6.

Glasses presence

9.

Legs clothes color

In this setting we clearly assign μ 4 = μ 5 = μ 6 = 2, corresponding to the binary nature of traits i = 4, 5, 6. On the other hand, the other traits are of continuous character (see Table 1) and had to be categorized in consideration to the tradeoff between reliability of detection and trait importance.

Towards this we chose to subdivide trait 1 (skin color) into μ 1 = 3 instances and label them (following a recommendation provided by the ethical partner of an ongoing EU project, ActiBioFootnote 8 to avoid any assumptions about race or ethnicity based on skin color) as skin color 1, 2 and 3. Trait 2 (hair color) was subdivided into μ 2 = 8 instances, trait 3 (eye color) into μ 3 = 6 instances (see Table 6.1), trait 7 into μ 7 = 8 instances and finally the traits 8 and 9 into equal trait instances μ 8 = μ 9 = 11.

Bag of soft biometrics: traits and traits instances

SB trait

Instances number

Traits instances

Skin color

3

Skin color type 1, skin color type 2, skin color 3

Hair color

8

Light-blond, dark-blond, brown, black, red, grey, black, mixed

Eye color

6

Blue, green, brown, grey, black, mixed

Weight

4

Underweight, normoweight, overweight, severe overweight

Torso clothes color

11

Black, white, red, yellow, green, blue, brown, purple, pink, orange, grey

Legs clothes color

11

Black, white, red, yellow, green, blue, brown, purple, pink, orange, grey

As a result, the proposed system is endowed with the ability to detect

$$\label{allrho} \rho=\Pi_{i=1}^9\mu_i=557568 $$
(7)

distinct categories. For the sake of clarification, we note two simple examples of such categories in Φ:

  • “skin type 1, brown hair, blue eyes, no beard, no moustache, no glasses, normoweight, red clothes torso, black clothes legs” ∈ Φ

  • “skin type 3, black hair, black eyes, beard present, moustache present, glasses present, overweight, black clothes torso, white clothes legs” ∈ Φ

We here note that an increase in the set of traits λ, yields an impressive number of categories ρ beyond 1012, which may be suitable for several applications.

We already analyzed, in the previous sections, the efficiency of the proposed algorithms in estimating the introduced signatures from the visual appearance of humans. To fairly assess the performance of a Bag of Soft Biometrics for human identification it is necessary to analyze some statistical characteristics of the proposed framework which reflect on the distinctiveness capability of the described traits.

Relevant parameters, in addition to λ, μ, and ρ, also include the size and statistics of the authentication group (revealing possible similarities between different subjects), as well as the statistical relationship between the authentication group and Φ. In what follows we aim to gain insight on the behavior of the above, in the specific setting of the proposed soft-biometric design. The following analysis, which is by no means conclusive, focuses on providing insight on parameters such as:

  • The spread of the effective categories for a given authentication group, where this spread is used as a measure of the suitability of Φ in authenticating subjects from a certain authentication group.

  • The relationship between N, and the corresponding probability of interference as a function of Φ (the probability that two users share the same category and will thus be indistinguishable).

6.2 Spread of the category set Φ

We here consider the case where a soft-biometric system is designed to distinguish among ρ distinct categories, but where the randomly introduced authentication group only occupies a smaller fraction of such categories, and where these categories are themselves substantially correlated. Leaving correlation issues aside for now, we first define the set of effective categories Φ e to be the set of categories that are present (are non empty) in the specific authentication group. A pertinent measure of system diversity and performance then becomes the cardinality ρ e  = |Φ e |. We note that clearly both Φ e and ρ e are random variables, whose realizations may change with each realization of the authentication group.

To analyze the spread of the category set in the case of facial soft biometrics, we consider the case where the authentication groups are each time drawn from general population that is a fixed set of K = 646 subjects taken from the Feret database, with ρ = 1152 categories, corresponding to a pdf as shown in Fig. 12a, where this pdf itself corresponds to the traits and trait-instances of the proposed system.

Fig. 12
figure 12

P(ϕ) corresponding to facial traits distribution in the Feret dataset a, BMI distribution in the NHANES dataset b, clothes color distribution in the VIPeR dataset c

The same analysis can be performed for weight (Fig. 12b) and clothes color (Fig. 12c) based on the already introduced dataset (NHANES and VIPeR respectively).

We can observe from the figures that the number of effective categories is much lower than we were expecting, decreasing the distinctiveness of the system. It becomes apparent that a natural remedy for increasing the cardinality of effective categories E[ρ e ] is to increase the overall ρ, which brings to the fore the natural question as to whether this increase in ρ should be more a result of an increase in the number of traits, or rather more a result of an increase in the number of trait-instances. We address this resource allocation problem, under the simplifying assumption of symmetry, where μ i  = μ, for all i = 1,...,λ. In this symmetric setting, where clearly

$$ \rho=\mu^\lambda $$
(8)

and where ρ increases polynomially with μ and exponentially with λ, a simple comparison of the two derivatives \(\frac{d\rho}{d\mu}\), \(\frac{d\rho}{d\lambda}\) identifies the trait-limited region of a soft-biometric system to be the region:

$$ \lambda<\mu\cdot \mbox{ln} \mu $$
(9)

in which ρ increases faster with λ than with μ, and where emphasis should be placed on increasing λ rather than μ.

This approach in turn, brings to the fore the issue that increasing ρ, may indeed result in an increased E[ρ e ], but might affect the correlation between the different categories. This would subsequently result in a reduced spread of Φ, which would imply a reduced distinctiveness in authentication.

In regards to this, we give some intuition on the distinctiveness of some non-empty categories of the proposed system, by computing the correlation between these categories using Pearson’s product-moment coefficient

$$ r_{X,Y}=\frac{cov(X,Y)}{\sigma_{X}\sigma_{Y}}=\frac{E[(X-\mu_{X})(Y-\mu_{Y})]}{\sigma_{X}\sigma_{Y}}. $$
(10)

In the case of facial traits, the resulting correlation parameters evaluated on the Feret dataset and shown below revealed, as expected, the highest correlation to be that between moustache and beard mirroring the fact that among the studied population the presence of moustache, given the presence of beard, is at 97.8%.

$$ \begin{array}{rll} r_{\rm Eye\,color,\,Hair\,color} &=& -0.1964\\ r_{\rm Hair\,color,\,Skin\,color} &=& -0.1375\\ r_{\rm Eye\,color,\,Skin\,color} &=& 0.3700\\ r_{\rm Moustache,\,Beard} &=& 0.6359 \end{array} $$
(11)

In the same way we could investigate the correlation among the body traits. As example we here propose the correlation measures among the common traits when talking about body: weight, height, and BMI (the numerical data where taken from the NHANES dataset):

$$ \begin{array}{rll} r_{\rm Height,\,Weight} &=& 0.5485\\ r_{\rm Weight,\,BMI} &=& 0.8676\\ r_{\rm Height,\,BMI} &=& 0.0742 \end{array} $$
(12)

As expected, height and weight are correlated as body weight increases typically with body height. Even if the relation between BMI, weight and height is clearly shown in the formula (\(\text{BMI}=\frac{W}{H^{2}}\)), for the sake of completeness, the correlations coefficients of the couples Weight–BMI and Height–BMI are also provided. We can notice the high and low value of correlation of weight and height w.r.t. BMI, respectively. The main contribution to BMI comes indeed from weight as, in the analyzed dataset, its variance is higher compared to the height one.

Regarding clothes color, one may argue a possible correlation between the torso and the legs clothes colors. However, the correlation coefficient obtained using the VIPeR dataset and shown below

$$ r_{\rm Torso\,clothes\,color, Legs\,clothes\,color}=0.0075 $$
(13)

reveals that there is no correlation between the two traits. This result is affected by the use of a color quantization step for which all the possible colors are described by the limited set of colors corresponding to the eleven culture colors.

6.3 Bounding N for a given interference probability

We are here interested in describing the relationship between N, and the corresponding probability of interference, as a function of Φ. We proceed to properly define the event of collision or interference.

The event of collision, or equivalently of interference, describes the event where any two or more subjects belong in the same category ϕ. Focusing on a specific subject, we say that this subject experiences interference if he/she belongs in a category which also includes other subjects from the authentication group. In regards to this, we are interested in gaining insight on two probability measures. The first measure is the probability that the authentication group of size N, chosen randomly from a large population of subjects, is such that there exist any two subjects within the group that collide. We briefly note the relationship of to the famous birthday paradox. For the other measure of system reliability, we consider the case where an authentication group of size N is chosen randomly from a large population of subjects, and where a randomly chosen subject from within this authentication group, collides with another member of the same group. We denote this probability as q(N), and note that clearly q(N) < p(N). To clarify, p(N) describes the probability that interference exists, even though it might not cause error, whereas q(N) describes the probability of an interference induced error.

In the following we provide a simulation of the probability of identification error, in the setting of interest, under the assumption that the errors are due to interference, i.e., under the assumptions that errors only happen if and only if the chosen subject shares the same category with another person from the randomly chosen authentication group. This corresponds to the setting where the soft-biometric approach cannot provide conclusive authentication.

In the first simulation, regarding facial soft biometrics, the larger population consisted of 646 people from the Feret database, and the simulation was run for different sizes N of the authentication group. The probability of authentication error is described in the following figure.

As a measure of the importance of each trait, Fig. 13a describes the collision probability when different traits are removed. The presence of glasses seem to has a lower influence on the detection results, whereas hair and eye color have the highest impact on distinctiveness.

Fig. 13
figure 13

Collision probability of clothes color in an N sized authentication group: a facial traits; b BMI; c torso clothes color; d legs clothes color

Figure 13b depicts the analysis conducted over the 4 categories of BMI (underweight, normoweight, overweight, and sever overweight). As the number of categories is much smaller than in the previous facial case, the collision probability is already important for small number of persons considered; precisely the probability of having two or more subjects that collide, is 50% already with 3 persons.

Figure 13c and d describes the same results evaluated for the clothes color considering torso and legs together or as distinctive approaches respectively. The probability of collision has almost the same trend for both torso and legs. This is a consequence of the similar distributions of the corresponding effective categories. As expected, considering torso and legs together drastically decreases the probability of collision since the number of category increases from 11 to 121. Specifically, in a group of at least 7 randomly chosen people, there is more than 50% probability that some pair of them will have the same color for both torso and legs.

6.4 Virtual multimodal database

We are finally interested in assessing the reliability of the proposed BoSB framework exploiting all the previously analyzed signatures.

As mentioned in Section 3, a system using more than one biometric characteristic to verify whether a person is who he/she claims to be is called a multimodal biometric authentication system. Many papers in the multimodal fusion literature test system performance on limited dataset (e.g. 100 subjects), obtaining very biased results.

Due to lack of large multimodal biometric datasets, a popular approach is to create virtual databases in which the biometric trait of a user from a database is combined with another biometric trait of yet another database, thus creating a so-called chimeric user.

Few works have studied the validity of using virtual subjects for multimodal system evaluation. In [30, 55] the authors motivate the possibility of using databases of virtual subjects under the conditions that the underlying biometric traits to be combined are a priori mutually independent. Following their approach, based on the assumption that the proposed bag of facial traits, the weight and the color of clothes, are temporally uncorrelated, we designed a virtual database to evaluate the collision probability in an N sized authentication group. The traits of each subject are obtained taking randomly the facial traits from the Feret dataset and the weight and the clothes color from the NHANES and VIPeR databases respectively. The obtained results are shown in Fig. 14.

Fig. 14
figure 14

Collision probability of the proposed BoSB in an N sized authentication group with N ranging from 0 to 1,000 (a), and a magnified version in [0 100].

As expected the collision probability drastically decreases with respect to the unimodal cases analyzed above. We need a group of at least 49 randomly chosen subjects for a 50% probability of collision. Moreover the probability q(n) that a particular person drawn from a group of N subjects has the same soft biometric characteristics as further subject(s) from the same group, is extremely low, even for a high value of N.

7 Concluding remarks

In this paper we proposed the use of soft biometrics for person identification. The limitations of single soft biometric traits, namely distinctiveness and permanence are overcome by the use of multiple traits. The main contributions of the paper are:

  • The introduction of a new definition of soft biometrics. We proposed a definition, which emphasizes the aspect of human compliance and broadens the concept of soft biometrics introduced in scientific literature. We identified and classified soft biometric traits, which accept the introduced definition, and further elaborated the related advantages, limitations, scientific work and applications.

  • The definition of two novel traits: weight and clothes color. Although weight, estimated by a scale, has already been introduced as a soft biometric trait, the attempt for an automatic body weight detector at a distance is novel and was provided in this work. Clothes color, on the other hand, has never been analyzed in biometric literature, since it is not a human feature itself. However, with the new definition it belongs to the human compliant soft biometric traits. For both new traits, we provided efficient estimation methods and a corresponding empirical analysis on the accuracy and performance.

  • The description of a new application, namely human identification based solely on soft biometric traits. In this context we created a framework for this specific application by constructing a Bag of Soft Biometrics, including facial and novel body and accessory traits. We have also provided insight on statistical properties of parameters concerning soft biometrics systems, as well as on design and resource allocation aspects. The performed experiments show very promising results and confirm the efficiency of the proposed BoSB for human identification.

The above contributions suggest a substantial potential for soft biometric systems, which incur a plethora of advantages over a large range of applications, like monitoring or video surveillance. The utility and range of soft biometric systems are expected to increase, as the number of traits increases, and as the underlying image capture and processing technology improves.