Keywords

1 Introduction

Down syndrome is the most common chromosomal abnormality defined by typical facial appearance, short fingers/toes and moderate/severe intellectual disability. These features are frequently associated with cardio-vascular and gastro-intestinal defects. The most characteristic facial traits include: round and flat face, upslanting palpebral features (mongoloid eyes), epichantal fold (skin fold in the inner eyelid angle), Brushfield spots (spotted iris), small nose with flattened root, small mouth (usually open, with protruding tongue) and small, low set ears [1, 2].

Down syndrome identification in digital images of faces can be useful in computer-aided diagnosis, human–computer interaction systems or demographic studies.

Down syndrome persons have very distinctive facial traits, that is why it is easy for humans to distinguish between a Down syndrome person and a healthy person just by looking at their faces. In this paper we investigate how effective are some computer vision techniques for face recognition, to make the same decision: Down/non Down face.

The subject of face recognition is one of the major research direction in image processing and computer vision. Some of the main directions of research in this field are described in [3, 4]. The problem of Down syndrome detection was also approached. In [5] the authors studied the problem of automatic identification of other (than Down) genetically disordered patients. In [6, 7] they use for Down syndrome identification a Hierarchical Constrained Local Model with Independent Component Analysis first for detecting facial landmarks and then to extract features around these points Local Binary Patterns (LBP) are employed. They used SVMs (both linear and with Radial Basis Function kernel), kNN, random forest and Linear Discriminant Analysis as classifiers. The tests were made using images of young children (50 Down/50 healthy) of different ethnicity, taken under variable illumination, but in a controlled environment. The best result was an accuracy of 95.6% with the SVM-RBF. In [8] they also extract information about fiducial face points and then use contourlets and LBP to extract also textural features around the landmarks. The tests were made using children images, making two type of evaluations: one with 24 normal patients and 24 with mixed genetic syndromes (81.3% accuracy in detecting Down syndrome) and the second with 24 normal faces and 24 Down faces (97.9% accuracy). In [9] the authors worked with 1–12 years old children images (15 healthy, 15 with Down syndrome), performing a standardization step before extracting features using Gabor wavelets and employed kNN and linear SVM’s for classification, the best result being an accuracy of 97.34%.

In our approach we used “real-life” images of persons with Down syndrome, of different ages and tested them using normal faces of mixed ethnicity, or using only Chinese. We tested LBP, a mixture of Discrete Wavelet Transform (DWT) with LBP, and Eigenfaces with Principal Components projection features using kNN, polynomial and RBF SVMs. For Down syndrome we selected 50 images from the Internet and for normal faces we used classical face recognition datasets (FERET [10], CAS-PEAL [11], LFW [12]). The influence of noise (inherent to real-life images) was also studied.

2 Face Characterization Features

There are many ways to extract features for face recognition. Among them are the Eigenfaces method [13] and Local Binary Patterns (LBP) [14, 15]. In the Eigenfaces algorithm, first the face images are vectorized and the mean value of these images is subtracted. A matrix is formed with these vectors as columns. Then the eigenvalues and eigenvectors of the covariance matrix are computed. The eigenvectors are arranged by sorting the corresponding eigenvalues in descending order. The eigenvectors are then employed to project new images on the linear subspace generated by them, thus forming a feature vector.

Local Binary Patterns [14] is a technique initially developed for texture characterization but can also be applied for face recognition [15]. A binary pattern for each pixel in a grayscale image is computed by comparing the intensities of its 8 neighbors with the intensity of the considered pixel. If the neighbor’s intensity is bigger, a 1 is put in the binary pattern associated with the pixel, otherwise a 0. Thus, for each pixel an 8-bit pattern is computed and usually is converted to a 0–255 decimal value. One obtains a matrix of binary patterns associated with the original image. In order to extract LBP features, the image is divided in blocks and histograms of the binary patterns are computed for each subimage. The feature vector is formed by concatenating these histograms. It was observed that among the 256 binary patterns that can be associated to a pixel, those that have only at most two 0–1 or 1–0 transitions are more important. These patterns are called uniform and there are 58 such patterns. If one computes histograms counting only the uniform patterns and the non-uniform ones are all counted in one bin, the size of histograms reduces from 256 to 59 without losing precisions in results, and thus the feature vectors are 4 times shorter.

The Wavelet Transform [16] was a solution to the shortcoming of the fact that the Fourier Transform coefficients provided no local information on frequencies. In 1988 Mallat and Meyer introduced the wavelet multiresolution concept, then Daubechies gave an algorithm for computing orthogonal wavelets with compact support, and Mallat deduced the fast wavelet transform, all these steps made the wavelet transform a very useful tool for signal processing. At one level of Discrete Wavelet (DW) decomposition, 4 filtered images are computed, half the size of the processed image: HH (high-high filtering both rows and columns of the image), HL (highpass filter is applied to rows and a lowpass one to columns), LH (low-high filtering rows/columns) and LL (lowpass filtering both rows and columns).

We use these methods to build feature vectors for Down syndrome identification in the following ways:

  1. 1.

    The uniform binary patterns are computed for an image, then the image is divided in 4 × 4 or 8 × 8 blocks, for each block the histogram of the uniform pattern is computed. The feature vector is formed by concatenating these 16 or 64 histograms. In Fig. 1 are some examples of images and their uniform binary pattern correspondent.

    Fig. 1.
    figure 1

    Examples of face images from FERET, CAS-PEAL, LFW, and with Down syndrome and their corresponding uniform LBP images

  2. 2.

    First apply a 2 levels DW decomposition, for the LL subimage from the second level of wavelet decomposition we build the feature vector as described in 1 (using 8 × 8 blocks).

  3. 3.

    We first computed the eigenfaces for the AT&T ([17], Fig. 2) face database, then we projected the image on the most significant of these eigenfaces. The weights obtained from the projection step form the feature vector.

    Fig. 2.
    figure 2

    Eigenfaces from AT&T dataset

  4. 4.

    For the AT&T faces, the 8 × 8 uniform LBP feature vectors are computed; for these feature vectors the principal components are calculated; for an image to be classified the final feature vector is obtained by projecting the 8 × 8 uniform LBP feature vector of the image on some of the most important principal components.

All the feature vectors were normalized: for the LBP ones, the histogram for each block was divided by the city-block norm, and for the projection weights we used the Euclidean norm for normalization.

The LPB method is known to be very sensitive to noise, that is the reason of considering applying first a DWT before extracting features with LBP method or projecting the face images on a set of eigenfaces or principal components.

As classifiers, the Support Vector Machines (SVM) with 3-rd degree polynomial kernel, or Radial Basis Function (RBF) kernel and kNN with k = 3 and Euclidean norm, were employed.

For testing the feature vectors, face images form FERET [10, 18], CAS-PEAL [11], and LFW [12] were employed. We selected randomly, from the Internet 50 images of faces for persons with Down syndrome.

In Fig. 3 are some examples of images we used in the tests.

Fig. 3.
figure 3

Samples of face images from FERET, AT&T, CAS- PEAL, LFW, Down syndrome datasets

We evaluated the classification results using the following measures:

$$ Accuracy = \frac{TP + TN}{TP + FP + TN + FN} $$
(1)
$$ Precision = \frac{TP}{TP + FP} $$
(2)
$$ Recall = \frac{TP}{TP + FN} $$
(3)
$$ Specificity = \frac{TN}{TN + FP} $$
(4)

where TP, TN, FP, FN stands for “true positive”, “true negative”, “false positive” “false negative”. For example, “true positive” is the number of Down faces identified as having Down syndrome, “false positive” the number of non-Down faces identified by the classifier as having Down syndrome. The accuracy is a measure of the overall success of the classification process, it is the proportion of the correctly identified images from all images. The recall (also called sensitivity) and specificity measure the proportion of the correctly identified Down/non-Down images from all Down/non-Down images, respectively. The precision is the ratio of the correctly identified Down images from all those images identified as Down.

3 Results

Our numerical tests were performed using MATLAB and OpenCV in Java. The face databases that we worked with have the following characteristics. The AT&T [17] database has 400 images of size 92 × 112, for 40 distinct persons, for each person 10 images with different facial expressions, under different light conditions and with or without glasses. The FERET (Face REcognition Technology, [10, 18]) have 14126 images for 1199 persons, with different facial expressions and positions, mixed ethnicity, taken in controlled environment. The size of images in FERET is 256 × 384. The CAS-PEAL database is a dataset of Chinese persons consisting of 30900 images for 1040 subjects, taken under varying pose, expression, accessory and lighting (PEAL) conditions, the size of images is 360 × 480. The LFW (Labeled Faces in the Wild) is a collection of 13000 face images collected from the Internet (such that Viola and Jones [19] face detector worked on them) for 5479 individuals, the size of the images is 250 × 250. Our criterion for selecting the images (Down or non-Down) was also the Viola and Jones face detector working on the images.

All the images from the AT&T dataset were employed for building 10304 (92*112) eigenfaces or 3776 principal components. In the projection computations we only used the most relevant 1500 of these eigenfaces and 125 principal components. From FERET database 900 frontal images were selected, from CAS-PEAL 300 (frontal and normal) and from LFW 70. The Viola and Jones face detector was applied to all the selected images and then feature vectors were computed. Because of the eigenfaces projection method 3. described in the previous section, all the faces were resized to 92 × 112, the common size of the AT&T images.

We performed several tests on these databases. The tests were carried out by using 60 (for FERET or CAS-PEAL) or 70 (for LFW) non-Down images and the 50 Down faces. Thus, for FERET 15 tests were made, for CAS-PEAL, 5 tests and 1 test for LFW images. As cross validation we used “leave-one-out”. In the tables with results, LBP (4 × 4 or 8 × 8) means that the uniform binary patterns were used to extract features for blocks of size 4 × 4 or 8 × 8, respectively. EP stands for Eigenfaces Projection method. LBP(PCA) is the 8 × 8 uniform LBP feature extractor followed by the projection on the principal components on AT&T as described in Sect. 2, reducing the dimension of the feature vectors to 125. For the DWT we used 22 tap biorthogonal Daubechies filters.

The feature vectors have the following dimensions: 944 for LBP (4 × 4), 3776 for LBP (8 × 8) and DWT + LBP, 1500 for EP, 125 for PCA (Principal Component Analysis) reduced dimension LBP (8 × 8).

In Tables 1, 2 and 3 are the (mean) results for the (1)–(4) measures of classification. We expected low values for the Recall parameter due to the fact that the selected Down faces images are low quality and are not standardized in any way. The good classification results we get for FERET and CAS-PEAL are due to this difference in image quality. The images in FERET and CAS-PEAL were taken in controlled conditions, with the same camera. The 50 Down selected images are rather noisy, and had different sizes, the re-dimensioning process enhanced the ‘defects’ in these images. We suspect that in the case of CAS-PEAL classification, in fact a Chinese/non-Chinese identification process was performed rather than Down/non-Down one. The results we obtain for the LFW database give better information on the capacity of distinguishing between Down and non-Down face images of the considered features, because these images were selected in the same way.

Table 1. Mean values for Accuracy (A), Precision (P), Recall (R), Specificity (S) for 15 tests with 60 FERET faces and 50 Down faces
Table 2. Mean values for Accuracy (A), Precision (P), Recall (R), Specificity (S) for 5 tests with 60 CAS-PEAL faces and 50 Down faces
Table 3. Accuracy (A), Precision (P), Recall (R), Specificity (S) for 70 LFW faces and 50 Down faces

We studied the effect of noise on the considered features by performing the following test: we trained the classifier using the original images and the same images where Gaussian noise with zero mean and 0.0005 variance was added. We tested using images with added Gaussian noise with zero mean and 0.003 variance. The tests were made in a “leave-one-out” manner, by training the classifier without the corresponding image we test. We only tested the FERET and CAS-PEAL images using the SVM (for 3NN the results are unusable). The results are in Tables 4 and 5.

Table 4. Mean values for Accuracy (A), Precision (P), Recall (R), Specificity (S) for 15 tests with 60 FERET faces and 50 Down faces—Gaussian noise with zero mean and 0.0005 variance for training and 0.003 variance for test
Table 5. Mean values for Accuracy (A), Precision (P), Recall (R), Specificity (S) for 5 tests with 60 CAS-PEAL faces and 50 Down faces—Gaussian noise with zero mean and 0.0005 variance for training and 0.003 variance for test

We can observe from these tables that the LBP does not perform well in noisy situations (which is well known), the DWT manages to deal better with the noise, but by far, the best results are obtained by using the projection methods on the principal components or eigenfaces of the AT&T images (the results are almost unchanged from the normal, no noise case). The recall becomes lower for the FERET database, the noise enhancing the ‘defects’ of the Down images and decreasing their identification rate. For the CAS-PEAL situation, the specificity decreases showing that, in fact, the features are classifying Chinese/non-Chinese.

We used DWT or eigenfaces projection in order to deal with noise. We made some tests by first applying median filters of different sizes (3, 5, 7, 9) in order to reduce the noise and then computing LBP (4 × 4) feature vectors. Tests were made for Down and LFW images. The results can be seen in Fig. 4. The results tend to decrease when the size of the filter increases, as expected, with the exception of the 7 sized median filter and all the results are worse than those obtained by using the DWT and eigenfaces projection methods.

Fig. 4.
figure 4

Accuracy and Recall for LBP median filtered 70 LFW, 50 Down images

We used a 3-rd degree polynomial kernel for SVM because it needed no parameter tuning as the RBF kernel needs and the results are good. Of course, in some cases we obtain better results with RBF SVMs. For example, for the LFW test without noise we get for the LBP case a better accuracy result, 91.66%, with lower precision (93.47%) and higher recall (86%). For the DWT + LBP and SVM-RBF case we obtain overall improved results with accuracy of 95.83%, precision 97.87%, 92% recall and 98.57% specificity. For the EP case we didn’t manage to find the appropriate parameter for the RBF kernel in order to improve the polynomial SVM result.

4 Conclusions

In the present paper we tested four feature extraction methods for Down syndrome identification in digital images. The first is the classical Local Binary Patterns method, the second is a combination of Discrete Wavelet Transform and LBP, the third method is an eigenfaces projection method using AT&T as eigenfaces provider and the fourth is a projection method on principal components of the LBP features of AT&T. All these methods give good results in distinguishing between faces with Down syndrome and healthy ones, but in the presence of noise projection methods are the best (especially the projection on the principal components of the LBP of AT&T faces) followed by the combination DWT and LBP. The feature vectors were tested using only frontal face images from the well-known face recognition datasets FERET, CAS-PEAL, LFW and some Down images collected from Internet, with SVM and 3NN as classifiers.

For the projection methods, we need to further test the influence of the dataset from which the eigenfaces or principal components are built. Other visually visible genetic disorders will be approached in a future work.