Introduction

Down Syndrome, trisomy 21 or mongolism, is one of today’s most common dysmorphic disorders. It is named clinically after John Langdon Down, the British physician who described the syndrome in 1866. The condition was identified as a chromosome 21 trisomy by Jérôme Lejeune in 1959 [1]. It affects about 1 baby in every 800 babies and it changes with respect to mother’s age. Despite the variability in Down syndrome, individuals with Down syndrome have a widely recognized characteristic appearance. Typical facial features include a flattened nose, small mouth, protruding tongue, small ears, and upward slanting eyes. The inner corner of the eyes may have a rounded fold of skin. The hands are short and broad with short fingers, and may have a single palmar crease. White spots on the colored part of the eye called Brushfield spots may be present. Babies with Down syndrome often have decreased muscle tone at birth. Normal growth and development is usually delayed and often individuals with Down syndrome don’t reach the average height or developmental milestones of unaffected individuals.

Today, for the diagnosis of such diseases, defined disease recipes found in the literature or photograph databases are being used. When the exact diagnosis wasn’t determined, various hormone and cytogenetic tests are performed. These tests are very costly and takes a long time. Speeding up the diagnosis, it is important to have high accuracy classification process by comparing a new image with the patient images on the existing database. In this context, computer-based medical image processing techniques are used.

Two-dimensional (2D) and three-dimensional (3D) face recognition methods have been successfully used in the classification of dysmorphic disorders. Wieczorek and her team achieved classifiying the two-dimension face images of the patients (taken from front and sides) in groups and in pairs in their study which began with 55 patient from 5 sydromes and reached 200 patients from 14 syndromes with varying accuracy [24]. Hammond and his team achieved classification with high accuracy by using dense surface model (DSM) method in the database consisting of three dimensional face images of individuals with syndromes and healthy [5, 6]. In order to distinguish Down syndrome’s and healthy individuals; classification accuracy of 68,7% was achieved with elastic bunch graph matching (EBGM) method in a database comprising of 36 images [7]. Classification accuracy of up to 95,3% was achieved with local binary pattern (LBP) method in another database comprising of 107 images [8].

In this study, the necessary image preprocessing procedures are completed firstly. Features of the images are extracted with GWT in a database comprising of 30 images collected by authors with the required permissions. As different from the studies related with the Down Sydrome, selection of the most valuable features is performed for the first time. Dimension of the most valuable features is reduced to smaller size with PCA. A new dimension for reduced features is derived with LDA. The most significant information is kept in this dimension. After the dimension changed, classification accuracy is carried out 96% and 97,34% with kNN and SVM methods, respectively.

Material and method

Image acquisition

Necessary database for feature extraction was taken from Down Syndrome Association of Turkey and Istanbul University Medicine Faculty Department of Medical Genetics by authors. At the annual meetings of the parents support groups, the patients and their parents were informed about the study design, and photographs of the patients were taken after their consent. Several photographs of each patient, especially in incooperative ones, were taken to ensure that a sharp photograph in optimal pose was obtained. At least two independent clinical geneticists has established the diagnoses in the patients. Images in the database belong to healthy and down’s syndrome children aged 1–12 years. As it can be seen from Fig. 1, there are 15 images for each group. There are 6 girls and 9 boys in down group and 7 girls and 8 boys in healthy group.

Fig. 1
figure 1

Images of down’s syndrome and healthy individuals

Standardization and selection of images

In preprocessing section, rotation, cropping, histogram equalization and scaling procedures are applied to the gray scale images in database which are converted from colour images. Rotation is the turning of an object or coordinate system by an angle about a fixed point. In order that all images have same angle in front view, visual direction of images are adjusted in rotation procedure. Cropping refers to the removal of the outer parts of an image to improve framing, accentuate subject matter or change aspect ratio. Focusing on the actual area, unneeded parts of images are deleted with cropping process. Histogram equalization usually increases the global contrast of many images, especially when the usable data of the image is represented by close contrast values. Through this adjustment, the intensities can be better distributed on the histogram. This allows for areas of lower local contrast to gain a higher contrast. Histogram equalization accomplishes this by effectively spreading out the most frequent intensity values. Contrast differences in images are fixed with histogram equalization process. Scaling is a linear transformation that enlarges or diminishes objects. Resolution of images are adjusted to 320 × 240 with scaling before the GWT. Sample image preprocessing procedures are seen in Fig. 2. The MATLAB program is used for implementation of all processes in this study.

Fig. 2
figure 2

Preprocessing of images. a Crop b Scale (from 486 × 360 to 320 × 240) c Convert RGB image to grayscale d Histogram Equalization

Feature extraction of images

Gabor wavelets

Because of the robustness of Gabor features against local distortions caused by variance of expression, illumination and pose, they have been successfully implemented for face recogniton [9]. Among various wavelet bases, Gabor functions particularly provide the optimized resolution both in the spatial and frequency domains [10, 11]. For the following reasons, Gabor wavelets appear to be the optimal basis for extracting local features for pattern recognition:

  • Biological reason: The shapes of Gabor wavelets are similar to the receptive fields of simple cells in the primary visual cortex [11].

  • Mathematical reason: The Gabor wavelets are optimal for measuring local spatial frequencies [12, 13].

  • Experimental reason: Gabor wavelets have been found to yield distortion tolerant feature spaces for other pattern recognition tasks, including texture segmentation [14, 15], handwritten numeral recognition [16] and fingerprint recognition [17].

Normalized elementary Gabor function is defined by Gabor [10] with Eq. 1. He found that the function is a Gaussian modulated by a sinusoidal signal.

$$ \varphi (t) = \frac{{\left| {{f_0}} \right|}}{{\gamma \sqrt {\pi } }}\exp ( - \frac{{{f_0}^2}}{{{\gamma^2}}}{t^2})\exp ( - j2\pi {f_0}t) $$
(1)

where α is the sharpmess of the Gaussian, and f 0 is the center frequency of the sinusoidal signal, a constant ratio γ = f 0 is defined such that the functions on different frequencies behave as scaled version of eachother.

Two dimensional counterpart of a Gabor elementary function was introduced by Granlund [18]. It can be derived directly from Eq. 1. by replacing t with the spatial coordinates (x,y). Setting the sharpness of the Gaussian in the y axis as β and the ratio with central frequency as η = f/β. The 2D Gabor wavelet can now be defined as in Eq. 2.

$$ \varphi (x,y) = \frac{{{f^2}}}{{\pi \gamma \eta }}\exp ( - (\frac{{{f^2}}}{{{\gamma^2}}}{x_r}^2 + \frac{{{f^2}}}{{{\eta^2}}}{y_r}^2))\exp (j2\pi f{x_r}) $$
(2)

where \( {x_r} = x\cos \theta + ysin\theta, {y_r} = - x\sin \theta + y\cos \theta \)

The 2D Gabor wavelet as defined with Eq. 2. has Fourier transform as in Eq. 3.

$$ \Phi (u,v) = \exp ( - {\pi^2}(\frac{{{\gamma^2}}}{{{f^2}}}{({u_r} - f)^2} + \frac{{{\eta^2}}}{{{f^2}}}{v_r}^2)) $$
(3)

where \( {u_r} = u\cos \theta + vsin\theta, {v_r} = - u\sin \theta + v\cos \theta \)

An amazing equivalence between the 2D Gabor functions and the organization and characteristics of the mammalian visual system was presented by Daugman [11]. He generalized the time frequency resolution uncertainty to the 2D domain. Okajima’s work showed that the Gabor-type receptive field can extract the maximum information from local image regions [19]. In order to find the relationships between different Gabor wavelets definitions, a wave vector is defined \( \left( {\overrightarrow k = 2\pi f(j\theta )} \right) \) to represent the central frequency components in the frequency domain. Setting \( \gamma = \eta = \frac{\sigma }{{\sqrt {2} \pi }} \), i.e., \( \alpha = \beta = \frac{{\sqrt {2} \pi f}}{\sigma } \), the Gabor wavelet located at position \( \overrightarrow z = (x,y) \) can be defined with Eq. 4.

$$ \varphi (\mathop{z}\limits^{ \to } ) = \frac{1}{{2\pi }}\frac{{\left\| {\overrightarrow k } \right\|}}{{{\sigma^2}}}\exp ( - \frac{{{{\left\| {\overrightarrow k } \right\|}^2}{{\left\| {\overrightarrow z } \right\|}^2}}}{{2{\sigma^2}}})\exp (i\overrightarrow k \overrightarrow { \cdot z} ) $$
(4)

||..|| denotes the absolute value. A family of U x V Gabor wavelets are usually required to extract features from images, since local frequency and orientation are unknown [2022]. The filter family are obtained with Eq. 5.

$$ \left\{ {{\varphi_{{({f_u},{\theta_v},\gamma, \eta )}}}(x,y)} \right\},{f_u} = \frac{{{f_{{\max }}}}}{{{{\sqrt {2} }^u}}} $$
(5)

where \( {\theta_v} = \frac{v}{8}\pi, u = 0,...,U - 1,v = 0,...,V - 1 \)where f u and θ v define the scale and orientation of the Gabor wavelets, f max is the maximum frequency and \( \sqrt {2} \)(half octave) is the spacing factor between different central frequencies. Gabor wavelets with five scales (U = 5), and eight orientations (V = 8) are used in most face applications. Whole set of 40 Gabor wavelets are seen in Fig. 3. Figure 3a shows the magnitude at five scales, while Fig. 3b shows the real parts. The wavelets show desirable characteristics of orientation selectivity, spatial frequency and spatial locality [9].

Fig. 3
figure 3

The set of 40 Gabor wavelets. a The magnitude at five scales b The real parts at five scales and eight orientations

Gabor wavelet representation

The Gabor wavelet representation of an image is the convolution of the image with a family of Gabor wavelets as defined by Eq. 5. Consequence of the convolution of an image and a Gabor wavelet can be defined by Eq. 6.

$$ {G_{{u,v}}}(\overrightarrow z ) = I * {\varphi_{{({f_u},{\theta_v},\gamma, \eta )}}}(\overrightarrow z ) $$
(6)

where \( \overrightarrow z = (x,y) \) and \( {\varphi_{{({f_u},{\theta_v},\gamma, \eta )}}}(\overrightarrow z ) \) denotes the Gabor wavelet with orientation θ v . and central frequency f u . When a set of 40 complex Gabor wavelets are used (five scales and eight orientations), local features can be represented by the set of convolution results at a certain convolution point \( \overrightarrow z \), which consists of important information at different orientation and scales. Such a feature \( J(\overrightarrow z ) \) is usually called a Gabor jet [20, 23] and can be defined by Eq. 7.

$$ J(\overrightarrow z ) = \left\{ {{J_j}(\overrightarrow z ),j = 0,1, \ldots, 39} \right\} $$
(7)

where complex \( {J_j}(\overrightarrow z ) = {G_{{u,v}}}(\overrightarrow z );J = v + 8u,u = 0,...,4,v = 0,...,7. \) J j can also be written as \( {J_j} = {a_j}\exp (i{\phi_j}) \) with magnitude a j and phase ϕ j , which contains very important information. In face recognition, a face image could be represented by the Gabor jets extracted at some pre-defined feature points, or by the full convolution with all of the Gabor wavelets. The convolution result of a down syndrome’s face image with 40 Gabor wavelets is shown in Fig. 4, the magnitude and real parts are shown in Fgure 4a and 4b, respectively.

Fig. 4
figure 4

Gabor wavelet representation of a face image. a magnitude b the real part

In this study, the resolution of Gabor wavelets is set to 10 × 10 in order to decrease the overlap. All images in database are convolved with 40 different Gabor wavelets. Magnitudes of convolution results of each image are collected in a feature vector. Components of feature vectors are decreased from 3072000 to 122880 with row and column downsampling in proportion to 1/5.

Preperation of training and test samples

In K-fold cross-validation, the original sample belonging to down’s syndrome and healthy individuals is randomly partitioned into K subsamples. From the K subsamples, a single subsample is kept as the validation data for testing the model, and the remaining K−1 subsamples are used as the training data. The cross-validation process is repeated K times, with each of the K subsamples used exactly once as the validation data. The advantage of this method over repeated random subsampling is that all observations are used for both training and validation, and each observation is used for validation exactly once [24].

We repeat the classification process five times. Because we purpose to differentiate contents of sumsamples for each repetion. We try the number of cross validation between 2 and 15. As the number of cross validation increases, the size of subsamples decreases. As a result of this, the number of validation data decreased and the classification performance is affected negatively. So we decided to use about 10–12 images as training data and retain 3–5 images as the validation data. To achieve this, 3 and 4 is chosen as the number of cross validation in this study.

Feature selection of images

Most valuable 2000 components of the 122880 components are selected to training set by the relationship between the pooled variance of feature vectors. The reduced feature vectors are used in PCA. Pooled variance is a method of estimating variance in different samples within different circumstances [25]. The mean may vary between samples but the true variance (equivalently, precision) is assumed to remain the same. The pooled variance s p 2 is calculated with the Eq. 8.

$$ s_p^2 = \frac{{\sum\limits_{{i = 1}}^k {\left( {({n_i} - 1)s_i^2} \right)} }}{{\sum\limits_{{i = 1}}^k {\left( {{n_i} - 1} \right)} }} $$
(8)

where n i is the sample size of the ith sample, \( s_i^2 \) is the variance of the ith sample, and k is the number of samples being combined. n−1 is used instead of n for the reason it may be used in estimating variances from samples.

Statistical analysis of features

While analytic approaches compare the salient facial features detected from the face, holistic approaches make use of the information derived from the whole face. PCA is a typical holistic method, which is a statistical technique using the Karhunen–Loeve transform. Turk and Pentland [26] developed a well known eigenface method for both face representation and recognition using the PCA technique. PCA can achieve the optimal representation in the sense of mean-square error, but the differences between faces from different people seem to be more significant in face recognition [27]. Based on this observation, LDA [28] is applied for the Fisher face [29] methods. LDA defines a projection that makes the within-class scatter small and the between-class scatter large. As a result, PCA is normally adopted to reduce the feature dimension before LDA can be applied [30]. In this study PCA is applied to most valuable 2000 components in order to reduce dimension to 9. The ultimate objective is to achieve a classification with as few features as possible. Thus it is passed to the LDA space and dimension is reduced to 2. Hereby the classification process is performed with two features.

The statistical evaluation of classification

Classification methods

kNN and SVM methods are used in classification process. They are commonly used in pattern recognition applications [31, 32]. Euclidean distance metric was chosen in kNN method. The value of k is tried as 1,3,5,7,9, respectively. Similar results are obtained because of the low variance between classes. The results are seen in the tables with respect to k = 3. Linear kernel type is used in SVM method.

Performance and correctness measures

The classification methods are compared with different performance and correctness measures [33]. Firstly, confusion matrix is defined. The confusion matrix has two rows and two columns and it reports the number of true negatives (TN), false positives (FP), false negatives (FN), true positives (TP) in predictive analysis as:

$$ {\text{Confusion}}\;{\text{Matrix}} = \left[ {\begin{array}{*{20}{c}} {TP} & {FN} \\ {FP} & {TN} \\ \end{array} } \right] $$

Definitions are given below for our study.

  • True positive (TP): Down’s syndrome people correctly diagnosed as down’s syndrome

  • False positive (FP): Healthy people incorrectly identified as down’s syndrome

  • True negative (TN): Healthy people correctly identified as healthy

  • False negative (FN): Down’s syndrome people incorrectly identified as healthy.

The accuracy is the rate of true results (both true positives and true negatives) in the population. The formula of classification accuracy is defined by Eq. 9.

$$ {\text{Accuracy}} = \frac{{TP + TN}}{{TP + FP + FN + TN}} $$
(9)

The precision or positive predictive value (PPV) is defined as the rate of the true positives against all the positive results (both true positives and false positives). It is calculated with Eq. 10.

$$ {\text{Precision}} = \frac{{TP}}{{TP + FP}} = {\text{Positive}}\;{\text{Predictive}}\;{\text{Value}}\;\left( {\text{PPV}} \right) $$
(10)

The negative predictive value (NPV) is a summary statistic used to describe the performance of a diagnostic testing procedure. It is defined as the proportion of subjects with a negative test result who are correctly diagnosed. It is given in Eq. 11.

$$ {\text{Negative}}\;{\text{Predictive}}\;{\text{Value}}\;\left( {\text{NPV}} \right) = \frac{{TN}}{{TN + FN}} $$
(11)

Sensitivity and specificity are statistical measures of the performance of a binary classification test. Test sensitivity is the ability of a test to correctly identify those with the disease, whereas test specificity is the ability of the test to correctly identify those without the disease.

Sensitivity (also called recall rate in some fields) measures the proportion of actual positives which are correctly identified (e.g. the percentage of down’s syndrome people who are correctly identified). Sensitivity is defined by Eq. 12.

$$ {\text{Sensitivity}} = {\text{Recall}} = {\text{True}}\;{\text{Positive}}\;{\text{Rate}}\;\left( {\text{TPR}} \right) = \frac{{TP}}{{TP + FN}} $$
(12)

Specificity measures the proportion of negatives which are correctly identified (e.g. the percentage of healthy people who are correctly identified). Specificity is obtained with Eq. 13.

$$ {\text{Specificity}} = {\text{True}}\;{\text{Negative}}\;{\text{Rate}}\;\left( {\text{TNR}} \right) = \frac{{TN}}{{TN + FP}} $$
(13)

Results

The classification process is repeated five times. Because we purpose to differentiate the contents of the subsamples in K-fold cross validation. As increased the number of cross validation, classification performances are affected negatively. Therefore 3 and 4 was chosen as the number of cross validation in this study. It is easily understood from the Tables 2 and 3, the number of cross validation is important for kNN and SVM classifiers.

The test results are presented in confusion matrix format in Table 1. True Positive, True Negative, False Positive, False Negative values can be seen from Table 1.

Table 1 Confusion matrices: \( \left[ {\begin{array}{*{20}{c}} {TP} & {FN} \\ {FP} & {TN} \\ \end{array} } \right] \)

The results about performance and correctness measures of this study are presented in Tables 2 and 3. The averages of the results are considered as overall performance of classification methods. Because of the low variance between classes, similar results are obtained with respect to k values for kNN. The results are seen in the tables according to k = 3. Linear kernel type is chosen for SVM method.

Table 2 Results about performance and correctness measures (for K = 3)
Table 3 Results about performance and correctness measures (for K = 4)

Comparison table of this and other methods with respect to the performance and correctness measures is seen in Table 4. We have 30 face images which has 15 Down syndrome and 15 healthy. The classification accuracy is carried out 96% and 97,34%, while the sensitivity and specificity of the system is 0,96 and 0,973 with kNN and SVM methods, respectively. PPV and NPV values are 0,961 and 0,974. As can be seen from the results in Table 4, accuracy parameter of the developed method is better than other studies. Furthermore, while NPV and sensitivity parameters of the developed method are close to the LBP method; PPV and specificity parameters were found to be better with respect to our custom face database.

Table 4 Comparison table of methods according to the performance and correctness measures

Discussion and conclusion

Down syndrome is one of the dysmorphic disorders which has more signs on the faces. In this study, the feature vectors are obtained by Gabor wavelets. These wavelets can simulate human visual system. They are robust against local distortions caused by variance of expression, illumination and pose. The feature vectors is classified correctly by using kNN and SVM algorithms. Thus the images belonging to patients with Down's syndrome can be separated from the images belonging to healty people.

In this study, selection of the most valuable features is performed for the first time. This process is different from the studies related with the discrimination of Down’s Sydrome and healthy individuals.

Classification of facial dysmorphic disorders will provide preliminary diagnosis for medical doctors who has less experience about dysmorphology. Our studies on classification of dysmorphic syndromes by 2D and 3D face recognition algorithms are ongoing. In future works, we consider to expand the database with different syndrome images. By this way, more accurate classification can be realized. Also it is considered that more successful results will be achieved with the addition of other feature selection and classification methods.