1 Introduction

The constitution in Traditional Chinese medicine(TCM) refers to the relatively stable body traits of the individual to the internal and external environment of the body. It is a morphological structural psychological state and physiological function formed on the basis of congenital inheritance, which is a system concept formed by combining the Chinese medical discourse on human physique phenomena and the understanding of physique in many disciplines and the purpose of medical research [10]. Constitution phenomenon is an important manifestation of human life phenomenon. It has the characteristics of individual differences, relative stability, dynamic variability, etc. [53, 54].

Constitution classification is the basis and core content of constitution research in TCM. The purpose is to standardize human constitution categories, and then to give different personalized conditioning solution for different constitution types. So it is especially important for specific people to accurately identify their constitution type. The existing constitution identification method is judged by questionnaire survey. The individual fills in the scale according to the national standard “Classification and Judgment of TCM Constitution” (ZYYXH/T157.2009) [5], and then evaluates the individual’s completed scale according to the scoring rules provided by the standard, and then determines the constitution type of the individual. This kind of scale-based approach has some shortcomings in clinic.

  1. 1)

    Subjective factors of the individual are greatly affected. Individuals are not very familiar with some issues and it is difficult to accurately choose an answer. Second, individuals have concerns about some issues and are reluctant to choose real answers.

  2. 2)

    The number of questions on the scale is relatively large. It takes too long to answer these questions, which makes it easy for individuals to lose patience in the process of filling out the body mass table. The latter problems on the scale are often randomly selected, which will inevitably affect the correct judgment of constitution.

  3. 3)

    The calculation formula of the scoring rules is complicated, and it is impossible to accurately calculate the physical types of many individuals [33, 60]. The original score and conversion score need to be calculated through the table, and then the score is compared to the score interval of each constitution.

In order to solve this problem, machine learning algorithms have been applied to constitution recognition, including the convolutional neural network algorithms [18, 29, 35]. In particular, the convolutional neural networks have led a series of breakthroughs in the field of image classification [12, 17, 19, 59]. However, training a convolutional neural network model from scratch is not easy, and requires a long time. It even requires some patience and expertise about training neural networks [46], especially it requires a large amount of labeled training data. However, the hospital currently only has medical records and prescription data in the field of Traditional Chinese medicine, no image data, and then need to be re-prepared. Moreover, it is very expensive for experts to label large amounts of data in TCM, which is difficult to meet in a short time. Secondly, if the convolutional neural network model is trained directly with small clinical dataset, the accuracy is not good and even over-fitting problems may be encountered. Therefore, we propose a constitution recognition method based on transfer learning, called transfer constitution recognition(TCR). This paper first uses the DenseNet-169 [19] model trained in ImageNet [9] and modifies its network. And then the modified network is retrained with a small sample of clinical face dataset. Finally, it integrates multiple models to achieve individual constitution type. The main contribution of the paper is:

  1. 1)

    This paper constructs a Constitution Identification Network called ConstitutionNet. Firstly, the DenseNet-169 model trained in ImageNet. Secondly, the DenseNet-169 structure is modified according to the physical characteristics, and then the modified model is trained with the clinical dataset to obtain the constitution type. The ConstitutionNet has obtained a better accuracy of constitution recognition.

  2. 2)

    In order to further improve the accuracy of classification, our paper uses the integrated learning idea to integrate the ConstitutionNet with Vgg-16 [40], Inception v3 [44] and DenseNet-121 [19], and determine the physique type of the input image. Finally, the accuracy of constitution recognition is improved.

The rest of this paper is organized as follows. In Section 2, we briefly present the related work. Section 3 details the proposed method. Experimental results as well as the discussion are given in Section 4, and Section 5 concludes this paper.

2 Related work

The commonly used constitution type criteria is determined by the constitution questionnaire which is developed by wang [52] in the mainland, su [30,31,32, 41, 42] in Taiwan and wang [57] in Hong Kong. Wang et al. [52] divided the constitution into nine types, namely, gentleness, qi-deficiency, qi-depression, dampness-heat, phlegm-dampness, blood-stasis, special-diathesis, yang-deficiency and yin-deficiency. Su et al. [43] studied the acoustic characteristics of eight different constitutions and applied them to constitution recognition. Wang et al. [55] classified the constitution through pulse which applied the BP neural network, and demonstrated the rationality and superiority of this method. Convolutional neural network is a specific type of neural network, which is a feed-forward neural network, including convolutional layer, pooled layer and fully connected layer. Due to its outstanding performance, the convolutional neural network is widely used in many fields, such as image classification [56, 62], target detection [27, 63], image segmentation [13, 34], visual tracking [24,25,26], etc. Hu et al. [16] applied the convolution neural network to the pulse diagnosis. In the case of feature ambiguity, the proposed method was superior to other well-known methods. Li et al. [28] used the convolution neural network to extract the features of the pulse, and then classified the body constitution type. The experimental results show that this method can obtain high accuracy. Huan et al. [18] proposed a constitution recognition algorithm based on convolutional neural network, which trained a convolutional neural network model for constitution recognition on face data. Li et al. [29] proposed a constitution recognition algorithm based on deep neural network, which first detected the tongue image and then determined the body constitution type. Ma et al. [35] proposed a complex perception-based algorithm for constitution recognition, whose dataset is tongue picture.

Facing the problem of collecting enough training data to train the model, the purpose of transfer learning is to transfer knowledge learned from source domains in big data to target domains in smaller data. The transfer learning based CNN has been used in many fields [14, 20, 49]. Burdick et al. [3] applied transfer learning to segment skin lesions and led to good classification results. Kermany et al. [22] used transfer learning to construct a diagnostic tool for screening patients with common treatments for blinding retinal diseases. Rajpurkar et al. [37] proposed the CheXnet network for pneumonia detection through chest X-rays images. The algorithm used transfer learning technology and was trained through the DenseNet-121 model.

3 Method

The algorithm proposed in this paper is divided into four main parts, (1) Data Acquisition (2) Data Preprocessing (3) Data Augmentation (4) Constitution Recognition through transfer learning. The flow chart of the whole algorithm is shown in Fig. 1. First, the clinical face dataset is collected and preprocessed. Then, the pre-processed image dataset is subjected to data augmentation to obtain training data. Finally, the obtained training data is subjected to constitution recognition by transfer learning technology. The following sections provide a detailed description of the modules included in the architecture.

Fig. 1
figure 1

The flow chart of the whole algorithm

3.1 Data acquisition and preprocessing

The clinical face training dataset used in this paper has 12,730 images, with a type of constitution judged by clinical TCM experts. The constitution type is based on professor wang ‘s judgment criteria [52]. Before collecting data, the standard is discussed by nearly ten medical experts. Some agreed with this standard. Some professors were partially in favor of the standard. Some professors have a negative attitude on this standard. We chose three professors who were in favor of this standard. This means that they reached the consensus (agreement of standard) to determine the type of body constitution. Subsequently, they were in different hospitals to judge the patient’s body constitution according to the standard. In this way, the impact of experience can be reduced as much as possible. Besides, these professors are well known and their ages are close, and the personal experience is not greatly different. Finally, the body constitution type of the patient in the same hospital is determined by the same medical professor. The entire dataset is determined by three Chinese medicine professors from three different hospitals according to the above-mentioned standard.

Therefore, all face images are taken by the same type of digital device and the patient’s constitution type is specified by the doctor. The indoor environment is no sunshine, and lighting conditions are normal fluorescent lamps. In the face database, there are 8 kinds of constitution types, that is, gentleness, qi-deficiency, qi-depression, dampness-heat, phlegm-dampness, blood-stasis, yang-deficiency, and yin- deficiency. The number of each constitution type is shown in Table 1. Each constitution example is shown in Fig. 2. In the preprocessing process, firstly, the face detection algorithm is used to detect the acquired picture, and the corresponding bounding box is obtained. Considering both time complexity and precision, this paper uses OpenCV tool to complete the face detection.

Table 1 The number of each constitutional type
Fig. 2
figure 2

Each constitution example

3.2 Data augmentation

The dataset collected in this paper is limited. At the same time, the data augmentation technology can not only increase the size of the dataset, but also avoid the over-fitting. Then the data augmentation is applied in the collected dataset. The original image is preprocessed in the training phase. Each image is 224 × 224 in size. In this paper, the width and height of the image are scaled proportionally and the image is zoomed in both length and width direction. This paper uses the Keras [7] tool to achieve data augmentation through the functions it contains, just through setting the values of width_shift_range, height_shift_range and zoom_range in the ImageDataGenerator function. After data augmentation, these pictures are trained through transfer learning.

3.3 Classifier architecture

The current clinical data is limited. We don’t use this small dataset to train the entire CNN, but use the transfer learning method, which takes advantage of features previously learned from larger dataset. We propose a new constitution identification network (ConstitutionNet), as shown in Fig. 3. It first uses the DenseNet-169 model trained in ImageNet, and modifies the DenseNet-169 model according to the characteristics of the constitution.

  1. 1)

    The final fully connected output layer performs eight classifications (gentleness, qi-deficiency, qi-depression, dampness-heat, phlegm-dampness, blood-stasis, yang-deficiency, and yin-deficiency), not 1000 classes as previously designed for the ImageNet dataset.

  2. 2)

    In DenseNet-169, the Google’ Inception block [21] and the ResNet’ Residual block [12] are added before fully-connected layer. The Inception block can increase the width of the network. The receptive field of different branches is different, so there is multi-scale information contained in it, and the parameters are reduced. The Residual block enables the network to increase with depth without gradient degradation.

Fig. 3
figure 3

The network structure of ConstitutionNet

3.4 Integrated constitution identification

To further improve the accuracy of classification, we adopt integrated learning ideas. Different classification models have different classification effects for different categories, and the classification effect is improved by the complementarity between model classifications. Through experimental comparison, this paper chooses to integrate the ConstitutionNet with Vgg-16, Inception v3 and DenseNet-121. First, the network models VGG16, Inception v3, and DenseNet-121 are trained separately through the transfer learning, which all models are implemented and trained through the Keras tool. Second, a test face image is inputted into the VGG16, Inception v3, DenseNet-121, and ConstitutionNet models to separately compute the probability of each constitution in the test process. Third, the average operation is performed to the four probabilities of each constitution from each model so as to obtain the final probability of each constitution. Finally, the constitution type corresponding to the index of the maximum probability is taken as the recognized constitution type for the input image. As shown in Fig. 4, the input face image is judged by integrated model, and then improves the accuracy of the constitution recognition.

Fig. 4
figure 4

The structure diagram for integrated constitution recognition

4 Experiments

In this section, we conducted a series of experiments to measure the effectiveness of transfer learning applied to the body constitution recognition algorithm. The details of these experiments are described below.

4.1 Experiment settings

The tools used in this experiment are based on Keras [7], TensorFlow [48], Scikit-learn [39] and Scikit-image [51]. The GPU is NVIDIA GTX Titan X, and its memory size is 12 GB. The operating system is Ubuntu 14.04. The face training dataset used in this paper has a total of 12,370 images, and the test dataset has 533 images. The whole network is trained by random gradient method. The learning rate is 0.0002, the momentum is set to 0.9, and the batch size is set to 30. In data augmentation processing, the values of width_shift_range, height_shift_range, and zoom_range are set to 0.2.

4.2 Experiment results

The purpose of the experiment is to verify the effectiveness of the proposed method. First of all, this paper chooses to compare with the traditional feature extraction methods to verify the advantages of the deep learning method. Second, comparing with some representative deep learning models, we demonstrate the effectiveness of our proposed constitution recognition network. Finally, our proposed integrated constitution recognition method is compared with the method we proposed earlier, indicating that the method of this paper has made new progress.

4.2.1 Comparison of different feature extraction methods

There are many methods for traditional image feature extraction. The ConstitutionNet network is also used to extract features. We compare it with traditional feature extraction methods to prove the superiority of deep learning methods. Methods for extracting facial image features include histogram of oriented gradient(HOG) [8], local binary patterns (LBP) features [1], Haar-Like features [50], and based on the ConstitutionNet. Considering that the same classifier is used, the classifiers of different feature extraction methods have different effects, and the same feature extraction method has different effects of different classifiers. Therefore, the classifier chooses a different principle classifier, namely, Logistic Regression classifier (LR) [61], Naive Bayes classifier (NB) [2], Support Vector Machine classifier (SVM) [11], Random Forest Classifier(RF) [36], KNN classifier(KNN) [38], Decision Tree classifier(DC) [4]. Among them, the kernel function in the support vector machine is the RBF.

It can be seen from Table 2 that under the premise of the same classifier, the classification effect based on the ConstitutionNet is better than that single LBP, HOG and Haar-like features. Moreover, the classification effect based on LBP features is significantly better than that based on single HOG and Haar-like in the traditional feature extraction method under the same classifier, namely, SVM, KNN, Softmax, Naïve Bayes. But the classification effect based on HOG features is significantly better than that based on single LBP and Haar-like under the same classifier, namely, Random Forest, Decision Tree. At the same time, the classification effects of different classifiers are compared under the same feature extraction method. Based on the single Haar-like feature and LBP feature, the SVM has the best classification accuracy. Based on the single HOG feature, the random forest classification works best. Based on the features extracted by the ConstitutionNet, the logistic regression has the best classification effect. Overall, the ConstitutionNet network is far better than other feature extraction methods.

Table 2 Classification results of different classifiers under different feature extraction

Further, we use the confusion matrix to analyze the sensitivity of various methods to different constitution type. The confusion matrix under the SVM algorithm based on LBP feature is shown in Table 3. It can be seen from Table 3 that the algorithm has the best classification effect on qi-deficiency and the worst classification effect on gentleness. The gentleness is misclassified to qi-deficiency. The confusion matrix of the random forest algorithm based on HOG feature is shown in Table 4.It can be seen from Table 4 that the algorithm has the best classification effect on the phlegm-dampness and the classification effect on the gentleness is not good. And the confusion matrix of Softmax based on ConstitutionNet is shown in Table 5. It can be seen from Table 5 that the algorithm has the best classification of yin-deficiency and the worst classification effect on gentleness. From these confusion matrices, the combination of different feature extraction methods and classifiers has different characteristics, providing a basis for further combination classification.

Table 3 the confusion matrix under the SVM algorithm based on LBP feature
Table 4 the confusion matrix of the random forest algorithm based on HOG feature
Table 5 the confusion matrix of softmax based on ConstitutionNet

In order to analyze the characteristics of the ConstitutionNet network from different aspects, we analyze the classification effect of the combination with multiple classifiers through the receiver operating characteristic curve. Based on the ConstitutionNet feature extraction, the ROC curves of different classifiers are shown in Fig. 5. As can be seen from Fig. 5, the Logistic Regression classifier has the largest area, with a value of 0.66, indicating that the Logistic Regression classifier performs best. The Decision Tree classifier has the smallest area with a value of 0.53, which indicates that the Decision Tree classifier has the worst performance.

Fig. 5
figure 5

the ROC curves of different classifiers based on the ConstitutionNet

Since the combination of the ConstitutionNet feature extraction method and the softmax classifier works best, we list the ROC curves for each label similar to the confusion matrix. As shown in Fig. 6, it can be seen that the ROC area of the yin -deficiency is 0.96, and the area of the gentleness is 0.66, indicating that the classification of yin-deficiency is the best and the classification of gentleness is the worst. In addition to the ROC curve, we also use other indicators to evaluate the classifier effects, namely macro-averaging and precision-recall curves. The experimental results are shown in Figs. 7 and 8. The macro-averaging is used by classifiers to measure the validity of small class discrimination. From Fig. 7, it can be seen that the ROC area of macro-averaging is 0.86. In the training dataset collected in this paper, the number of physical types of gentleness, yang-deficiency and blood-stasis is small, indicating that the classifier has a good classification effect on the three type. From Fig. 8, it can be seen that the precision-recall curve of the support vector machine, the random forest and the decision tree fluctuates significantly, and the precision-recall curve of the Regression classifier is relatively smooth. It can also be seen that the area under the precision-recall is relatively large, indicating the Logistic Regression classification works well.

Fig. 6
figure 6

the ROC curve of each label in softmax based on ConstitutionNet

Fig. 7
figure 7

the micro/macro-average of softmax based on ConstitutionNet

Fig. 8
figure 8

the precision-recall curve of different classifiers based on ConstitutionNet

4.2.2 Comparison with other deep network models

There are some representative deep learning networks. Krizhevsky et al. [23] constructed an AlexNet network to classify in the ImageNet and achieved the best results. Later, different scholars proposed different network for classification in the ImageNet, such as VGG-16 [40], ResNet [12], Inception V4 [45], Xception [6],MobileNet v1 [15],SENet [17],CBAM [58] and EfficientNets [47]. These networks can be used for constitution identification, and the effects are different. In order to verify the effect of the constitution recognition network proposed in this paper, we also use transfer training to train these networks on the same training dataset. The experimental results are shown in Table 6. It is not difficult to see that the ConstitutionNet model works best, with an accuracy rate of 65.67%.

Table 6 The classification result of different model

4.2.3 Comparison of integrated constitution recognition

In order to further improve the accuracy of face recognition, this paper uses integrated learning ideas and selects four deep learning models in Table 6 for integration. They are Vgg-16, Inception v3, DenseNet-121 and ConstitutionNet. These four models are all trained on the same training dataset using transfer learning. However, the test dataset uses the literature [18], the purpose is to compare with the previous method, indicating that the proposed method has been further improved. The test results are shown in Table 7. It can be seen from Table 7 that the test result of the convolutional neural network is 64.54%, and then the result by fusion of CNN and color feature is 65.29% in the literature [18]. In this paper, the result of ConstitutionNet is 65.67%, which is little higher than the result of the literature [18]. The accuracy of the integrated recognition is 66.79%, which is higher than that of all comparison models. The confusion matrix the integrated constitution recognition is shown in Table 8. It is not difficult to see that the integrated model is the best for yin-deficiency, but the classification of the gentleness is the worst, indicating that there is model with poor classification for gentleness in the model integration. If the model is well classified for gentleness, it can further improve the classification effect by model integration test.

Table 7 Comparison of results from different literature methods
Table 8 Confusion matrix of model integration test

5 Conclusion

Face consultation is an important diagnosis method in the Traditional Chinese medicine. This paper applies face diagnosis to constitution recognition. Because the clinical dataset of physique type is very limited and we want to take the great advantages of deep learning network, this paper proposes the ConstitutionNet for constitution recognition which is obtained through the transfer learning. The DenseNet-169 model trained in ImageNet is used for physique recognition through transfer learning technology, and then the model is modified to further suit the TCM constitution recognition. In order to further improve the accuracy of constitution recognition, an integrated physique recognition method is proposed. The basic classifier includes ConstitutionNet and the other three most representative deep networks, with accuracy of 66.79%. Experiments show that transfer learning and integrated learning are effective for constitution recognition with limited clinical data. The future work is to explore more mechanisms of transfer learning methods and to be used for constitution recognition.