Comparison of Artificial Neural Networks with Logistic Regression for Detection of Obesity

Heydari, Seyed Taghi; Ayatollahi, Seyed Mohammad Taghi; Zare, Najaf

doi:10.1007/s10916-011-9711-4

Comparison of Artificial Neural Networks with Logistic Regression for Detection of Obesity

ORIGINAL PAPER
Published: 10 May 2011

Volume 36, pages 2449–2454, (2012)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Journal of Medical Systems Aims and scope Submit manuscript

Comparison of Artificial Neural Networks with Logistic Regression for Detection of Obesity

Download PDF

Seyed Taghi Heydari¹,
Seyed Mohammad Taghi Ayatollahi¹ &
Najaf Zare¹

795 Accesses
21 Citations
Explore all metrics

Abstract

Obesity is a common problem in nutrition, both in the developed and developing countries. The aim of this study was to classify obesity by artificial neural networks and logistic regression. This cross-sectional study comprised of 414 healthy military personnel in southern Iran. All subjects completed questionnaires on their socio-economic status and their anthropometric measures were measured by a trained nurse. Classification of obesity was done by artificial neural networks and logistic regression. The mean age±SD of participants was 34.4 ± 7.5 years. A total of 187 (45.2%) were obese. In regard to logistic regression and neural networks the respective values were 80.2% and 81.2% when correctly classified, 80.2 and 79.7 for sensitivity and 81.9 and 83.7 for specificity; while the area under Receiver-Operating Characteristic (ROC) curve were 0.888 and 0.884 and the Kappa statistic were 0.600 and 0.629 for logistic regression and neural networks model respectively. We conclude that the neural networks and logistic regression both were good classifier for obesity detection but they were not significantly different in classification.

A Hybrid Machine Learning Model for Estimation of Obesity Levels

Analysis of computational intelligence techniques for diabetes mellitus prediction

Article 06 April 2017

Enhancing Impaired Waist-to-Height Ratio Classification Using Neural Networks

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Introduction

The prevalence of obesity has been increasing globally, making it an important health problem both in developed and developing countries in which it is reaching epidemic proportions [1–6]. According to the World Health Organization in 2005 approximately 1.6 billion adults (age > 15) were overweight, at least 400 million adults were obese and at least 20 million children under the age of 5 years were overweight globally. The number of overweight and obese individuals is estimated to reach 2.3 billion and over 700 million persons by 2015 [7].

There are different methods such as dual energy x-ray absorptiometry (DXA), hydrostatic weighing (HW), body mass index (BMI) and bioelectrical impedance analysis (BIA) for classification of persons to obese, overweight or normal weight. Among them DXA and HW are expensive, time consuming, invasive and not used in epidemiology studies, but BMI and BIA are simple, quick, noninvasive and used in epidemiology studies.

Artificial neural networks which were inspired by neurobiology and by the architecture of the human brain are non-parametric pattern recognition techniques that find “hidden patterns” between dependent and independent variables [8].

In recent years, neural networks have gained wide popularity in many disciplines of science and medicine. Neural networks models can learn from examples, incorporate large number of variables, and provide adequate and quick response to new information not previously presented to them.

Neural networks was rarely used to classification obesity. In the previous studies, the factors used for obesity classification with neural networks were clinical factor that are difficult to obtain [9]. In current study, based on anthropometric measures, obesity classification was performed by neural networks and logistic regression and the results of two methods compared with ROC curve.

Materials and methods

Study design and subjects

This investigation was conducted with a cross-sectional design to determine the prevalence of obesity from February to March 2010 among 465 (51 subjects were excluded because some measurements of them were missed) military personnel in southern Iran. The study was approved by the committee of research ethics in Military Service, and University of Medical Sciences.

Measurements

Demographic and life style information including; age, marital status, level of education, duration of physical activity per week, history of smoking, with its frequency were obtained from each individual. Anthropometric measures including; weight, height, mid-upper arm circumference (MUAC), waist circumference (WC), hip circumference (HC), triceps skinfold and abdomen thicknesses measured by the trained auxologists using techniques presented by Cameron [10]. WC was measured to unit millimeter, as an average of three measurements, by measuring waist diameter at midpoint between iliac crest and lower border of tenth rib. The triceps skinfold and abdomen thicknesses were measured to unit millimeter in triplicate, using a Harpenden caliper device, applying the average of the right and left of subjects. Body weights (measured to the last 10 g) and heights (measured to unit centimeter) were measured using Seca digital scale (Germany) and a non-stretchable wall meter, respectively. Body mass index (BMI) was calculated by weight in kilogram divided by square of height in meter $ \left( {\frac{{kg}}{{{m^2}}}} \right) $. Body Fat Mass (BFM) and body fat percent (%BF) were obtained using BIA method by hand-to-hand Omron BF-500 set, Japan. The cutoff point of obesity according to American College of Sports Medicine is defined as body fat percent > 25% [11]. We asked participants to sit in a silent room for 5 min and afterward we measured the arterial blood pressure from the right arm by a calibrated digital instrument for two times.

Logistic regression

The logistic regression most frequently used to determine the risk of relationships between disease and exposure [12]. Several studies used logistic regression for classification and prediction of the obesity [13–16], where the outcome is a binary variable: the obese or normal. In this case, the probability P of obesity is derived as a function of the following combination of obesity parameters:

$$ { \log }\left[ {\frac{\text{p}}{{{\text{1 - p}}}}} \right] = {\beta_0} + {\beta_1}{x_1} + \ldots + {\beta_k}{x_k} $$

where p is the probability of a binary response, β ₀, β ₁,…, β _k are the coefficients associated with each explanatory variable x ₁, x ₂,…, x _k. A nonzero regression coefficient β _i represents a correlation between the variable x _i and the outcome. The regression coefficients and their standard deviations are calculated using a least-squares fit, and their statistical significance is determined by the P value using the Wald test. If P value less than 0.05, β _i was a nonzero coefficient.

Neural networks

An artificial neural network is a computing system consisting of simple inter-connected processing elements called neurons. The input signals (input data) pass through the network of neurons to generate the network response(s). Each neuron (except the input ones) receives the information from several neurons through connection in proportion to their weights, sums them up and modifies the sum through a non-linear transfer function before passing the signal to other neurons [17].

A Multi-layer perceptron with input layer, hidden layer and output layer was used. The back-propagation algorithm was used for training of the models. A hyperbolic-tangent transfer function was used in all cases.

It is important to adjust learning rate and momentum term during the learning process of the neural networks. High weights may destroy learning behavior of the neural networks. Learning rate set as small value to prevent selection of high weights. Small learning rate slow downs the learning process. Learning rate and momentum is set as 0.1 and 0.7 [8].

Input layer consists of 11 neurons corresponding to independent measures (age, systole, diastole, weight, height, BMI, MUAC, WC, HC, triceps skinfold and abdominal thicknesses). Such measures were used to classify participants into two groups (normal and obese). In hidden layer, different number of neurons were used for optimal selection of networks architecture and preventing over training (Fig. 1). Data was divided into three groups: training (60% of total data), cross-validation (15%) and test (25%) sets. Before training the model input data were normalized. The network was trained in 5000 epochs for different number of neurons in hidden layer. In each epoch training data set was selected randomly to prevent learning especial order of data.

To select the best number of neurons in the hidden layer, the mean squared error (MSE) of training and cross-validation data were computed for different neurons. Minimum of MSE cross-validation is a measure for best number of neuron in hidden layer. Performance evaluation was done by test data.

Neural network and logistic regression analysis were compared by receiver operating characteristic (ROC) curve and Kappa measurement of agreement. The relationship between sensitivity and specificity is exhibited by ROC analysis when a predictive output for two groups is continuous. The area under ROC curve is calculated for performance of the model. The ROC curve was obtained by plotting sensitivity rate against 1-specificity rate for all possible cut-off points. According to the obesity input variables, a cutoff value was selected to classify the normal or obese persons correctly [18].

Descriptive statistics, Kappa measurement of agreement and logistic regression were performed by Statistical Package for Social Sciences (SPSS) version 15. Neural networks was performed by MATLAB software version 7.8. ROC curve was analyzed by MedCalc statistical software version 8.

Results

Among 465 subjects, 414 (89.0%) of them had complete data. The mean±SD age of participants was 34.4 ± 7.5 years (ranging from 20 to 54 years). A total of 187 (45.2%) had obese. In obese group the mean of diastole, weight, body mass index, mid-upper arm circumference, waist circumference, triceps skinfold thickness, abdominal and hip circumference were significantly more than normal group. But the age, systole and height were similar in two groups (Table 1).

Table 1 Mean ± SD anthropometric characteristics based on subject’s obesity

Full size table

Explanatory variables in logistic regression and neural networks were age, systole, diastole, weight, height, BMI, MUAC, WC, HC, triceps skinfold and abdominal thicknesses. Using binary logistic regression abdominal thickness, weight, BMI, HC were significantly associated with obese, while other explanatory variables were removed from model by backward method (Table 2). The true and predicted status of participant, classified by logistic regression for all subjects showed in Table 3. Logistic regression method correctly classified 80.2% as obese with sensitivity and specificity being 80.2 and 81.9 (Table 4).

Table 2 Odds ratios and coefficients of binary logistic regression analysis of factors associated with obese

Full size table

Table 3 Number of correct predictive values of obese using logistic regression and neural networks

Full size table

Table 4 Sensitivity, specificity and area under ROC curve using logistic regression and neural networks

Full size table

The other method for classification of participants in two groups was neural networks. The result of cross validation MSE to select the best network architecture in hidden layer was shown in Fig. 2. The minimum MSE occurred at ten neurons in hidden layer. The network was then selected at 11 neurons in input layer, ten neurons in hidden layer and one neuron in output layer.

Neural networks correctly classified 80.6% of cases as obese in the training dataset, 82.7% in the testing set, 82.3% in the cross-validation, and 81.2% of all subjects (Table 3). The sensitivity and specificity were 79.7 and 83.7 respectively (Table 4).

The area under ROC curve were 0.888 and 0.884 (Fig. 3) and the kappa statistic were 0.600 and 0.629 for logistic regression and neural networks model respectively, showing that the classification model was not done by chance (P < 0.001).

Discussion

The popularity of logistic regression may be due to the interpretability of model parameters and being user-friendly. For neural networks, this may be because of the fact that these models may be viewed as nonlinear generalizations of logistic regression, and thus at least as powerful as logistic regression [19]. Neural networks is a new method in medicine. This method is rarely used in obesity [20, 21]. For classification, two studies have been performed on obesity [9, 22]. In our study, the obesity was classified by logistic regression and neural networks. In neural networks all 11 measures were entered into input layer for analysis while in logistic regression four influential variables entered model by backward method. BMI had the highest odds ratio.

The area under ROC curve and Kappa statistic were utilized in order to compare neural networks and logistic regression. In logistic regression and neural networks the area under ROC curve were 0.888 and 0.884 respectively. However the neural network had slightly lower value but it was not significant (P = 0.365). In a study for classification of obesity disease area under ROC curve for logistic regression and neural networks were 0.902 and 0.923 [9]. The other study for classification of diabetes showed that area under ROC curve were 0.76 and 0.77 for logistic regression and neural networks respectively [23]. These studies showed that the area was not significantly different in two models, then logistic regression and neural networks were similar in classification subjects.

The Kappa value for logistic regression and neural networks were 0.600 and 0.629 respectively. Similar to area under ROC curve, the Kappa statistic for neural networks was barely higher than logistic regression.

The ten ordered groups were created based on their estimated probability with logistic regression and neural networks; those with estimated probability below 0.1 form one group, and so on, up to those with probability 0.9 to 1.0. Each of these categories is further divided into two groups based on the actual obesity status.

Logistic regression and neural networks were almost similarly correctly classified participants to obese and normal in the first, second, 9th and 10th deciles, but both did not have good performances in 3rd, 4th, 5th, 6th, 7th deciles (Table 5). This stated that the problem was mainly in boundary condition with crisp decision.

Table 5 Deciles of risk base on logistic regression and neural networks

Full size table

In neural networks the relation between input and output factors due to network complexity are not clear. Neural networks are black box model. Hart and Wyatt argued that this “black box” aspect is the major barrier to the acceptance of neural networks for medical decision systems [24]. Several researches in medical diagnosis have shown that neural networks have almost similar result in classification in compression to logistic regression [9, 23, 25]. Analysis of 72 papers compared neural networks and logistic regression in medicine indicated that neural networks in half of the studies did not outperform the logistic regression [19].

Neural networks and logistic regression were compared for childhood obesity prediction. The results showed the neural networks had better performance than logistic regression [26]. Bourdès et al. showed the neural networks predicted breast cancer better than logistic regression [27]. Also Shadabi et al. compared artificial neural networks with logistic regression in prediction of kidney transplant outcomes and concluded the neural networks was better predictor [28].

We conclude that the neural networks and logistic regression both were good classifier for obesity detection but they were not significantly different in classification. Further researches on neural networks training are suggested to customize it for obesity classification.

References

Ayatollahi, S. M. T., and Heydari, S. T., Obesity indices among infants and their parents, Shiraz, Iran. Iranian Journal of Medical Sciences 29:161–167, 2004.
Google Scholar
Li, Z., Bowerman, S., and Heber, D., Health ramifications of the obesity epidemic. The Surgical Clinics of North America 85:681–701, 2005.
Article Google Scholar
Rashidy-Pour, A., Malek, M., Eskandarian, R., and Ghorbani, R., Obesity in the Iranian population. Obesity Reviews 10:2–6, 2009.
Article Google Scholar
Schokker, D. F., Visscher, T. L., Nooyens, A. C., van Baak, M. A., and Seidell, J. C., Prevalence of overweight and obesity in the Netherlands. Obesity Reviews 8:101–108, 2007.
Article Google Scholar
Wang, Y., and Lobstein, T., Worldwide trends in childhood overweight and obesity. International Journal of Pediatric Obesity 1:11–25, 2006.
Article Google Scholar
Heydari, S. T., Vakili, M. A., and Emamghorashi, F., Longitudinal pattern and reference values of obesity indices of infants in Jahrom (Southern Region of), Iran. Iranian Journal of Pediatrics 18:38–46, 2008.
Google Scholar
Organization W.H., Obesity and overweight. In 2006.
Haykin, S., Neural networks: a comprehensive foundation. Upper Saddle River, New Jersey, 1999.
MATH Google Scholar
Ergün, U., The classification of obesity disease in logistic regression and neural network methods. Journal of Medical Systems 33:67–72, 2009.
Article Google Scholar
Cameron, N., The measurement of human growth. Routledge Kegan & Paul, 1984.
American College of Sports Medicine. ACSM’s Guidelines for Exercise Testing and Prescription. Lippincott Williams and Wilkins, 2005.
Hosmer, D., and Lemeshow, S., Applied logistic regression. Wiley-Interscience, 2000.
Banerji, A., Rudders, S. A., Corel, B., Garth, A. P., Clark, S., and Camargo, C. A., Jr., Predictors of hospital admission for food-related allergic reactions that present to the emergency department. Annals of Allergy, Asthma & Immunology 106:42–48, 2011.
Article Google Scholar
Dong, X., Zhang, B., Kang, M. X., Chen, Y., Guo, Q. Q., Wu, Y. L., Analysis of Pancreatic Fistula According to the International Study Group on Pancreatic Fistula Classification Scheme for 294 Patients Who Underwent Pancreaticoduodenectomy in a Single Center. Pancreas, 2010.
Goycochea-Robles, M. V., Sanin, L. H., Moreno-Montoya, J., et al., Validity of the COPCORD core questionnaire as a classification tool for rheumatic diseases. The Journal of Rheumatology. Supplement 86:31–35, 2011.
Article Google Scholar
Hasenbring, M. I., Kreddig, N., Deges G. et al., Psychological impact of genetic counseling for hereditary nonpolyposis colorectal cancer: The role of cancer history, gender, age, and psychological distress. Genet. Test Mol. Biomarkers, 2010.
Dreyfus, G., Neural networks: methodology and applications. Springer Verlag, 2005.
Mobley, B., Schechter, E., Moore, W., McKee, P., and Eichner, J., Predictions of coronary artery stenosis by artificial neural network. Artificial Intelligence in Medicine 18:187–203, 2000.
Article Google Scholar
Dreiseitl, S., and Ohno-Machado, L., Logistic regression and artificial neural network classification models: a methodology review. Journal of Biomedical Informatics 35:352–359, 2002.
Article Google Scholar
Adnan, M., Husain, W., Damanhoori, F., A survey on utilization of data mining for childhood obesity prediction. In IEEE, 2010, 1–6.
Valavanis, I., Mougiakakou, S., Grimaldi, K., and Nikita, K., A multifactorial analysis of obesity as CVD risk factor: use of neural network based methods in a nutrigenetics context. BMC Bioinformatics 11:453, 2010.
Article Google Scholar
Zhang, S., Tjortjis, C., Zeng, X., Qiao, H., Buchan, I., and Keane, J., Comparing data mining methods with logistic regression in childhood obesity prediction. Information Systems Frontiers 11:449–460, 2009.
Article Google Scholar
Kazemnejad, A., Batvandi, Z., and Faradmal, J., Comparison of artificial neural network and binary logistic regression for determination of impaired glucose tolerance/diabetes. Eastern Mediterranean Health Journal 16:615–620, 2010.
Google Scholar
Hart, A., and Wyatt, J., Evaluating black-boxes as medical decision aids: issues arising from a study of neural networks. Informatics for Health & Social Care 15:229–236, 1990.
Article Google Scholar
Llorca, J., Dierssen-Sotos, T., Gomez-Acebo, I., Gonzalez-Castro, A., and Minambres, E., Artificial neural networks predict mortality after lung transplantation better than logistic regression. The Journal of Heart and Lung Transplantation 28:1237–1238, 2009.
Article Google Scholar
Bioelectrical impedance analysis in body composition measurement: National Institutes of Health Technology Assessment Conference Statement. Am. J. Clin. Nutr. 64:524S–532S, 1996.
Bourdès, V. S., Bonnevay, S., Lisboa, P. J. et al., Breast cancer predictions by neural networks analysis: a comparison with logistic regression. Conference proceedings : … Annual International Conference of the IEEE Engineering in Medicine and Biology Society. IEEE Engineering in Medicine and Biology Society. Conference 2007, 2007, 5424–5427.
Shadabi, F., and Sharma, D., Comparison of artificial neural networks with logistic regression in prediction of kidney transplant outcomes. In IEEE, 2009, 543–547.

Download references

Author information

Authors and Affiliations

Department of Biostatistics, School of Medicine, Shiraz University of Medical Sciences, Shiraz, IR, Iran
Seyed Taghi Heydari, Seyed Mohammad Taghi Ayatollahi & Najaf Zare

Authors

Seyed Taghi Heydari
View author publications
You can also search for this author in PubMed Google Scholar
Seyed Mohammad Taghi Ayatollahi
View author publications
You can also search for this author in PubMed Google Scholar
Najaf Zare
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Seyed Mohammad Taghi Ayatollahi.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Heydari, S.T., Ayatollahi, S.M.T. & Zare, N. Comparison of Artificial Neural Networks with Logistic Regression for Detection of Obesity. J Med Syst 36, 2449–2454 (2012). https://doi.org/10.1007/s10916-011-9711-4

Download citation

Received: 13 January 2011
Accepted: 12 April 2011
Published: 10 May 2011
Issue Date: August 2012
DOI: https://doi.org/10.1007/s10916-011-9711-4

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Comparison of Artificial Neural Networks with Logistic Regression for Detection of Obesity

Abstract

Similar content being viewed by others

A Hybrid Machine Learning Model for Estimation of Obesity Levels

Analysis of computational intelligence techniques for diabetes mellitus prediction

Enhancing Impaired Waist-to-Height Ratio Classification Using Neural Networks

Introduction