Abstract
In the supplied ASD dataset, it is typically seen that there is an extremely large imbalance in the number of samples for two classes, leading to an imbalance. Without addressing this issue, applying binary classification algorithms to such data would produce an extremely biased result. It affects the relationships between features as well. These misclassifications could affect the decision regarding medical treatment and result in a protracted delay for those who urgently require medical intervention. In the current study, we use a variety of resampling strategies to address the issue of class imbalance. Precision, Recall, and F1-score are used as evaluation measures for all models. We have also looked at AUC score, which demonstrates encouraging outcomes for the use of resampling methods for imbalanced dataset.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Parellada M, Penzol MJ, Pina L, Moreno C, Gonz´alez-Vioque E, Zalsman G, Arango C (2014) The neurobiology of autism spectrum disorders. Eur Psychiatry 29(1):11–19. https://doi.org/10.1016/j.eurpsy.2013.02.005
Lord C, Risi S, DiLavore PS, Shulman C, Thurm A, Pickles A (2006) Autism from 2 to 9 years of age. Arch Gen Psychiatry 63(6):694–701. https://doi.org/10.1001/archpsyc.63.6.694
Hyman SL, Levy SE, Myers SM (2020) Identification, evaluation, and management of children with autism spectrum disorder. Pediatrics 145(1):694–701. https://doi.org/10.1542/peds.2019-3447
Association AP (2013) Diagnostic and statistical manual of mental disorders, 5th edn. American Psychiatric Association. https://doi.org/ https://doi.org/10.1176/appi.books.9780890425596
Allison C, Baron-Cohen S, Wheelwright S, Charman T, Richler J, Pasco G, Brayne C (2008) The q-chat (quantitative checklist for autism in toddlers): a normally distributed quantitative measure of autistic traits at 18–24 months of age: preliminary report. J Autism Dev Disord 38(8):1414–1425. https://doi.org/10.1007/s10803-007-0509-7
Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) Smote: synthetic minority oversampling technique. J Artif Intell Res 16(8):321–357. https://doi.org/10.1613/jair.953
Fern´andez A, del R´ıo S, Chawla NV, Herrera1 F (2017) An insight into imbalanced big data classification: Outcomes and challenges. Complex Intell Syst 3:105–120. https://doi.org/10.1007/s40747-017-0037-9
Abdeljaber F (2019) Detecting autistic traits using computational intelligence and machine learning techniques. Master of research thesis, Psychology Department, School of Health, University of Huddersfield, Huddersfield, UK. http://eprints.hud.ac.uk/id/eprint/34844/
Estabrooks A, Jo T, Japkowicz N (2004) A multiple resampling method for learning from imbalanced data sets. Comput Intell 20(1):18–36. https://doi.org/10.1111/j.0824-7935.2004.t01-1-00228.x
Thabtah F, Hammoud S, Kamalov F, Gonsalves A (2020) Data imbalance in classification: experimental evaluation. Inf Sci 513:429–441. https://doi.org/10.1016/j.ins.2019.11.004
Zheng Z, Cai Y, Li Y (2015) Oversampling method for imbalanced classification. Comput Inform 34(5):1017–1037. https://doi.org/10.1016/j.ins.2019.11.004
Thabtah F, Kamalov F, Rajab K (2018) A new computational intelligence approach to detect autistic features for autism screening. Int J Med Inform 117:112–124. https://doi.org/10.1016/j.ijmedinf.2018.06.009
Han H, Wang W-Y, Mao B-H (2005) Borderline-smote: a new oversampling method in imbalanced data sets learning. In: Huang D-S, Zhang X-P, Huang G-B (eds) Advances in intelligent computing ICIC. Lecture notes in computer science. Springer, Berlin, Heidelberg, pp 878–887. https://doi.org/10.1007/1153805991
Wang Q, Luo Z, Huang J, Feng Y, Liu Z (2017) A novel ensemble method for imbalanced data learning: bagging of extrapolation-smote svm. Comput Intell Neurosci (Article ID 1827016):11 https://doi.org/10.1155/2017/1827016
He H, Bai Y, Garcia EA, Li S (2008) Adasyn: adaptive synthetic sampling approach for imbalanced learning. In: 2008 IEEE international joint conference on neural networks, IEEE world congress on computational intelligence. pp 1322–1328. https://doi.org/10.1109/IJCNN.2008.4633969
Shelke MS, Deshmukh PR, Shandilya VK (2017) A review on imbalanced data handling using undersampling and oversampling technique. Int. J. Recent Trends Eng Res 3(4):444–449
Abdelhamid N, Padmavathy A, Peebles D, Thabtah F, GoulderHorobin D (2020) Data imbalance in autism pre-diagnosis classification systems: an experimental study. J Inf Knowl Manag 19(1). https://doi.org/10.1142/S0219649220400146
Rahman MM, Davis DN (2013) Addressing the class imbalance problem in medical datasets. Int J Mach Learn Comput 3(2):224–228. https://doi.org/10.7763/IJMLC.2013.V3.307
Li D-C, Liu C-W, Hub CS (2010) A learning method for the class imbalance problem with medical data sets. Comput Biol Med 40(5):509–518. https://doi.org/10.1016/j.compbiomed.2010.03.005
El-Sayed AA, Mahmood MAM, Meguid NA, Hefny HA ((2015)) Handling autism imbalanced data using synthetic minority over-sampling technique (smote). In: Third world conference on complex systems (WCCS). IEEE, pp 1–5. https://doi.org/10.1109/ICoCS.2015.7483267
Vakadkar K, Purkayastha D, Krishnan D (2021) Detection of autism spectrum disorder in children using machine learning technique. SN Comput Sci 2(5):1–9. https://doi.org/10.1007/s42979-021-00776-5
Das PR, Kumar CJ (2021) The diagnosis of asd using multiple machine learning techniques. Int J Dev Disabil. https://doi.org/10.1080/20473869.2021.1933730
Thabtah F, Spencer R, Abdelhamid N, Kamalov F, Wentzel C, Ye Y, Dayara T (2022) Autism screening: an unsupervised machine learning approach. Health Inf Sci Syst 10(1):26. https://doi.org/10.1007/s13755-022-00191-x
Thabtah F (2019) Machine learning in autistic spectrum disorder behavioral research: a review and ways forward. Inform Health Soc Care 44(3):278–297. https://doi.org/10.1080/17538157.2017.1399132
Acknowledgements
The work carried out by the first author is supported by the GATE scholarship from Ministry of Education, India.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Gupta, R.K., Dutta, K. (2023). Resampling Strategies for Mitigating Class Imbalance of ASD Dataset on the Performance of Machine Learning Classifiers. In: Borah, S., Gandhi, T.K., Piuri, V. (eds) Advanced Computational and Communication Paradigms . ICACCP 2023. Lecture Notes in Networks and Systems, vol 535. Springer, Singapore. https://doi.org/10.1007/978-981-99-4284-8_18
Download citation
DOI: https://doi.org/10.1007/978-981-99-4284-8_18
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-4283-1
Online ISBN: 978-981-99-4284-8
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)