A Hybrid Approach for Missing Data Imputation in Gene Expression Dataset Using Extra Tree Regressor and a Genetic Algorithm

Yadav, Amarjeet; Rasool, Akhtar; Dubey, Aditya; Khare, Nilay

doi:10.1007/978-981-99-0047-3_12

Amarjeet Yadav⁴¹,
Akhtar Rasool ORCID: orcid.org/0000-0001-9964-2414⁴¹,
Aditya Dubey ORCID: orcid.org/0000-0002-4885-0632⁴¹ &
…
Nilay Khare⁴¹

Part of the book series: Lecture Notes in Electrical Engineering ((LNEE,volume 998))

Included in the following conference series:

International Conference on Machine Intelligence and Signal Processing

360 Accesses

Abstract

Missing data can produce a significant risk of yielding inaccurate deductions due to the lack of critical attribute values. In gene expression data, missing values are prominent because of the apparatus error, inefficient techniques used for measurements, abraded slides, etc. These missing values create issues in visualizing gene features and other biological studies. Hence, for the study of the structural information of the gene expressions, efficient prediction of missing values becomes crucial. Consequently, the problem of accurate imputation of missing values has obtained considerable interest from researchers. To address this challenge, this paper presents a hybrid model used for imputing missing values in the gene expression dataset. The proposed model utilizes a machine learning-based ensemble technique known as Extra tree regression and genetic algorithm to optimize parameters of the K-Means clustering algorithm. Then optimized K-Means algorithm is used to estimate missing values in the dataset. This paper discusses the impact of distinct missing ratios on the performance of the proposed model and also compares accuracy with baseline models.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 259.00; Price excludes VAT (USA)

Softcover Book: USD 329.99; Price excludes VAT (USA)

Hardcover Book: USD 329.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Missing Value Imputation Using Correlation Coefficient

Missing-Values Imputation Algorithms for Microarray Gene Expression Data

Effectiveness of Different Partition Based Clustering Algorithms for Estimation of Missing Values in Microarray Gene Expression Data

References

Gan X, Liew AWC, Yan H (2006) Microarray missing data imputation based on a set theoretic framework and biological knowledge. Nucleic Acids Res 34(5):1608–1619
Article Google Scholar
Pedersen AB, Mikkelsen EM, Cronin-Fenton D, Kristensen NR, Pham TM, Pedersen L, Petersen I (2017) Missing data and multiple imputation in clinical epidemiological research. Clin Epidemiol 9:157
Article Google Scholar
Dubey A, Rasool A (2020) Time series missing value prediction: algorithms and applications. In: International Conference on Information, Communication and Computing Technology. Springer, pp. 21–36
Google Scholar
Trevino V, Falciani F, Barrera- HA (2007) DNA microarrays: a powerful genomic tool for biomedical and clinical research. Mol Med 13(9):527–541
Article Google Scholar
Chakravarthi BV, Nepal S, Varambally S (2016) Genomic and epigenomic alterations in cancer. Am J Pathol 186(7):1724–1735
Article Google Scholar
Chi JT, Chi EC, Baraniuk RG (2016) k-pod: A method for k-means clustering of missing data. Am Stat 70(1):91–99
Article MathSciNet MATH Google Scholar
Aydilek IB, Arslan A (2013) A hybrid method for imputation of missing values using optimized fuzzy c-means with support vector regression and a genetic algorithm. Inf Sci 233:25–35
Article Google Scholar
Dubey A, Rasool A (2020) Clustering-based hybrid approach for multivariate missing data imputation. Int J Adv Comput Sci Appl (IJACSA) 11(11):710–714
Google Scholar
Gomer B (2019) Mcar, mar, and mnar values in the same dataset: a realistic evaluation of methods for handling missing data. Multivar Behav Res 54(1):153–153
Article Google Scholar
Meng F, Cai C, Yan H (2013) A bicluster-based bayesian principal component analysis method for microarray missing value estimation. IEEE J Biomed Health Inform 18(3):863–871
Article Google Scholar
Liew AWC, Law NF, Yan H (2011) Missing value imputation for gene expression data: computational techniques to recover missing data from available information. Brief Bioinform 12(5):498–513
Article Google Scholar
Li H, Zhao C, Shao F, Li GZ, Wang X (2015) A hybrid imputation approach for microarray missing value estimation. BMC Genomics 16(S9), S1
Google Scholar
Troyanskaya O, Cantor M, Sherlock G, Brown P, Hastie T, Tibshirani R, Botstein D, Altman RB (2001) Missing value estimation methods for DNA microarrays. Bioinformatics 17(6):520–525
Article Google Scholar
Oba S, Sato Ma, Takemasa I, Monden M, Matsubara, Ki, Ishii S (2003) A Bayesian missing value estimation method for gene expression profile data. Bioinformatics 19(16), 2088–2096
Google Scholar
Celton M, Malpertuy A, Lelandais G, De Brevern AG (2010) Comparative analysis of missing value imputation methods to improve clustering and interpretation of microarray experiments. BMC Genomics 11(1):1–16
Article Google Scholar
Kim H, Golub GH, Park H (2005) Missing value estimation for DNA microarray gene expression data: local least squares imputation. Bioinformatics 21(2):187–198
Article Google Scholar
Ouyang M, Welsh WJ, Georgopoulos P (2004) Gaussian mixture clustering and imputation of microarray data. Bioinformatics 20(6):917–923
Article Google Scholar
Sehgal MSB, Gondal I, Dooley LS (2005) Collateral missing value imputation: a new robust missing value estimation algorithm for microarray data. Bioinformatics 21(10):2417–2423
Article MATH Google Scholar
Burgette LF, Reiter JP (2010) Multiple imputation for missing data via sequential regression trees. Am J Epidemiol 172(9):1070–1076
Article Google Scholar
Yu Z, Li T, Horng SJ, Pan Y, Wang H, Jing Y (2016) An iterative locally auto-weighted least squares method for microarray missing value estimation. IEEE Trans Nanobiosci 16(1):21–33
Article Google Scholar
Dubey A, Rasool A (2021) Efficient technique of microarray missing data imputation using clustering and weighted nearest neighbour. Sci Rep 11(1):24–29
Article Google Scholar
Dubey A, Rasool A (2020) Local similarity-based approach for multivariate missing data imputation. Int J Adv Sci Technol 29(06):9208–9215
Google Scholar
Purwar A, Singh SK (2015) Hybrid prediction model with missing value imputation for medical data. Expert Syst Appl 42(13):5621–5631
Article Google Scholar
Aydilek IB, Arslan A (2012) A novel hybrid approach to estimating missing values in databases using k-nearest neighbors and neural networks. Int J Innov Comput, Inf Control 7(8):4705–4717
Google Scholar
Tang J, Zhang G, Wang Y, Wang H, Liu F (2015) A hybrid approach to integrate fuzzy c-means based imputation method with genetic algorithm for missing traffic volume data estimation. Transp Res Part C: Emerg Technol 51:29–40
Article Google Scholar
Marwala T, Chakraverty S (2006) Fault classification in structures with incomplete measured data using autoassociative neural networks and genetic algorithm. Curr Sci 542–548
Google Scholar
Hans-Hermann B (2008) Origins and extensions of the k-means algorithm in cluster analysis. Electron J Hist Probab Stat 4(2)
Google Scholar
Geurts P, Ernst D, Wehenkel L (2006) Extremely randomized trees. Mach Learn 63(1):3–42
Article MATH Google Scholar
Yadav A, Dubey A, Rasool A, Khare N (2021) Data mining based imputation techniques to handle missing values in gene expressed dataset. Int J Eng Trends Technol 69(9):242–250
Article Google Scholar
Gond VK, Dubey A, Rasool A (2021) A survey of machine learning-based approaches for missing value imputation. In: Proceedings of the 3rd International Conference on Inventive Research in Computing Applications, ICIRCA 2021, pp. 841–846
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science & Engineering, Maulana Azad National Institute of Technology, Bhopal, 462003, India
Amarjeet Yadav, Akhtar Rasool, Aditya Dubey & Nilay Khare

Authors

Amarjeet Yadav
View author publications
You can also search for this author in PubMed Google Scholar
Akhtar Rasool
View author publications
You can also search for this author in PubMed Google Scholar
Aditya Dubey
View author publications
You can also search for this author in PubMed Google Scholar
Nilay Khare
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Aditya Dubey .

Editor information

Editors and Affiliations

Department of Computer Science and Engineering, National Institute of Technology Raipur, Raipur, Chhattisgarh, India
Pradeep Singh
Department of Computer Science and Engineering, National Institute of Technology Raipur, Raipur, Chhattisgarh, India
Deepak Singh
Department of Computer Science and Engineering, International Institute of Information Technology, Naya Raipur, Chhattisgarh, India
Vivek Tiwari
Østfold University College, Halden, Norway
Sanjay Misra

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Yadav, A., Rasool, A., Dubey, A., Khare, N. (2023). A Hybrid Approach for Missing Data Imputation in Gene Expression Dataset Using Extra Tree Regressor and a Genetic Algorithm. In: Singh, P., Singh, D., Tiwari, V., Misra, S. (eds) Machine Learning and Computational Intelligence Techniques for Data Engineering. MISP 2022. Lecture Notes in Electrical Engineering, vol 998. Springer, Singapore. https://doi.org/10.1007/978-981-99-0047-3_12

Download citation

DOI: https://doi.org/10.1007/978-981-99-0047-3_12
Published: 16 May 2023
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-0046-6
Online ISBN: 978-981-99-0047-3
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics

A Hybrid Approach for Missing Data Imputation in Gene Expression Dataset Using Extra Tree Regressor and a Genetic Algorithm

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Missing Value Imputation Using Correlation Coefficient

Missing-Values Imputation Algorithms for Microarray Gene Expression Data

Effectiveness of Different Partition Based Clustering Algorithms for Estimation of Missing Values in Microarray Gene Expression Data

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

A Hybrid Approach for Missing Data Imputation in Gene Expression Dataset Using Extra Tree Regressor and a Genetic Algorithm

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Missing Value Imputation Using Correlation Coefficient

Missing-Values Imputation Algorithms for Microarray Gene Expression Data

Effectiveness of Different Partition Based Clustering Algorithms for Estimation of Missing Values in Microarray Gene Expression Data

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation