Abstract
Missing values of microarray dataset are imputed with the help of gene expression sample values. The process by which missing values are calculated is the mean of gene expression sample values and then discretized the sample values. Those discretized values are used to find the similarities between gene expressions with missing value-related genes and genes with no missing values. The gene from without missing values which is most similar of each missing value-related gene is selected, and Pearson’s correlation coefficient of the identified gene with all no missing value-related genes is calculated. Now, the genes which have higher correlation coefficient with respect to a threshold value are identified. At last, the missing position of the gene is imputed with the mean expression values of the no missing value-related genes which are selected based on correlation coefficient values.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Alizadeh, A.A.: Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling. Nature 403, 503–511 (2000)
Schafer, J.L., Graham, J.W.: Missing data: our view of the state of the art. Psychol. Methods 7(2), 147–177 (2002)
Troyanskaya, O., Cantor, M., Sherlock, G., et al.: Missing value estimation methods for DNA microarrays. Bioinformatics 17, 520–525 (2001)
Tusher, V., Tibshirani, R., Chu, G.: Significance analysis of microarrays applied to the ionizing radiation response. Proc. Natl. Acad. Sci. USA 98, 5116–5121 (2001)
Tibshirani, R., Hastie, T., Narasimhan, D., Chu, G.: Diagnosis of multiple cancer types by shrunken centroids of gene expression. Proc. Natl. Acad. Sci. USA 99, 6567–6572 (2002)
Kim, K.Y., Kim, B.J., Yi, G.S.: Reuse of imputed data in microarray analysis increases imputation efficiency. BMC Bioinform (2004). https://doi.org/10.1186/1471-2105-5-160
Cheng, K.O., Law, N.F., Siu, W.C.: Iterative bicluster-based least square framework for estimation of missing values in microarray gene expression data. Pattern Recogn. 45(4), 1281–1289 (2012)
He, C., Li, H.H., Zhao, C., et al.: Triple imputation for microarray missing value estimation. In: IEEE International Conference on Bioinformatics and Biomedicine (BIBM), pp 208–213 (2015)
Oba, S., Sato, M.A., Takemasa, I., et al.: A Bayesian missing value estimation method for gene expression profile data. Bioinformatics 19(16), 2088–2096 (2003)
Pati, S.K., Das, A.K.: Missing value estimation for microarray data through cluster analysis. Knowl. Inf. Syst. 52, 709–750 (2017). https://doi.org/10.1007/s10115-017-1025-5
Pourhashem, M.M., Kelarestaghi, M., Pedram, M.M.: Missing value estimation in microarray data using fuzzy clustering and semantic similarity. Glob. J. Comput. Sci. Technol. 10(12), 18–22 (2010)
Brevern, A.G., Hazout, S., Malpertuy, A.: Influence of microarrays experiments missing values on the stability of gene group by hierarchical clustering. BMC Bioinform. (2004). https://doi.org/10.1186/147-2105-5-114
Luo, J., Yang, T., Wang, Y.: Missing value estimation for microarray data based on fuzzy C-means clustering. In: Proceedings of the 8th International Conference on High-Performance Computing I Asia-Pacific Region (HPCASIA’05), pp 611–616 (2005)
Zhang, S., Zhang, J., Zhu, X., Qin, Y., Zhang, C.: Missing value imputation based on data clustering. Trans. Comput. Sci. 1, 128–138 (2008)
Zhang, X., Song, X., Wang, H., et al.: Sequential local least squares imputation estimating missing value of microarray data. Comput. Biol. Med. 38, 1112–1120 (2008)
Kent Ridge Bio-medical Dataset. http://datam.i2r.a-star.edu.sg/datasets/krbd
Shi, F., Zhang, D., Chen, J., et al.: Missing value estimation for microarray data by Bayesian principal component analysis and iterative local least squares. Math. Probl. Eng. (2013). https://doi.org/10.1155/2013/162938
Pati, S.K., Das, A.K.: Missing value estimation of microarray data using similarity measurement. In: Swarm, Evolutionary, and Memetic Computing, SEMCCO 2012. Lecture Notes in Computer Science, vol. 7677, pp. 602–610 (2012)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Manna, S., Pati, S.K. (2020). Missing Value Imputation Using Correlation Coefficient. In: Das, A., Nayak, J., Naik, B., Dutta, S., Pelusi, D. (eds) Computational Intelligence in Pattern Recognition. Advances in Intelligent Systems and Computing, vol 1120. Springer, Singapore. https://doi.org/10.1007/978-981-15-2449-3_47
Download citation
DOI: https://doi.org/10.1007/978-981-15-2449-3_47
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-15-2448-6
Online ISBN: 978-981-15-2449-3
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)