Skip to main content

Missing Value Imputation Using Correlation Coefficient

  • Conference paper
  • First Online:
Computational Intelligence in Pattern Recognition

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 1120))

Abstract

Missing values of microarray dataset are imputed with the help of gene expression sample values. The process by which missing values are calculated is the mean of gene expression sample values and then discretized the sample values. Those discretized values are used to find the similarities between gene expressions with missing value-related genes and genes with no missing values. The gene from without missing values which is most similar of each missing value-related gene is selected, and Pearson’s correlation coefficient of the identified gene with all no missing value-related genes is calculated. Now, the genes which have higher correlation coefficient with respect to a threshold value are identified. At last, the missing position of the gene is imputed with the mean expression values of the no missing value-related genes which are selected based on correlation coefficient values.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Alizadeh, A.A.: Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling. Nature 403, 503–511 (2000)

    Article  Google Scholar 

  2. Schafer, J.L., Graham, J.W.: Missing data: our view of the state of the art. Psychol. Methods 7(2), 147–177 (2002)

    Article  Google Scholar 

  3. Troyanskaya, O., Cantor, M., Sherlock, G., et al.: Missing value estimation methods for DNA microarrays. Bioinformatics 17, 520–525 (2001)

    Article  Google Scholar 

  4. Tusher, V., Tibshirani, R., Chu, G.: Significance analysis of microarrays applied to the ionizing radiation response. Proc. Natl. Acad. Sci. USA 98, 5116–5121 (2001)

    Article  Google Scholar 

  5. Tibshirani, R., Hastie, T., Narasimhan, D., Chu, G.: Diagnosis of multiple cancer types by shrunken centroids of gene expression. Proc. Natl. Acad. Sci. USA 99, 6567–6572 (2002)

    Article  Google Scholar 

  6. Kim, K.Y., Kim, B.J., Yi, G.S.: Reuse of imputed data in microarray analysis increases imputation efficiency. BMC Bioinform (2004). https://doi.org/10.1186/1471-2105-5-160

    Article  Google Scholar 

  7. Cheng, K.O., Law, N.F., Siu, W.C.: Iterative bicluster-based least square framework for estimation of missing values in microarray gene expression data. Pattern Recogn. 45(4), 1281–1289 (2012)

    Article  Google Scholar 

  8. He, C., Li, H.H., Zhao, C., et al.: Triple imputation for microarray missing value estimation. In: IEEE International Conference on Bioinformatics and Biomedicine (BIBM), pp 208–213 (2015)

    Google Scholar 

  9. Oba, S., Sato, M.A., Takemasa, I., et al.: A Bayesian missing value estimation method for gene expression profile data. Bioinformatics 19(16), 2088–2096 (2003)

    Article  Google Scholar 

  10. Pati, S.K., Das, A.K.: Missing value estimation for microarray data through cluster analysis. Knowl. Inf. Syst. 52, 709–750 (2017). https://doi.org/10.1007/s10115-017-1025-5

    Article  Google Scholar 

  11. Pourhashem, M.M., Kelarestaghi, M., Pedram, M.M.: Missing value estimation in microarray data using fuzzy clustering and semantic similarity. Glob. J. Comput. Sci. Technol. 10(12), 18–22 (2010)

    Google Scholar 

  12. Brevern, A.G., Hazout, S., Malpertuy, A.: Influence of microarrays experiments missing values on the stability of gene group by hierarchical clustering. BMC Bioinform. (2004). https://doi.org/10.1186/147-2105-5-114

    Article  Google Scholar 

  13. Luo, J., Yang, T., Wang, Y.: Missing value estimation for microarray data based on fuzzy C-means clustering. In: Proceedings of the 8th International Conference on High-Performance Computing I Asia-Pacific Region (HPCASIA’05), pp 611–616 (2005)

    Google Scholar 

  14. Zhang, S., Zhang, J., Zhu, X., Qin, Y., Zhang, C.: Missing value imputation based on data clustering. Trans. Comput. Sci. 1, 128–138 (2008)

    Google Scholar 

  15. Zhang, X., Song, X., Wang, H., et al.: Sequential local least squares imputation estimating missing value of microarray data. Comput. Biol. Med. 38, 1112–1120 (2008)

    Article  Google Scholar 

  16. Kent Ridge Bio-medical Dataset. http://datam.i2r.a-star.edu.sg/datasets/krbd

  17. Shi, F., Zhang, D., Chen, J., et al.: Missing value estimation for microarray data by Bayesian principal component analysis and iterative local least squares. Math. Probl. Eng. (2013). https://doi.org/10.1155/2013/162938

    Article  MathSciNet  MATH  Google Scholar 

  18. Pati, S.K., Das, A.K.: Missing value estimation of microarray data using similarity measurement. In: Swarm, Evolutionary, and Memetic Computing, SEMCCO 2012. Lecture Notes in Computer Science, vol. 7677, pp. 602–610 (2012)

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Manna, S., Pati, S.K. (2020). Missing Value Imputation Using Correlation Coefficient. In: Das, A., Nayak, J., Naik, B., Dutta, S., Pelusi, D. (eds) Computational Intelligence in Pattern Recognition. Advances in Intelligent Systems and Computing, vol 1120. Springer, Singapore. https://doi.org/10.1007/978-981-15-2449-3_47

Download citation

Publish with us

Policies and ethics