Skip to main content

Big Data in Bioinformatics and Computational Biology: Basic Insights

  • Protocol
  • First Online:
Reverse Engineering of Regulatory Networks

Part of the book series: Methods in Molecular Biology ((MIMB,volume 2719))

  • 713 Accesses

Abstract

The human genome was first sequenced in 1994. It took 10 years of cooperation between numerous international research organizations to reveal a preliminary human DNA sequence. Genomics labs can now sequence an entire genome in only a few days. Here, we talk about how the advent of high-performance sequencing platforms has paved the way for Big Data in biology and contributed to the development of modern bioinformatics, which in turn has helped to expand the scope of biology and allied sciences. New technologies and methodologies for the storage, management, analysis, and visualization of big data have been shown to be necessary. Not only does modern bioinformatics have to deal with the challenge of processing massive amounts of heterogeneous data, but it also has to deal with different ways of interpreting and presenting those results, as well as the use of different software programs and file formats. Solutions to these problems are tried to present in this chapter. In order to store massive amounts of data and provide a reasonable period for completing search queries, new database management systems other than relational ones will be necessary. Emerging advance programing approaches, such as machine learning, Hadoop, and MapReduce, aim to provide the capacity to easily construct one’s own scripts for data processing and address the issue of the diversity of genomic and proteomic data formats in bioinformatics.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Protocol
USD 49.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 189.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 249.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Hart EM, Barmby P, LeBauer D, Michonneau F, Mount S, Mulrooney P, Poisot T, Woo KH, Zimmerman NB, Hollister JW (2016) Ten simple rules for digital data storage. PLoS Comput Biol 12:e1005097

    Article  Google Scholar 

  2. Chaudhuri S, Dayal U (1997) An overview of data warehousing and OLAP technology. SIGMOD Rec 26:65–74

    Article  Google Scholar 

  3. Julliet R (2022) How to store big data. https://www.bocasay.com/how-to-store-big-data/

  4. Hassan J, Shehzad D, Habib U, Aftab MU, Ahmad M, Kuleev R, Mazzara M (2022) The rise of cloud computing: data protection, privacy, and open research challenges-a systematic literature review (SLR). Comput Intell Neurosci 2022:8303504

    Article  Google Scholar 

  5. Kashyap H, Ahmed HA, Hoque N, Roy S, Bhattacharyya DK (2015) Big data analytics in bioinformatics: a machine learning perspective. arXiv preprint arXiv:1506.05101

    Google Scholar 

  6. Johnson M, Zaretskaya I, Raytselis Y, Merezhuk Y, McGinnis S, Madden TL (2008) NCBI BLAST: a better web interface. Nucleic Acids Res 36:W5–W9

    Article  Google Scholar 

  7. Madeira F, Pearce M, Tivey ARN, Basutkar P, Lee J, Edbali O, Madhusoodanan N, Kolesnikov A, Lopez R (2022) Search and sequence analysis tools services from EMBL-EBI in 2022. Nucleic Acids Res 50:W276–W279

    Article  Google Scholar 

  8. Bianchi V, Ceol A, Ogier AG, De Pretis S, Galeota E, Kishore K, Bora P, Croci O, Campaner S, Amati B, Morelli MJ (2016) Integrated systems for NGS data management and analysis: open issues and available solutions. Front Genet 7:75

    Article  Google Scholar 

  9. Prajapati J. List of bioinformatics software tools for next generation sequencing. https://bioinformaticsonline.com/pages/view/26617/list-of-bioinformatics-software-tools-for-next-generation-sequencing

  10. Wang K, Singh D, Zeng Z, Coleman SJ, Huang Y, Savich GL, He X, Mieczkowski P, Grimm SA, Perou CM, MacLeod JN, Chiang DY, Prins JF, Liu J (2010) MapSplice: accurate mapping of RNA-seq reads for splice junction discovery. Nucleic Acids Res 38:e178

    Article  Google Scholar 

  11. Howe EA, Sinha R, Schlauch D, Quackenbush J (2011) RNA-Seq analysis in MeV. Bioinformatics 27:3209–3210

    Article  Google Scholar 

  12. Kashyap H, Ahmed HA, Hoque N, Roy S, Bhattacharyya DK (2016) Big data analytics in bioinformatics: architectures, techniques, tools and issues. Netw Model Anal Health Inform Bioinform 5:1–28

    Article  Google Scholar 

  13. Amaral ML, Erikson GA, Shokhirev MN (2018) BART: bioinformatics array research tool. BMC Bioinform 19:296

    Article  Google Scholar 

  14. Illumina (2018) Beeline Illumina (Version 2.0). Illumina, Inc. Retrieved from https://support.illumina.com/downloads/beeline-software-2-0.html

  15. Tibshirani R, Walther G, Hastie T (2001) Estimating the number of clusters in a data set via the gap statistic. J R Stat Soc B Stat Methodol 63:411–423

    Article  MathSciNet  MATH  Google Scholar 

  16. Tothill RW, Tinker AV, George J, Brown R, Fox SB, Lade S, Johnson DS, Trivett MK, Etemadmoghadam D, Locandro B, Traficante N, Fereday S, Hung JA, Chiew Y-E, Haviv I, Australian Ovarian Cancer Study Group, Gertig D, de Fazio A, Bowtell DDL (2008) Novel molecular subtypes of serous and endometrioid ovarian cancer linked to clinical outcome. Clin Cancer Res 14:5198–5208

    Article  Google Scholar 

  17. Khezr SN, Navimipour NJ (2017) MapReduce and its applications, challenges, and architecture: a comprehensive review and directions for future research. J Grid Comput 15:295–321

    Article  Google Scholar 

  18. Low Y, Gonzalez JE, Kyrola A, Bickson D, Guestrin CE, Hellerstein J (2014) GraphLab: a new framework for parallel machine learning. arXiv preprint arXiv:1408.2041

    Google Scholar 

  19. Apache Software Foundation (2023) Apache Spark (version 3.4.0). Retrieved from https://spark.apache.org/news/spark-3-4-0-released.html

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Science+Business Media, LLC, part of Springer Nature

About this protocol

Check for updates. Verify currency and authenticity via CrossMark

Cite this protocol

Gupta, A., Kumar, S., Kumar, A. (2024). Big Data in Bioinformatics and Computational Biology: Basic Insights. In: Mandal, S. (eds) Reverse Engineering of Regulatory Networks. Methods in Molecular Biology, vol 2719. Humana, New York, NY. https://doi.org/10.1007/978-1-0716-3461-5_9

Download citation

  • DOI: https://doi.org/10.1007/978-1-0716-3461-5_9

  • Published:

  • Publisher Name: Humana, New York, NY

  • Print ISBN: 978-1-0716-3460-8

  • Online ISBN: 978-1-0716-3461-5

  • eBook Packages: Springer Protocols

Publish with us

Policies and ethics