Big Data in Bioinformatics and Computational Biology: Basic Insights

Gupta, Aanchal; Kumar, Shubham; Kumar, Ashwani

doi:10.1007/978-1-0716-3461-5_9

Aanchal Gupta³,
Shubham Kumar³ &
Ashwani Kumar³

Part of the book series: Methods in Molecular Biology ((MIMB,volume 2719))

713 Accesses

Abstract

The human genome was first sequenced in 1994. It took 10 years of cooperation between numerous international research organizations to reveal a preliminary human DNA sequence. Genomics labs can now sequence an entire genome in only a few days. Here, we talk about how the advent of high-performance sequencing platforms has paved the way for Big Data in biology and contributed to the development of modern bioinformatics, which in turn has helped to expand the scope of biology and allied sciences. New technologies and methodologies for the storage, management, analysis, and visualization of big data have been shown to be necessary. Not only does modern bioinformatics have to deal with the challenge of processing massive amounts of heterogeneous data, but it also has to deal with different ways of interpreting and presenting those results, as well as the use of different software programs and file formats. Solutions to these problems are tried to present in this chapter. In order to store massive amounts of data and provide a reasonable period for completing search queries, new database management systems other than relational ones will be necessary. Emerging advance programing approaches, such as machine learning, Hadoop, and MapReduce, aim to provide the capacity to easily construct one’s own scripts for data processing and address the issue of the diversity of genomic and proteomic data formats in bioinformatics.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Protocol: USD 49.95; Price excludes VAT (USA)

eBook: USD 189.00; Price excludes VAT (USA)

Hardcover Book: USD 249.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Bioinformatics from a Big Data Perspective: Meeting the Challenge

An Insight of Biological Databases Used in Bioinformatics

Trends and Application of Data Science in Bioinformatics

References

Hart EM, Barmby P, LeBauer D, Michonneau F, Mount S, Mulrooney P, Poisot T, Woo KH, Zimmerman NB, Hollister JW (2016) Ten simple rules for digital data storage. PLoS Comput Biol 12:e1005097
Article Google Scholar
Chaudhuri S, Dayal U (1997) An overview of data warehousing and OLAP technology. SIGMOD Rec 26:65–74
Article Google Scholar
Julliet R (2022) How to store big data. https://www.bocasay.com/how-to-store-big-data/
Hassan J, Shehzad D, Habib U, Aftab MU, Ahmad M, Kuleev R, Mazzara M (2022) The rise of cloud computing: data protection, privacy, and open research challenges-a systematic literature review (SLR). Comput Intell Neurosci 2022:8303504
Article Google Scholar
Kashyap H, Ahmed HA, Hoque N, Roy S, Bhattacharyya DK (2015) Big data analytics in bioinformatics: a machine learning perspective. arXiv preprint arXiv:1506.05101
Google Scholar
Johnson M, Zaretskaya I, Raytselis Y, Merezhuk Y, McGinnis S, Madden TL (2008) NCBI BLAST: a better web interface. Nucleic Acids Res 36:W5–W9
Article Google Scholar
Madeira F, Pearce M, Tivey ARN, Basutkar P, Lee J, Edbali O, Madhusoodanan N, Kolesnikov A, Lopez R (2022) Search and sequence analysis tools services from EMBL-EBI in 2022. Nucleic Acids Res 50:W276–W279
Article Google Scholar
Bianchi V, Ceol A, Ogier AG, De Pretis S, Galeota E, Kishore K, Bora P, Croci O, Campaner S, Amati B, Morelli MJ (2016) Integrated systems for NGS data management and analysis: open issues and available solutions. Front Genet 7:75
Article Google Scholar
Prajapati J. List of bioinformatics software tools for next generation sequencing. https://bioinformaticsonline.com/pages/view/26617/list-of-bioinformatics-software-tools-for-next-generation-sequencing
Wang K, Singh D, Zeng Z, Coleman SJ, Huang Y, Savich GL, He X, Mieczkowski P, Grimm SA, Perou CM, MacLeod JN, Chiang DY, Prins JF, Liu J (2010) MapSplice: accurate mapping of RNA-seq reads for splice junction discovery. Nucleic Acids Res 38:e178
Article Google Scholar
Howe EA, Sinha R, Schlauch D, Quackenbush J (2011) RNA-Seq analysis in MeV. Bioinformatics 27:3209–3210
Article Google Scholar
Kashyap H, Ahmed HA, Hoque N, Roy S, Bhattacharyya DK (2016) Big data analytics in bioinformatics: architectures, techniques, tools and issues. Netw Model Anal Health Inform Bioinform 5:1–28
Article Google Scholar
Amaral ML, Erikson GA, Shokhirev MN (2018) BART: bioinformatics array research tool. BMC Bioinform 19:296
Article Google Scholar
Illumina (2018) Beeline Illumina (Version 2.0). Illumina, Inc. Retrieved from https://support.illumina.com/downloads/beeline-software-2-0.html
Tibshirani R, Walther G, Hastie T (2001) Estimating the number of clusters in a data set via the gap statistic. J R Stat Soc B Stat Methodol 63:411–423
Article MathSciNet MATH Google Scholar
Tothill RW, Tinker AV, George J, Brown R, Fox SB, Lade S, Johnson DS, Trivett MK, Etemadmoghadam D, Locandro B, Traficante N, Fereday S, Hung JA, Chiew Y-E, Haviv I, Australian Ovarian Cancer Study Group, Gertig D, de Fazio A, Bowtell DDL (2008) Novel molecular subtypes of serous and endometrioid ovarian cancer linked to clinical outcome. Clin Cancer Res 14:5198–5208
Article Google Scholar
Khezr SN, Navimipour NJ (2017) MapReduce and its applications, challenges, and architecture: a comprehensive review and directions for future research. J Grid Comput 15:295–321
Article Google Scholar
Low Y, Gonzalez JE, Kyrola A, Bickson D, Guestrin CE, Hellerstein J (2014) GraphLab: a new framework for parallel machine learning. arXiv preprint arXiv:1408.2041
Google Scholar
Apache Software Foundation (2023) Apache Spark (version 3.4.0). Retrieved from https://spark.apache.org/news/spark-3-4-0-released.html

Download references

Author information

Authors and Affiliations

University Institute of Biotechnology, Chandigarh University, Mohali, Punjab, India
Aanchal Gupta, Shubham Kumar & Ashwani Kumar

Authors

Aanchal Gupta
View author publications
You can also search for this author in PubMed Google Scholar
Shubham Kumar
View author publications
You can also search for this author in PubMed Google Scholar
Ashwani Kumar
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Electronics and Communication Engineering, Jalpaiguri Govt. Engineering College, Jalpaiguri, West Bengal, India
Sudip Mandal

Rights and permissions

Reprints and permissions

Copyright information

About this protocol

Cite this protocol

Gupta, A., Kumar, S., Kumar, A. (2024). Big Data in Bioinformatics and Computational Biology: Basic Insights. In: Mandal, S. (eds) Reverse Engineering of Regulatory Networks. Methods in Molecular Biology, vol 2719. Humana, New York, NY. https://doi.org/10.1007/978-1-0716-3461-5_9

Download citation

DOI: https://doi.org/10.1007/978-1-0716-3461-5_9
Published: 07 October 2023
Publisher Name: Humana, New York, NY
Print ISBN: 978-1-0716-3460-8
Online ISBN: 978-1-0716-3461-5
eBook Packages: Springer Protocols

Publish with us

Policies and ethics

Big Data in Bioinformatics and Computational Biology: Basic Insights

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Bioinformatics from a Big Data Perspective: Meeting the Challenge

An Insight of Biological Databases Used in Bioinformatics

Trends and Application of Data Science in Bioinformatics

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this protocol

Cite this protocol

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Big Data in Bioinformatics and Computational Biology: Basic Insights

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Bioinformatics from a Big Data Perspective: Meeting the Challenge

An Insight of Biological Databases Used in Bioinformatics

Trends and Application of Data Science in Bioinformatics

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this protocol

Cite this protocol

Download citation

Publish with us

Search

Navigation