Sharing genetic variants with the NGS pipeline is essential for effective genomic data sharing and reproducibility in health information exchange

Lee, Jeong Hoon; Kweon, Solbi; Park, Yu Rang

doi:10.1038/s41598-021-82006-9

Sharing genetic variants with the NGS pipeline is essential for effective genomic data sharing and reproducibility in health information exchange

Article
Open access
Published: 26 January 2021

Volume 11, article number 2268, (2021)
Cite this article

Download PDF

You have full access to this open access article

Scientific Reports

Sharing genetic variants with the NGS pipeline is essential for effective genomic data sharing and reproducibility in health information exchange

Download PDF

3767 Accesses
1 Citation
1 Altmetric
Explore all metrics

Abstract

Genetic variants causing underlying pharmacogenetic and disease phenotypes have been used as the basis for clinical decision-making. However, due to the lack of standards for next-generation sequencing (NGS) pipelines, reproducing genetic variants among institutions is still difficult. The aim of this study is to show how many important variants for clinical decisions can be individually detected using different pipelines. Genetic variants were derived from 105 breast cancer patient target DNA sequences via three different variant-calling pipelines. HaplotypeCaller, Mutect2 tumor-only mode in the Genome Analysis ToolKit (GATK), and VarScan were used in variant calling from the sequence read data processed by the same NGS preprocessing tools using Variant Effect Predictor. GATK HaplotypeCaller, VarScan, and MuTect2 found 25,130, 16,972, and 4232 variants, comprising 1491, 1400, and 321 annotated variants with ClinVar significance, respectively. The average number of ClinVar significant variants in the patients was 769.43, 16.50% of the variants were detected by only one variant caller. Despite variants with significant impact on clinical decision-making, the detected variants are different for each algorithm. To utilize genetic variants in the clinical field, a strict standard for NGS pipelines is essential.

AMLVaran: a software approach to implement variant analysis of targeted NGS sequencing data in an oncological care setting

Article Open access 04 February 2020

Medical implications of technical accuracy in genome sequencing

Article Open access 02 March 2016

Somatic and Germline Variant Calling from Next-Generation Sequencing Data

Introduction

Genome or exome sequencing using next-generation sequencing (NGS) technologies has now entered medical practice¹. Genetic variant databases for clinical applications were built on numerous studies of human genetic variants affecting response to medications associated with diseases and phenotypes^2,3,4. As guidelines for the interpretation of sequence variants have been established, clinical laboratories now perform genetic testing for therapeutic decision-making and disease prediction. Nonetheless, the construction of uniform standards for NGS pipelines is difficult because of various genetic testing techniques, different experimental goals, and numerous algorithms⁵. As a result, clinical laboratories and medical institutions have generated patients’ genetic variants through different sequencing protocols and NGS pipelines, leading to genetic variants that are not interoperable.

The current gold standard for variant-calling pipelines is the Genome Analysis Toolkit (GATK) Best Practices Workflow pipeline using HaplotypeCaller, which is considered to have the highest accuracy for single nucleotide polymorphisms (SNPs) and small insertions and deletions^6,7. However, the development of numerous NGS sequencing technologies, such as Illumina and BGI, has caused data-specific effects, making it difficult to build a uniform pipeline^8,9. Data-specific effects cause false positive detection due to unexpected systematic error patterns in the HaplotypeCaller algorithm using GATK Best Practices¹⁰. Therefore, it is difficult to build NGS pipeline guidelines and make genetic variants interoperable in clinical practice.

The importance of reliable genetic data communication between hospitals and clinical genomic data sharing to improving genetic health care is widely recognized, and the practice has been encouraged by both professional societies and funding agencies¹¹. Before sharing genetic variant data derived from raw sequencing data, the validity of the variant-calling pipeline result must be verifiable. However, different NGS pipelines among institutions produce different variant calling results despite the same raw sequencing data, causing serious problems in clinical decision-making and genetic variant sharing. Hence, diagnostic genetic tests used as a basis for clinical decision-making should be reproducible or replicable¹².

This study suggests that the pipeline throughout the variant-calling process, including raw sequencing data, should be shared for the reproducibility of the genetic variants as a laboratory test. Of the genetic variants called by different NGS pipelines, we quantified the important variants missed, which consequently affected clinical decision-making.

Results

Raw sequencing data were preprocessed using the GATK Best Practices-based NGS pipeline. Variant calling was performed using three different variant callers, GATK HC, VarScan, and MuTect2 tumor-only mode. Figure 1 summarizes the NGS pipeline workflow for the preprocessing of raw sequencing data. The workflow includes information about the purpose of the process, name of the program, version, options, and additional input needed for each process. The command line for all data processing is available in the supplementary data.

The consequence of the called variants

The counts of variants called by three variant callers, HC, VarScan, and MuTect2 tumor-only mode, for aggregation of all patients are shown in Table 1. The number of called variants was highest with GATK HC, followed by VarScan and MuTect2. The average number of variants per person was 4152.362, 2925.257, and 159.219 in GATK HC, VarScan, and MuTect2, respectively. The truncation mutation, called the loss of function, is splice_acceptor_variant, splice_donor_variant, splice_region_variant, and stop_gained. The numbers of truncation mutations in GATK HC, VarScan, and MuTect2 variants were 5792 (1.33%), 4676 (1.52%), and 287 (1.72%), respectively. Based on the GATK HC, the odds ratios of the truncation mutations for all VarScan and MuTect2 variants were 1.15 and 1.29, respectively.

Table 1 Distribution of consequences of genetic variants using three different variant callers.

Full size table

The deleteriousness of the called variants

To infer the importance of genetic variants, we annotated the deleterious values of the SIFT, PolyPhen, and CADD algorithms that predict the intolerance of the variant by the conservation between species. For variants called using GATK HC, MuTect2 and VarScan, 2224, 1960, and 40 variants were annotated with SIFT, 2345, 2078, and 41 with PolyPhen, and 435,999, 307,152, and 16,719 with CADD, respectively (Fig. 2). Among the variants annotated using SIFT, 363 (16.32%), 342 (17.45%), and 9 (22.50%) deleterious variants were observed with scores < 0.05 for GATK HC, VarScan, and MuTect2, respectively. Of the variants with annotated PolyPhen scores, the numbers of deleterious variants with scores > 0.95 were 120 (5.12%), 109 (5.25%), and 1 (2.44%) for GATK HC, VarScan, and MuTect2, respectively. Among the variants annotated using CADD, the numbers of variants with scores > 15 were 16,364 (3.75%), 13,391 (4.36%), and 419 (2.51%) for GATK HC, VarScan, and MuTect2, and 7355 (1.69%), 6281 (2.04%), and 199 (1.19%) for deleterious variants with scores > 20, respectively.

ClinVar for clinical significance

Table 2 shows the ClinVar annotations for clinical significance in compliance with the variant-calling algorithms. The numbers of drug_response, likelypathogenic, pathogenic, protective, and risk_factor mutations, which are clinically important, were 1504 (3.07%), 134 (0.27%), 405 (0.83%), 306 (0.62%), and 753 (1.54%) for GATK HC; 1354 (3.21%), 129 (0.31%), 364 (0.86%), 285 (0.68%), and 674 (1.60%) for VarScan; and 19 (1.08%), 16 (0.91%), 21 (1.19%), 7 (0.40%), and 10 (0.57%) for MuTect2, respectively. The average number of ClinVar significant variants of the patients was 769.43, the variants detected by only one caller were 16.5%, and those detected by two callers were 82.18%.

Table 2 The distribution by the ClinVar category of genetic variants according to three different variant callers.

Full size table

To visualize the distribution of differentially detected clinically significant variants, individual distributions of patients with mutations are presented in a Venn diagram. In Fig. 3, ClinVar is based on variants corresponding to drug_response, likely_pathogenic, pathogenic, protective, and risk_factor. Truncation is based on variations whose consequence is the loss of function. The SIFT score was 0.05 or less, the PolyPhen score was 0.85 or more, and the CADD score was 15 or more.

To characterize the differently called variants, we reviewed variants that included significant consequences, deleteriousness scores, and ClinVar annotations that GATK HC found but VarScan did not (Table 3). ABCA4 is an ATP-binding cassette (ABC) transporter (OMIM 601691; GenBank U88667). Diseases associated with ABCA4 include age-related macular degeneration and Stargardt disease^13,14. Diseases associated with DHCR7 include Smith-Lemli-Opitz Syndrome and holoprosencephaly. There is much evidence associating the variant rs11555217 with disease^15,16. Diseases associated with CYP4V2 include Bietti crystalline corneoretinal dystrophy and telangiectatic osteogenic sarcoma¹⁷. Diseases associated with CFTR include cystic fibrosis and Vas Deferens congenital bilateral aplasia¹⁸. This gene is a target of FDA-approved drugs and is known to be associated with ivacaftor, glyburide, bumetanide, crofelemer, and lumacaftor drugs^{19,20,21,22,23}.

Table 3 The annotation information for clinically important variants that GATK HC found, but VarScan and MuTect2 were never found.

Full size table

Discussion

With advances in NGS technologies in the past several years, genome or exome sequencing is now practiced in medicine. However, different NGS pipelines among institutions produce different variant calling results despite the same raw sequencing data, causing serious problems in clinical decision-making and genetic variant sharing. Variant calling, which is the result of diagnostic genetic tests, should be reproducible or replicable for use as a basis for clinical decision-making¹². In breast cancer, various genomic factors, such as EGFR, BRCA1/2, ESR1, PIK3CA, and TP53, greatly influence clinical decisions²⁴. However, if this information is not reproducible and replicable among medical institutions, it can cause confusion when making clinical decisions. The development of numerous NGS sequencing technologies, such as Illumina and BGI, has caused data-specific effects, making it difficult to build a uniform pipeline^8,9. Therefore, we suggest that the entire pipeline throughout the variant-calling process, including raw sequencing data, should be shared to enhance the reproducibility of the genetic variants. All processing included in the NGS pipeline, such as the version of the programs, options, and additional files with each version, should be shared to reproduce or replicate the same genetic variant from the raw sequence. Of the genetic variants called by different NGS pipelines, we quantified how many important variants were missed, affecting clinical decision-making. As a result, we found that important variants affecting clinical decisions are found quite differently according to the variant-calling algorithm.

Several studies suggest that the result of variant calling differs by NGS preprocessing and variant-calling pipeline^25,26. Moreover, the result of variant calling is different for different sequencers, despite using the same raw sequence data and NGS pipeline. Nevertheless, establishing a guideline with a uniform NGS pipeline for a single best practice is difficult because the performance of NGS pipelines differs by sequencer, purpose of the sequencing, and characteristics of the sample²⁷. Therefore, there is the risk of making a clinical decision with a genetic variant in an institution that does not perform NGS pipeline because the institution cannot reproduce the result of the variant calling. Hence, details of the NGS pipeline for the entire variant-calling process are essential.

To evaluate the significance of the variants called by three different variant caller algorithms, GATK HaplotypeCaller, MuTect2, and VarScan, we used the consequence, deleteriousness score, and ClinVar classification. Consequences of variants, referred to as loss-of-function mutations, can be divided into truncation and non-truncation mutations. Truncation mutations have a profound impact on the loss of gene function. SIFT, PolyPhen, and CADD scores are algorithms that measure deleteriousness of genes based on conservation and protein structure. ClinVar annotated variants are clinically significant genetic variants categorized into pathogenic, drug response, risk factor, and more, which are important information in making clinical decisions. Truncation mutations, deleterious variants, and clinically significant variants have different results depending on the variant-calling algorithm, even though they are variants that have a large effect on gene function (Fig. 3). Thus, NGS pipelines that produce different variant calling results can have a significant impact on clinical decisions based on genetic variants.

Our study has some limitations. We only measured variant differences based on variant callers. From the read alignment algorithm to the final variant-calling process within the entire NGS pipeline, various factors can affect variant calling. We could not test all of them due to the combination explosion, but we focused on variant calling. A replication study of the genetic testing pipeline used in hospitals is needed. From the NGS pipeline information used in hospitals, we need to test whether the variant calling results can be reproduced from the same raw sequence data.

In conclusion, our results show that clinically important variants are differently called by variant callers, thus affecting clinical decisions. This means that variant calling outcomes are not reproducible without detailed NGS pipeline information. Therefore, we suggest that the pipeline throughout the variant-calling process, including raw sequencing data, should be shared for effective genetic variant sharing and clinical decision-making.

Methods

Raw sequencing samples

Raw sequence files from massive parallel sequencing of blood DNA from 105 breast cancer patients were downloaded from the NCBI Sequencing Read Archive (SRA) database (SRP174001). These targeted data were sequenced for the coding and regulatory regions of 509 genes selected from PharmGKB and Phenopedia, where a number of important variants are located for clinical decisions^2,28. SRA files were downloaded using 'prepetch' version 2.9.4 of the SRA Toolkit (https://www.ncbi.nlm.nih.gov/sra/docs/toolkitsoft/). The SRA files were converted to paired sequence FASTQ format files using fastq-dump of the SRA Toolkit (https://ncbi.github.io/sra-tools/fastq-dump.html). Quality assessment of the paired sequence reads was performed using FastQC version 0.11.8, followed by adaptor removal and read trimming (http://www.bioinformatics.babraham.ac.uk/projects/fastqc/)²⁹.

Pre-processing of DNA resequencing data

The raw FASTQ files and paired sequence data were aligned to the human genome hg38 assembly using the Burrows-Wheeler Aligner, BWA program, version 0.7.12, and were transformed into a sequence alignment map (SAM) format³⁰. Using SAMtools version 1.9, sequence data in SAM format was compressed into Binary Alignment Map (BAM) format by view command, and the aligned sequence reads were sorted with leftmost coordinates by sort command. Read groups are added to aligned sequence files using the’ AddOrReplaceReadGroups’ module in Picard. Next, SAMtools was used to prepare index referencing and BAM files³¹. After preparing these files, GATK version 3.8 was used to perform Realigner Target Creator and Indel Realigner to locally realign regions containing insertions and deletions to correct misaligned reads⁶. Base quality scores were adjusted using GATK BaseRecalibrator with the dbSNP build 138 and 1000-genome gold standard indels provided by the GATK Resource bundle standard files for working with human resequencing data (https://software.broadinstitute.org/gatk/download/bundle)^32,33. The sequencing target section was extracted using the bedtools intersect version 2.26 with indexing³⁴. Finally, Picard MarkDuplicates v1.93 was used to identify duplications with the option to flag and remove duplicate reads.

Small variant detection

After preprocessing the DNA sequencing data, we detected single nucleotide variants (SNVs) using three algorithms. VarScan and GATK HaplotypeCaller (HC) were used to find genetic variants between the sample DNA sequence compared with the reference sequence³⁵. Somatic variants were called using GATK MuTect2³⁶. Variants called by a mixture of germline and somatic variant calling tools were compared based on the assumption that NGS pipeline information was not properly shared during the communication process for the genetic variant of the patient. Reference genome databases, dbSNP build 138, and COSMIC, a source of commonly mutated genes, were used for the variant-calling argument.

Genetic variant annotation

The Ensembl Variant Effect Predictor (VEP) was used to determine the effect of genetic variants derived from the three variant callers, HC, MuTect2, and VarScan³⁷. The mutation consequence, SIFT score, PolyPhen score, CADD score, and ClinVar annotations were determined to examine the effect of differently called variants on variant callers^3,38,39,40. Consequences were divided into truncating and non-truncating mutations. While truncating mutations included nonsense mutations, frameshift deletions, frame shift insertions, and splice-site mutations, non-truncating mutations included missense mutations, in-frame deletions, in-frame insertions, and nonstop mutations. To evaluate the significance of genetic variant effects, SIFT, PolyPhen, and CADD algorithms for predicting the deleteriousness of variants were used. SIFT score < 0.05, PolyPhen > 0.95, and CADD > 15 were defined as deleterious variants. The clinical significance of genetic variants was cataloged by making comparisons in ClinVar (http://www.ncbi.nlm.nih.gov/ClinVar/).

References

Biesecker, L. G. & Green, R. C. Diagnostic clinical genome and exome sequencing. N. Engl. J. Med. 370, 2418–2425 (2014).
Article PubMed Google Scholar
Hewett, M. et al. PharmGKB: the pharmacogenetics knowledge base. Nucleic Acids Res. 30, 163–165 (2002).
Article CAS PubMed PubMed Central Google Scholar
Landrum, M. J. et al. ClinVar: public archive of interpretations of clinically relevant variants. Nucleic Acids Res. 44, D862–D868 (2015).
Article PubMed PubMed Central Google Scholar
Hirschhorn, J. N. & Daly, M. J. Genome-wide association studies for common diseases and complex traits. Nat. Rev. Genet. 6, 95 (2005).
Article CAS PubMed Google Scholar
Aziz, N. et al. College of American Pathologists’ laboratory standards for next-generation sequencing clinical tests. Arch. Pathol. Lab. Med. 139, 481–493 (2014).
Article PubMed Google Scholar
McKenna, A. et al. The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 20, 1297–1303 (2010).
Article CAS PubMed PubMed Central Google Scholar
der Auwera, G. A. et al. From FastQ data to high-confidence variant calls: the genome analysis toolkit best practices pipeline. Curr. Protoc. Bioinform. 43, 10–11 (2013).
Article Google Scholar
Huang, J. et al. A reference human genome dataset of the BGISEQ-500 sequencer. Gigascience 6, gix024 (2017).
Article Google Scholar
Fehlmann, T. et al. cPAS-based sequencing on the BGISEQ-500 to explore small non-coding RNAs. Clin. Epigenetics 8, 123 (2016).
Article PubMed PubMed Central Google Scholar
Seo, H., Park, Y., Min, B. J., Seo, M. E. & Kim, J. H. Evaluation of exome variants using the ion proton platform to sequence error-prone regions. PLoS ONE 12, e0181304 (2017).
Article PubMed PubMed Central Google Scholar
Azzariti, D. R. et al. Points to consider for sharing variant-level information from clinical genetic testing with ClinVar. Mol. Case Stud. 4, a002345 (2018).
Article Google Scholar
Stupple, A., Singerman, D. & Celi, L. A. The reproducibility crisis in the age of digital medicine. NPJ Digit. Med. 2, 2 (2019).
Article PubMed PubMed Central Google Scholar
Shroyer, N. F. et al. The rod photoreceptor ATP-binding cassette transporter gene, ABCR, and retinal disease: from monogenic to multifactorial. Vis. Res. 39, 2537–2544 (1999).
Article CAS PubMed Google Scholar
Fingert, J. H. et al. Case of Stargardt disease caused by uniparental isodisomy. Arch. Ophthalmol. 124, 744–745 (2006).
Article PubMed Google Scholar
Balogh, I. et al. Mutational spectrum of Smith–Lemli–Opitz syndrome patients in Hungary. Mol. Syndromol. 3, 215–222 (2012).
Article CAS PubMed PubMed Central Google Scholar
Adam, M. P. et al. Smith-Lemli-Opitz Syndrome--GeneReviews®.
Li, A. et al. Bietti crystalline corneoretinal dystrophy is caused by mutations in the novel gene CYP4V2. Am. J. Hum. Genet. 74, 817–826 (2004).
Article CAS PubMed PubMed Central Google Scholar
Dumur, V. et al. Congenital bilateral absence of the vas deferens (CBAVD) and cystic fibrosis transmembrane regulator (CFTR): correlation between genotype and phenotype. Hum. Genet. 97, 7–10 (1996).
Article CAS PubMed Google Scholar
Yu, H. et al. Ivacaftor potentiation of multiple CFTR channels with gating mutations. J. Cyst. Fibros. 11, 237–245 (2012).
Article CAS PubMed Google Scholar
Zhou, Z., Hu, S. & Hwang, T.-C. Probing an open CFTR pore with organic anion blockers. J. Gen. Physiol. 120, 647–662 (2002).
Article CAS PubMed PubMed Central Google Scholar
Reddy, M. M. & Quinton, P. M. Bumetanide blocks CFTR G Cl in the native sweat duct. Am. J. Physiol. Physiol. 276, C231–C237 (1999).
Article CAS Google Scholar
Tradtrantip, L., Namkung, W. & Verkman, A. S. Crofelemer, an antisecretory antidiarrheal proanthocyanidin oligomer extracted from Croton lechleri, targets two distinct intestinal chloride channels. Mol. Pharmacol. 77, 69–78 (2010).
Article CAS PubMed PubMed Central Google Scholar
Kuk, K. & Taylor-Cousar, J. L. Lumacaftor and ivacaftor in the management of patients with cystic fibrosis: current evidence and future prospects. Ther. Adv. Respir. Dis. 9, 313–326 (2015).
Article CAS PubMed Google Scholar
Stearns, V. & Park, B. H. Gene mutation profiling of breast cancers for clinical decision making: drivers and passengers in the cart before the horse. JAMA Oncol. 1, 569–570 (2015).
Article PubMed Google Scholar
Hwang, S., Kim, E., Lee, I. & Marcotte, E. M. Systematic comparison of variant calling pipelines using gold standard personal exome variants. Sci. Rep. 5, 17875 (2015).
Article ADS PubMed PubMed Central Google Scholar
Cornish, A. & Guda, C. A comparison of variant calling pipelines using genome in a bottle as a reference. Biomed Res. Int. https://doi.org/10.1155/2015/456479 (2015).
Article PubMed PubMed Central Google Scholar
Chen, J., Li, X., Zhong, H., Meng, Y. & Du, H. Systematic comparison of germline variant calling pipelines cross multiple next-generation sequencers. Sci. Rep. 9, 9345 (2019).
Article ADS PubMed PubMed Central Google Scholar
Yu, W., Clyne, M., Khoury, M. J. & Gwinn, M. Phenopedia and Genopedia: disease-centered and gene-centered views of the evolving knowledge of human genetic associations. Bioinformatics 26, 145–146 (2009).
Article PubMed PubMed Central Google Scholar
Andrews, S. et al. FastQC: a quality control tool for high throughput sequence data (2010).
Li, H. Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. arXiv Prepr. arXiv:1303.3997 (2013).
Li, H. et al. The sequence alignment/map format and SAMtools. Bioinformatics 25, 2078–2079 (2009).
Article PubMed PubMed Central Google Scholar
Sherry, S. T. et al. dbSNP: the NCBI database of genetic variation. Nucleic Acids Res. 29, 308–311 (2001).
Article CAS PubMed PubMed Central Google Scholar
Siva, N. 1000 Genomes project (2008).
Quinlan, A. R. & Hall, I. M. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26, 841–842 (2010).
Article CAS PubMed PubMed Central Google Scholar
Koboldt, D. C. et al. VarScan 2: somatic mutation and copy number alteration discovery in cancer by exome sequencing. Genome Res. 22, 568–576 (2012).
Article CAS PubMed PubMed Central Google Scholar
Cibulskis, K. et al. Sensitive detection of somatic point mutations in impure and heterogeneous cancer samples. Nat. Biotechnol. 31, 213 (2013).
Article CAS PubMed PubMed Central Google Scholar
McLaren, W. et al. The ensembl variant effect predictor. Genome Biol. 17, 122 (2016).
PubMed PubMed Central Google Scholar
Kumar, P., Henikoff, S. & Ng, P. C. Predicting the effects of coding non-synonymous variants on protein function using the SIFT algorithm. Nat. Protoc. 4, 1073 (2009).
Article CAS PubMed Google Scholar
Adzhubei, I., Jordan, D. M. & Sunyaev, S. R. Predicting functional effect of human missense mutations using PolyPhen-2. Curr. Protoc. Hum. Genet. 76, 7–20 (2013).
Google Scholar
Kircher, M. et al. A general framework for estimating the relative pathogenicity of human genetic variants. Nat. Genet. 46, 310 (2014).
Article CAS PubMed PubMed Central Google Scholar

Download references

Acknowledgements

This work was supported by the Technology Innovation Program (20002289), funded by the Ministry of Trade, Industry & Energy, Republic of Korea and by the Foundational Technology Development Program (NRF-2019M3E5D4064682) of the Ministry of Science and ICT, Republic of Korea.

Author information

Authors and Affiliations

Lunit Inc., 175 Yeoksamro, Gangnam-gu, Seoul, Republic of Korea
Jeong Hoon Lee
Department of Biomedical Systems Informatics, Yonsei University College of Medicine, Seoul, Republic of Korea
Jeong Hoon Lee, Solbi Kweon & Yu Rang Park

Authors

Jeong Hoon Lee
View author publications
You can also search for this author in PubMed Google Scholar
Solbi Kweon
View author publications
You can also search for this author in PubMed Google Scholar
Yu Rang Park
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Y.R.P. and J.H.L. designed the study and acquired data for training. J.H.L. performed analysis. J.H.L., S.K. and Y.R.P. drafted and wrote the manuscript.

Corresponding author

Correspondence to Yu Rang Park.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher's note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Lee, J.H., Kweon, S. & Park, Y.R. Sharing genetic variants with the NGS pipeline is essential for effective genomic data sharing and reproducibility in health information exchange. Sci Rep 11, 2268 (2021). https://doi.org/10.1038/s41598-021-82006-9

Download citation

Received: 31 January 2020
Accepted: 07 January 2021
Published: 26 January 2021
DOI: https://doi.org/10.1038/s41598-021-82006-9
Springer Nature Limited

This article is cited by

The evaluation of Bcftools mpileup and GATK HaplotypeCaller for variant calling in non-human species
- Messaoud Lefouili
- Kiwoong Nam
Scientific Reports (2022)

Sharing genetic variants with the NGS pipeline is essential for effective genomic data sharing and reproducibility in health information exchange

Abstract

Similar content being viewed by others

AMLVaran: a software approach to implement variant analysis of targeted NGS sequencing data in an oncological care setting

Medical implications of technical accuracy in genome sequencing

Somatic and Germline Variant Calling from Next-Generation Sequencing Data

Introduction