Abstract
Short tandem repeats (STRs), also known as microsatellites, have a wide range of applications, including medical genetics, forensics, and population genetics. High-throughput sequencing has the potential to profile large numbers of STRs, but cumbersome gapped alignment and STR-specific noise patterns hamper this task. We recently developed an algorithm, called lobSTR, to overcome these challenges and to accurately profile STRs from short reads. Here we describe how to use lobSTR to call STR variations from high-throughput sequencing datasets and to diagnose the quality of the calls.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Mirkin SM (2007) Expandable DNA repeats and human disease. Nature 447:932
(1993) A novel gene containing a trinucleotide repeat that is expanded and unstable on Huntington’s disease chromosomes. The Huntington’s Disease Collaborative Research Group. Cell 72: 971
Pearson CE, Nichol Edamura K, Cleary JD (2005) Repeat instability: mechanisms of dynamic mutations. Nat Rev Genet 6:729
Kozlowski P, Sobczak K, Krzyzosiak WJ (2010) Trinucleotide repeats: triggers for genomic disorders? Genome Med 2:29
Broman KW, Murray JC, Sheffield VC, White RL, Weber JL (1998) Comprehensive human genetic maps: individual and sex-specific variation in recombination. Am J Hum Genet 63:861
Butler JM, Buel E, Crivellente F, McCord BR (2004) Forensic DNA typing by capillary electrophoresis using the ABI Prism 310 and 3100 genetic analyzers for STR analysis. Electrophoresis 25:1397
Zhivotovsky LA et al (2004) The effective mutation rate at Y chromosome short tandem repeats, with application to human population-divergence time. Am J Hum Genet 74:50
Treangen TJ, Salzberg SL (2012) Repetitive DNA and next-generation sequencing: computational challenges and solutions. Nat Rev Genet 13:36
Li H, Durbin R (2009) Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25:1754
Li H, Homer N (2010) A survey of sequence alignment algorithms for next-generation sequencing. Brief Bioinform 11:473
Gymrek M, Golan D, Rosset S, Erlich Y (2012) lobSTR: a short tandem repeat profiler for personal genomes. Genome Res 22(6):1154–1162
Danecek P et al (2011) The variant call format and VCFtools. Bioinformatics 27:2156
Benson G (1999) Tandem repeats finder: a program to analyze DNA sequences. Nucleic Acids Res 27:573
Kent WJ et al (2002) The human genome browser at UCSC. Genome Res 12:996
Robinson JT et al (2011) Integrative genomics viewer. Nat Biotechnol 29:24
Cock PJ, Fields CJ, Goto N, Heuer ML, Rice PM (2010) The Sanger FASTQ file format for sequences with quality scores, and the Solexa/Illumina FASTQ variants. Nucleic Acids Res 38:1767
Li H et al (2009) The sequence alignment/map format and SAMtools. Bioinformatics 25:2078
Bentley DR et al (2008) Accurate whole human genome sequencing using reversible terminator chemistry. Nature 456:53
Wheeler DA et al (2008) The complete genome of an individual by massively parallel DNA sequencing. Nature 452:872
Friedmann T (1979) Rapid nucleotide sequencing of DNA. Am J Hum Genet 31:19
Rothberg JM et al (2011) An integrated semiconductor device enabling non-optical genome sequencing. Nature 475:348
Loman NJ et al (2012) Performance comparison of benchtop high-throughput sequencing platforms. Nat Biotechnol 30(5):434–439
Kent WJ et al (2002) The human genome browser at UCSC. Genome Res 12:996
Sharma D, Issac B, Raghava GP, Ramaswamy R (2004) Spectral Repeat Finder (SRF): identification of repetitive sequences using Fourier transformation. Bioinformatics 20:1405
Leclercq S, Rivals E, Jarne P (2007) Detecting microsatellites within genomes: significant variation among algorithms. BMC Bioinformatics 8:125
Lim KG, Kwoh CK, Hsu LY, Wirawan A (2013) Review of tandem repeat search tools: a systematic approach to evaluating algorithmic performance. Brief Bioinform 14(1):67–81
Castelo AT, Martins W, Gao GR (2002) TROLL–tandem repeat occurrence locator. Bioinformatics 18:634
Needleman SB, Wunsch CD (1970) A general method applicable to the search for similarities in the amino acid sequence of two proteins. J Mol Biol 48:443
Acknowledgements
Y.E. is an Andria and Paul Heafy Family Fellow. This publication was supported by the National Defense Science and Engineering Graduate Fellowship (M.G.). We thank Dina Esposito for useful comments.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer Science+Business Media New York
About this protocol
Cite this protocol
Gymrek, M., Erlich, Y. (2013). Profiling Short Tandem Repeats from Short Reads. In: Shomron, N. (eds) Deep Sequencing Data Analysis. Methods in Molecular Biology, vol 1038. Humana Press, Totowa, NJ. https://doi.org/10.1007/978-1-62703-514-9_7
Download citation
DOI: https://doi.org/10.1007/978-1-62703-514-9_7
Published:
Publisher Name: Humana Press, Totowa, NJ
Print ISBN: 978-1-62703-513-2
Online ISBN: 978-1-62703-514-9
eBook Packages: Springer Protocols