Who Watches the Watchmen? An Appraisal of Benchmarks for Multiple Sequence Alignment

Iantorno, Stefano; Gori, Kevin; Goldman, Nick; Gil, Manuel; Dessimoz, Christophe

doi:10.1007/978-1-62703-646-7_4

Stefano Iantorno^3,4,
Kevin Gori⁵,
Nick Goldman⁵,
Manuel Gil⁶ &
…
Christophe Dessimoz⁵

Part of the book series: Methods in Molecular Biology ((MIMB,volume 1079))

5598 Accesses
29 Citations
34 Altmetric

Abstract

Multiple sequence alignment (MSA) is a fundamental and ubiquitous technique in bioinformatics used to infer related residues among biological sequences. Thus alignment accuracy is crucial to a vast range of analyses, often in ways difficult to assess in those analyses. To compare the performance of different aligners and help detect systematic errors in alignments, a number of benchmarking strategies have been pursued. Here we present an overview of the main strategies—based on simulation, consistency, protein structure, and phylogeny—and discuss their different advantages and associated risks. We outline a set of desirable characteristics for effective benchmarking, and evaluate each strategy in light of them. We conclude that there is currently no universally applicable means of benchmarking MSA, and that developers and users of alignment tools should base their choice of benchmark depending on the context of application—with a keen awareness of the assumptions underlying each benchmarking strategy.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Protocol: USD 49.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Revisiting Evaluation of Multiple Sequence Alignment Methods

Multiple Sequence Alignment

Benchmarking of alignment-free sequence comparison methods

Article Open access 25 July 2019

Notes

1.
Stefano Iantorno and Kevin Gori contributed equally to this work.

References

Kemena C, Notredame C (2009) Upcoming challenges for multiple sequence alignment methods in the high-throughput era. Bioinformatics 25(19):2455–2465
Article PubMed CAS Google Scholar
Aniba MR, Poch O, Thompson JD (2010) Issues in bioinformatics benchmarking: the case study of multiple sequence alignment. Nucleic Acids Res 38(21):7353–7363
Article PubMed CAS Google Scholar
Edgar RC (2010) Quality measures for protein alignment benchmarks. Nucleic Acids Res 38(7):2145–2153
Article PubMed CAS Google Scholar
Thompson JD, Linard B, Lecompte O, Poch O (2011) A comprehensive benchmark study of multiple sequence alignment methods: current challenges and future perspectives. PLoS One 6(3):e18093
Article PubMed CAS Google Scholar
Löytynoja A (2012) Alignment methods: strategies, challenges, benchmarking, and comparative overview. Methods Mol Biol 855:203–235
Article PubMed Google Scholar
Thompson JD, Higgins DG, Gibson TJ (1994) CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res 22(22):4673–4680
Article PubMed CAS Google Scholar
Morrison DA (2009) Why would phylogeneticists ignore computerized sequence alignment? Syst Biol 58(1):150–158
Article PubMed CAS Google Scholar
Mardis ER (2008) The impact of next-generation sequencing technology on genetics. Trends Genet 24(3):133–141. doi:10.1016/j.tig.2007.12.007
Article PubMed CAS Google Scholar
Anisimova M, Cannarozzi G, Liberles D (2010) Finding the balance between the mathematical and biological optima in multiple sequence alignment. Trends Evol Biol 2(1):e7
Article Google Scholar
Stebbings LA, Mizuguchi K (2004) HOMSTRAD: recent developments of the Homologous Protein Structure Alignment Database. Nucleic Acids Res 32(Database issue):D203–D207
Article PubMed CAS Google Scholar
Thompson JD, Koehl P, Ripp R, Poch O (2005) BAliBASE 3.0: latest developments of the multiple sequence alignment benchmark. Proteins 61:127–136
Article PubMed CAS Google Scholar
Stoye J, Evers D, Meyer F (1998) Rose: generating sequence families. Bioinformatics 14(2):157–163
Article PubMed CAS Google Scholar
Cartwright RA (2005) DNA assembly with gaps (Dawg): simulating sequence evolution. Bioinformatics 21(Suppl 3):iii31–iii38
Article PubMed CAS Google Scholar
Hall BG (2008) Simulating DNA coding sequence evolution with EvolveAGene 3. Mol Biol Evol 25(4):688–695
Article PubMed CAS Google Scholar
Fletcher W, Yang Z (2009) INDELible: a flexible simulator of biological sequence evolution. Mol Biol Evol 26(8):1879–1888
Article PubMed CAS Google Scholar
Sipos B, Massingham T, Jordan GE, Goldman N (2011) PhyloSim – Monte Carlo simulation of sequence evolution in the R statistical computing environment. BMC Bioinformatics 12(1):104
Article PubMed Google Scholar
Koestler T, Av H, Ebersberger I (2012) REvolver: modeling sequence evolution under domain constraints. Mol Biol Evol 29(9):2133–2145
Article PubMed CAS Google Scholar
Dalquen DA, Anisimova M, Gonnet GH, Dessimoz C (2012) ALF-a simulation framework for genome evolution. Mol Biol Evol 29(4):1115–1123
Article PubMed CAS Google Scholar
Thompson JD, Plewniak F, Poch O (1999) A comprehensive comparison of multiple sequence alignment programs. Nucleic Acids Res 27(13):2682–2690, gkc432 [pii]
Article PubMed CAS Google Scholar
Blackburne BP, Whelan S (2012) Measuring the distance between multiple sequence alignments. Bioinformatics 28(4):495–502. doi:10.1093/bioinformatics/btr701
Article PubMed CAS Google Scholar
Löytynoja A, Goldman N (2008) Phylogeny-aware gap placement prevents errors in sequence alignment and evolutionary analysis. Science 320(5883):1632–1635. doi:10.1126/science.1158395
Article PubMed Google Scholar
Golubchik T, Wise MJ, Easteal S, Jermiin LS (2007) Mind the gaps: evidence of bias in estimates of multiple sequence alignments. Mol Biol Evol 24(11):2433–2442
Article PubMed CAS Google Scholar
Huelsenbeck JP (1995) Performance of phylogenetic methods in simulation. Syst Biol 44(1):17–48
Google Scholar
Do CB, Mahabhashyam MS, Brudno M, Batzoglou S (2005) ProbCons: probabilistic consistency-based multiple sequence alignment. Genome Res 15(2):330–340. doi:10.1101/gr.2821705
Article PubMed CAS Google Scholar
Notredame C, Higgins DG, Heringa J (2000) T-Coffee: a novel method for fast and accurate multiple sequence alignment. J Mol Biol 302(1):205–217. doi:10.1006/jmbi.2000.4042
Article PubMed CAS Google Scholar
Lassmann T, Sonnhammer ELL (2005) Automatic assessment of alignment quality. Nucleic Acids Res 33(22):7120–7128
Article PubMed CAS Google Scholar
Landan G, Graur D (2007) Heads or tails: a simple reliability check for multiple sequence alignments. Mol Biol Evol 24(6):1380–1383
Article PubMed CAS Google Scholar
Hall BG (2008) How well does the HoT score reflect sequence alignment accuracy? Mol Biol Evol 25(8):1576–1580
Article PubMed CAS Google Scholar
Chothia C, Lesk AM (1986) The relation between the divergence of sequence and structure in proteins. EMBO J 5(4):823
PubMed CAS Google Scholar
Mizuguchi K, Deane CM, Blundell TL, Overington JP (1998) HOMSTRAD: a database of protein structure alignments for homologous families. Protein Sci 7(11):2469–2471. doi:10.1002/pro.5560071126
Article PubMed CAS Google Scholar
Thompson JD, Plewniak F, Poch O (1999) BAliBASE: a benchmark alignment database for the evaluation of multiple alignment programs. Bioinformatics 15(1):87–88, btc017 [pii]
Article PubMed CAS Google Scholar
Van Walle I, Lasters I, Wyns L (2005) SABmark – a benchmark for sequence alignment that covers the entire known fold space. Bioinformatics 21(7):1267–1268. doi:10.1093/bioinformatics/bth493
Article PubMed Google Scholar
Edgar RC (2004) MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res 32(5):1792–1797. doi:10.1093/nar/gkh340
Article PubMed CAS Google Scholar
Gardner P, Wilm A, Washietl S (2005) A benchmark of multiple sequence alignment programs upon structural RNAs. Nucleic Acids Res 33(8):2433–2439
Article PubMed CAS Google Scholar
Kim J, Sinha S (2010) Towards realistic benchmarks for multiple alignments of non-coding sequences. BMC Bioinformatics 11:54
Article PubMed Google Scholar
Mathews DH (2005) Predicting a set of minimal free energy RNA secondary structures common to two sequences. Bioinformatics 21(10):2246–2253. doi:10.1093/bioinformatics/bti349
Article PubMed CAS Google Scholar
Havgaard JH, Lyngso RB, Stormo GD, Gorodkin J (2005) Pairwise local structural alignment of RNA sequences with sequence similarity less than 40%. Bioinformatics 21(9):1815–1824. doi:10.1093/bioinformatics/bti279
Article PubMed CAS Google Scholar
Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ (1990) Basic local alignment search tool. J Mol Biol 215:403–410
PubMed CAS Google Scholar
Thompson JD, Fdr P, Ripp R, Thierry J-C, Poch O (2001) Towards a reliable objective function for multiple sequence alignments1. J Mol Biol 314(4):937–951. doi:10.1006/jmbi.2001.5187
Article PubMed CAS Google Scholar
Raghava GP, Searle SM, Audley PC, Barber JD, Barton GJ (2003) OXBench: a benchmark for evaluation of protein multiple sequence alignment accuracy. BMC Bioinformatics 4:47. doi:10.1186/1471-2105-4-47
Article PubMed CAS Google Scholar
Russell RB, Barton GJ (1992) Multiple protein sequence alignment from tertiary structure comparison: assignment of global and residue confidence levels. Proteins 14(2):309–323. doi:10.1002/prot.340140216
Article PubMed CAS Google Scholar
Pop M, Salzberg SL (2008) Bioinformatics challenges of new sequencing technology. Trends Genet 24(3):142–149. doi:10.1016/j.tig.2007.12.006
Article PubMed CAS Google Scholar
Berger SA, Stamatakis A (2011) Aligning short reads to reference alignments and trees. Bioinformatics 27(15):2068–2075. doi:10.1093/bioinformatics/btr320
Article PubMed CAS Google Scholar
Dessimoz C, Gil M (2010) Phylogenetic assessment of alignments reveals neglected tree signal in gaps. Genome Biol 11(4):R37
Article PubMed Google Scholar
Jordan G, Goldman N (2011) The effects of alignment error and alignment filtering on the sitewise detection of positive selection. Mol Biol Evol 29:1125. doi:10.1093/molbev/msr272
Article PubMed Google Scholar
Blackshields G, Wallace IM, Larkin M, Higgins DG (2006) Analysis and comparison of benchmarks for multiple sequence alignment. In Silico Biol 6(4):321–339
PubMed CAS Google Scholar
Lassmann T, Sonnhammer EL (2002) Quality assessment of multiple alignment programs. FEBS Lett 529(1):126–130, S0014579302031897 [pii]
Article PubMed CAS Google Scholar
Strope CL, Abel K, Scott SD, Moriyama EN (2009) Biological sequence simulation for testing complex evolutionary hypotheses: indel-Seq-Gen version 2.0. Mol Biol Evol 26(11):2581–2593. doi:10.1093/molbev/msp174
Article PubMed CAS Google Scholar
Lassmann T, Sonnhammer EL (2006) Kalign, Kalignvu and Mumsa: web servers for multiple sequence alignment. Nucleic Acids Res 34(Web Server issue):W596–W599. doi:10.1093/nar/gkl191
Article PubMed CAS Google Scholar
Kemena C, Taly JF, Kleinjung J, Notredame C (2011) STRIKE: evaluation of protein MSAs using a single 3D structure. Bioinformatics 27(24):3385–3391. doi:10.1093/bioinformatics/btr587
Article PubMed CAS Google Scholar

Download references

Acknowledgments

The authors thank Julie Thompson for helpful feedback on the manuscript. CD is supported by SNSF advanced researcher fellowship #136461. This article started as assignment for the graduate course “Reviews in Computational Biology” at the Cambridge Computational Biology Institute, University of Cambridge.

Author information

Authors and Affiliations

Wellcome Trust Sanger Institute, Cambridge, UK
Stefano Iantorno
National Institute of Allergy and Infectious Diseases, National Institutes of Health, Bethesda, MD, USA
Stefano Iantorno
EMBL-European Bioinformatics Institute, Cambridge, UK
Kevin Gori, Nick Goldman & Christophe Dessimoz
Max F. Perutz Laboratories, Center for Integrative Bioinformatics Vienna, Medical University Vienna, University of Vienna, Vienna, Austria
Manuel Gil

Authors

Stefano Iantorno
View author publications
You can also search for this author in PubMed Google Scholar
Kevin Gori
View author publications
You can also search for this author in PubMed Google Scholar
Nick Goldman
View author publications
You can also search for this author in PubMed Google Scholar
Manuel Gil
View author publications
You can also search for this author in PubMed Google Scholar
Christophe Dessimoz
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Dept. of Electrical Engineering, University of Nebraska-Lincoln, Lincoln, Nebraska, USA
David J Russell

Rights and permissions

Reprints and permissions

Copyright information

About this protocol

Cite this protocol

Iantorno, S., Gori, K., Goldman, N., Gil, M., Dessimoz, C. (2014). Who Watches the Watchmen? An Appraisal of Benchmarks for Multiple Sequence Alignment. In: Russell, D. (eds) Multiple Sequence Alignment Methods. Methods in Molecular Biology, vol 1079. Humana Press, Totowa, NJ. https://doi.org/10.1007/978-1-62703-646-7_4

Download citation

DOI: https://doi.org/10.1007/978-1-62703-646-7_4
Published: 23 August 2013
Publisher Name: Humana Press, Totowa, NJ
Print ISBN: 978-1-62703-645-0
Online ISBN: 978-1-62703-646-7
eBook Packages: Springer Protocols

Publish with us

Policies and ethics

Who Watches the Watchmen? An Appraisal of Benchmarks for Multiple Sequence Alignment

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Revisiting Evaluation of Multiple Sequence Alignment Methods

Multiple Sequence Alignment

Benchmarking of alignment-free sequence comparison methods

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this protocol

Cite this protocol

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Who Watches the Watchmen? An Appraisal of Benchmarks for Multiple Sequence Alignment

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Revisiting Evaluation of Multiple Sequence Alignment Methods

Multiple Sequence Alignment

Benchmarking of alignment-free sequence comparison methods

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this protocol

Cite this protocol

Download citation

Publish with us

Search

Navigation