Skip to main content

Automated Removal of Non-homologous Sequence Stretches with PREQUAL

  • Protocol
  • First Online:
Multiple Sequence Alignment

Part of the book series: Methods in Molecular Biology ((MIMB,volume 2231))

Abstract

Large-scale multigene datasets used in phylogenomics and comparative genomics often contain sequence errors inherited from source genomes and transcriptomes. These errors typically manifest as stretches of non-homologous characters and derive from sequencing, assembly, and/or annotation errors. The lack of automatic tools to detect and remove sequence errors leads to the propagation of these errors in large-scale datasets. PREQUAL is a command line tool that identifies and masks regions with non-homologous adjacent characters in sets of unaligned homologous sequences. PREQUAL uses a full probabilistic approach based on pair hidden Markov models. On the front end, PREQUAL is user-friendly and simple to use while also allowing full customization to adjust filtering sensitivity. It is primarily aimed at amino acid sequences but can handle protein-coding nucleotide sequences. PREQUAL is computationally efficient and shows high sensitivity and accuracy. In this chapter, we briefly introduce the motivation for PREQUAL and its underlying methodology, followed by a description of basic and advanced usage, and conclude with some notes and recommendations. PREQUAL fills an important gap in the current bioinformatics tool kit for phylogenomics, contributing toward increased accuracy and reproducibility in future studies.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Protocol
USD 49.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 99.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 129.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 199.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Chatzou M, Floden EW, Di Tommaso P, Gascuel O, Notredame C (2018) Generalized bootstrap supports for phylogenetic analyses of protein sequences incorporating alignment uncertainty. Syst Biol 67(6):997–1009

    Article  CAS  Google Scholar 

  2. Philippe H, Brinkmann H, Lavrov DV, Littlewood DTJ, Manuel M, Wörheide G et al (2011) Resolving difficult phylogenetic questions: why more sequences are not enough. PLoS Biol 9(3):e1000602

    Article  CAS  Google Scholar 

  3. Irisarri I, Meyer A (2016) The identification of the closest living relative(s) of tetrapods: phylogenomic lessons for resolving short ancient internodes. Syst Biol 65(6):1057–1075

    Article  Google Scholar 

  4. Schneider A, Souvorov A, Sabath N, Landan G, Gonnet GH, Graur D (2009) Estimates of positive Darwinian selection are inflated by errors in sequencing, annotation, and alignment. Genome Biol Evol 1:114–118

    Article  Google Scholar 

  5. Di Franco A, Poujol R, Baurain D, Philippe H (2019) Evaluating the usefulness of alignment filtering methods to reduce the impact of errors on evolutionary inferences. BMC Evol Biol 19(1):21

    Article  Google Scholar 

  6. Whelan S, Irisarri I, Burki F (2018) PREQUAL: detecting non-homologous characters in sets of unaligned homologous sequences. Bioinformatics 34(22):3929–3930

    CAS  PubMed  Google Scholar 

  7. Criscuolo A, Gribaldo S (2010) BMGE (Block Mapping and Gathering with Entropy): a new software for selection of phylogenetic informative regions from multiple sequence alignments. BMC Evol Biol 10(1):210

    Article  Google Scholar 

  8. Capella-Gutiérrez S, Silla-Martínez JM, Gabaldón T (2009) trimAl: a tool for automated alignment trimming in large-scale phylogenetic analyses. Bioinformatics 25(15):1972–1973

    Article  Google Scholar 

  9. Castresana J (2000) Selection of conserved blocks from multiple alignments for their use in phylogenetic analysis. Mol Biol Evol 17(4):540–552

    Article  CAS  Google Scholar 

  10. Ali RH, Bogusz M, Whelan S (2019) Identifying clusters of high confidence homologies in multiple sequence alignments. Mol Biol Evol 36(10):2340–2351

    Article  CAS  Google Scholar 

  11. Durbin R, Eddy SR, Krogh A, Mitchison GJ (1998) Biological sequence analysis: probabilistic models of proteins and nucleic acids. Cambridge University Press, Cambridge

    Book  Google Scholar 

  12. Wu M, Chatterji S, Eisen JA (2012) Accounting for alignment uncertainty in phylogenomics. PLoS One 7(1):e30288

    Article  CAS  Google Scholar 

  13. Bogusz M, Whelan S (2017) Phylogenetic tree estimation with and without alignment: new distance methods and benchmarking. Syst Biol 66(2):218–231

    CAS  PubMed  Google Scholar 

  14. Fletcher W, Yang Z (2009) INDELible: a flexible simulator of biological sequence evolution. Mol Biol Evol 26(8):1879–1888

    Article  CAS  Google Scholar 

  15. Whelan NV, Kocot KM, Moroz TP, Mukherjee K, Williams P, Paulay G et al (2017) Ctenophore relationships and their placement as the sister group to all other animals. Nat Ecol Evol 1(11):1737–1746

    Google Scholar 

  16. MacLeod A, Irisarri I, Vences M, Steinfartz S (2015) The complete mitochondrial genomes of the Galápagos iguanas, Amblyrhynchus cristatus and Conolophus subcristatus. Mitochondr DNA Part A 27(5):3699–3700

    Article  Google Scholar 

  17. Burki F, Kaplan M, Tikhonenkov DV, Zlatogursky V, Minh BQ, Radaykina LV et al (2016) Untangling the early diversification of eukaryotes: a phylogenomic study of the evolutionary origins of Centrohelida, Haptophyta and Cryptista. Proc R Soc B-Biol Sci 283(1823):20152802

    Article  Google Scholar 

  18. Tange O (2015) GNU Parallel 20150322 (‘Hellwig’). USENIX Magazine 36:42–47

    Google Scholar 

  19. Köster J, Rahmann S (2012) Snakemake: a scalable bioinformatics workflow engine. Bioinformatics 28(19):2520–2522

    Article  Google Scholar 

  20. Larsson A (2014) AliView: a fast and lightweight alignment viewer and editor for large datasets. Bioinformatics 30(22):3276–3278

    Article  CAS  Google Scholar 

Download references

Acknowledgments

We would like to thank Kazutaka Katoh for the possibility of contributing this chapter. Max E. Schön provided comments on an earlier version. II acknowledges the support from a Juan de la Cierva-Incorporación postdoctoral fellowship (IJCI-2016-29566) from the Spanish Ministry of Science and Competitiveness (MINECO). This work in the lab of FB is supported by a fellowship from Science for Life Laboratory. SW thanks the Carl Tryggers Stiftelse and Uppsala University for support.

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Science+Business Media, LLC, part of Springer Nature

About this protocol

Check for updates. Verify currency and authenticity via CrossMark

Cite this protocol

Irisarri, I., Burki, F., Whelan, S. (2021). Automated Removal of Non-homologous Sequence Stretches with PREQUAL. In: Katoh, K. (eds) Multiple Sequence Alignment. Methods in Molecular Biology, vol 2231. Humana, New York, NY. https://doi.org/10.1007/978-1-0716-1036-7_10

Download citation

  • DOI: https://doi.org/10.1007/978-1-0716-1036-7_10

  • Published:

  • Publisher Name: Humana, New York, NY

  • Print ISBN: 978-1-0716-1035-0

  • Online ISBN: 978-1-0716-1036-7

  • eBook Packages: Springer Protocols

Publish with us

Policies and ethics