Multiple Alignment of DNA Sequences with MAFFT

Katoh, Kazutaka; Asimenos, George; Toh, Hiroyuki

doi:10.1007/978-1-59745-251-9_3

Kazutaka Katoh²,
George Asimenos³ &
Hiroyuki Toh⁴

Part of the book series: Methods in Molecular Biology ((MIMB,volume 537))

7204 Accesses
796 Citations

Abstract

Multiple alignment of DNA sequences is an important step in various molecular biological analyses. As a large amount of sequence data is becoming available through genome and other large-scale sequencing projects, scalability, as well as accuracy, is currently required for a multiple sequence alignment (MSA) program. In this chapter, we outline the algorithms of an MSA program MAFFT and provide practical advice, focusing on several typical situations a biologist sometimes faces. For genome alignment, which is beyond the scope of MAFFT, we introduce two tools: TBA and MAUVE.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Protocol: USD 49.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Large-Scale Sequence Comparison

Multiple Sequence Alignment

A Survey of Multiple Sequence Alignment Techniques

References

Woese, C. R., and Fox, G. E. (1977) Phylogenetic structure of the prokaryotic domain: the primary kingdoms. Proc Natl Acad Sci USA 74, 5088–90.
Article PubMed CAS Google Scholar
Flicek, P., Keibler, E., Hu, P., Korf, I., and Brent, M. R. (2003) Leveraging the mouse genome for gene prediction in human: from whole-genome shotgun reads to a global synteny map. Genome Res 13, 46–54.
Article PubMed CAS Google Scholar
Katoh, K., Misawa, K., Kuma, K., and Miyata, T. (2002) MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform. Nucleic Acids Res 30, 3059–66.
Article PubMed CAS Google Scholar
Katoh, K., Kuma, K., Toh, H., and Miyata, T. (2005) MAFFT version 5: improvement in accuracy of multiple sequence alignment. Nucleic Acids Res 33, 511–8.
Article PubMed CAS Google Scholar
Wilm, A., Mainz, I., and Steger, G. (2006) An enhanced RNA alignment benchmark for sequence alignment programs. Algorithms Mol Biol 1, 19.
Article PubMed Google Scholar
Carroll, H., Beckstead, W., O’connor, T., Ebbert, M., Clement, M., Snell, Q., and McClellan, D. (2007) DNA reference alignment benchmarks based on tertiary structure of encoded proteins. Bioinformatics 23, 2648–49.
Google Scholar
Blanchette, M., Kent, W. J., Riemer, C., Elnitski, L., Smit, A. F., Roskin, K. M., Baertsch, R., Rosenbloom, K., Clawson, H., Green, E. D., Haussler, D., and Miller, W. (2004) Aligning multiple genomic sequences with the threaded blockset aligner. Genome Res 14, 708–15.
Article PubMed CAS Google Scholar
http://www.bx.psu.edu/miller_lab
Darling, A. C., Mau, B., Blattner, F. R., and Perna, N. T. (2004) Mauve: multiple alignment of conserved genomic sequence with rearrangements. Genome Res 14, 1394–403.
Article PubMed CAS Google Scholar
http://gel.ahabs.wisc.edu/mauve/
Edgar, R. C., and Batzoglou, S. (2006) Multiple sequence alignment. Curr Opin Struct Biol 16, 368–73.
Article PubMed CAS Google Scholar
Needleman, S. B., and Wunsch, C. D. (1970) A general method applicable to the search for similarities in the amino acid sequence of two proteins. J Mol Biol 48, 443–53.
Article PubMed CAS Google Scholar
Smith, T. F., and Waterman, M. S. (1981) Identification of common molecular subsequences. J Mol Biol 147, 195–7.
Article PubMed CAS Google Scholar
Gotoh, O. (1982) An improved algorithm for matching biological sequences. J Mol Biol 162, 705–8.
Article PubMed CAS Google Scholar
Feng, D. F., and Doolittle, R. F. (1987) Progressive sequence alignment as a prerequisite to correct phylogenetic trees. J Mol Evol 25, 351–60.
Article PubMed CAS Google Scholar
Thompson, J. D., Higgins, D. G., and Gibson, T. J. (1994) CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res 22, 4673–80.
Article PubMed CAS Google Scholar
Katoh, K., and Toh, H. (2007) Parttree: an algorithm to build an approximate tree from a large number of unaligned sequences. Bioinformatics 23, 372–4.
Article PubMed CAS Google Scholar
Barton, G. J., and Sternberg, M. J. (1987) A strategy for the rapid multiple alignment of protein sequences. confidence levels from tertiary structure comparisons. J Mol Biol 198, 327–37.
Article PubMed CAS Google Scholar
Berger, M. P., and Munson, P. J. (1991) A novel randomized iterative strategy for aligning multiple protein sequences. Comput Appl Biosci 7, 479–84.
PubMed CAS Google Scholar
Gotoh, O. (1993) Optimal alignment between groups of sequences and its application to multiple sequence alignment. Comput Appl Biosci 9, 361–70.
PubMed CAS Google Scholar
Ishikawa, M., Toya, T., Hoshida, M., Nitta, K., Ogiwara, A., and Kanehisa, M. (1993) Multiple sequence alignment by parallel simulated annealing. Comput Appl Biosci 9, 267–73.
PubMed CAS Google Scholar
Notredame, C., and Higgins, D. G. (1996) Saga: sequence alignment by genetic algorithm. Nucleic Acids Res 24, 1515–24.
Article PubMed CAS Google Scholar
Gotoh, O. (1994) Further improvement in methods of group-to-group sequence alignment with generalized profile operations. Comput Appl Biosci 10, 379–87.
PubMed CAS Google Scholar
Gotoh, O. (1995) A weighting system and algorithm for aligning many phylogenetically related sequences. Comput Appl Biosci 11, 543–51.
PubMed CAS Google Scholar
Gotoh, O. (1996) Significant improvement in accuracy of multiple protein sequence alignments by iterative refinement as assessed by reference to structural alignments. J Mol Biol 264, 823–38.
Article PubMed CAS Google Scholar
Hirosawa, M., Totoki, Y., Hoshida, M., and Ishikawa, M. (1995) Comprehensive study on iterative algorithms of multiple sequence alignment. Comput Appl Biosci 11, 13–18.
PubMed CAS Google Scholar
Vingron, M., and Argos, P. (1989) A fast and sensitive multiple sequence alignment algorithm. Comput Appl Biosci 5, 115–21.
PubMed CAS Google Scholar
Gotoh, O. (1990) Consistency of optimal sequence alignments. Bull Math Biol 52, 509–25.
PubMed CAS Google Scholar
Notredame, C., Holm, L., and Higgins, D. G. (1998) COFFEE: an objective function for multiple sequence alignments. Bioinformatics 14, 407–22.
Article PubMed CAS Google Scholar
Notredame, C., Higgins, D. G., and Heringa, J. (2000) T-coffee: a novel method for fast and accurate multiple sequence alignment. J Mol Biol 302, 205–17.
Article PubMed CAS Google Scholar
Higgins, D. G., and Sharp, P. M. (1988) CLUSTAL: a package for performing multiple sequence alignment on a microcomputer. Gene 73, 237–44.
Article PubMed CAS Google Scholar
Jones, D. T., Taylor, W. R., and Thornton, J. M. (1992) The rapid generation of mutation data matrices from protein sequences. Comput Appl Biosci 8, 275–82.
PubMed CAS Google Scholar
Altschul, S. F. (1998) Generalized affine gap costs for protein sequence alignment. Proteins 32, 88–96.
Article PubMed CAS Google Scholar
Myers, E. W., and Miller, W. (1988) Optimal alignments in linear space. Comput Appl Biosci 4, 11–17.
PubMed CAS Google Scholar
Gribskov, M., McLachlan, A. D., and Eisenberg, D. (1987) Profile analysis: detection of distantly related proteins. Proc Natl Acad Sci USA, 84, 4355–58.
Article PubMed CAS Google Scholar
Schwartz, S., Kent, W. J., Smit, A., Zhang, Z., Baertsch, R., Hardison, R. C., Haussler, D., and Miller, W. (2003) Human-mouse alignments with BLASTZ. Genome Res 13, 103–7.
Article PubMed CAS Google Scholar
http://genome.ucsc.edu/FAQ/FAQformat
http://genome.ucsc.edu/
Smit, A. F. A., Hubley, R., and Green, P. Repeatmasker. http://www.repeatmasker.org/
Benson, G. (1999) Tandem repeats finder: a program to analyze DNA sequences. Nucleic Acids Res 27, 573–80.
Article PubMed CAS Google Scholar
http://globin.cse.psu.edu/dist/gmaj/
http://www.bx.psu.edu/miller_lab/dist/tba_howto.pdf
http://gel.ahabs.wisc.edu/mauve/mauve-user-guide/
Altschul, S. F., Madden, T. L., Schaffer, A. A., Zhang, J., Zhang, Z., Miller, W., and Lipman, D. J. (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search pro grams. Nucleic Acids Res 25, 3389–402.
Article PubMed CAS Google Scholar
Morgenstern, B., Goel, S., Sczyrba, A., and Dress, A. (2003) Altavist: comparing alternative multiple sequence alignments. Bioinformatics 19, 425–6.
Article PubMed CAS Google Scholar
Lassmann, T., and Sonnhammer, E. L. (2007) Automatic extraction of reliable regions from multiple sequence alignments. BMC Bioinformat 8 Suppl 5, S9.
Article Google Scholar
Morgenstern, B., Dress, A., and Werner, T. (1996) Multiple DNA and protein sequence alignment based on segment-to-segment comparison. Proc Natl Acad Sci USA 93, 12098–103.
Article PubMed CAS Google Scholar
Edgar, R. C. (2004) MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res 32, 1792–7.
Article PubMed CAS Google Scholar
Do, C. B., Mahabhashyam, M. S., Brudno, M., and Batzoglou, S. (2005) ProbCons: probabilistic consistency-based multiple sequence alignment. Genome Res 15, 330–40.
Article PubMed CAS Google Scholar
Lassmann, T., and Sonnhammer, E. L. (2005) Kalign – an accurate and fast multiple sequence alignment algorithm. BMC Bioinformat 6, 298.
Article Google Scholar
Wallace, I. M., O’Sullivan, O., Higgins, D. G., and Notredame, C. (2006) M-Coffee: combining multiple sequence alignment methods with t-coffee. Nucleic Acids Res 34, 1692–9.
Article PubMed CAS Google Scholar
Golubchik, T., Wise, M. J., Easteal, S., and Jermiin, L. S. (2007) Mind the gaps: Evidence of bias in estimates of multiple sequence alignments. Mol Biol Evol 24, 2433–42.
Google Scholar
Do, C. B., and Katoh, K. (2008) Protein multiple sequence alignment Functional Proteomics, Methods Mol Biol 484, 379–413.
Google Scholar
Morrison, D. (2006) Multiple sequence alignment for phylogenetic purposes. Aust Syst Bot 19, 479–539.
Article CAS Google Scholar
Roshan, U., and Livesay, D. R. (2006) Probalign: multiple sequence alignment using partition function posterior probabilities. Bioinformatics 22, 2715–21.
Article PubMed CAS Google Scholar
Yamada, S., Gotoh, O., and Yamana, H. (2006) Improvement in accuracy of multiple sequence alignment using novel group-to-group sequence alignment algorithm with piecewise linear gap cost. BMC Bioinformat 7, 524.
Article Google Scholar
Brudno, M., Do, C. B., Cooper, G. M., Kim, M. F., Davydov, E., Green, E. D., Sidow, A., and Batzoglou, S. (2003) LAGAN and multi-LAGAN: efficient tools for large-scale multiple alignment of genomic DNA. Genome Res 13, 721–31.
Article PubMed CAS Google Scholar
Bray, N., and Pachter, L. (2004) MAVID: constrained ancestral alignment of multiple sequences. Genome Res. 14, 693–9.
Article PubMed CAS Google Scholar

Download references

Acknowledgments

We thank Chuong B. Do for critical reading of the manuscript.

Author information

Authors and Affiliations

Digital Medicine Initiative, Kyushu University, 812-8582, Fukuoka, Japan
Kazutaka Katoh
Department of Computer Science, Stanford University, Stanford, CA, USA
George Asimenos
Medical Institute of Bioregulation, Kyushu University, Fukuoka, Japan
Hiroyuki Toh

Authors

Kazutaka Katoh
View author publications
You can also search for this author in PubMed Google Scholar
George Asimenos
View author publications
You can also search for this author in PubMed Google Scholar
Hiroyuki Toh
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Dept. Bioquímica, Genética e Inmunología, Universidad de Vigo, Vigo, 36310, Spain
David Posada

Rights and permissions

Reprints and permissions

Copyright information

About this protocol

Cite this protocol

Katoh, K., Asimenos, G., Toh, H. (2009). Multiple Alignment of DNA Sequences with MAFFT. In: Posada, D. (eds) Bioinformatics for DNA Sequence Analysis. Methods in Molecular Biology, vol 537. Humana Press. https://doi.org/10.1007/978-1-59745-251-9_3

Download citation

DOI: https://doi.org/10.1007/978-1-59745-251-9_3
Published: 28 February 2009
Publisher Name: Humana Press
Print ISBN: 978-1-58829-910-9
Online ISBN: 978-1-59745-251-9
eBook Packages: Springer Protocols

Publish with us

Policies and ethics

Multiple Alignment of DNA Sequences with MAFFT

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Large-Scale Sequence Comparison

Multiple Sequence Alignment

A Survey of Multiple Sequence Alignment Techniques

References

Acknowledgments

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this protocol

Cite this protocol

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Multiple Alignment of DNA Sequences with MAFFT

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Large-Scale Sequence Comparison

Multiple Sequence Alignment

A Survey of Multiple Sequence Alignment Techniques

References

Acknowledgments

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this protocol

Cite this protocol

Download citation

Publish with us

Search

Navigation