The Practical Evaluation of DNA Barcode Efficacy

Spouge, John L.; Mariño-Ramírez, Leonardo

doi:10.1007/978-1-61779-591-6_17

John L. Spouge³ &
Leonardo Mariño-Ramírez³

Part of the book series: Methods in Molecular Biology ((MIMB,volume 858))

6506 Accesses
9 Citations

Abstract

This chapter describes a workflow for measuring the efficacy of a barcode in identifying species. First, assemble individual sequence databases corresponding to each barcode marker. A controlled collection of taxonomic data is preferable to GenBank data, because GenBank data can be problematic, particularly when comparing barcodes based on more than one marker. To ensure proper controls when evaluating species identification, specimens not having a sequence in every marker database should be discarded. Second, select a computer algorithm for assigning species to barcode sequences. No algorithm has yet improved notably on assigning a specimen to the species of its nearest neighbor within a barcode database. Because global sequence alignments (e.g., with the Needleman–Wunsch algorithm, or some related algorithm) examine entire barcode sequences, they generally produce better species assignments than local sequence alignments (e.g., with BLAST). No neighboring method (e.g., global sequence similarity, global sequence distance, or evolutionary distance based on a global alignment) has yet shown a notable superiority in identifying species. Finally, “the probability of correct identification” (PCI) provides an appropriate measurement of barcode efficacy. The overall PCI for a data set is the average of the species PCIs, taken over all species in the data set. This chapter states explicitly how to calculate PCI, how to estimate its statistical sampling error, and how to use data on PCR failure to set limits on how much improvements in PCR technology can improve species identification.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Protocol: USD 49.95; Price excludes VAT (USA)

eBook: USD 99.00; Price excludes VAT (USA)

Softcover Book: USD 129.99; Price excludes VAT (USA)

Hardcover Book: USD 179.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Measurement of a Barcode’s Accuracy in Identifying Species

Guidelines for DNA taxonomy, with a focus on the meiofauna

Article 28 February 2015

PARSID: a Python script for automatic analysis of local BLAST results for a rapid molecular taxonomic identification

Article Open access 24 January 2024

References

Hebert PD, Cywinska A, Ball SL, Dewaard JR (2003) Biological identifications through DNA barcodes. Proc Biol Sci 270:313–321
Article PubMed CAS Google Scholar
Floyd R, Abebe E, Papert A, Blaxter M (2002) Molecular barcodes for soil nematode identification. Mol Ecol 11:839–850
Article PubMed CAS Google Scholar
Hebert PD, Ratnasingham S, Dewaard JR (2003) Barcoding animal life: cytochrome c oxidase subunit 1 divergences among closely related species. Proc Biol Sci 270:S96–S99
Article PubMed CAS Google Scholar
Hajibabaei M, Janzen DM, Burns JM et al (2006) DNA barcodes distinguish species of tropical lepidoptera. Proc Natl Acad Sci U S A 103:968–971
Article PubMed Google Scholar
Hogg ID, Hebert PDN (2004) Biological identification of springtails (hexapoda: Collembola) from the canadian arctic, using mitochondrial DNA barcodes. Can J Zool 82:749–754
Article Google Scholar
Lorenz JG, Jackson WE, Beck JC, Hanner R (2005) The problems and promise of DNA barcodes for species diagnosis of primate biomaterials. Philos Trans R Soc Lond B Biol Sci 360:1869–1877
Article PubMed CAS Google Scholar
Meyer CP, Paulay G (2005) DNA barcoding: error rates based on comprehensive sampling. PLoS Biol 3:e422
Article PubMed Google Scholar
Saunders GW (2005) Applying DNA barcoding to red macroalgae: a preliminary appraisal holds promise for future applications. Philos Trans R Soc Lond B Biol Sci 360:1879–1888
Article PubMed CAS Google Scholar
Smith MA, Fisher BL, Hebert PDN (2005) DNA barcoding for effective biodiversity assessment of a hyperdiverse arthropod group: the ants of Madagascar. Philos Trans R Soc Lond B Biol Sci 360:1825–1834
Article PubMed CAS Google Scholar
Smith MA, Woodley NE, Janzen DH et al (2006) DNA barcodes reveal cryptic host-specificity within the presumed polyphagous members of a genus of parasitoid flies (diptera: Tachinidae). Proc Natl Acad Sci U S A 103:3657–3662
Article PubMed CAS Google Scholar
Chase MW, Salamin N, Wilkinson M et al (2005) Land plants and DNA barcodes: short-term and long-term goals. Philos Trans R Soc Lond B Biol Sci 360:1889–1895
Article PubMed CAS Google Scholar
Cowan RS, Chase MW, Kress JW, Savolainen V (2006) 300,000 species to identify: problems, progress, and prospects in DNA barcoding of land plants. Taxon 55:611–616
Article Google Scholar
Kress WJ, Erickson DL (2008) DNA barcodes: genes, genomics, and bioinformatics. Proc Natl Acad Sci U S A 105:2761–2762
Article PubMed CAS Google Scholar
Cbol Plant Working Group (2009) A DNA barcode for land plants. Proc Natl Acad Sci U S A 106:12794–12797
Article Google Scholar
Meier R, Shiyang K, Vaidya G, Ng PK (2006) DNA barcoding and taxonomy in diptera: a tale of high intraspecific variability and low identification success. Syst Biol 55:715–728
Article PubMed Google Scholar
Huang D, Meier R, Todd PA, Chou LM (2008) Slow mitochondrial coI sequence evolution at the base of the metazoan tree and its implications for DNA barcoding. J Mol Evol 66:167–174
Article PubMed CAS Google Scholar
Erickson DL, Spouge JL, Resch A et al (2008) DNA barcoding in land plants: developing standards to quantify and maximize success. Taxon 13:1304–1316
Google Scholar
Kress WJ, Erickson DL (2007) A two-locus global DNA barcode for land plants: the coding rbcl gene complements the non-coding trnh-psba spacer region. PLoS One 2:e508
Article PubMed Google Scholar
Austerlitz F (2007) Comparing phylogenetic and statistical classification methods for DNA barcoding. Paper presented at the second international barcode of life conference, Taipei, Taiwan, 2007
Google Scholar
Little DP, Stevenson DW (2007) A comparison of algorithms for the identification of specimens using DNA barcodes: examples from gymnosperms. Cladistics 23:1–27
Article Google Scholar
Altschul SF, Madden TL, Schaffer AA et al (1997) Gapped blast and psi-blast: a new generation of protein database search programs. Nucleic Acids Res 25:3389–3402
Article PubMed CAS Google Scholar
Altschul S (1999) Hot papers – bioinformatics – gapped blast and psi-blast: a new generation of protein database search programs by s.F. Altschul, t.L. Madden, a.A. Schaffer, j.H. Zhang, z. Zhang, w. Miller, d.J. Lipman – comments. Scientist 13:15
Google Scholar
Wouters MA, Husain A (2001) Changes in zinc ligation promote remodeling of the active site in the zinc hydrolase superfamily. J Mol Biol 314:1191–1207
Article PubMed CAS Google Scholar
Needleman SB, Wunsch CD (1970) A general method applicable to the search for similarities in the amino acid sequence of two proteins. J Mol Biol 48:443–453
Article PubMed CAS Google Scholar
Smith TF, Waterman MS (1981) Identification of common molecular subsequences. J Mol Biol 147:195–197
Article PubMed CAS Google Scholar
Eddy SR (1995) Multiple alignment using hidden markov models. Proc Int Conf Intell Syst Mol Biol 3:114–120
PubMed CAS Google Scholar
Edgar RC (2004) Muscle: a multiple sequence alignment method with reduced time and space complexity. BMC Bioinformatics 5:113
Article PubMed Google Scholar
Katoh K, Misawa K, Kuma K, Miyata T (2002) Mafft: a novel method for rapid multiple sequence alignment based on fast fourier transform. Nucleic Acids Res 30:3059–3066
Article PubMed CAS Google Scholar
Matz MV, Nielsen R (2005) A likelihood ratio test for species membership based on DNA sequence data. Philos Trans R Soc Lond B Biol Sci 360:1969–1974
Article PubMed CAS Google Scholar
Nielsen R, Matz M (2006) Statistical approaches for DNA barcoding. Syst Biol 55: 162–169
Article PubMed Google Scholar
Felsenstein J (1981) Evolutionary trees from DNA sequences: a maximum likelihood appr-oach. J Mol Evol 17:368–376
Article PubMed CAS Google Scholar
Felsenstein J (1988) Phylogenies from molecular sequences – inference and reliability. Annu Rev Genet 22:521–565
Article PubMed CAS Google Scholar
Kuksa P, Pavlovic V (2009) Efficient alignment-free DNA barcode analytics. BMC Bioinformatics 10:S9
Article PubMed Google Scholar
Efron B, Stein C (1981) The jackknife estimate of variance. Ann Stat 9:586–596
Article Google Scholar
Ferguson JWH (2002) On the use of genetic divergence for identifying species. Biol J Linnean Soc 75:509–516
Article Google Scholar
Blaxter M, Mann J, Chapman T et al (2005) Defining operational taxonomic units using DNA barcode data. Philos Trans R Soc Lond B Biol Sci 360:1935–1943
Article PubMed CAS Google Scholar
Lambert DM, Baker A, Huynen L et al (2005) Is a large-scale DNA-based inventory of ancient life possible? J Hered 96(3):279–284
Article PubMed CAS Google Scholar
Jukes TH, Cantor CR (1969) Evolution of protein molecules. In: Munro HN (ed) Mammalian protein metabolism. Academic, New York, pp 21–123
Google Scholar
Kimura M (1980) A simple method for estimating evolutionary rates of base substitutions through comparative studies of nucleotide sequences. J Mol Evol 16:111–120
Article PubMed CAS Google Scholar
Jin L, Nei M (1990) Limitations of the evolutionary parsimony method of phylogenetic analysis. Mol Biol Evol 7:82–102
PubMed CAS Google Scholar
Tamura K (1994) Model selection in the estimation of the number of nucleotide substitutions. Mol Biol Evol 11:154–157
PubMed CAS Google Scholar
Waterman MS, Smith TF, Beyer WA (1976) Some biological sequence metrics. Adv Math 20:367–387
Article Google Scholar

Download references

Acknowledgment

This research was supported in part by the Intramural Research Program of the NIH, NLM, NCBI.

Author information

Authors and Affiliations

National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
John L. Spouge & Leonardo Mariño-Ramírez

Authors

John L. Spouge
View author publications
You can also search for this author in PubMed Google Scholar
Leonardo Mariño-Ramírez
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to John L. Spouge .

Editor information

Editors and Affiliations

National Museum of Natural History, Dept. Botany, Smithsonian Institution, 10th St. & Constitution Ave. NW.,, Washington, 20560-0166, District of Columbia, USA
W. John Kress
National Museum of Natural History, Dept. Botany, Smithsonian Institution, 10th St. & Constitution Ave. NW.,, Washington, 20560-0166, District of Columbia, USA
David L. Erickson

Appendix

For a barcode with several markers, each of which can have a failed PCR, specimen identification ultimately relies on the markers with a successful PCR. To quantify the identification process, number the markers $ \left\{1,2,\mathrm{...},m\right\}$, and consider any subset $ M$ of $ \left\{1,2,\mathrm{...},m\right\}$. For a particular specimen, let the probability that $ M$ is the subset of markers with PCR success be denoted by $ {s}_{M}$, and let the PCI for the barcode based on the marker subset $ M$ be $ {p}_{M}$. A species PCI $ p$ can then be calculated from the values of $ {s}_{M}$ and $ {p}_{M}$ (although the calculation depends on the definition of species PCI: see Section 2.3 for various definitions.)

One very reasonable definition of the PCR-adjusted species PCI is the average $ p={\displaystyle {\sum }_{(M)}{p}_{M}{s}_{M}}$. For the case of a barcode based on a single marker, e.g., $ M$is a subset of $ \left\{1\right\}$, i.e., the empty set $ \left\{\right\}$or $ \left\{1\right\}$. Because the empty set$ \left\{\right\}$corresponds to a complete absence of information about a specimen, the corresponding PCI is $ {p}_{\left\{\right\}}=0$, so $ p={p}_{\left\{\right\}}{s}_{\left\{\right\}}+{p}_{\left\{1\right\}}{s}_{\left\{1\right\}}={p}_{\left\{1\right\}}{s}_{\left\{1\right\}}$, which agrees with the formula for the PCR-adjusted PCI in the main text, for a barcode based on a single marker.

Rights and permissions

Reprints and permissions

Copyright information

About this protocol

Cite this protocol

Spouge, J.L., Mariño-Ramírez, L. (2012). The Practical Evaluation of DNA Barcode Efficacy. In: Kress, W., Erickson, D. (eds) DNA Barcodes. Methods in Molecular Biology, vol 858. Humana Press, Totowa, NJ. https://doi.org/10.1007/978-1-61779-591-6_17

Download citation

DOI: https://doi.org/10.1007/978-1-61779-591-6_17
Published: 29 March 2012
Publisher Name: Humana Press, Totowa, NJ
Print ISBN: 978-1-61779-590-9
Online ISBN: 978-1-61779-591-6
eBook Packages: Springer Protocols

Publish with us

Policies and ethics

The Practical Evaluation of DNA Barcode Efficacy

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Measurement of a Barcode’s Accuracy in Identifying Species

Guidelines for DNA taxonomy, with a focus on the meiofauna

PARSID: a Python script for automatic analysis of local BLAST results for a rapid molecular taxonomic identification

References

Acknowledgment

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Appendix

Rights and permissions

Copyright information

About this protocol

Cite this protocol

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

The Practical Evaluation of DNA Barcode Efficacy

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Measurement of a Barcode’s Accuracy in Identifying Species

Guidelines for DNA taxonomy, with a focus on the meiofauna

PARSID: a Python script for automatic analysis of local BLAST results for a rapid molecular taxonomic identification

References

Acknowledgment

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Appendix

Appendix

Rights and permissions

Copyright information

About this protocol

Cite this protocol

Download citation

Publish with us

Search

Navigation