Keywords

4.1 Introduction

Medicinal herbs have been used in Indian medicinal systems since ancient times. Currently, authentication of medicinal plants is a big issue. Trading of medicinal plants and its products worldwide is estimated at around US$60 billion, and an annual turnover of Ayurvedic medicine in the international market is about Rs. 3500 crores (US$813 million) (Biswas and Biswas 2014). According to the World Health Organization’s (WHO) guidelines, authenticity, purity, and safety are important aspects of standardization and evaluation of traditional medicines. Due to commercialization and increased demand of Ayurvedic herbs, safety, quality, and assurance are big issues (Chan 2003). Taxonomists are busy in naming and annotating the huge number of organisms constituting the biodiversity. A large variety of species are measured annually. However, there still remains an enormous diverseness to be explored. The current scenario of extinction and conservation rates of biodiversity is also a serious concern (Costello et al. 2013).

In the last decade, DNA sequences are being extensively used in the biological process analysis such as phylogenetic analysis of organism identification. Among various approaches, the DNA barcoding was proposed to overcome the problems faced in the traditional taxonomy (Hebert et al. 2003a, b). This approach has succeeded in the identification of already existing as well as unknown species. In this technique, a standard region of DNA known as “DNA barcode” is used for the biodiversity analyses. Different regions of DNA are used as markers for DNA barcoding. Two main characteristics of a good marker are its universality and high resolution (Hollingsworth et al. 2011). The universality of any region refers to the applicability of the chosen DNA barcode to a large number of organisms. High-resolution ability implies that the markers must discriminate the closely related species. For efficient discriminatory power, a marker must show high interspecific and low intraspecific divergence. This distinction between inter- and intraspecific distances is known as the “DNA barcoding gap.” The DNA barcoding is a widely used technique for quick and accurate identification of species (Bhargava and Sharma 2013). COI (cytochrome oxidase I) is the universal barcode marker in animals (Hebert et al. 2003a). However for plants, it has remained elusive (Li et al. 2015; Kress and Erickson 2008). COI has shown a good success rate in animals but in plants, due to limited divergence, it cannot be used. There has been much debate about the regions to be used as barcodes for plants (Hollingsworth et al. 2011). Presently, DNA barcodes work on the standard gene of any locus as well as in whole chloroplast genome in plants. Although different kinds of genome-based strategies are developed for the identification, DNA barcoding is the most powerful tool. Approximately 300,000 plant species are available worldwide. The identification and classification of such a vast range of plants may be a difficult task for taxonomists. DNA barcoding helps in a rapid and accurate identification of plant species (Costion et al. 2011). DNA-based methods are more suitable as compared to proteins and RNAs because DNA is available in all the tissues of the organisms is more stable, and remains unaffected by external factors. The species discrimination in plants is difficult because of a higher level of gene tree paraphyly (Fazekas et al. 2008). matK, rbcL, trnH-psbA, ITS, trnL-F, 5S-rRNA, and 18S-rRNA are majorly used markers for plants with regard to their discrimination capacity (Table 4.1). Cowan and Fay (2012) have described the major challenges associated with DNA barcoding of plants. However, the studies on plant barcoding are increasing consistently due to its capability of identifying the unknown samples. A general concept of the formation of DNA barcode has been explained in Fig. 4.1.

Table 4.1 Major candidate regions used for DNA barcoding of plants
Fig. 4.1
figure 1

Schematic representation of the formation of DNA barcodes

4.2 History of DNA Barcoding and Success Stories

Two international initiatives are operating for the DNA barcoding analyses, viz., the International Barcode of Life project (iBOL) and Consortium for the Barcode of Life (CBOL). iBOL is the biggest biodiversity genomics initiative ever undertaken. Its mission is to maintain and update the barcode reference library, Barcode of Life Data systems (BOLD), and further establishment as a robust resource for animal, plant, and fungal DNA barcodes (Ratnasingham and Hebert 2007). The work of the iBOL association is carried out by its constituent nodes, comprised of many counties grouped into separate operating teams. On the other hand, CBOL established in 2004 is functioning for DNA barcoding as a world methodology for the identification of plants and animals of earth’s biodiversity. As far as the application of the technique is concerned, recently its usefulness has been explored in forensic botany to resolve the legal questions. Plant identifications at crime scenes are important in the criminal investigations. Every environment has a unique combination of pollens, suggesting the type of place where the crime took place. To overcome the problem related to ancient forensic botany, DNA barcoding could be a promising technique in several cases (Ward et al. 2005, 2009; Tsai et al. 2006, 2008; Ferri et al. 2009).

4.3 Status of DNA Barcoding of Medicinal Plants

Adulteration is a major problem in the herbal plant material market. Therefore, authentication and standardization is the prerequisite to minimize the unfair trade. According to the World Health Organization (WHO) total international seasoning, the drug market is calculated as US$62 billion and is anticipated to grow to the extent of US$5 trillion by the year 2050. The total available barcodes represent 363,584 sequences from 50,039 species. The criteria of DNA barcoding, i.e., minimum sequence length of 500 bp and more than three organisms per single species have been convincing by 13,761 species (Sarwat and Yamdagni 2015). However, most of these (98 %) are animal species. In January 2009, iBOL started with the target to collect barcodes for 5 million species in first 5 years. The scientists from 25 countries have contributed to this initiative (Hajibabaei et al. 2006). The DNA barcoding project has the goal of the reference library development that might provide data even for very low taxonomic level with short and specific DNA fragments. The major efforts are underway for barcoding of medicinal and aromatic plants worldwide (Cowan and Fay 2012; Elliott and Jonathan 2014). However, very little work has been reported for barcoding of Indian medicinal plants. India carries 7–8 % of world biodiversity with excessive resources of medicinal plants (45,500 approx.). Out of those, 8,000 plant species are of medicative worth, and 960 species are considered in a trade. Out of that, 178 species have a yearly consumption of more than 100 metric tons (Aneesh et al. 2009; Efferth and Greten 2012). The demand of the medicinal plants at industrial level is higher due to its global growth within the herbal industries. Thus, the Indian market is a center of herbs with the calculable trade of US$140 million annually. The botanical and natural ingredients export worldwide was more or less US$33 billion throughout 2010, and it was expected to reach US$93 billion by 2015 according to the December 2011 bulletin of Market News Service. The export of Indian medicinal plants and their products is estimated to be about $0.2 billion. In addition to the international trade, there is a considerable volume of international exchange of medicinal plants in India with a turnover of $1.6–$1.8 billion (Mishra et al. 2015). Total world seasoning herbal market is of the scale of $60 billion yearly with India’s contribution of 2.5 %. Thus, in spite of having an extensive heritage of Ayurvedic literature and a good variety of medicinal plant species, India is still struggling with the potential market demand (Mishra et al. 2015). For increasing the India’s share in the global herbal market, the improvement in quality control, standardization, scientific ways of production, and analysis of business products is necessary. The standardized mass produce of herbal products tested scientifically would not only maintain the efficacy of the herbals but also offers a competing edge to other medicines. China is presently leading the efforts on DNA barcoding of medicinal plants and has developed the database of DNA barcodes (Lou et al. 2010). Some reports are also available for DNA barcoding of Indian medicinal plants (Parveen et al. 2012; Ghosh et al. 2013). Due to importance and demand of herbal raw materials and products, the herbal industry suffers from substitution and adulteration of medicinal plants with its closely related species. Adulteration and mixing cause major changes in formulation and are also considered as illegal practices. The efficacy of any drugs/herbal product decreases when the herb is adulterated, and sometimes it could be lethal if it is substituted with toxic adulterants. The correct formulation is important for the medicinal herb to be effective. The main source of income of herbalist is the trading of medicinal plants. The economic constraints might offer an incentive for herbalists to substitute rare ingredients with cheaper and a lot of pronto offered species. Due to the illegal overtrading of medicinal plants, many plant species have become endangered in India. Therefore, to avoid these practices, some identification tags are required to detect plant materials. DNA barcoding is a useful tool for the discrimination of raw materials of medicinal plants.

4.4 Different Approaches of DNA Barcoding in Plants

Due to the complexity of DNA barcoding in plants, may it be amplification, sequencing, or a significant “barcoding gap,” the technique has demanded ample attention toward the improvement of the methodology of the identification process. Thus, approaches like combining multiple barcodes at the totally different taxonomic group or multiple combinations of barcodes in a tiered fashion, such as a particular combination of one taxonomic group followed by a more robust combination at the next level, have recently gained importance. These approaches are mentioned below to introduce to the readers with the current methods in DNA barcoding in plants.

4.4.1 Single-Locus Approach

Due to the differences in the efficiency of barcoding markers in discriminating plants of different families, individual markers have been comparatively evaluated in a number of families (Gao et al. 2010a, b; Hollingsworth et al. 2009; Li et al. 2012; Muellner et al. 2011; Pettengill and Neel 2010). matK is the nearest plant analogue to COI, the animal DNA barcode. It typically provides high resolution, leading to good species identification as a result of its speedily evolving coding fragment among the plastid genome (Lahaye et al. 2008). However, the disadvantage of this barcode marker is due to unavailability of universal primer sets for all taxa. It creates a problem in PCR resulting in low PCR amplification particularly in non-angiosperms (Kress and Ericsson 2007; CBOL Plant Working Group 2009). As compared to matK, the barcode marker rbcL (ribulose-1,5-bisphosphate carboxylase/oxygenase large subunit) is easy to amplify and sequence. It is an important candidate for plant DNA barcoding even though its discriminatory power is not as good as matK. matK and rbcL have been suggested to be the core DNA barcodes for plants (Hollingsworth et al. 2009). Other than these, the plastid intergenic spacer trnH-psbA is also used as a supplementary DNA barcode. It has higher species discrimination success and variable intergenic spacers in plants (CBOL Plant Working Group 2009; Liu et al. 2012a, b). The main focus related to this locus includes the high frequency of mononucleotide repeats that cause simplex reads (Devey et al. 2009) and thus hamper the recovery of bidirectional sequences. Another common event encountered during this region is the microinversion. Microinversions in trnH-psbA have been studied in various angiosperms (Whitlock et al. 2010). Thus, the advanced design of trnH-psbA makes it tough to use as a barcode (Storchova and Olson 2007; Hao et al. 2010). However, the additional characters for species discrimination are provided by uncorrected microinversions (Jeanson et al. 2011). Although demonstrated a positive impact on species, discrimination by manually correcting inverted sequences has been demonstrated (Whitlock et al. 2010). A comprehensive analysis of the utility of trnH-psbA and its mixture has been studied (Pang et al. 2012). Many researchers have been concerned with the utilization of nuclear internal transcribed spacer (ITS) region in the form of the standard barcode and have recommended one of the core barcodes for seed plants (Li et al. 2015). Subsequent to this study, part of this region, ITS2 was also suggested to be a novel barcode for both plants and animals (China Plant BOL Group 2011; Yao et al. 2010; Chen et al. 2010). On the basis of its performance in phylogenetic studies (Baldwin 1992), resolving power of ITS was not underestimated; however, the following three major problems were encountered in its use.

4.4.1.1 Sequencing

One of the main limitations for nuclear ITS is its recovery, since the amplification and sequencing are difficult for this region (Kress et al. 2005). An alternative to it is the use of ITS2 that is less complicated to work with due to the small length of target fragment which makes amplification and sequencing easier.

4.4.1.2 Paralogous Gene Copies

Nuclear ITS fragment is available in multiple copies in the cells. Concerted evolution of multiple copies leads to divergence co-occurring in the individuals (Alvarez and Wendel 2003; Bailey et al. 2003). This can lead to rendering the sequences unreadable resulting in messy sequences. This paralogy phenomenon may lead to misidentification of samples because depending upon the variant sequence, the species will be identified subsequently. However, the identification of region, compared to other markers, is not compromise due to the availability of paralogous copies (Hollingsworth et al. 2011).

4.4.1.3 Fungal Contamination

The fungal ITS regions represent similarity with their plant correspondent. The primers considered for amplification as well as sequencing are very similar. Thus, the fungal DNAs are amplified by accident in several cases, particularly for the plants containing fungal endophytes. This can lead to misidentification of samples. Therefore, no matter the quantity of primer sets obtainable for this explicit barcode region, amplification and sequencing have been hard for numerous samples (Gonzalez et al. 2009).

4.4.2 Alternate Regions as DNA Barcodes

Besides the core plastid markers (matK, rbcL), the supplementary trnH-psbA, and ITS regions, there are other plastid protein-coding genes (rpoB, rpoC1), plastid intergenic spacers (atpF-H, psbK-I), and low-copy number genes being tested for identification in several families (Pillon et al. 2013).

4.4.3 Multilocus Approach

The multilocus approach is an adequate method for DNA barcoding of plants with good discrimination success (Kress and Erickson 2007; Fazekas et al. 2008; Newmaster et al. 2008; CBOL Plant Working Group 2009). The practice of using multiple barcodes has emerged in view of the unsatisfactory performance by individual loci. The high discrimination-related results could be obtained through combining the universality, discriminatory power, and amplification success of each locus. Multilocus combinations also promote high clade support values in monophyly-based identification of samples as in the case of Nyssaceae (Wang et al. 2012). Any plant barcode can be a combination of two or more locus. One of them may be a conservative coding region like rbcL and the other, a rapidly evolving noncoding region. The noncoding trnL intron and trnL-F intergenic spacer (IGS) are recommended for situations involving extremely degraded tissue (Taberlet et al. 2007). In bryophytes, the power of this region has been tested (Quandt et al. 2004; Stech et al. 2013). Thus, trnL-F and trnL regions were further used successfully for distinguishing the mysterious aquatic fern gametophyte (Li et al. 2009). Even a project on two-locus DNA barcode for plants (matK + rbcL) has been proposed by the CBOL Plant Working Group (2009). In some cases, the combination of three loci has failed to improve the discrimination better than two-locus barcodes in few cases (Wang et al. 2012). To avoid expenses of using a three loci combination for large data sets, the two-locus barcode was accepted as the standard barcode for land plants (CBOL 2009). In the case of two-locus, the preserved coding loci align well with the taxa of a community sample to determine deep phylogenetic branches. The hypervariable region of the DNA barcodes can align with ease in the subclades of closely related species (Kress et al. 2009). The complementation of rbcL gene and the noncoding trnH-psbA spacer region has been demonstrated (Fazekas et al. 2008). In contrast to CBOL, they suggested the use of more than two regions because of the decreased discrimination identified in barcoding analyses with three or more regions. This concept would also be beneficial when some of the loci recovered are of bad quality. Another efficient work using rbcL in combination with trnL-F for ferns has also been demonstrated and shown great potential for species discrimination (de Groot et al. 2011). The composition vector (CV) approach (Qi et al. 2004) has been described as an efficient method for analyzing rRNA data sets. The changed CV methodology incorporates an adjustable weighted algorithmic program for the vector distance as per the magnitude relation of sequence length found between a pair of taxa within the candidate genes. Recently, changed CV approach has been applied for studying huge multigene datasets for plant DNA barcoding (Li et al. 2012).

4.4.4 Tiered Approach

Combining barcode markers for discriminating species is robust and has high support values. A newer approach for combining the barcode, known as the tiered approach, is also evolving. Although vigorous efforts are going on to find suitable universal loci for plants, and there may be one in future, but relying on a single locus for plants will still be a bad choice. This is because of hybridization and introgression observed in some group of plants. Therefore, rather depending on maternally inherited genes, using a combination of both coding and noncoding regions in a stepwise manner will be the favorable and logical approach. It permits an unknown sample to be allotted at a taxon level, where a successful pair of primers can be targeted. Among a small group of taxa, the samples are aligned first followed by final assignment. In a specific taxonomic group, only a few studies have tested this methodology (Newmaster et al. 2006; Xiang et al. 2011). The first tier coding region, common in plants, has been used for differentiation at a definite taxonomic level, followed by a lot of variable second tier coding or noncoding region at the species level. Alignments at first tier (coding regions) would decrease the problem of aligning more divergent genera using noncoding regions at the second tier. So, under a common first tier sequence, the dataset will be properly organized to perform well at the next level of resolution. The method also preserves the efficiency of the previous multilocus approach since the complement regions for a group can still be used in this method. The rbcL has been considered as a primary tier barcode (Newmaster et al. 2006). Although it is the most identified plastid coding region in GenBank, covering a majority of groups and thus can work as a platform for comparison of different plastid genes. rbcL was analyzed to see how well it resolves congeneric species. This marker might be used for resolving congeneric species (85 % cases), so it should be used as the core first tier locus, followed by a choice of a secondary locus at the second tier. The method, therefore, provides flexibility in the choice of the next locus after a standard common region is used at a particular level. Similarly, this approach has been supported by Xiang et al. (2011) and also recommended that the use of matK at the generic level with further resolution at the second tier needs to be explored with a suitable second tier locus.

4.5 Bioinformatics Approaches

Bioinformatics play an important role in DNA barcoding analyses. The DNA barcoding processes depend on the availability of information in the form of data. If the data are available for query, we can use bioinformatics tools for the analysis of barcode data. After the collection of corresponding query sequences, sequence analysis and phylogenetic construction are performed. Sequence analysis basically involves the query and reference dataset sequence alignments. Some of the MSA programs ClustalW, T-Coffee, and MUSCLE, etc. are used for sequence analysis. In silico innovation approaches for DNA barcoding have been developed on the basis of compensatory base changes (CBCs) (Wolf et al. 2005), operational taxonomic units (OTUs) (Slabbinck et al. 2008), DNA metabarcoding (Riaz et al. 2011), locus-specific tools (Liu et al. 2011), tool for representing barcode symbology (Liu et al. 2012a, b), neural network techniques (Zhang et al. 2008), machine learning (Zhang et al. 2012a, b), data mining (Bertolazzi et al. 2009), composition vector (Kuksa and Pavlovic 2009), etc. The available software and tools analyzing the barcode data are given below (Table 4.2).

Table 4.2 Different software and tools used for DNA barcoding

4.6 Limitations

4.6.1 The Absence of Universal Barcode and Selection of Appropriate Barcode Region

In DNA barcoding, the universality of the barcode is still a big problem. It is difficult to attain the universality of barcode due to the insufficient information of genetic variation in the less-studied taxonomic group. This problem is majorly found in plants as compared to animals. The differentiation and identification of species relying on interspecific variation among DNA sequences are due to the resolution capability of a barcode. Thus, there is a challenge in defining a good quality barcode consisting of a small and variable DNA sequence flanked by conserved regions (Hebert et al. 2003b; Moritz and Cicero 2004; Rubinoff et al. 2006a; Ficetola et al. 2010). The most important task of DNA barcoding is the identification of universal primers amplifying fragments with high resolution. However, it has been argued that a single short fragment will be sufficient to discriminate the organism at species level identification (Ficetola et al. 2010; Rubinoff et al. 2006a; Moritz and Cicero 2004). The single-locus DNA barcodes lack adequate variation in the closely connected taxonomic group, so for the identification of plants, no loci are available (Li et al. 2015).

4.6.2 Error Found in DNA Barcoding when Mitochondrial Sequences Are Used

DNA barcoding faced limitation due to the presence of the same copy of a gene of interest in the mitochondrial genome because of heteroplasmy in mtDNA, bacterial infection biasing, nuclear integration, and introgression in mtDNA. The duplication of a gene, i.e., if a portion of cytochrome oxidase I (COI) are duplicated in a given species, typical PCR may amplify these fragments. Thus, it will not be clear whether the paralogous copy had diverged from duplication of COI (Campbell and Barker 1999; Song et al. 2008). The heteroplasmy is the combination of more than one type of mitochondrial genome in a species. The overestimation of the quantity of distinctive species in barcoding results due to occurrence of co-amplification in divergent heteroplasmic copies of mtDNA (Rubinoff et al. 2006b; Song et al. 2008; Fišer Pečnikar and Buzan 2014; Magnacca and Brown 2010; Moulton et al. 2010; Valentini et al. 2009; Acs et al. 2010; Hurst and Jiggins 2005). The bacterial infection found in mtDNA due to the maternally inherited symbionts can cause linkage disequilibrium, and each individual becomes infected with such symbionts. These symbionts among closely connected species break the species barrier by conjugation followed by selective sweep leading to the identical mtDNA sequences in different species (Song et al. 2008; Whitworth et al. 2007). The nuclear integration of mtDNA creates error for barcoding. Nuclear mitochondrial pseudogenes (numts) are a nonfunctional duplication of mtDNA in the nucleus and occur in the major clades of eukaryotes. The presence of numts in the nuclear region creates a problem in DNA barcode data library construction and species identification. The potential existence of COI numts causes a major problem to DNA barcoding (Bensasson et al. 2001; Richly and Leister 2004; Song et al. 2008; Zhang and Hewitt 1996). The introgression in mtDNA also creates a problem for barcoding. Introgression is the process of transfer of a gene from one species into the gene pool of other species through recurrent backcrossing of an interspecific hybrid with one among its parents. It causes confusion in species boundaries between evolutionary lineages (phylogenies) that might commonly be divergent (Rubinoff 2006). In meta-analysis of phylogenetic studies, it was found that over 20 % of the studies lineages present problem due to mtDNA introgression (Ballard and Whitlock 2004; Fišer Pečnikar and Buzan 2014; Rubinoff 2006; Valentini et al. 2009; Vences et al. 2005; Acs et al. 2010; Hurst and Jiggins 2005; Machado and Hey 2003). There are limitations of using mtDNA in infer species boundaries with the retention of ancestral polymorphism, male-biased gene flow, and selection on any mtDNA nucleotide (the whole genome is one linkage group). The introgression along with hybridization and paralogy results in the transfer of mtDNA gene copies to the nucleus (Hebert et al. 2004; Ballard and Whitlock, 2004; Bensasson et al. 2001). These factors in mtDNA create a problem for both animal and plant DNA barcoding.

4.6.3 Lack of Comprehensive Reference Database

DNA barcoding is affected due to incomplete a priori identification of specimen in the reference database. The conflictions are created in data assessment; different laboratories work on the same taxa and explain different nomenclatures of the same species through morphological identification (Becker et al. 2011; Collins and Cruickshank 2013). If the reference database is not comprehensive, it will create misidentification of the taxa (Meyer and Paulay 2005; Valentini et al. 2009). DNA barcoding faces limitations when the selected individual represents to every taxon within the reference database. The unknown specimen taken from undescribed biodiversity causes problems in the identification (Fišer Pečnikar and Buzan 2014; Rubinoff 2006). The reference sequences from taxonomically verified specimen lead to the validity of DNA barcoding. In the absence of the reference data, DNA barcoding will face limitations and challenges (Ajmal et al. 2014). DNA barcoding will also face difficulty when the query sequence lacks its target in the reference database. Therefore, the barcoding-based identification of the query at the species level fails (Nielsen and Matz 2006; Virgilio et al. 2010). The reference sequences are verified from voucher specimen that is documented by experienced taxonomists. Due to lack of reference database, there will be no authentic library for recently identified query sequences. As a result, there will be a large quantity of legacy data in the form of sequences that are available in GenBank. These will not be used as a barcode. Thus, DNA barcoding does not improve the speed of cataloging the life on earth (Taylor and Harris 2012; Peterson et al. 2007).

4.6.4 Lack of Statistical Solution

DNA barcoding is a useful tool for the identification of unknown species. For this methodology, the threshold values providing a distinction between intraspecific and interspecific variation values are required. If the unknown sequence differs from the nearest reference sequence by a variation above the threshold, the organism containing the sequences will belong to a specific species, suggesting its classification needs additional investigation. The wide range of overlap between intra- and interspecific divergence values creates major problems. These overlaps seem comparatively restricted and far from the respective average values. Thus, only the mean values for intra- and interspecific comparisons of closely connected sibling species are required (Desalle 2006; Hebert et al. 2003a; Prendini 2005; Rubinoff 2006; Taylor and Harris 2012; Valentini et al. 2009; Vences et al. 2005; Casiraghi et al. 2010; Frézal and Leblois 2008). The use of a different threshold considering the tenfold rule (gap corresponds to a generic ten times the value of intraspecific divergence) has been proposed. This law has been extensively criticized (Meyer and Paulay 2005; Moritz and Cicero 2004; Matz and Nielsen 2005; Nielsen and Matz 2006; Valentini et al. 2009). To overcome this difficulty, currently it is predicted that the interspecific sequence divergence should increase to the threshold of 2 or 3 % dissimilarity. This threshold has been set on the basis of experimental proof observation of sequence variations among congeneric species (Hebert et al. 2003b). This approach might be simple to neglect the inconclusive or inaccurate results. Thus, there is a requirement of for statistical strategies when a sampled query sequence is the same as the specific database sequence to proof a species assignment of the query (Nielsen and Matz 2006). The strong assumptions based on the population genetics of the analyzed species revealed the statistical uncertainty in DNA barcoding (Nielsen and Matz 2006). The unrealistic assumption of excellent sequence identity at intraspecies level is abandoned. Thus, with not creating population genetic assumptions, the DNA barcoding is not possible (Acs et al. 2010; Hickerson et al. 2006). It has been observed that with robust population subdivision within species, the species assignment might fail due to the underlying demographics that have not been modeled capably. Another case is sequence sampled from a sub-population with no gene flow with any of the population listed in the database. The DNA barcoding statistical methods which are used here do not categorize the query sequence as a member of the parental species, even though taxonomists would identify it as belonging to it. So, DNA barcoding might fail as a result of the unrecognition of taxonomical units corresponding to a population that is reproductively isolated and additionally if centered on a range of nucleotide changes (K) as a statistics within the hypothesis-based mostly approach. Nielsen and Matz (2006) have urged that a procedure that examines a number of nucleotide changes only between a query sequence and its best match within the information match in the database is not optimal. During this series for DNA barcoding, two ways, K-test and Bayesian check, are developed from a perspective of applied mathematics genetics. The performance of K-test faces drawback if some species were missing from the information and such behavior might lead to incorrect assignment of queries derived from these “unrecorded species.” On the other hand, Bayesian theorem is used as an advantage upon the K- test in terms of accuracy and ability to face the negligence with more than one sequence per species in the database. However, this methodology is also with difficulties of significant phylogenetic assumptions and species level assumptions which are not always correct. Therefore, the convenience of a full Bayesian theorem might not eliminate the necessity for illation procedures with controlled frequentist (hypothesis-based) properties (Nielsen and Matz 2006). Still, DNA barcoding faces the problem to check the clear hypothesis meaning alternative of inappropriate or suboptimal analytical technique because of confusion on the objectives of the study.

4.6.5 Limitation of Distance-Based and Tree-Building Method Used in DNA Barcoding

In some reports, it is noted that DNA barcoding fails in the form of taxonomic approach because it does not recover correct species tree (Hebert and Gregory 2005; Will and Rubinoff 2004; Rubinoff et al. 2006a). Some criticism has arisen due to distance-based and character-based methods. Some reports have mentioned that the distance-based method should not be used for DNA barcoding, as it is a phenetic measure and is not appropriate for species identification (Casiraghi et al. 2010; Desalle 2006; Meyer and Paulay 2005). In the distance-based method, NJ tree acts as a standard part of the procedure for DNA barcoding (Casiraghi et al. 2010; Collins and Cruickshank 2013). But, there is a good documentation about the poor performance of NJ trees on the basis of trial and error and theoretical (Collins and Cruickshank 2013; Little 2011; Meier et al. 2006; Virgilio et al. 2010; Zhang et al. 2012b). The inappropriate use of NJ trees for identification can decrease the effectiveness of DNA barcoding. This will ensue either mtDNA paraphyly or misidentification of species independently. The NJ trees do not seem to be resolved by exploitation the other tree inference ways (Desalle 2006; Desalle 2007; DeSalle et al. 2005; Lowenstein et al. 2009; Rubinoff et al. 2006a; Taylor and Harris 2012; Austerlitz et al. 2009; Collins and Cruickshank 2013; Kerr et al. 2009; Little 2011; Little and Stevenson 2007; Lowenstein et al. 2009; Srivathsan and Meier 2012; Virgilio et al. 2010; Zhang et al. 2012b; Collins et al. 2012; Collins and Cruickshank 2013; Will et al. 2005). On the other hand, the character-based methodology are used to identify the nucleotide combinations (Collins and Cruickshank 2013). The character-based methodology have failed to break into the most stream of DNA barcoding (Savolainen et al. 2005). However, currently, DNA barcoding via tree-based approach did not stop at distance vs. character based approach. Avoiding any tree-building analysis due to its impression of inferring phylogenies and relationships with single-gene tree is well known as a problem of phylogenetics (DeSalle et al. 2005; Taylor and Harris 2012). Generally, the phylogenetic technique has been proposed as a data analysis in order to overcome the limitations of the threshold-based methodology in DNA barcoding. However, the application of these threshold-based approaches leads to some problem in a study on the relationship between DNA barcoding and molecular phylogeny. DNA barcoding is not a phylogenetic reconstruction. Still, these methods are being used along with the debate in phylogeny and identification in the area of DNA barcoding (Casiraghi et al. 2010; Moritz and Cicero 2004; Vogler and Monaghan 2007). The bootstrap resampling can further decrease the already low identification success rates associated with NJ trees (Brown et al. 2012; Collins and Cruickshank 2013; Fujita et al. 2012; Meyer and Paulay 2005; Monaghan et al. 2009; Puillandre et al. 2012; Virgilio et al. 2012; Zhang et al. 2012a). The use of bootstrap resampling in DNA barcoding studies creates confusion between species discovery and specimen identification. Bootstrapping in this situation also helps in addressing the problem with NJ trees, such as taxon-order bias and tied trees (Lowenstein et al. 2009; Meier et al. 2008). Use of bootstrap value as a cutoff for correct identification severely compromises the efficacy of a reference library and exacerbates the previously outlined weaknesses of using tree-based methods in general (Collins et al. 2012; Collins and Cruickshank 2013; Zhang et al. 2012b).

4.6.6 Limitation in Available Bioinformatics Tools and Algorithm

The biases occurred in methods used for the original cohort of DNA barcoding are being replicated by various studies and assisted by the analytical tools obtainable from the BOLD. A character-based tool, i.e., BLOG, has been made along with BOLD. But, presently it is available only on the Barcode of Life Data Portal (BDP) instead of various BOLD websites. The current popular methods could be a product of routine instead of wise selection. This means a systematic appraisal of taxa has not been capitalized by the barcoding movement. For DNA barcoding, easy-to-handle tools are required for species discrimination and identification. These tools use pairwise global alignment or alignment-free and automated selection of data partitions of an alignable group of samples (CBOL Plant Working Group 2009; Chu et al. 2009; Kress et al. 2009; Kuksa and Pavlovic 2009; Hollingsworth et al. 2011). Microinversions are common in noncoding regions leading to multiple groupings of samples.

4.6.7 Absence of Effective Bioinformatics Pipeline for DNA Barcoding

DNA barcoding is being used to recognize and identify the unknown species. Thus, despite its present limitations, the barcoding method provides a pipeline for the survey of biodiversity, a crucial task for prioritizing conservation efforts, given the present extinction crisis (Taylor and Harris 2012; Valentini et al. 2009). As mentioned earlier, there is a huge amount of sequence information stored in GenBank for which there are no voucher specimens, excluding this sequence from use as a barcode. During the DNA establishment of a barcoding reference library, there will be different unsampled taxa varying the depth of sample coverage for some markers. So, it is necessary to develop the bioinformatics framework having the access to select the sets of samples, directly comparable for a given set of markers. The integration of analytical approaches into a single easy-to-use workflow is required to provide comparable bioinformatics support for multi-marker barcoding in animals and plants (Hollingsworth et al. 2011; Bhargava and Sharma 2013).

4.7 Successful Uses of DNA Barcoding in Medicinal Plants

In many studies from 2003 to 2016, the results of DNA barcoding can provide accurate identification of many medicinal plant materials that are not morphologically distinguishable. DNA barcoding has found its applications in several areas like forensic science, biosecurity (Armstrong and Ball 2005), tracing of illegal trading of organisms (Galimberti et al. 2014), and pharmaceutical and herbal industries, among others (Gantait et al., 2014). When there is an insufficient morphological or anatomical data for the identification of a sample, a stretch of DNA sequence might be helpful in identifying a species. Samples with multiple fragments can now provide multiple species identification, giving a clear picture of habitat that offers a critical clue to the investigators (Ferri et al. 2015). DNA barcoding works with different identification fields and gives more accurate results of medicinal plant identification, i.e., DNA barcode identification with chemical analysis (Palhares et al. 2015) and next-generation sequencing (Shokralla et al. 2014). The plant materials are frequently encountered in criminal investigations but often overlooked as potential evidence. A forensic investigation that seeks to match evidence to a particular plant would require an updated database of samples. This requires the collection and genotyping of many samples from or near the crime scene. The law enforcement officers and attorneys are not very much familiar with the science of botany. So, the important plant-based evidence is often overlooked. Development of a robust DNA barcode database with highly authenticated sequence information will greatly contribute to the future of forensic botany (Ferri et al. 2009). Hallucinogenic compounds are pharmacological agents banned in most of the states or countries. They cause changes in perception, thought, emotion, and consciousness. Such kind of plants producing hallucinogens has been detected using DNA barcoding technique in some of the forensic studies (Murphy and Bola 2013; Ogata et al. 2013). Various DNA barcodes available in medicinal plants till date are listed below (Table 4.3).

Table 4.3 List of available DNA barcodes of medicinal plants

4.8 Conclusions

Over the past 12 years, DNA barcoding has been attracting a lot of interest all over the world. Researchers working in this field are busy in finding a more superior and desirable universal DNA barcode for an efficient conservation of the biodiversity. Since a major problem of barcoding lies in the case of plants, the research carried out so far in this area has been reviewed including the futuristic approaches. In the present chapter, various candidate markers used in plants and a number of barcoding reports have been summarized. Although the CBOL proposed seven candidate barcodes belonging to the plastid region, the proposed supplementary loci, i.e., nuclear-transcribed spacer regions ITS1 and ITS2, have a number of GenBank submissions of their respective sequences owing to its easy amplification due to high copy number. rbcL and matK (both plastid genes) come next followed by 18S rRNA (nuclear structural RNA), trnL-F (intron + IGS), and trnH-psbA (IGS), respectively. Since higher substitution rates are observed in plant nuclear genes than plastid genes, ITS is more in use and also acts as a supplementary marker. But once the choice of the locus is made, the approach of single-locus, multilocus, or tiered needs consideration. Based on the literature review, it can be inferred that multilocus and tiered approaches resulted in higher success rates than the single-locus approach if proper combinations of loci and selection of loci for each tier are done carefully.