Introduction

Identification of the species from which biological samples are derived is often necessary in forensic science. Examples include poaching and illegal trade in endangered species, animal cruelty, and the detection of species misrepresentation and forgery in the food industry [1]. Species identification is also crucial for determining the presence or absence of criminal activity when outdoor bloodstains with an unknown reason, a piece of tissue from a suspected hit-and-run vehicle, or incomplete bones whose morphology cannot be identified as human remains are discovered. Species identification methods using molecular biological techniques target the barcode region of mitochondrial DNA, which has a high mutation rate and is employed for evolutionary phylogenetic analysis [2], and the forensic scientist in charge must decide which of the several methods to use. They can be divided into two major categories: methods based on PCR using species-specific primers [3, 4] and methods based on sequencing the barcode region using universal primers [5, 6]. The former is user-friendly and can identify samples with two or more mixed target species, but it can only detect the target species of the primers in the reaction system. By using DNA sequence databases such as the Basic Local Alignment Search Tool (BLAST), the latter can be specified from a large number of registered species, but it is challenging to incorporate sequencing into routine work and is not suitable for complex mixed samples. If the species from which an unknown sample is derived is predictable, it can be confirmed by multiplex PCR containing specific primers for the predicted species, but the biological material encountered in forensic sciences is so diverse that it is challenging to predict species. In order to identify species quickly and readily in a criminal investigation, we simplified sequencing with a technical trick and developed a practical workflow.

Materials and methods

DNA sample

This study made use of control genomic DNA from animals including cattle, chicken, dog, pig, rabbit, and rat that was purchased from BioChain Institute Incorporated. As human control DNA, DNA derived from one male and one female was used from Standard Reference Material 2372a Human DNA Quantitation Standard purchased from ATCC (American Type Culture Collection). DNA was also extracted from the blood or tissue of a chimpanzee, a Japanese macaque, a gorilla, an orangutan, a cat, a crocodile, a goose, a frog, a puffer fish, a tuna, a shrimp, and a squid. DNA extraction was carried out on an EZ1 Advanced XL (Qiagen, Hiden, Germany) by using the EZ1 DNA Investigator kit (Qiagen), and the extracted DNA was measured for DNA concentration using a NanoDrop-1000 spectrophotometer (Thermo Fisher Scientific, Wilmington, DE, USA). Each DNA was adjusted to approximately 1 ng/µl.

Real-time PCR

The DNA barcode regions used in forensic and systematic studies are the 12 and 16 S ribosomal RNA (rRNA) genes, cytochrome b (cyt b), and cytochrome c oxidase subunit 1 (COI), which are all located within the mitochondrial genome. Four primer sets reported as vertebrate-universal primers in each locus were selected for this workflow [5, 7,8,9] (Table 1).

Table 1 Universal primers used in this study

Real-time PCR was carried out in 20 µL reactions containing 10 µL of 2× TB Green Premix Ex Taq™ (RR420) (TaKaRa Bio Inc., Shiga, Japan), 0.4 µL of 50× ROX Reference Dye II, 0.4 µM each of forward and reverse universal primers, and 2 µL of DNA (singleplex per mtDNA locus). The reaction was conducted using a QuantStudio™ Design & Analysis software v1.5.1 on a QuantStudio 5 (Thermo Fisher Scientific Inc., Waltham, MA, USA) using the following conditions: 95 °C for 30 s; 40 cycles of 95 °C for 3 s, 57 °C (12 and 16 S rRNA) or 50 °C (cyt b and COI) for 20 s, and 72 °C for 20 s; and melting curve analysis (95 °C for 15 s, 60 °C for 1 min and 95 °C at a rate of 0.15 °C/s).

Real-time PCR was performed in the experimental system using control DNA three times on separate occasions, and the mean and standard deviation of the melting temperatures were calculated (technical triplicate). Incidentally, since the technical triplicate of the control DNA confirmed the reproducibility of the real-time PCR, the real-time PCR for the additional samples and the case samples was performed once (no repeats).

Direct sequencing

Without using a purification process, the real-time PCR products were diluted 20-fold in TE buffer (Thermo Fisher Scientific, Inc.). Direct sequencing was conducted using 2 µl diluted real-time PCR products, 4 µl BigDye Terminator v1.1 Ready Reaction mix (Thermo Fisher Scientific, Inc.), 2 µl 5× Sequencing Buffer, and one of the forward or reverse primer (3.2 µM) in a total volume of 20 µl. On a ProFlex PCR System (Thermo Fisher Scientific, Inc.), the reaction was then carried out under the following conditions: 96 °C for 1 min and 25 cycles of 96 °C for 10 s, 50 °C for 5 s, and 60 °C for 2 min. According to the manufacturer’s instructions, the reaction product was purified using a BigDye XTerminator Purification Kit (Thermo Fisher Scientific, Inc.) and run on a 3500xL Genetic Analyzer (Thermo Fisher Scientific, Inc.) using 36-cm capillary arrays (Thermo Fisher Scientific, Inc.) with POP-4 Polymer (Thermo Fisher Scientific, Inc.). Because electrophoresis can incidentally result in low resolution, the run was performed two times.

Data analysis

Sequence data that passed quality checks in the 3500 series data collection software4 (Thermo Fisher Scientific Inc.) were employed for sequence analysis. The sequence of the target region was completed by checking and merging the sequence data obtained from both strands using the MEGA X software [10] (primer sequences were deleted). With the exception of the COI sequence, the completed sequences were homology searched against sequence data in public databases like GenBank by the BLAST [11]. The COI sequence was matched to COI sequences that were registered and publicly available by using the Barcode of Life Data (BOLD) system (http://www.boldsystems.org/) [12]. The genus level of sequence data demonstrating 99–100% identity to the input sequence was defined as the identification result.

Case study

Casework Sample 1 (CS1): Bloodstains (drip pattern) found in a large area spanning the parking lot to the roadway.

Casework Sample 2 (CS2): Bloodstains (swipe pattern) left on the floor of the front door of the house.

Casework Sample 3 (CS3): Tissue fragments found on the vehicle suspected of hitting a person.

Casework Sample 4 (CS4): Bone fragments found on a mountain trail.

Although the casework sample mentioned here proved to be nonhuman and unrelated to the case, species identification was done for confirmation. DNA was extracted from these casework samples in a manner appropriate for each forensic sample type, and human DNA was measured on the Quantifiler HP DNA Quantification Kit (Thermo Fisher Scientific, Inc.) according to the manufacturer’s instructions but was not detected. Since it was anticipated that the extracted DNA would also contain bacterial DNA of environmental origin, spectrophotometric measurements were not performed and 2 µl of DNA solution with an unknown concentration was utilized for real-time PCR. The products that reached a plateau were then diluted 20-fold in TE buffer, and direct sequencing and sequence analysis were carried out as above.

Results and discussion

We have developed a practical species identification workflow for unknown biological samples (Fig. 1). A commercially available kit is used for the quantification of human DNA if DNA taken from an unknown sample is most likely of human origin. Real-time PCR is carried out using vertebrate universal primers if human DNA cannot be detected or the sample is assumed to be of animal origin. Thereafter, the amplicon that has reached a plateau and its melting temperature is confirmed (Check 1 and 2 in Fig. 1). The real-time PCR product is diluted and directly sequenced (real-time PCR–direct sequencing), and the resulting sequence is utilized to identify the animal species by homology search against public DNA databases (DDB).

Fig. 1
figure 1

Practical species identification workflow. Check 1: Amplification reaching a plateau phase. Check 2: Melting temperatures that were different from those of humans

It is often important in forensic practice to determine whether an unknown sample is of human origin or not. Among these are bones discovered in the mountains, blood and tissues recovered from a knife suspected to have been used in a murder, or a car suspected to have run over a person. Once those samples are proven to be of non-human origin, they are excluded from evidence. However, even though the test results utilizing a commercially available human DNA quantification kit based on the qPCR method are undetectable for human DNA, it cannot be concluded that these suspected human samples are not of human origin. This is due to the fact that forensic samples are subjected to a variety of conditions: there is the possibility of DNA extraction failure and high levels of degradation cannot be ruled out. Thus, if human DNA is not detected with the quantification kit, it is unclear whether the sample is not of human origin or whether there is no detectable DNA in the sample. Amplifications reaching a plateau were obtained at two or more loci in all of the animal control DNAs used in this study, and melting temperatures were different from those of humans (Tm in Table 2). Ever higher primates that are closely related to humans, such as a chimpanzee and an orangutan, had melting temperatures that were different from those of humans, particularly in cyt b (Tm in Table 3). Thus, in suspected human samples, the test could be terminated at this step if amplification is confirmed in the real-time PCR with vertebrate universal primers and the melting temperatures differ from those of humans because the sample has been confirmed as not being of human origin (No arrow under “Investigative Necessity” in Fig. 1).

On the other hand, one can proceed to direct sequencing if the identification of the animal species provides information on the case (Yes arrow under “Investigative Necessity” in Fig. 1). For example, in the case of meat counterfeiting, violations of animal protection laws, or when hairs found at a crime scene are derived from animals related to (kept by) the suspect. If melting temperatures have been measured in the past for each target locus of the animal species assumed from the case details and investigative information, they could be used as a preliminary confirmation aid prior to sequencing. It should be noted that melting temperatures are dependent on the reagents used (reaction composition and salt concentration); therefore, the same reagents and reaction composition must be used for comparison (i.e., TB Green Premix Ex Taq must be used when referring to the Tm values described in this paper).

One of the characteristics of real-time PCR is the plateau phenomenon, which may be caused by primer depletion, and the amplified product does not significantly exceed a certain amount [13]. Therefore, the real-time PCR product that has achieved a plateau in the amplification curve can be fed into the sequencing reaction with the appropriate amount of amplified product without concentration measurement. In our preliminary experiments, we have determined the amount of real-time PCR product to be used in the sequencing reaction (1 part in 200 in the sequencing reaction solution in this study). The amount may be crucial because it must contain enough template DNA for the sequencing reaction while keeping unreacted primers and dNTPs at low levels to avoid affecting the sequencing reaction. As we expected, good sequencing results were obtained, with the exception of cases where the amplicon was not a single sequence. A possible reason for the non-single sequence could be nonspecific amplification due to nuclear mitochondrial DNA segments (NUMTs), which are mtDNA sequences that have migrated into the nuclear DNA and may co-amplify with true mtDNA [14]. Heteroplasmy with many haplotypes mixed together also makes sequencing difficult, especially in length heteroplasmy. Therefore, it is important to target the barcode regions of multiple loci for sequencing. Naturally, it is challenging to sequence samples that originally contain more than two species. Although outside the scope of this study, which aims to simplify species identification, for complex samples containing multiple unknown species (e.g., products of the Traditional Chinese medicine), techniques such as metabarcoding with next-generation sequencing are recommended [1].

More than two-locus sequences were obtained from the vertebrate-derived DNA samples used in this study, including those from mammals, amphibians, reptiles, birds, and fish, and the search results against the public database were compatible with the origin of the sample, which allowed for the identification to the genus level (Tables 2 and 3). Furthermore, although sequencing was obtained for only one locus (COI), even the mollusk squid and the arthropod shrimp were considered to be able to infer the genus (Table 3). Meanwhile, the COI amplicon in some animals could not be sequenced with the forward primer, most likely as a result of slippage at the poly-C sequence adjacent to the primer junction. Nevertheless, species identification by the BOLD system, which relied only on the sequence obtained with the reverse primer, was consistent with the origin of the samples (superscript b in Tables 2 and 3).

Table 2 Melting temperature (Tm) and sequence search results for real-time PCR products (Control DNA)
Table 3 Melting temperature (Tm) and sequence search results for real-time PCR products (Additional samples including casework)

We tested four actual samples following our workflow. All of those samples consequently exhibited melting temperatures that were different from those of humans (Tm in Table 3), allowing us to quickly complete our testing and inform the investigating agency that they were not of human origin. Determining criminality at an early stage is valuable because it reduces human labor and cost. Although it was clear that there was no criminality, by our interest, real-time PCR–direct sequencing was performed and homology searches allowed us to estimate the animal species at the genus level (Table 3). The amount of DNA that could be recovered from these forensic samples would vary significantly due to the various sample types (bloodstains, tissue, bone), degree of antiquity, and environmental exposure conditions. Despite this, we were able to easily identify the animal species by sequencing them using real-time PCR–direct sequencing without measuring DNA concentrations.

Although vertebrate universal primers were used in this study, plant universal primers and universal primers for each class, including reptiles, birds, and fish, have also been reported [1]. The identification of plant species and the species-level identification of animals may both be possible through the use of such universal primers. It should be noted that in species-level identification, hybrid animals are incorrectly designated as the maternal species, because mitochondrial DNA is maternally inherited [15]. In light of this, it is recommended that the report to the investigating agency state that the test is for mitochondrial DNA.

DNA extraction, DNA quantification, PCR amplification, confirmation of amplified products, purification of PCR products, and direct sequencing are all steps in forensic science species identification methods based on sequencing [5, 6, 16]. As long as the amplification is confirmed to have reached a plateau by real-time PCR, the real-time PCR–direct sequencing incorporated in this workflow does not require purification of the amplified product before the sequencing reaction or confirmation of the product by agarose gel electrophoresis; thus sequencing results can be obtained quickly and easily. In additional supplementary attempts, real-time PCR products from other intercalator reagents (PowerUp SYBR Green Master Mix, Thermo Fisher Scientific, Inc.) and the TaqMan reagents (TaqPath qPCR Master Mix, CG, Thermo Fisher Scientific, Inc.) were also successfully directly sequenced (data not shown). Thus, this technical trick might be applied in direct sequencing based on the Sanger method in a variety of life science fields, and workflows for species identification involving this trick could be integrated into routine forensic practice.

Conclusion

We have developed a workflow for species identification that incorporates real-time PCR–direct sequencing as a technical trick to enable species identification to be routinely performed in the forensic community. Four actual samples of different types were tested based on this workflow, which immediately indicated their nonhuman origin and subsequently easily identified the animal species of origin to genus level. Therefore, in forensic practice, this workflow is effective for routine species identification.