Keywords

Introduction

The person on the street when asked about DNA testing will generally first envision a crime laboratory rather than a hospital laboratory. Still today, most criminal perpetrators are caught and prosecuted based upon eye witnesses or confessions, although increasingly scientific evidence comes into play. Forensic DNA typing has become the queen of the forensic sciences and is looked to as the most scientifically grounded of the forensic sciences [1]. The bulk of forensic evidence links an evidential item to the crime scene, but does not identify the perpetrator. Other than videocapture, only fingerprint friction ridge analysis and DNA typing identify the perpetrator per se. Fingerprints came into widespread use for forensic purposes in the late nineteenth century and were admitted into US courts as evidence in the 1930s. The Federal Bureau of Investigation (FBI) owes its origins to the need for a centralized database of fingerprints. Today, millions of fingerprints are filed for criminal and civil purposes. Likewise, millions of convicted offenders’ DNA profiles, and in many states, arrestees’, are also databased. Thus, DNA tests not only confirm a detective’s hunch but, due to the DNA databases, also have become a powerful investigatory tool to identify otherwise unsuspected perpetrators [2].

Serology tests (blood group and type testing and then serum protein isoenzyme electrophoresis) were the forerunners to forensic DNA identity testing of biological materials. Unfortunately, these tests required specimens with a significant amount of high-quality blood serum and they did not have a very strong discriminatory power. Advantages of DNA typing over serologic methods include greater discriminatory power, species specificity, tissue independence, greater sensitivity, and less susceptibility to degradation [3].

The Bureau of Justice Statistics (BJS) first surveyed US crime laboratories in 1998, focusing exclusively on agencies that performed DNA analysis, and found that there were 120 public forensic DNA laboratories, which had a median staff of 5 and faced substantial backlogs [4]. According to the 2005 BJS Census of Publicly Funded Forensic Crime Laboratories, about half the public crime laboratories were performing DNA testing [5]. Although there is no corresponding contemporary study, it seems clear that the size and number of forensic DNA laboratories have substantially increased, with most of the 450–500 public crime laboratories now performing DNA testing. The National Institute of Justice (NIJ), beginning with a 2003 Advancing Justice Through DNA Technology initiative, has spent hundreds of millions of dollars on forensic DNA capacity building [6].

Forensic Testing and Sample Considerations

Forensic v. Clinical Specimens and Tests

Forensic tests differ from clinical tests in several respects. First, whereas clinical samples can be standardized, forensic samples vary substantially. An analyst may routinely encounter cigarette butts as evidence, but then must be prepared to face a completely new challenge for the first time, such as a partially eaten piece of food. Second, clinical samples are relatively substantial, whereas forensic laboratories routinely receive trace evidence, permitting testing only once (although routine practice is to attempt to save a portion, if possible, for potential testing by the defense). Third, unlike clinical specimens, evidentiary materials are usually neither fresh nor pristine. For example, semen samples are generally admixed with vaginal cells and microbial flora in a rape swab (Fig. 54.1), spit on a sidewalk has been exposed to the sun and rain, and blood on the floor may have been there for months. Furthermore, forensic testing is performed with an eye to court challenges. Thus, the forensic scientist uses only well-validated protocols, documents all aspects of laboratory processing, and must be ready to defend the science, the procedures, and the testing against legal attack. Chain-of-custody must always be maintained in forensic laboratories. Moreover, clinical laboratory staff are generally not familiar with the regulations, standards, and quality assurance practices of forensic laboratories. Thus, clinical laboratories, though technically capable, are normally not prepared to conduct forensic identity testing. Nevertheless, since clinical laboratories use identity testing for other applications, an understanding of forensic identity testing is useful as an introduction to the methodology and for the historical background.

Figure 54.1
figure 1

The most common DNA evidence in US crime laboratories is a vaginal swab from a rape kit. This photomicrograph is a stained vaginal smear from a rape kit. The arrows indicate spermatozoa. In addition to the DNA from the male contributor, there is DNA from the female epithelial squamous and white blood cells, as well as that of the microbial flora

Sexual Assaults (Swabs)

In the USA, rape kits have dominated the evidential submissions to forensic DNA laboratories (Fig. 54.1). Often, the demand for DNA testing on rape kits outstrips the ability of crime laboratories’ testing capacity and large backlogs may exist despite substantial NIJ grant programs to reduce them [7]. Typical rape kits include vaginal, anal, and oral evidentiary swabs, buccal reference swabs, pubic combings, and exemplars of pubic and scalp hairs. If a condom was used by a rapist and later found, it may yield semen from the male perpetrator on the inside and vaginal epithelial cells from the female victim on the outside. Vaginal swab specimens are inherently mixed samples. Most commonly the DNA of the spermatozoa is partially purified by a differential extraction procedure in which the female fraction is released using a gentle lysis medium, after which male DNA is released from the sperm using a solution containing a strong reducing agent (dithiothreitol) to break the disulfide bonds in the capsules of the spermatozoan heads [8]. Laser capture of spermatozoa from microscopic slides has also been successfully used, while immunologic affinity methods have thus far been disappointing. Y-chromosome DNA markers (described below) are an alternative method of capturing male identity information.

Other Violent Crimes (Blood, Other Evidence)

Blood and similar specimens from homicides and other violent crimes are the next-most-common evidentiary materials submitted to forensic laboratories [9]. In an early study of biological evidence at crime scenes, blood was found to be present in 60 % of murders, assaults, and batteries [10]. The DNA can come from myriad items and materials. Saliva may be deposited on beverage containers, envelope seals, gum, cigarettes, or food (Fig. 54.2). Investigators have followed suspects to obtain “abandoned” specimens, such as facial tissues, cigarette butts, gum, or drinking glass. Cords used as a murder weapon for strangulation can yield both victim and perpetrator DNA (Fig. 54.3). Shed hairs, which contain little or no nuclear DNA (nDNA), still harbor mitochondrial DNA (mtDNA), which also can be used for identification purposes. Fingernail swabbing or scrapings occasionally yield foreign DNA if a victim struggled and scratched the perpetrator, and similarly bite marks can be swabbed for DNA. Reference samples may come from toothbrushes, razors, combs, clothing, and medical specimens. One of the authors (DF) has shown DNA to be useful to identify the bombmaker of deflagrated improvised explosive devices [11].

Figure 54.2
figure 2

DNA testing identified a masked bandit when his peach strudel that was left at the scene of an armed robbery was used for DNA testing

Figure 54.3
figure 3

This vacuum cleaner cord was used as a ligature for a strangulation murder. Swabbings of the cord along its length revealed the victim’s DNA in the center and a mixture of victim and accused DNA on outer areas of the cord

Property Crimes (Touch DNA)

Increasingly, jurisdictions are performing DNA testing in property crimes, including theft, burglary, robbery, and arson, among others. The vast majority of US crimes are property crimes: 9.3 million property crimes compared to 1.3 million violent crimes in 2009 [12]. The case closure (clearance) rate for property crimes is < 20 % [13] and DNA testing for such crimes has been found to be cost-effective [14]. Furthermore, it is generally thought that some individuals progress from nonviolent to violent crimes; often from petty theft to burglary to rape, and thus interdiction of a criminal career progression may break the cycle and prevent major crimes [15, 16]. In general, DNA testing for property crimes involves “touch DNA” from handled objects. The possibility of testing such trace or “low copy number” (LCN) DNA was introduced in 1997 when Dr. van Oorschot reported that minute quantities of DNA can be recovered from fingerprints [17]. Conventional laboratory testing will successfully type DNA from approximately 100 cells (0.5–1 ng at 6.5 pg/diploid cell), although many forensic laboratories may be successful down to as few as 15–20 cells (approximately 100 pg). LCN DNA is generally defined as <100 pg, but 35 pg is often considered an analytical threshold. LCN DNA testing for property crimes was pioneered by Drs. Peter Gill and Dave Werrett at the Forensic Science Service (now disbanded) in the UK [18] and later in the USA by Dr. Mechthild Prinz and Theresa Caragine in the New York City Office of the Chief Medical Examiner’s Department of Forensic Biology [19]. Such testing involves minimizing reaction volumes and increasing polymerase chain reaction (PCR) cycling (see below). However, only a portion of the specimens yield a useful profile (perhaps 10–20 %). LCN DNA testing is problematic due to detection of contamination from prior handling, in-laboratory contamination, and inconsistent results that stem from random sampling of one or both alleles when both exist at very small levels (so-called stochastic sampling effects). These difficulties are compounded by the destructive nature of DNA testing, which may negate the possibility of retesting. For these reasons, some have suggested that LCN analysis should only be used for investigative purposes, and not as probative evidence in court. No national standards are yet in place and the FBI has generally recommended against such testing [20]. Nonetheless, LCN testing is increasingly used. The object is swabbed and resultant DNA extracts may be amplified two or three times, wherein analysts hope to obtain pure (single) profile results that are assumed to be from the last person who pulled the trigger of a gun or handled a knife. Forensic laboratories performing this testing will generally simply disregard any results other than clear single profiles.

Other Forensic DNA Testing Applications

Forensic DNA identity testing also can be used in other forensic and non-forensic contexts. For example, urine samples from drug testing may be analyzed to confirm that the sample is truly from the person who allegedly generated it [21]. DNA testing is used for disaster victim identification [22]. In cases involving nonhuman DNA (discussed below), individual, group (clade), or species may be determined, linking items such as a plant leaf or animal hair to a criminal case, or proving illegal poaching activity [23]. “Microbial forensics” has been developed for source attribution of terrorist pathogens, such as the anthrax letter attacks [24, 25]. Genetic analyses can identify the type of body fluid or tissue (e.g., urine, semen, saliva) based on the RNAs expressed. DNA testing can be used for investigatory purposes by supplying information about the perpetrator using phenotypic markers (described below), as well as through partial (“low stringency”) matches that may detect relatives who represent investigatory leads (described below). In non-criminalistic applications, the same tests used in forensic identity testing can be used for determinations of parentage and sample switch disputes [26].

Genetic Systems and Methods for DNA Typing

Genetic Variation

In everyday life, we easily recognize individuals through obvious biological variation among individuals. Positive identification or individualization is a statement of uniqueness, which is theoretically impossible to prove. However, forensic identity testing harnesses the extraordinary statistical discriminatory power of genetic variation to support a policy-based, administrative, or judicial determination of identity [27]. Indeed, forensic DNA testing often is thought of as tantamount to positive identification. Genetic variation occurs in a continuum of biological classification, from kingdom to genus, clades, and individuals. Specifically, forensic DNA identity testing is based on the detection and comparison of polymorphisms (poly—many; morphs—types) in the DNA among individuals. Statistically, there is variation at approximately one in every thousand base pairs (bp) between every two unrelated humans. However, this variation is not random; many protein-coding regions are highly conserved, as mutations in genes succumb to natural selection. Most polymorphisms occur in the noncoding DNA, which predominates in the human genome (>98 %) and is more tolerant of mutation than the protein-coding DNA regions. Differences between individuals can be due to single nucleotide polymorphisms (SNPs) or variations in length of a specific region or locus in the genome; that is, length polymorphisms. Such polymorphisms result in different forms, or alleles, of genetic markers. All individuals have two copies of each autosomal chromosome: one inherited maternally and the other paternally. Routine forensic DNA testing, using short tandem repeat (STR) typing, (described below) involves length polymorphisms in repetitive DNA from noncoding regions of the chromosomes, although it employs only a small fraction of the differences in the human genome among individuals.

Restriction Fragment Length Polymorphisms

Historical Context

In the mid-1980s, most DNA-based forensic analysis involved restriction fragment length polymorphism (RFLP) testing, first described by Dr. Edwin Southern in 1975 [28]. Such testing merely gave a binary result and was too little information for too much work. Drs. Wyman and White detailed a polymorphic RFLP marker in 1980, in which variation between human individuals was observed [29]. However, the beginning of the forensic DNA typing revolution began with the 1985 publication of a landmark article by Dr. Alec Jeffreys of Leicester, England, in which he coined the term “DNA fingerprint” and suggested the potential application of DNA fingerprinting in forensic investigations [30, 31]. His technique involved use of “minisatellites,” which was a multilocus probe RFLP system that yielded a bar code pattern that seemed to be different for every person (Fig. 54.4). Jeffreys conducted the first DNA identity tests in 1986 in a disputed immigration case and a double rape-homicide, which resulted in the 1987 exoneration of Richard Bucklin and then the 1988 conviction of Colin Pitchfork [32, 33]. In the USA, single-locus probe RFLP analysis was pioneered by Dr. Arthur Eisenberg (then at Lifecodes Corporation), that was more robust and permitted statistical evaluation (Fig. 54.5). In 1986–1987, commercial laboratories, particularly involving Dr. Edward Blake of the Serologic Research Institute, Drs. Michael Baird and Arthur Eisenberg of Lifecodes Corporation, and Dr. Robin Cotton of Cellmark Diagnostics, undertook forensic DNA testing in the USA, and in 1987 Tommy Lee Andrews became the initial American to be convicted of a crime (rape) using DNA data [34]. The FBI, led by Dr. Bruce Budowle, began performing DNA typing casework in December 1988. A few months later, in March 1989, Virginia became the first state crime laboratory with an operational DNA unit, directed by Dr. Paul Ferrara. RFLP testing was the mainstay of most criminalistic DNA typing for a decade. At the same time, Dr. Henry Erlich and coworkers of Cetus Corporation, developed a faster PCR-based (see below) HLA DQ-alpha dot-blot system (and later the Polymarker system, Fig. 54.6), but it did not have sufficient discriminatory power for widespread adoption by the forensic community. Nevertheless, the first use of DNA tests in litigation in the USA was in 1986, in the case of Commonwealth v. Pestinikas, using HLA DQ-alpha to show that organs had not been switched in an autopsy [35]. In the early 1990s, PCR-based STR systems (described below) were developed and eventually became the standard forensic DNA test worldwide. STR methods replaced RFLP systems due to robustness, sensitivity, statistically discrete systems, ease of automation, and economy. Other systems, such as Y-chromosome markers, mtDNA sequencing, and phenotypic markers also are sometimes used (described below) (see Table 54.1).

Figure 54.4
figure 4

In 1985, Alec Jeffreys first described a DNA fingerprint. He used a multilocus minisatellite probe that resulted in a band pattern similar to a bar code, such as the one shown on the can to the right. The various lanes of the autoradiograph are from different individuals, demonstrating that each shows a unique pattern of bands. This multilocus probe method of DNA typing is no longer used in forensic identification

Figure 54.5
figure 5

RFLP autoradiograph with five analytical lanes and three control lanes. The DNA profile of the reference sample from a female rape victim matches the DNA profile of blood found at the scene and that of the female fraction of a vaginal swab. The DNA profile of the suspect reference specimen matches the male fraction of a vaginal swab but does not match the DNA profile of the female victim

Figure 54.6
figure 6

Polymarker strips from different individuals using five genetic systems detected by PCR amplification and reverse dot-blot hybridization probes. GC group-specific component, GYPA glycophorin A, HBGG hemoglobin gamma-globin chain, LDLR low-density lipoprotein receptor,

Table 54.1 Summary of DNA typing system usage in crime laboratories

Early Cases Using DNA Testing

Queen v Pitchfork

The first criminal investigation using DNA typing was in a double rape-homicide (of Linda Mann in 1983 and of Dawn Ashworth in 1986) on a deserted footpath in the English countryside, known as the “Black Pad Murders.” Richard Buckland, a person of low intelligence and sexual fetishes, became the focus of early suspicion and was charged but then exonerated by the new Jeffreys DNA tests. Males in the community between 13 and 30 years of age were asked to volunteer blood samples for DNA testing. There were no matches despite 4,500 “bloodings.” However, police discovered that a man named Ian Kelly had substituted his blood for Colin Pitchfork’s sample. Pitchfork was subsequently DNA matched and then convicted of both homicides.

Pennsylvania v Pestinikas

The first use of DNA typing in the USA was in a 1986 nursing home negligent homicide case. Forensic Science Associates performed DNA tests to prove that organs in the autopsy had not been switched as was alleged by one expert. The DNA in this case had become highly degraded, averaging fragments of approximately 100 bp.

Florida v Andrews

The first US criminal conviction based on DNA typing was of a serial rapist, Tommy Lee Andrews (1987). A series of breaking and entering women’s homes and rapes began in 1986 in Orlando, Florida. A stakeout resulted in an arrest, and Lifecodes Corporation matched the suspect’s DNA to vaginal swabs of two of the rape victims.

PCR Amplification as Sample Preparation

Today, all major methods for routine forensic DNA testing begin with amplification of the DNA target by PCR. Dr. Kary Mullis shared the 1993 Nobel Prize in Chemistry for PCR development in 1983. Forensically valuable human leukocyte antigen (HLA) polymorphisms were among the earliest targets to be amplified by PCR in the laboratory [36]. PCR amplification is relatively easy to perform, inexpensive, quick, and amenable to automation. It also permits chemical labeling of the amplified fragments, as well as simultaneous amplification of several loci in a single reaction (multiplex). PCR amplification allows the routine testing of nanogram quantities of DNA, and can be optimized for testing of even picogram quantities, enabling the use of new classes of evidentiary specimens. However, such sensitivity requires extreme care to prevent contamination, including laboratory facilities with separate pre- and post-amplification areas, unidirectional handling of evidence intake through final analysis, limited laboratory access by untrained personnel, and knowledge of each analyst’s DNA profile to identify any contamination. Lastly, PCR can be successful on evidentiary material in which the DNA has become degraded and only a few fragments with the intact target sequence remain. Although amplification methods other than PCR exist, the conservative forensic community will not likely be quick to adopt an alternative to PCR unless there is a very good reason.

Short Tandem Repeats

STRs are repeat length polymorphisms that have become the mainstay of current forensic identity profiling around the world. Core repeat units in STR systems are tetranucleotide or pentanucleotide elements (i.e., have four or five nucleotides in each core repeat, respectively), with resulting amplicon sizes of approximately 100–450 bp (see Fig. 54.7). STR analysis is robust, amenable to automation, highly sensitive, relatively insensitive to degraded DNA, and yields discrete alleles. Multiplexed amplification of multiple STR loci achieves extraordinary discriminatory powers (typically >10−12) (see Fig. 54.8). As a result, PCR-based STR testing has become dominant in forensic DNA laboratories (Table 54.1).

Figure 54.7
figure 7

Diagram of short tandem repeat DNA segments composed of varying numbers of core repeats (C.R.) and accompanying electropherograms showing the corresponding allele peaks: (a) heterozygous pattern with alleles of 3 and 5 repeats, (b) homozygous pattern with allele of 4 repeats. The shoulder region is the flanking constant region to which PCR primers hybridize

Figure 54.8
figure 8

Electropherogram of multiplexed fluorescently labeled PCR amplicons of STR loci demonstrating the allelic determinations (boxes). The X-axis reflects time and the Y-axis reflects fluorescence intensity. Four fluorophore colors permit separate analysis of genetic loci with overlapping sizes. A fifth dye channel is used for a size standard that is not shown. This person is a 15,16 genotype at the vWA locus, a 7,7 genotype (7 phenotype) at the TH01 locus, has a 32.2 variant allele in the D21S11 locus, and is a male according to the amelogenin locus

In the late 1980s, Dr. C. Thomas Caskey working with Holly Hammond, then at Baylor College of Medicine in Houston, Texas, was funded by NIJ to develop STR systems for forensic applications [37]. Subsequently, in 1991, STRs were first used in casework by one of the authors (VW) at the US Armed Forces DNA Identification Laboratory (AFDIL), through a subcontract with Cellmark Diagnostics, to identify service members who died in the first Persian Gulf War. However, it was Drs. Peter Gill and David Werrett at UK’s Forensic Science Service who, in the mid-1990s, began applying STR analysis (using in-house systems) to routine criminal casework [38, 39].

Recognizing the importance of cross-jurisdictional matches, the FBI convened a panel of forensic scientists in 1998 to select a panel of STR loci for use in their National DNA Index System (NDIS). Thirteen loci, all containing tetranucleotide repeats, were chosen: D3S1358, D5S818, D7S820, D8S1179, D13S317, D16S539, D18S51, D21S11, CSF1PO, FGA, THO1, TPOX, and vWA (Table 54.2) [40, 41]. These 13 core loci have become standard for forensic casework in much of the world and are referred to as the “CODIS” loci, after the Combined DNA Index System (CODIS) software into which DNA profiles are entered [42]. Databases are maintained of the STR alleles of convicted felons, casework profiles, and missing persons, although exactly which profiles can or must be uploaded varies based on state requirements. The commonality of genetic systems (i.e., STR loci) used in forensic casework enables computer searches for matches across jurisdictions.

Table 54.2 Nationally indexed “13 CODIS STR Core Loci”*

All CODIS STR loci are tetranucleotide repeats. In general, smaller fragments are preferred for amplification of potentially degraded samples. Additionally, preferential amplification, where smaller target DNA fragments are amplified preferentially over larger targets, becomes an issue with large core repeats as the size discrepancy between overall allele lengths increases substantially. However, “stutter” can become problematic if the core repeat size (for example, dinucleotide and trinucleotide repeats) is too small. Stutter peaks are produced when the DNA polymerase slips during amplification, resulting in PCR products that have fewer or more repeat units than the starting template, with the major stutter product generally one repeat unit less than the template. Dinucleotide and trinucleotide repeats have substantial stutter, while pentanucleotide or larger repeats have almost none.

Commercial kits for amplification of the CODIS core STR loci are available in various combinations of multiplex primer sets from two companies: Promega Corporation and Applied Biosystems Inc. (ABI, a subsidiary of Life Technologies, Cartsbad, CA). ABI sells the Identifiler series, which includes all 13 core loci as well as amelogenin (below), D2S1338, and D19S433 in a single reaction. Promega offers the PowerPlex series, such as PowerPlex 16 that includes the 13 core loci, plus amelogenin and two pentanucleotide repeat loci, Penta D and Penta E. These two companies regularly produce new products that add loci to the multiplex, increasing discriminatory power. Mini-STRs, such as Applied Biosystem’s Minifiler, are traditional STRs with primers designed to reduce the flanking regions and thus the amplicon size, so that typing results can be obtained from more substantially degraded specimens.

The 13 CODIS core STR loci are being expanded to a likely set of 24 loci. The impetus for this expansion is to reduce the likelihood of adventitious matches in large databases, to increase the compatibility with international databases, and to increase the discrimination power for missing person cases [43]. The putative additional loci include loci from the European Standard Set (ESS). ABI has responded with the Global Filer kit and Promega with the Powerplex Fusion kit. These new kits have been engineered to be more sensitive as well as capable of more rapid amplification (see discussion below).

Amelogenin and Other Sex Markers

An amelogenin assay is included in current commercial STR amplification kits as a sex marker [44]. The amelogenin gene is present as homologs on both the X and Y chromosomes, but there are a number of sequence differences in the noncoding regions. The locus used in forensic testing involves a 6 bp deletion on the X chromosome; thus, the X marker is shorter than the Y marker and males manifest two peaks whereas females manifest a single peak of twice the intensity (Fig. 54.9). The amelogenin sex marker system is robust and reliable, although in very rare instances, sex-typing discrepancies have been noted [45]. Other sex markers have been described, including ones that exist at higher copy number and are thus more sensitive than the single copy amelogenin.

Figure 54.9
figure 9

The amelogenin locus is 6 bp longer on the Y chromosome than on the X chromosome. Thus, a male will have two peaks and a female will have only one peak in the electropherogram

Male-Specific DNA Typing with Y-Chromosome Markers

Y-chromosome markers are not sex markers, but rather are male-specific identity systems that permit typing of the male DNA in mixed male/female specimens (e.g., vaginal swabs following rape or fingernail scrapings after assault). Y-chromosome markers are useful due to their strict paternal inheritance and can be helpful in lineage studies. The absence of recombination means that the exact same Y chromosome DNA alleles are present in distant paternal relatives of an individual. For example, Y-chromosome markers were used in determining the paternity of US President Thomas Jefferson among his distant descendants [46]. Y-chromosome markers are inherited together, and are reported together as a haplotype. Since they are not genetically independent, the population frequencies of each allele cannot be multiplied together; instead the counting method is used, where the number of times the haplotype is seen in a population database generates the frequency statistic. This means that the discriminatory power is much less than autosomal STRs and that a large number of loci is necessary to achieve substantial discrimination. Current commercial Y-chromosome markers are STRs that can be analyzed on the same equipment as standard STRs. More information on Y-STR haplotypes is available from various websites [47].

Single Nucleotide Polymorphisms

By far the most common polymorphisms in the human genome are SNPs [4850]. It is estimated that there are approximately 15 million SNP sites out of the more than six billion bps of the diploid human genome. Very small DNA fragments can be interrogated for SNP alleles; thus, SNP genotyping can be applied for forensic identification despite extreme DNA degradation. The identification of human remains recovered from the World Trade Center disaster is one scenario in which SNPs showed an advantage over other DNA typing (Orchid-GeneScreen, now LabCorp, Burlington, NC), since the DNA was severely degraded in a large percentage of the specimens. Most SNPs are biallelic; that is, there are only two alleles, despite the fact that there are four possible nucleotide bases. Most SNPs are base substitutions; however, a smaller number are insertions or deletions (“indels”). Therefore, a large set of SNPs must be used to obtain decisive discriminatory values. In forensics, SNP analysis is not currently commercially available, but it is particularly amenable to automation and analysis with chip technologies. Indels are compatible with more standard fragment length technologies. SNPs may prove to be particularly valuable with Y chromosome or phenotypic markers (described below).

Mitochondrial DNA Sequencing

MtDNA is useful for forensic typing in tissues lacking a nucleus (e.g., shed [telogen] hairs), when specimens are greatly degraded (e.g., old skeletal remains), and in some cases of distant maternal relatives or to clarify kindred relations. Human mtDNA is a histone-free, circular, double-stranded DNA of approximately 16,569 bp (Fig. 54.10). MtDNA is useful for testing highly degraded specimens because it is present at a high copy number in each cell. Mitochondria are thought to be derived from ancient cellular symbionts, explaining the presence of their own DNA and modified genetic code. Each cell contains tens to thousands of mitochondria and each mitochondrion contains 1–10 copies of mtDNA; as a result there are a total of 500–2,000 copies per cell, compared to the single set of diploid nuclear chromosomes [51]. Furthermore, each mtDNA particle appears to be more resistant to degradation than nDNA [52], possibly because it is circular or because it is enveloped within the mitochondrion. The non-recombinant maternal inheritance pattern of mtDNA (Fig. 54.11) also can be of use in certain cases. Any paternal mtDNA that may pass into the fertilized egg from the sperm is thought to be destroyed by the ubiquitin pathway, leaving only the maternal egg-derived mtDNA intact. Thus, all mitochondria are derived from the mother’s egg. MtDNA, unlike the paired nuclear DNA, does not undergo meiosis and does not participate in genetic recombination events, remaining unchanged through generations, until a mutational event occurs. In this regard, mtDNA analysis can be important when only a distant maternal relative is available as a reference specimen.

Figure 54.10
figure 10

Mitochondrial DNA (mtDNA) is a circular DNA with 16,569 bp. The “control” region is a segment that encompasses the site used for the arbitrary numbering system and that contains two hypervariable regions (HVI and HVII) that are used for forensic purposes

Figure 54.11
figure 11

Mitochondrial DNA (mtDNA) is maternally inherited without recombination. The mtDNA sequence is exactly the same in all children of the grandmother and all children of her daughters. In contrast, grandchildren inherit only approximately 25 % of the nuclear DNA of their grandmother. The mtDNA sequence of the maternal grandmother (top left) represented by a tan color is transmitted to her male and female children (middle row) and her daughter’s male and female children (bottom row on right)

For identity testing, only nucleotide polymorphisms in mtDNA are of practical utility, since no STR-like repetitive DNA is present. The mtDNA sequence obtained from a sample is compared to the first complete human mtDNA sequence generated by Anderson et al. [53], or its Revised Cambridge Reference Sequence [54]. Using standard nomenclature, only the differences between the aligned sequence and the reference sequence are noted by the position and base (e.g., 16311C), with insertions designated by a period after the preceding base and the number of bases inserted (e.g., 16192.2T), deletions designated by D or minus sign (e.g., 249−), and ambiguous bases coded using an N. Human mtDNA is densely coding, specifically coding for 37 genes, and is thus generally highly conserved. The sequence polymorphisms are concentrated in two hypervariable regions that are located in the noncoding control region [55]. The control region is a 1.2 kb segment, which includes a cloning site that Anderson et al. arbitrarily set as base pair 1. Hypervariable region I (HVI) spans positions 16,024–16,365 and hypervariable region II (HVII) spans positions 73–340 (Fig. 54.10). Homopolymorphic C-stretch regions around positions 16,189 of HVI and 310 of HVII may complicate sequencing. MtDNA does not provide definitive identification due to maternal kindred sharing the same sequence and its relatively low discriminatory power. Common haplotypes exist (e.g., the 263G, 315.1C haplotype occurs in 7 % of Caucasians), but most haplotypes are rare. Polymorphisms in the rest of the molecule exist, but are too infrequent to be practically interrogated by traditional sequencing [56].

In the late 1970s, Dr. Wesley Brown brought mtDNA analysis techniques to Dr. Allan Wilson’s molecular evolution laboratory at UC Berkeley, which eventuated in the beginnings of forensic mtDNA testing through Drs. Mary-Claire King, Mark Stoneking, and Svante Pääbo. In 1984, Dr. King began to use mtDNA for the “disappeared” in Argentina, which allowed lost children to be reunited with their grandparents [57]. Dr. Peter Gill at the Forensic Science Service (now disbanded) working with Erika Hagelberg of Cambridge University in the UK took the lead in mtDNA casework in 1992, and at about the same time AFDIL in the USA began to identify skeletal remains from the Vietnam War using this method [58, 59].

In crime laboratories, mtDNA is most commonly used to analyze shed hairs from pubic combings or those found at scenes, because such hairs lack roots and the hair shaft contains little or no nDNA. On average, an individual loses 200 hairs per day, and thus it is not surprising that shed hairs constitute an important trace evidential specimen. MtDNA may also be used on fingernails and keratotic skin, which also lack nDNA. The first recorded mtDNA case was in the successfully prosecuted 1996 case, Tennessee v Ware, involving a single hair found in the throat of a victim [60].

Occasionally, more than one mtDNA sequence exists in the same organism or tissue, a condition termed heteroplasmy. Although heteroplasmy was well known in plants and nonhuman animals, it was first seen in human mtDNA by Dr. Peter Gill of the Forensic Science Service [61] and then confirmed by one of the authors (VW) [62] during the identification of Czar Nicholas Romanov II (Fig. 54.12). Paternal leakage, recombination, and high mutation rates may contribute to heteroplasmy. The rate of mutations in noncoding mtDNA is 10–20 times greater than that in nuclear DNA, possibly due to the exposure of mtDNA to oxygen-free radicals or DNA polymerase with a higher error rate [63]. Thus single base differences between presumed maternal relatives must be viewed with caution. A low level of heteroplasmy may, in fact, be present in all individuals. To be detected using standard DNA sequencing, the level of heteroplasmy must be above approximately 30 % of the mtDNA sequence; otherwise, it is not distinguishable from background noise. Heteroplasmy is not uniform throughout the body and appears to be somewhat tissue specific. In addition, heteroplasmy may be rapidly lost (reversion to a homoplasmic state) in family lineages because of the bottleneck phenomenon that occurs during reproduction from a single egg. Heteroplasmy can complicate forensic analysis. For example, two hairs cannot be assumed to be from different individuals if they differ by a single nucleotide.

Figure 54.12
figure 12

The first description of mtDNA heteroplasmy in humans was in the case of Czar Nicholas Romanov II, the last imperial Russian monarch. DNA sequence analysis shows that the czar (a) shares the heteroplasmy (C/T marked with an asterisk) at position 16,169 with his brother Georgij (b), but not with his distant relative Xenia Cheremeteff-Sfiri (c) (five generations removed) who has only the T nucleotide

MtDNA testing is not performed by most forensic laboratories because the standard analytical method is DNA sequencing, which is expensive, labor-intensive, relatively slow, and, owing to how ubiquitous and prevalent the molecule is, may be susceptible to contamination. The exquisite sensitivity of the testing mandates special laboratory facilities and procedures. Also, interpretation is less straightforward than for routine STR results [64]. In 2006, the FBI created four regional state mtDNA laboratories (Arizona, Connecticut, Minnesota, and New Jersey), expanding forensic mtDNA sequencing capacity beyond private and federal laboratories. The number of laboratories performing mtDNA sequencing has since increased.

Phenotypic Markers and Ancestry-Informative Markers

Most forensic DNA systems involve noncoding DNA loci and are not associated with phenotypic traits [65]. The amelogenin marker is a major exception in that it directly assesses the sex of an individual. Some of the loci, like vWA of the von Willebrand locus, have very weak associations with disease states or other phenotypic information; of course older serologic testing was phenotype-based.

In some instances use of descriptive traits of an individual may be desirable, particularly if no eye-witnesses exist. Phenotypic markers have been and continue to be developed for use in forensic investigations. Markers have been established for eye color and weaker ones exist for hair and skin pigmentation [66, 67]. DNA Print Genomics (which ceased operations in 2009) had claimed their RETINOME system could predict eye color with 96 % accuracy. Generally, such tests are SNP assays for a set of informative but widely disparate loci. A genetic version of “driver’s license” data would be useful for investigations, even if it were not to be used as probative evidence in court. A danger of misdirection from an incorrect prediction would have to be considered, since the accuracy is less than perfect [68].

Ancestry-informative markers are used to suggest a geo-ethnic origin [69, 70]. DNA Print Genomics “DNA Witness” appeared to have used them successfully in some investigations, but the technique was controversial [71]. Other groups, most notably the National Geographic Genographic Project [72], with various sets of markers and analyses, claim to be able to make statements of various proportions regarding ancestry background. Of course, admixture and modern travel greatly limit the value of such efforts. Some would derisively characterize these trials as genetic profiling [73].

Species Identification

Animal, plant, and microbial identification is sometimes important in a forensic investigation and can be accomplished using DNA analysis [23]. Typical forensic specimens include animal hair and fly larvae. Animals are generally examined by forensic scientists using the cytochrome b [74], 12S ribosomal RNA [75], or other mtDNA loci [76], and plants through their chloroplast DNA [77, 78]. This interrogated sequence is then entered into the online bioinformatics program BLAST (Basic Local Alignment Search Tool), hosted by the US National Institutes of Health. The utility of a BLAST search is that today almost all DNA sequences produced by scientists are entered into the database (often required for journal publication) meaning virtually every species ever studied at the molecular level is represented. BLAST undertakes a query of the questioned sequence and in a few seconds produces a list of the most similar sequences in the database (often 100 % matches), complete with sequence alignment and appropriate references. For instance, a questioned hair may have a 100 % match to dozens or hundreds of dog sequences, followed by 99 % matches to more dog sequences, and will then begin to be interspersed with wolf sequences, coyote, etc. Except for extremely closely related, or highly exotic and rare species, BLAST queries typically result in an exact match, and the questioned speciation is identified. However, there is often a need for further strain (clade) or individualizing analysis. For instance, strain testing is used to trace marijuana plant sources [79]. Source attribution at the specific individual level is accomplished by DNA methodologies similar to human identity testing.

Tissue Identification

Occasionally, determination of the tissue origin of a specimen is required. Since the DNA from an individual is the same in all tissues, forensic scientists assay either messenger RNA (mRNA), as certain genes are expressed in some tissues and not others, or using immunoassays of the protein products [80, 81]. Commercial human gene expression microarrays have been used to determine tissue origin, but this is not a capability of crime laboratories.

Instruments and Technologies for DNA Typing

Since STRs replaced manual RFLP slab gel methods in the 1990s, capillary electrophoresis (CE) of amplified DNA, pioneered by Drs. John Butler and Bruce McCord, then at the FBI, has become the mainstay of forensic DNA laboratory operations around the world. CE instruments have replaced slab gel electrophoresis systems because of automation, faster run times, smaller sample volumes, and greater resolution. Typical casework calls for on-demand instrumentation that can handle relatively few specimens but with fast run times. High-throughput CE instruments are used as batch instruments for DNA data-banking operations. The ABI series of CE instruments (310, 3100, 3130 “Genetic Analyzers”) from ABI are predominant and virtually exclusively used with the forensic community; the new ABI 3500 instrument is to supplant older models and features an ability to detect 6 dyes.

The forensic community is investigating other technologies for DNA analysis and NIJ has funded development [82]. Commercial “Rapid ID” DNA (rDNA) instruments (GE Healthcare Life Sciences [Pittsburgh, PA] and NetBio [Waltham, MD] ANDE; IntegenX [Pleasanton, CA] RapidHIT; and Lockheed Martin Corp [Bethseda, MA] and Zygem [Solana Beach, CA] RapiD; LGC [London, UK] RapiDNA) were introduced in 2012 to perform sample preparation, amplification, and electrophoresis to produce typing results within 2 h [83]. Their technology involves integrated microfluidic systems. Such systems are designed to be used at police booking stations to permit searches while persons of interest are in custody. These systems may also make crime-scene field-testing practical. Next-generation sequencers vary in their technologies, but also have microfluidics in common. They are designed for genomic/exomic applications for which they are incredibly rapid and relatively inexpensive for the amount of DNA sequenced. Analysis requires a specimen with a large quantity of DNA, relative to forensic evidentiary specimens, and is relatively expensive on a per run cost compared to current STR kits. However, if the next-generation sequencing assay is focused on targeted areas of interest, then less specimen DNA would be required and possibly multiple specimens could be processed in batch mode. In fact, the massively parallel sequence reads resulting from next-generation sequencing may be a benefit in analysis of degraded and low copy number DNA specimens.

Interpretation of Results

US crime laboratories will typically use the CODIS software (“popstats”) to generate their statistics based on the FBI’s allelic frequency data for Caucasian, African-American, and Hispanic racial groupings. STR systems are powerfully discriminating with an average random match probability of less than one in a trillion using the 13 core loci. “Discriminatory power” should not be confused with “accuracy” (e.g., ABO blood group typing is accurate but has low discriminatory power). The high discriminatory power of STRs is achieved because the statistic from each STR locus is multiplied together, the so-called “product rule” [8488]. Current STR systems utilize genetically unlinked loci (STR loci are on different chromosomes, except CSF1PO and D5S818 which are sufficiently distant as to be genetically independent). Hardy-Weinberg disproportion, which may occur from population or racial grouping substructure (subgrouping), selection (non-random mating), inbreeding (mating within kindred), or linkage disequilibrium (from incomplete mixing of different ancestral populations), was cause for early court challenges to statistical interpretations of DNA results. Some of the early purported large deviations turned out to be an artifact of lower resolution RFLP tests where a single band was interpreted as a homozygote rather than two overlapping heterozygous bands [89, 90]. A National Research Council (NRC) report, NRC I [91], was issued in part to address these statistical concerns. The NRC I report itself proved controversial, which led to the NRC II report [92], that has, in fact, largely settled most statistical forensic identification issues.

In the USA, a Random Match Probability (RMP) is usually calculated as the chance of a random match in the population or racial grouping. Under conditions of Hardy-Weinberg proportions and linkage equilibria, the statistical calculation for the probability of an occurrence of a given genotype would be p 2 in the case of a homozygote or 2pq in the case of a heterozygote, where p is the frequency of observed occurrence for the p allele and q is the frequency of for the q allele. Instead, to account for population substructure, the NRC II recommended for calculation of homozygote frequencies from population allelic data: p 2 + p(1 − p)θ, where θ is 0.01 for the US population as a whole and US racial groupings (empirically determined); but a more conservative value of 0.03 may be used in cases of smaller, isolated, and more inbred groups; and since heterozygote frequencies are overestimated in cases of disequilibria then 2pq can be used to calculate them (see Table 54.3 for an example RMP calculation).

Table 54.3 Example of random match probability and likelihood ratio calculations by ethnic group

In Europe, a Likelihood Ratio (LR) is usually used, wherein the hypothesis of the prosecution that the defendant is guilty and was the source of the DNA (assumed to be 100 % or 1) is divided by the hypothesis of the defense that the specimen is from someone else (some random individual) or p 2 or 2pq (LR = 1/p 2 or 1/2pq) a result greater than one would support the prosecution and a result smaller than one would support the defense (see Table 54.3 for an example LR calculation).

MtDNA and Y-chromosome markers yield haplotype population frequency data that do not involve the product rule, and instead the frequency of the observed haplotype in a database is considered (the counting method). There has been some discussion as to whether these haplotype frequencies can be multiplied against each other as well as the STR statistic to achieve a summary discriminatory figure. Interpretation may be problematic when confronted with mixtures or from significant apparent imbalances or allelic drop-in and drop-out (Fig. 54.13) when testing highly degraded or trace DNA specimens.

Figure 54.13
figure 13

Allelic drop out from sample degradation or primer site polymorphism is one of the few interpretative pitfalls in the analysis of STRs. Allelic drop in can occur from contamination. This figure demonstrates a drop out of allele 16 and a drop in of allele 13 in the upper tracing compared to the lower tracing

Convicted Offender Databases

The DNA Identification Act of 1994 (US Public Law 103–322) authorized the creation of the FBI’s National DNA Index System (NDIS). DNA profiles are uploaded using CODIS software, which may vary from state to state due to variations in state policy or statute [93]. Searches can be performed locally through a Local DNA Index System (LDIS) or State DNA Index System (SDIS), and across state lines through NDIS. Identifying information other than the DNA profile is not entered into the system. A match from an NDIS search results in the local crime laboratory of one state being put into contact with the local crime laboratory in another state to discuss case details. Uploading of DNA profiles triggers federal regulatory requirements on the use of the DNA specimens and profiles. The federal government has its own database for federal crimes as well. The number of profiles in the DNA databases has increased dramatically as state laws have expanded the convicted offender requirements from selected offenses to all felons, a broad array of misdemeanor crimes, and even arrestees. In 2012, the US Supreme Court, in MD v King, upheld the routine search of DNA databases when DNA samples are collected upon arrest [94]. Today, approximately ten million convicted offender profiles exist in the database. In recent years, “low stringency matches” have enabled searches for family members to assist investigations when no DNA profile of the perpetrator is in the database [95, 96].

Quality Assurance and Laboratory Issues

The FBI formed the Technical Working Group on DNA Analysis Methods (TWGDAM) to allow analysts from different laboratories to share information on the new DNA technology. The DNA Identification Act of 1994 gave the FBI regulatory oversight of DNA profiles entered into the national database [97]. The legislation called for a DNA Advisory Board (DAB) that produced recommended standards, based largely on guidelines of the TWGDAM, which were adopted with little change by the FBI director [98]. DAB requirements include minimal educational credits and experience of the testing personnel, proficiency testing twice a year per analyst, annual audits, and technical and administrative reviews of all tests. TWGDAM has since been renamed the Scientific Working Group on DNA Analysis Methods (SWGDAM) [99] and continues to recommend new standards to the FBI Director. The FBI conducts audits of laboratories to verify and enforce compliance with the standards, at least with respect to profiles that are generated and uploaded into NDIS.

The FBI/DAB standards require accreditation. The American Society of Crime Laboratory Directors/Laboratory Accreditation Board (ASCLD/LAB) and, more recently, the Forensic Quality Services (FQS) accredit laboratories. The accreditation requirements and audits are rigorous and are based on the International Organization for Standardization/International Electrotechnical Commission (ISO/IEC) Standard 17025. Standard reference materials from the National Institute of Standards and Technology (NIST) are available for autosomal STR analysis (SRM 2391b), Y-STRs (SRM 2395), mtDNA testing (SRM 2392), and DNA quantification (SRM 2372). Standards require annual comparisons with NIST-traceable standard materials [100]. In addition to these forensic science laboratory measures, judicial scrutiny provides further review of DNA findings in those cases that go to court.

Legal Issues

Shortly after forensic DNA tests were first introduced, defense attorneys attempted to directly attack this new scientific evidence. These early challenges are sometimes referred to as the “DNA wars” [101104]. The first serious challenge to forensic DNA identity testing came in the 1989 case of New York v Castro [105] but legal admissibility of RFLP analysis was generally established in the 1991 case of US v Yee [106], and for PCR-based STR analysis in a series of cases in 2001 and again in 2005 [107110]. Most prominent among the players were defense attorneys Peter Neufeld and Barry Scheck (subsequently part of the O.J. Simpson “dream team” and later founders of the Innocence Project), FBI lead scientist Dr. Bruce Budowle, and prosecutors Rockne Harmon and Woody Clark. The attacks were centered primarily on the issue of statistical interpretation. The early forensic DNA tests suffered from an inability to resolve discrete alleles. Moreover, the genetic independence of the loci was questioned based upon early Hardy–Weinberg disequilibrium calculations. Today, the “DNA wars” are largely over. The scientific basis of forensic DNA typing was never seriously questioned, but rather vitriolic challenges were launched at laboratory procedures and statistical interpretation. The admissibility of DNA evidence was not challenged in the 1995 O.J. Simpson trial despite the presence of a well-funded and experienced defense team; instead, the “weight” of the evidence was challenged, on the theory that police investigators had intentionally planted Mr. Simpson’s blood. The most common challenges today are to sample collection, preservation of the evidence, chain of custody, documentation, and validation studies [111]. New genetic testing systems and technologies will undergo renewed judicial scrutiny and in particular, LCN DNA testing will generate anticipated challenges. While the challenges subside, the uses of DNA continue to grow. Police and prosecutors’ office have created “cold case” units to try to close old cases with DNA evidence. Identification of the unidentified in medical examiner and coroner offices is also being pursued and is expected to close some old open cases as well. Indeed, the defense is now using forensic DNA identity tests after conviction to exonerate the previously convicted through the Innocence Project [112114]. At the time of this writing there have been nearly 300 postconviction DNA exonerations.