1.1 Monogenic Disorders Simulation

Join a team of doctors and learn about the basics of monogenic disorders and how to perform a genetic risk assessment based on family history and genetic tests. As the name suggests monogenic disorders arise due to mutations or changes in a single gene. Cystic fibrosis (CF) is the most common life‐limiting genetic disorder in those of North European descent, estimated to affect 70,000 people worldwide. CF is a monogenic disorder caused by one of over 2000 disease‐causing mutations in the cystic fibrosis transmembrane conductance regulator (CFTR) gene. Importantly, as a recessive disorder, both copies of the gene must carry mutations for the disease to manifest.

Learn About Cystic Fibrosis and how It Is Detected Genetically

Predominantly present within the cell membrane of epithelial cells, the CFTR protein is a chloride channel that regulates epithelial surface fluid secretion in the respiratory and gastrointestinal systems as well as numerous other epithelial tissues. In CF mutations in the CFTR gene lead to a reduction, or complete loss, of CFTR function. These mutations can be detected by several techniques, including polymerase chain reaction (PCR) to check for changes in CFTR protein length (Fig. 1.1).

Fig. 1.1
figure 1

A lab assistant explains how to use PCR and gel electrophoresis to detect mutations in CFTR in the Monogenic Disorders simulation

Design Custom Primers to Investigate CFTR

Although CF is a monogenic disorder, there are actually many mutations in the CFTR gene that can give rise to the disease, often influencing disease severity. By using PCR and gel‐electrophoresis with specially designed primers it is often possible to identify which mutation is present in the CFTR gene (Fig. 1.2).

Fig. 1.2
figure 2

Learn how to design forward and reverse primers to study the CFTR gene in the Monogenic Disorders simulation

Generate a Pedigree and Predict the Risk of Inheritance

As CF is a genetic disorder it is possible to track the mutation through generations by generating a pedigree, based on genetic analysis of related individuals and the family’s medical history (Fig. 1.3). With sufficient knowledge you can even predict the potential risk of future generations inheriting the disease, or carrying a single copy of the gene!

Fig. 1.3
figure 3

Construct a pedigree tree and calculate the risk that the unborn infant will inherit cystic fibrosis from its parents in the Monogenic Disorders simulation

Will you be able to pull together all your knowledge of CF and various key molecular techniques to inform a young couple about the risk of their future children inheriting the disease?

1.2 Monogenic Disorders Theory Content

In this lab, you will learn about the monogenic disorder cystic fibrosis (CF) and how it is inherited from one generation to the next. To understand this you will cover the basics of genetics, key facets of CF and important molecular techniques used to study the disease. The theory content below covers everything you’ll need to be successful in the Monogenic Disorders simulation.

1.2.1 Monogenic Disorders

Monogenic disorders are classified as genetic disorders, and their distinctive characteristic is that only one gene is involved in the development of the disease. Examples of monogenic disorders include CF or hemophilia.

1.2.2 Cystic Fibrosis

CF is a type of monogenic disorder caused by mutations in the cystic fibrosis transmembrane conductance regulator (CFTR) gene, an ion transporter found mainly in epithelial cells. CF is caused by mutations in the CFTR gene which lead to a reduction, or complete loss, of CFTR protein function. Due to impaired ion‐transport, the surface liquid which lines many epithelial tissues becomes more viscous, often with an associated build‐up of mucus.

Mucus secretion is a normal part of airway physiology, as it helps protect the cells of the airway from mechanical damage and infection. In health this mucus is swept out of the airways by the action of the cilia which line the airway. However, in CF an excess of this mucus can build up as it is not cleared correctly due to a failure of cilial action caused by changes in the viscosity of the surface liquid.

While a monogenic disorder, a major feature of the disease is its heterogeneity, meaning that CFTR genes from many different patients with CF contain different types of mutations, with over 2000 currently recorded. However, the most common mutation in the CFTR gene (affecting ~60–70% of the population) is a deletion of three nucleotides, resulting in a loss of the amino acid phenylalanine (F) at position 508 in the protein. It is therefore commonly called F508del (also ∆F508).

Historically, the major cause of mortality associated with CF was the patient’s inability to absorb nutrients from food, due to a lack of digestive enzymes in the intestine. This is caused by the build‐up of mucus, which prevents the enzymes from passing into the intestine from the pancreas. As such individuals would usually die early in childhood. With a better understanding of the disease and nutritional supplementation, digestive issues associated with the disease are now typically well managed in people with CF.

Therefore, lung disease associated with CF is now the major issue facing individuals today. The airways become clogged with a thick, viscous mucus limiting the individual’s ability to breathe. This mucus also represents an attractive breeding ground for bacteria leading to chronic bacterial infections and associated inflammation. This in turn leads to remodeling of the airways resulting in the loss of lung function. This loss and the ongoing bacterial infections are often the major cause of mortality in people with CF these days.

In the Monogenic Diseases simulation we focus on CF, therefore all of the content below is discussed in relation to CF. For a more general description of genetics and associated simulations please see our other publication “Labster Virtual Lab Experiments: Basic Genetics” (Stauffer S et al. [2018] Labster Virtual Lab Experiments: Basic Genetics, 1st edn. Springer, Heidelberg, ISBN 978‐3‐662‐57999‑2).

1.2.3 Gene

A gene is a section of DNA and is the basic unit of heredity in living organisms. In CF we are interested in the CFTR gene.

Genes encode information to construct functional proteins, i.e. the CFTR gene encodes for the CFTR protein. This code is “read” through the process of transcription, which creates a complementary mRNA strand of the gene. The mRNA strand is then “translated” in the cell’s ribosomes where this code is used to build a chain of amino acids, ultimately constructing a protein.

Humans are thought to have between 19,000–20,000 genes in total, and together all of these genes are referred to as the genome.

1.2.4 Locus

We use the term locus to refer to the location or position of a gene (or significant sequence) on a chromosome. CFTR is located on the long arm of chromosome seven, specifically at position q31.2.

1.2.5 Allele

An allele is one of several alternative forms of the same gene, where the DNA sequence differs slightly. Sometimes, different alleles can result in different observable phenotypic traits, such as changes in pigmentation. However, most genetic variation results in little or no observable variation. Rarely, changes in the DNA sequence will result in alleles which impact on an individual’s health, for example CF is caused by changes in the DNA sequence of the CFTR gene leading to a reduction or loss of CFTR protein function.

Most multicellular organisms are diploid. This means that they possess two alleles for each gene, one inherited from each parent. If the two alleles are the same, the organism is said to be homozygous with respect to that gene. If they are different we call the organism heterozygous with respect to that gene (see zygosity).

1.2.6 Phenotype

Phenotype refers to an organism’s set of observable traits. The phenotype results from the expression of an organism’s genotype, inherited epigenetic factors, and environmental conditions.

Traits can be observable (e.g. eye color) or physiological (e.g. lactose intolerance), they can also be more severe and lead to diseases such as CF.

1.2.7 Genotype

The genotype consists of an organism’s DNA segments or genes, which make up its phenotype. In the case of CF, the CFTR alleles an individual carries will determine whether they develop the disease or not.

1.2.8 Mutation

From a genetic point of view the term mutation can refer to any permanent change in a DNA sequence regardless of its impact on the genome or the individual. However, in clinical terms a disease‐causing mutation refers to a permanent change in a DNA sequence which leads to the development of a disease or disorder. Mutations can vary from a single base changes, or point mutation, up to chromosomal mutations which include the duplication or deletion of entire chromosomes, or chromosome regions, which are covered in Chap. 2 of this book.

1.2.9 Central Dogma of Molecular Biology

During gene expression DNA is transcribed into mRNA which is then translated into an amino acid chain which goes on to form a mature protein. During translation the nucleotides which make up the mRNA strand are ‘read’ in batches of three, known as codons, which encode for specific amino acids. Some specific codons are termed stop codons, rather than codon for an amino acid these stop codons instead terminate translation.

1.2.10 Point Mutations

Point mutations describe a genetic mutation where a single nucleotide base is changed, inserted or deleted in a DNA sequence. This change in DNA leads to a change in the mRNA which can in turn lead to changes in the amino acid chain produced. There are three common outcomes to point mutations; frameshifts, synonymous mutations and non‐synonymous mutations:

  • Frameshift – Insertions and deletions lead to frameshift mutations, where the normal reading frame for the mRNA is shifted, the affected codon and all subsequent codons will therefore be ‘read’ differently. For example the sequence:

    $$ {\text{`}}\text{A}\underline{\text{GU}}\ \text{CAG}\ \text{AGA}{\text{'}} $$

    encodes for the amino acids serine‐glutamine‐arginine. An insertion of an ‘A’ nucleotide between the marked ‘G’ and ‘U’ nucleotides of the first codon would therefore result in the following sequence:

    $$ {\text{`}}\text{A}\underline{\text{G}A}\ \underline{\text{U}}\text{CA\ GAG\ A..}{\text{'}} $$

    which encodes for arginine‐serine‐glutamic acid. Frameshift mutations commonly result in significantly altered proteins often with dramatically different lengths due to the change in reading‐frame altering stop codon location.

  • Synonymous – The point mutation does not change the amino acid sequence of the protein encoded for by the gene, which typically does not affect the length of the proteins. For example, the codons ‘UGU’ and ‘UGC’ both encode for the amino acid cysteine, making the U > C point mutation synonymous. Importantly, diseases can still arise from these mutations as changes in the DNA sequence can have other impacts outside of direct protein coding.

  • Non‐synonymous – Conversely, in this instance the point mutation does result in a change to the amino acid sequence of the protein. Non‐synonymous mutations can be further divided into missense and nonsense mutations.

    • Missense – Missense mutations arise when the point mutation results in a different amino acid being present, but not a stop codon. This amino acid change can dramatically alter the function of the protein but does not usually alter its length. For example the codon ‘UGU’ encodes for a cysteine amino acid, whereas the codon ‘UGG’ encodes for tryptophan. In this case the U > G point mutation produces a non‐synonymous missense mutation.

    • Nonsense – Nonsense mutations arise when the point mutation results in a stop codon being present, which results in a truncated and usually non‐functional protein. For example the a ‘UGU’ to ‘UGA’ point mutation results in the cysteine amino acid changing into a stop codon. In this case the U > A point mutation produces a non‐synonymous nonsense mutation.

1.2.11 Zygosity

So‐called diploid organisms such as mammals carry two alleles – one gene variation inherited from each parent. Zygosity indicates the similarity of the two alleles for one particular trait, i.e. whether the DNA sequence at a particular locus are the same or different.

Three different genotypes at a single locus are found in diploid organisms:

  • Heterozygous – refers to the state of having two different alleles for a particular trait.

  • Homozygous – refers to the state of having two identical alleles for a particular trait.

  • Hemizygous – refers to the state of having only one allele for a particular trait. Males are said to be hemizygous, in that they only have one allele for any X‐linked characteristic (see X‐linked mode of inheritance).

1.2.12 Dominant and Recessive Alleles

When an allele is dominant, one copy of the gene is enough to produce the respective phenotype. In order to exhibit a recessive phenotype however, an individual must have two copies of the gene.

1.2.12.1 Dominant Trait

A dominant trait is defined as a trait which confers the same phenotype regardless of whether a diploid individual has one or two copies of the respective allele.

1.2.12.2 Recessive Trait

A recessive trait is a trait that is only expressed if the diploid organism carries two identical alleles. A recessive trait is “latent” or non‐expressed when the individual is heterozygous, that is, if it carries one recessive allele and one dominant allele. CF is a recessive disease – both of an individual’s CFTR alleles must carry a disease‐causing mutation for the disease to manifest. Those who carry a single copy are not affected, but are often referred to as carriers.

1.2.13 Pedigree Analysis

A pedigree tree is a diagram depicting members of a family, their inter‐relations, and their disease/phenotypic status. It provides an overview of the inheritance pattern and frequency for a specific trait or disease.

Each family member is represented by a symbol: circles for females and squares for males. Filled symbols represent affected individuals, and empty ones represent un‐affected individuals, although there are many other types of symbols used as well (Fig. 1.4).

Fig. 1.4
figure 4

Explanation of pedigree symbols. Numerous symbols are used when constructing pedigrees. For diseases such as CF, carrier or affected status can be used to predict risk in future generations

When constructing a pedigree, reproduction is illustrated by connecting horizontal lines between individuals, with the resulting offspring depicted below. Each generation is assigned a Roman numeral (e.g. I, II, III) while individuals of a generation are sometimes assigned Arabic numerals (e.g. 1, 2, 3) as seen in Fig. 1.5.

Fig. 1.5
figure 5

A pedigree showing a phenotype that is passed on over three generations. In this example image we can see that individuals 2 of generation I, 4 and 8 of generation II and 2 of generation III are all affected by the phenotype being investigated. We can also see that individual 2 of generation I is now deceased. The arrow pointing at individual 10 of generation III identifies that they are the proband, or the individual for whom the genetic workup was initiated

1.2.14 Autosomal Mode of Inheritance

The autosomal mode of inheritance refers to the inheritance of genes that are present on autosomes. An autosome is any chromosome not considered as a sex chromosome, or not involved in sex determination.

For instance, a human somatic cell will normally contain 23 pairs of chromosomes (total = 46 chromosomes). 22 of these pairs will be autosomes, and only one of them will be a pair of allosomes (the X and Y chromosomes), also known as the sex chromosomes.

1.2.14.1 Autosomal Recessive

An autosomal recessive disease, such as CF, is a recessive disease linked to an autosome. The main characteristics observed for these kinds of diseases are:

  • The disease may jump generations

  • Affected individuals may be inbred

  • Males and females are equally affected

1.2.14.2 Autosomal Dominant

An autosomal dominant disease is a dominant disease linked to an autosome. The main characteristics observed for these kinds of diseases are:

  • All generations have affected individuals

  • An affected individual has at least one affected parent

  • Males and females are equally affected

1.2.15 X‐linked Mode of Inheritance

The X‐linked mode of inheritance refers to the inheritance of genes that are present on X‑, but not on Y‐chromosomes. Some examples of X‐linked traits/conditions include color blindness, hemophilia and muscular dystrophy.

1.2.15.1 X‐linked Recessive

An X‐linked recessive disease is a recessive disease linked to the X chromosome. The main characteristics observed for these kinds of diseases are:

  • The disease may jump generations

  • Males are more frequently affected

1.2.15.2 X‐linked Dominant

An X‐linked dominant disease is a dominant disease linked to the X chromosome. The main characteristics observed for these kinds of diseases are:

  • All generations have affected individuals

  • Affected males pass the disease on to all their daughters

  • Males are always affected

1.2.16 Punnett Square

A Punnett square is a visual representation of a cross. The genotypes of the parents are denoted along the top and the side of the grid. The possible genotypes of the offspring are obtained by combining the different alleles in the grid.

The Punnett square in Fig. 1.6 shows an example of a cross between a heterozygous father and a homozygous dominant mother.

Fig. 1.6
figure 6

An example Punnett square. The Punnett square shows an example of a cross between a heterozygous father (R and r) and a homozygous dominant mother (R and R). Typically an uppercase letter is used to denote a dominant allele, and a lowercase letter a recessive allele. As offspring will receive one allele from the mother and one from the father (shown around the edge of the square) it is possible to construct a diagram showing all possible variants (shown in the center of the square). In this case, there will be a 1:1 ratio of homozygous and heterozygous offspring

1.2.17 Calculating the Risk of Inheriting a Genetic Disease

In order to assess the risk of a newborn suffering a certain genetic disease, first we need to know the mode of inheritance of the disease. Once we know that, we can assess the carrier risk of the parents. To do that, we will start with the affected individual(s) in the family, and we will calculate the risk of the intermediate family members until we reach the parents. We can use a Punnett square to calculate the probabilities of carrying the different alleles, using the mode of inheritance to rule out the options that are not compatible with the disease under study. To calculate the risk, you need to know a bit of probability!

1.2.17.1 Risk Calculation of Autosomal Dominant Inheritance

In diseases following autosomal dominant inheritance, the disease is manifesting in individuals carrying only one allele with the disease‐causing mutation. These patients are heterozygous for the mutation. When one of the parents is affected by the disease each of their offspring have a 50% risk of inheriting the mutation and the disease (Fig. 1.7). An exception to this rule is in the cases where the disease displays reduced penetrance: this means that individuals carrying the disease‐causing mutation do not necessarily develop the disease symptoms, however, they may still pass on the mutation to their offspring, and they may become affected.

Fig. 1.7
figure 7

A pedigree showing autosomal dominant inheritance. In this example the mother carries both a dominant and recessive allele, and is therefore affected. Whereas the father carries two recessive alleles and is unaffected. The child will receive one recessive allele from the father, but could receive either a recessive or dominant allele from the mother. The child therefore has a 50% risk of being affected

1.2.17.2 Risk Calculation of Autosomal Recessive Inheritance

In recessive disorders the symptoms are only seen in a person if both alleles of the gene carries mutations; or, in other words, if the person carries no normal, functional allele of the gene. This means that heterozygous individuals are healthy, but if both parents are heterozygous carriers of the disease there is a risk that their offspring receive mutated alleles from both and thus become homozygous and affected. When assessing the risk that a child or fetus will be affected by an autosomal recessive genetic disease we therefore have to start by calculating the risk that each of the parents are healthy carriers. In order to do so we look at the pedigree (Fig. 1.8): do we know of any ancestors who were affected with the disease? If not, we need to know the general carrier frequency in the population, but if any ancestors were affected the risk calculation will start with them. If the parent of a person is affected the carrier risk of all offspring is 100%, as the homozygous parent in all cases will pass on a mutated allele of the gene. This of course provided that the other parent carries two normal alleles.

Fig. 1.8
figure 8

A pedigree showing autosomal recessive inheritance. Working out carrier status for recessive disorders can be difficult. In this example only the son in generation II is affected. We therefore know that both his unaffected parents in generation I must both be carriers in order for the son to receive two recessive disease alleles. As the daughter and her partner are unaffected we know that each must therefore carry a non‐disease allele, but the status of the other allele remains unknown. In the case of the daughter we can calculate her exact risk of carrying a disease allele. As she could inherit either type of allele from either parent there are four different outcomes; affected, carrier from mother, carrier from father, not a carrier. We can discount the first option, and from the remaining three outcomes two involve her carrying a risk allele, she therefore has a 2/3 chance of being a carrier. For her partner as we don’t know his pedigree we would assign what is known as the population risk. Therefore if the daughter and her partner wish to have a child its risk of being affected could be calculated as (1/2 · 2/3) · (1/2 · population risk)

If a sibling is affected, for example the son in generation II of Fig. 1.8 above, the calculation of the carrier risk in all unaffected siblings goes as follows: if one sibling is affected we know that both parents most likely are heterozygous carriers (if they are healthy), meaning that they both may pass on the mutation. The combinations of alleles that can be found in their offspring are heterozygous with a disease allele from the mother, heterozygous with a disease allele from the father, homozygous with two normal alleles or homozygous with two mutated alleles. However, if the sibling in question is healthy we can rule out the fourth combination, homozygous with two mutated alleles, as this would cause the individual to be affected (which he/she is not). This leaves three possible combinations of alleles in question for this person, and in two of these the individual carries one mutated allele, i.e. is a heterozygous carrier. In other words, if a sibling is affected the carrier risk of the healthy siblings is 2/3. Once you have assessed the carrier risk of a person you calculate the carrier risk of that person’s offspring by dividing the risk by two, reflecting the fact that each time a potential heterozygous person passes on one of that person’s alleles to a child there is a 50% chance that it is the normal allele that is passed on. As a rule of thumb, when we know the disease status we assess the risk if healthy individuals are heterozygous carriers; when we don’t know whether the child or fetus is affected, we calculate the risk that this individual is homozygous and affected. Once you have assessed the carrier risk of the parents of the child or fetus, you calculate the disease risk by multiplying the risk that the mother passed on a disease‐causing allele with the risk that the father did. If both parents are heterozygous this risk is 1/2 · 1/2 = 1/4.

1.2.17.3 Risk Calculation for X‐linked Dominant Inheritance

In this rare mode of inheritance the disease gene is located on the X chromosome, and both men and women carrying the disease‐causing mutation will develop the disease. The risk of the offspring will depend on the sex of the affected parent as well as the sex of the child. An affected woman will have a 50% risk of passing on the mutation and the disease to both daughters and sons (Fig. 1.9).

Fig. 1.9
figure 9

A pedigree showing X‐linked dominant inheritance with the mother carrying the disease‐causing mutation. In this example the mother carries an X‐linked dominant disease allele, and a non‐disease allele, and is therefore affected. The father carries a single non‐disease allele, we can therefore discount him from any risk analysis relating to the child. Therefore any children, regardless of sex, will be at a 50% risk of inheriting the X‐linked disease allele and being affected

However, an affected man will pass on the X chromosome with the mutation to all of his daughters but none of his sons, meaning that disease risk in daughters is 100% and in sons 0% (Fig. 1.10).

Fig. 1.10
figure 10

A pedigree showing X‐linked dominant inheritance with the father carrying the disease‐causing mutation. In this example the father carries an X‐linked dominant disease allele and is therefore affected. The mother carries two non‐disease alleles. As the father will always pass on his Y‐chromosome to any sons, and they will receive their X‐chromosome from their mother, male children will have no risk of being affected. However, any daughters will receive the father’s X‐linked dominant disease allele, along with a non‐disease allele from their mother, female children will therefore all be affected

1.2.17.4 Risk Calculation of X‐linked Recessive Inheritance

In diseases caused by X‐linked recessive inheritance we often see affected sons of unaffected mothers, due to the fact that a woman carrying a disease‐causing mutation in the heterozygous state is healthy (since she has one normal allele). Whereas her son, if inheriting the mutation from her, will be hemizygous for the mutation and affected (as he only has one X chromosome and this carries the mutation). The risk that the son of a heterozygous woman becomes affected is thus 50% (Fig. 1.11).

Fig. 1.11
figure 11

A pedigree showing X‐linked recessive inheritance with the sibling carrying the disease‐causing mutation. If one sibling is affected, the unaffected mother must be a carrier, thus 50% of sisters will be carriers too and 50% of brothers affected

The daughter of a woman heterozygous for an X‐linked recessive disorder will not develop the disorder (unless we have one of those rare cases where her father has the same disorder), but her carrier risk can be calculated by dividing her mother’s carrier risk by 2. If a man is affected with an X‐linked recessive disorder he will pass on the mutation to all of his daughters, but not to any of his sons, as they receive a Y chromosome from him (Fig. 1.12).

Fig. 1.12
figure 12

A pedigree showing X‐linked recessive inheritance with the father carrying the disease‐causing mutation. In this example the father carries an X‐linked recessive disease allele, and is therefore affected. The mother carries two non‐disease alleles and is so unaffected. As the daughter receives an X chromosome from each parent she will become a carrier, whereas the son will receive a Y‐chromosome from his father and a non‐disease X‐chromosome from his mother

1.2.18 Diagnosis of Monogenic Disorders

To diagnose a monogenic disorder, the first step is to draw the family pedigree. This will indicate the most likely mode of inheritance and the risk of being affected. Then, the most feasible (easiest, cheapest, most reliable) way to investigate if a patient carries a certain mutation is to analyze the sequence of the gene involved in the suspected disease (genotyping).

1.2.19 Genotyping

Genotyping is the process of identifying differences in the genotype between one organism and another, or compared to a reference gene sequence. It provides information about the alleles the parental generation has passed on to its offspring.

Typical methods of genotyping include polymerase chain reaction (PCR) followed by gel electrophoresis, or DNA sequencing. Genotyping is particularly important when studying genes and gene variants that are associated with a disease.

1.2.20 DNA Sequencing

Sequencing is a technique used for “reading” the precise order of nucleotides in a DNA fragment. Small DNA fragments, whole genes, or even genomes can be sequenced.

The first DNA sequencing technologies were based on the chain‐termination technique developed by Fred Sanger and colleagues in 1977. This method is very similar to PCR, but it involves only one primer, which anneals close to the region of interest at the 3′ end of the DNA template (Fig. 1.13). During the sequencing reaction, a mixture of normal nucleotides (deoxynucleotide [commonly nucleotides] e.g. dATP, dTTP, dCTP, and, dGTP) and “stop”‐nucleotides (dideoxynucleotide e.g. ddATP, ddTTP, ddCTP, and ddGTP) are added to the DNA template. The dideoxynucleotides lack the 3′ hydroxyl group found on normal deoxynucleotides, this therefore means no further nucleotides can be added to the chain. Each type of stop‐nucleotide is labeled with a specific color, allowing them to be identified.

Fig. 1.13
figure 13

Schematic overview of Sanger (chain‐termination) method for DNA sequencing. DNA fragments of all possible lengths are produced during the elongation cycles, each terminated with a colored stop nucleotide. When the fragments are separated according to their size on a gel, the order of colors detected will indicate the precise DNA sequence

In every cycle the target DNA is replicated by a DNA polymerase until a stop‐nucleotide is added, which stops further elongation (chain‐termination). After 35 cycles, a large number of fragments of all possible lengths are produced. These fragments are run in a specialized acrylamide gel where their length and the color of the “end‐bases” are detected. Because the fragments are separated based on their size in the gel, the labeled nucleotides are detected one by one and thus the precise DNA sequence in the fragment can be reconstructed.

The purpose of sequencing is, for example, to predict the protein sequence of a gene, to compare species on a sequence level (genes or genome), or to search for a mutation.

Importantly, while DNA sequencing of this type is still routinely performed it has in many cases been supplanted by newer technologies which are collectively known as Next Generation Sequencing. While differing in exact methodology these techniques broadly rely on sequencing many shorter DNA fragments simultaneously (sometimes in the range of multiple billions of fragments), and then using complex algorithms to computationally stitch all the fragments back together. These techniques have revolutionized life sciences by allowing multiple entire genomes to be sequenced in days, rather than months or even years.

1.2.21 Polymerase Chain Reaction

PCR is a method used to prepare billions of copies of specific DNA sequences, or in other words, to amplify a DNA sample. It is often necessary to have a larger number of copies of a specific DNA sequence found in a typical sample for further DNA analysis (for example, for DNA fingerprinting or genotyping).

The PCR is highly specific, meaning that it will only produce copies of the desired sequence from the template (sample) DNA. This specificity is ensured by the primers, which are designed to be complementary to the template strand and anneal to specific regions on each side of the DNA region of interest (target region).

All cycles are performed without intervention in a PCR machine, also called thermocycler, which can be programmed to change the temperature automatically after each step. By the end of one cycle, parts of the initial DNA strands will have doubled in number (Fig. 1.14). Therefore, after 30 cycles, at least one billion copies of the target sequence will be present in the tube.

Fig. 1.14
figure 14

PCR experiment: PCR consists of three steps: 1. Denaturation, 2. Annealing, and 3. Extension. The steps are repeated many times (often up to 30 cycles), producing billions of DNA copies of specific regions

For a more thorough review of PCR please see publication “Labster Virtual Lab Experiments: Basic Genetics” (Stauffer S et al. (2018) Labster Virtual Lab Experiments: Basic Genetics, 1st edn. Springer, Heidelberg, ISBN 978‐3‐662‐57999‐2).

1.2.22 PCR Experiment

To prepare billions of DNA copies, many repeated cycles of DNA synthesis are performed in one PCR tube. Each cycle includes three distinct steps defined by the temperature (Fig. 1.14):

  1. 1.

    Denaturation step (95 ºC): At this high temperature, the hydrogen bonds holding together the two DNA strands are broken, and the DNA strands fall apart. The single‐stranded DNA template is now available for copying.

  2. 2.

    Annealing step (5–10 ºC below the primer with the lowest melting temperature [Tm], generally 50–65 ºC): At the annealing temperature short DNA pieces called primers bind at complementary sites of the template DNA. The primers define the target sequence, which is the specific region of DNA that will be copied. The annealing temperature is calculated from the primer composition. Typically the optimal annealing temperature for each primer is calculated.

  3. 3.

    Extension step (72 ºC): At 72 ºC, an enzyme called DNA polymerase is responsible for copying DNA. It recognizes the 3′ end of a primer bound to a template strand and starts copying the template DNA in the 5′ direction.

1.2.23 Primers

Primers are short fragments of DNA or RNA used to start DNA synthesis by a DNA polymerase. They are typically 18–25 nucleotides in length and will bind (anneal) to a complementary region of single‐stranded DNA, called the template strand. They mark the point where DNA synthesis will begin. When a primer is bound, the polymerase can also bind to the DNA at the 3′ end of the primer and then copies the DNA strand in a 3′ → 5′ direction.

By using specific primers PCR can be used to localize mutations within genes. For example, we can use primers which will amplify regions of the CFTR gene in order to detect which CFTR mutation is present.

1.2.24 PCR Mutation Analysis

The result of a PCR experiment is billions of copies of the DNA region flanked by the primers. The size of the amplified fragments is determined by the primers (Fig. 1.15). When certain CFTR mutations are present the products formed will be of different sizes, allowing researchers to determine which mutations are present. To visualize these fragments other techniques, such as gel electrophoresis, need to be utilized.

Fig. 1.15
figure 15

Examples for different PCR product lengths. Following a PCR experiment, the PCR product A will be 200 bp, PCR product B will be 900 bp long, C will be 700 bp, and D will be 400 bp long. The length of the PCR product is defined by the position at which forward and reverse primer bind to the target DNA sequence

1.2.25 Gel Electrophoresis

Gel electrophoresis is a widely used method to separate charged macromolecules (DNA, RNA, or proteins) of different sizes and to estimate their molecular size. It is based on the principle that once an electric field is applied, negatively charged macromolecules are attracted to the positive pole, and separate according to their length in a matrix gel, such as acrylamide. Shorter macromolecules pass more easily through the pores of the matrix and so travel further than longer macromolecules. This technique is often used to separate DNA or RNA molecules, for example, in the case of DNA profiling or to study RNA integrity.

Gel electrophoresis is also commonly used to separate PCR‐amplified DNA fragments or to isolate and extract DNA fragments of a specific size.

1.2.25.1 Capillary Electrophoresis

Capillary electrophoresis (CE) is a type of electrophoresis that is performed in submillimeter diameter capillaries and in micro‑ and nanofluidic channels.

1.2.26 Protein Truncation Test

A protein truncation test is a powerful method to evaluate DNA mutations that result in protein truncation in vitro. Such mutations are often the most severe in CF as they result in a complete lack of protein.

So‐called nonsense mutations result in a premature stop codon. This means that the resulting protein will be terminated earlier than it is supposed to and can be detected using a protein truncation test. With a protein truncation test, we do not need to use an animal as a model system to synthesize the protein; instead, we have an in vitro system where the resulting protein can be synthesized without the need for living cells.

1.2.27 Protein Truncation Experiment

The protein truncation test is composed of four major steps (Fig. 1.16):

  • Nucleic acid isolation; either genomic DNA, total RNA, or poly‐A RNA.

  • Amplification of a specific region of the gene of interest using PCR. In this step, a start codon (ATG) is added to the 5′ end of the amplified DNA, to allow for transcription.

  • In vitro transcription and translation of the product. If a nonsense mutation is present the protein produced will be truncated compared to the protein produced from the “normal” gene.

  • Proteins are detected using gel electrophoresis. A truncated protein will be smaller and will therefore run further into the gel.

Fig. 1.16
figure 16

A schematic diagram showing the different steps of the protein truncation test. Isolated nucleic acids are tagged with an ATG start codon and then transcribed and translated in vitro. The resulting protein products are then separated by a form of gel electrophoresis known as SDS‐PAGE (sodium dodecyl sulfate polyacrylamide gel electrophoresis). If a nonsense mutation is present the translation will terminate early, resulting in a truncated protein. This will be detected on the gel as the truncated protein will run further

1.2.28 Protein Truncation Analysis

Protein truncation tests are often used to analyze disease‐related mutations, for example, in CF where certain mutations result in a nonfunctional protein. From protein truncation tests, we can conclude whether there is a mutation in a particular part of the gene that results in protein truncation; however, this does not provide any information regarding where exactly the mutation is located and what the associated DNA sequence is (for example, it could be a substitution creating a stop codon or a deletion resulting in a frameshift and then a stop codon or others).

1.3 Let’s Get Started

Phew, that was a lot of content. But now you should know all about CF and the wide variety of tools that scientists use to detect the disease and its associated mutations, and then predict the risk for future generations. Will you be able to use all this knowledge to inform a young couple about the risk of their unborn child having CF?

Techniques Used in the Lab

  • PCR

  • Gel electrophoresis

  • Capillary electrophoresis

Learning Objectives

At the end of this simulation, you will be able to …

  • Describe the basic concepts of inheritance

  • Build and interpret a pedigree based on family data

  • Understand genetic risk assessment and counseling

  • Gain insight into the vital work a genetics laboratory performs

ACCESS THE VIRTUAL LAB SIMULATION HERE www.labster.com/springer BY USING THE UNIQUE CODE AT THE END OF THE PRINTED BOOK. IF YOU USE THE E‐BOOK YOU CAN PURCHASE ACCESS TO THE SIMULATIONS THROUGH THE SAME LINK.