1 Introduction: The notion of causality in molecular disease goes back to Pauling

Genetic diseases, diseases caused by changes in DNA sequence, affect millions of people worldwide. Of these diseases, ten thousand are estimated to be monogenic, in which there is modification in the DNA sequence of only a single gene. Some of these monogenic diseases include Huntington’s disease, Tay sachs disease, cystic fibrosis, and sickle cell anemia (World Health Organization 2016). With these diseases, a change in the genotype leads to a manifest change in the organism. The cause and effect can readily be seen. However, we want to delve deeper into how a specific cause can lead to a particular effect. With many genetic diseases, the answer to this question is not yet elucidated. Not only is it human nature to be curious about underlying mechanisms of disease, but comprehending these mechanisms will also provide ideas for prevention and treatment. In this review, we discuss the causal links between molecular disease and developing phenotypes on the level of the whole organism.

To illustrate the connections between causality and mechanisms, consider sickle cell anemia, a monogenic disease that is detected in approximately 300,000 newborns every year (Piel 2016). This disease causes patients to experience painful swelling in the hands and feet, acute pain in the chest, and fatigue. Reports detailing these symptoms in African children have been found in the medical literature starting in the 1800s (Ebrahim et al. 2010). The first patient with these symptoms was observed in the United States in 1910 (Herrick 1910). James B. Herrick, a physician from Chicago, and his intern, Ernest Irons, were examining blood samples from a man from Grenada (Savitt and Goldberg 1989). The blood samples revealed that the blood cells of the patient were elongated and had a shape similar to that of a sickle, preventing them from carrying oxygen to the rest of the body. Deeper understanding of the disease occurred several decades later. In 1949 Ernest Beet and James V. Neel, a military doctor and geneticist respectively, each published articles describing the genetic nature of the disease (Beet 1949; Neel 1949). During these years, many other laboratory findings contributed to understanding the asymptomatic form of the disease and found the disease is Mendelian (Strasser 2002). Indeed, most individuals with this disease were found to be of African descent (Strasser 2012).

It was also in 1949 that Linus Pauling was able to show that sickle cell disease is molecular in nature (Pauling et al. 1949). Pauling had studied hemoglobin in previous years, interested in how oxygen binds to the iron in the heme group of a hemoglobin molecule (Eaton 2002). He was fascinated by sickle cell disease and was determined to find the cause. After discerning that normal and sickled hemoglobin have different electrophoretic mobilities, he thought that there may be a change in the types of charged amino acids in the sickled hemoglobin, but proof of this could not be found at the time (Pauling et al. 1949). Instead, Pauling and several other scientists believed that it must be the conformation of the folded protein that resulted in different function (Strasser 2002, 2006). Pauling was a proponent of the hypothesis that specificity between two molecules was the result of complementarity of the shapes of the molecule rather than the chemical makeup (Strasser 2006). In 1956, Vernon Ingram finally proved that sickled hemoglobin had a different amino acid sequence than normal hemoglobin. He used a new technique to separate and analyze small peptides of digested proteins. One negatively charged glutamate residue in each of the two beta chains of hemoglobin is switched to an uncharged valine residue in sickled hemoglobin (Ingram 1956, 1957). In addition to predicting a difference in the amino acid content of the two hemoglobin molecules, Pauling also asserted in his publications that aggregation of sickled hemoglobin is what causes the disease. Although he did not know the exact structure of hemoglobin at the time, he surmised that the shape of the sickled hemoglobin was different such that it could not properly bind oxygen (Pauling 1952). This was a brilliant insight.

Figure 1 summarizes the path from cause to effect that was revealed over 50 years of research. Hemoglobin is a tetramer, made up of two alpha-chains and two beta-chains. We now know that sickle cell anemia is caused by a mutation in the β-globin gene, known as Hb S. This mutation results in a variant form of β-chain of hemoglobin that polymerizes in red blood cells after deoxygenation (Ahsley-Koch et al. 2000; Stuart and Nagel 2004). The rope-like fibers of hemoglobin that cause the red blood cells to become distorted into a crescent or sickled shape. This results in anemia due to the defective shape of the blood cells unable to carry oxygen and causes obstruction of the vasculature and poor circulation (Ahsley-Koch et al. 2000; Stuart and Nagel 2004). Lack of oxygen delivery to the rest of the body causes fatigue, and rupturing of blood cells due to irregular shape of hemoglobin leads to swelling in the limbs. Individuals with this disease have a high mortality rate, usually dying from infections of the upper respiratory tract or by gastroenteritis (Manci et al. 2003). Other causes of death included stroke and pulmonary embolism (Manci et al. 2003).

Fig. 1
figure 1

Etiology of sickle cell anemia. Hemoglobin is a tetramer, made up of two alpha-chains and two beta-chains. A point mutation in the hemoglobin gene causes a change in amino acid sequence of hemoglobin. Blue beta-chains are found in normal hemoglobin. Green beta-chains represent mutated chains found in sickled hemoglobin. Mutated hemoglobin molecules aggregate and form long, rope-like structure within cells. This causes cells to deform into a sickle-like shape. Sickled red blood cells cannot easily flow through blood vessels, leading to blood flow obstruction and symptoms such as fatigue, chest pain, and swelling in the hands and feet. (Color figure online)

Scientific work from the early twentieth century was essential for diagnosing and treating sickle cell anemia. It paved the way for prenatal screenings for the disease in the 1970s by analyzing samples from the umbilical vein. By the 1980s, bone marrow transplants were being performed and were successful in treating sickle cell disease by replacing blood progenitors that could lead to sickled hemoglobin (National Institute of Health 2010). Scientists also found that increased production of fetal hemoglobin in the blood could reduce pain and the frequent need for blood transfusions. Fetal hemoglobin contains two gamma-globin chains instead of two beta-globin chains, the chains known to carry the mutation. The only existing cure for the disease is based on transplantation of hematopoietic progenitor cells from a healthy donor (Hsieh et al. 2009). These advances in treatment could only be created by knowing the underlying mechanisms of sickle cell disease.

2 Most diseases are different; they are caused by changes in network states

Research of sickle cell disease has laid the groundwork for solving the mystery of other genetic diseases; however, understanding certain classes of monogenic diseases is not always so simple. Sickle cell disease involves a mutation in a single protein that interacts only with oxygen or itself; it lacks interactions with any other proteins. Moreover, hemoglobin is only found in blood cells, which are repeatedly renewed in the bone marrow. Challenges arise when mutations occur in genes that encode proteins that have multiple interactions in larger networks of proteins, such as in signaling networks. Furthermore, these mutated proteins are typically expressed in several tissues and organs of the body, instead of just one cell type. In this section, we discuss a class of monogenic diseases that lead to changes in such protein networks and the ramifications of this in terms of studying and treating disease.

Protein signaling networks underlie how cells communicate with each other. Signaling is responsible for differentiation of cell fates, apoptosis, and cell proliferation, which are all important processes for sustaining life. Signaling also plays a central role in development and in cancer (Lim et al. 2015b). Signals can be of different types, including chemical, mechanical, or electrical. We will focus on biochemical signals between cells. In order for cells to communicate, there must be an input and an output. Inputs usually arise from changes in the chemical composition of the surrounding environment. The output can be a change in gene expression, morphology, metabolic state, etc. Input and output are perceived by interactions of proteins (Fig. 2a). In the case of biochemical signals, one cell releases a molecule known as a ligand, that can bind to a receptor on the surface of a neighboring cell. This causes a sort of domino effect within the cells. Ligand binding is similar to pushing the first domino, and the following dominos are the series of proteins that can interact one after the other inside the cell. The final domino is what ultimately informs the cell to divide or grow or die. It is usually a protein that controls gene expression or morphology (Lim et al. 2015b). Of course, this analogy has some shortcomings. The cascade of protein interactions is not usually linear. Most signaling pathways contain many feed-forward or feedback loops and cross-talk between proteins of other signaling pathways (Cirit et al. 2010; Kolch 2000; Mendoza et al. 2011; Shin et al. 2009). These interactions add yet another layer of complexity to understanding diseases in signaling networks. For simplicity, we will not focus on these other mechanisms of pathway regulation.

Fig. 2
figure 2

a Simplified representation of a signaling cascade. This panel shows a generic sequence of proteins and their activating interactions within a cell. b A specific signaling cascade, known as the Ras/ERK pathway. Individual proteins and their interactions are shown. In both panels, arrows indicate activation. (Color figure online)

One of the major principals of chemical signal transmission is protein activation and inhibition. In other words, proteins can act like switches that turn on and off. Proteins are usually switched on by post-translational modifications that cause a conformational change. A common modification is protein phosphorylation, or addition of a phosphate group to an amino acid residue on the protein. Proteins that can phosphorylate other proteins are known as kinases. Proteins can be turned off by being dephosphorylated by proteins called phosphatases. Most of the time, a protein that is turned on by a kinase can then act as a kinase and turn on a following protein. In this way, a network of proteins is merely a network of on/off switches. This also means that the series of switches can be reset and turned on and off again.

Figure 2b illustrates a concrete example of these signaling networks. The representative example shown here is of the Ras/ERK pathway. The Ras/ERK pathway is known as a mitogen-activated protein kinase signaling network. A growth factor or other activating ligand binds to receptor tyrosine kinases (RTK) on the surface of the cell. Binding of ligand to receptor leads to biochemical modification in the tails of the receptor. Like the domino effect described above, this leads to a series of protein interactions: Grb2 recruits a protein called Son of Sevenless (SOS), which switches on Ras. When Ras is turned on, it begins a three-tiered cascade of kinases: Raf switches on MEK, which switches on ERK by phosphorylation. Raf and MEK are highly specific for their substrates, unlike ERK, which has many substrates in the cytoplasm and in the nucleus of the cell. After being phosphorylated, ERK can enter the nucleus and interacts with targets like transcription factors in order to regulate the expression of particular genes (Lemmon and Schlessinger 2010; Lim et al. 2015b; Mccubrey et al. 2007; Mendoza et al. 2011).

Networks of proteins with many interactions are necessary to ensure that the cell has a correct output for a given input. Sometimes, information provided to a cell needs to be amplified to get the correct output, which is more efficiently done with networks. Furthermore, a response needs to be robust such that small perturbations to the system will still result in a signal that is interpreted properly. All of these requirements are met through interactions of different proteins. Specifically, interactions between kinases, phosphatases, and substrates are important in turning the pathway on and off. When any of these proteins are mutated, regulation falls apart. Pathways can become stuck in the on or off position, leading to the formation of cancer or developmental defects.

One set of diseases that results from a mutation in the Ras/ERK signaling network are known as RASopathies. The RASopathies are developmental diseases caused by germline mutations in the Ras/ERK pathway, such as the Noonan Syndrome and Legius Syndrome (Jindal et al. 2015; Rauen 2013; Tidyman and Rauen 2009). Each of these syndromes has mutations in different core components of the Ras/ERK pathway, but individuals with these syndromes display many of the same developmental defects, such as cardiac defects, craniofacial dysmophology, and skin and hair abnormalities (Jindal et al. 2015). For instance, the Costello syndrome is a RASopathy that results from a gain-of-function mutation in Ras. This single mutation leads to musculoskeletal abnormalities, neurocognitive delay, and formation of papilloma tumors on the skin (Aoki et al. 2005). Unlike sickle cell anemia, the causal chain of events that leads from mutation in a gene (cause) to craniofacial deformities and skin abnormalities (effect) is still unknown. For instance, it is uncertain which cells of the organism are affected in such a disease. Cells that ordinarily receive signal may receive a larger than normal input. On the other hand, it is possible that signals are being relayed to cells that do not usually receive them. It is unclear how atypical signaling affects individual cells, and how this effect translates into altered function on the scale of the whole organism. Without this information, it is difficult to develop appropriate treatments.

Efforts have been made to develop small-molecule drugs that modulate or inhibit the pathway in some way. For instance, simvastatin is a 2-hydroxy-3-methylglutamyl coenzyme A reductase inhibitor that interferes with Ras activity. Although researchers had hoped this drug would improve the cognitive function of children with the RASopathy known as neurofibromatosis type 1 (NF1), clinical trials did not show a significant change (Rauen 2013). Other treatments have been shown to be a bit more successful. MEK inhibitors typically used in the treatment of cancer have somewhat ameliorated developmental defects when distributed early enough in development by delaying the formation of neurofibromas in NF1 patients (Jindal et al. 2015; Jousma et al. 2015; Rauen 2013). Nonetheless, these treatments merely improve some of the symptoms of the RASopathies. By understanding the mechanisms that drive genotype to phenotype in these diseases, better treatments may become available. It remains to be seen whether we will be able to understand diseases like the RASopathies on the same level that we understand sickle cell anemia.

3 Methods of studying perturbations in a signaling network

While it is unrealistic to study the RASopathies and other germline mutations within human embryos, model organisms provide insight into how mutations may affect signaling networks. This is possible because most signaling pathways are conserved across species. The Ras/ERK pathway and pathways similar to it are conserved, meaning homologues of proteins within these pathways exist in most model organisms (Kolch 2000; Widmann et al. 1999). Drosophila melanogaster has many advantages for studying signaling pathways. Fruit flies have short life-cycles to support high-throughput and large embryo collection at a given time, providing data for quantification. Signaling events can easily be tracked across space and time in the embryo. It is also relatively easy to genetically manipulate flies and to create stable mutant lines (Jindal et al. 2015; Widmann et al. 1999). Lastly, Drosophila can also be used to connect mutant genotypes with mutant phenotypes.

To illustrate the usefulness of model organisms in understanding the links between mutations in signaling enzymes and emerging phenotypes, consider sevenmaker, a mutation found in the Drosophila homologue of mammalian ERK. ERK is the terminal kinase in the Ras/ERK pathway. It is unique in that it can only be activated by a specific molecule (MEK), but is capable of binding to hundreds of substrates (Lemmon and Schlessinger 2010). Brunner et al. discovered the sevenmaker mutation in 1994 when trying to determine if activation of ERK is both necessary and sufficient to activate signaling pathways. This was done by looking at mutant phenotypes in Drosophila eye development, which involves Sevenless. Sevenless is a receptor (RTK) upstream of ERK and is involved in development of the eye of the fruit fly. Signaling through the Sevenless pathway is responsible for specification and differentiation of photoreceptor cells. In the absence of signaling, the fly eye lacks the seventh photoreceptor cell (R7) (Biggs et al. 1994). If the pathway is over-active, multiple R7 cells develop (Fig. 3a).

Fig. 3
figure 3

Signaling events in Drosophila. a Taken from Biggs et al., The EMBO Journal, 1994. The left panels show wild-type Drosophila eyes. The right panels show Drosophila with a constitutively active signaling pathway. Bottom panels show eye cross sections arrows pointing to photoreceptor cells. Reproduced with kind permission of John Wiley and Sons. b The ERK common docking motif (cyan) binds to kinases, phosphatases, and substrates. (Color figure online)

This sevenmaker mutation, found to be ERKD319N, substitutes a negatively charged aspartic acid residue for a polar, uncharged asparagine residue. The mutation occurs in the common docking (CD) motif of ERK on one of the docking sites known as the D-site (Fig. 3b). This docking site is responsible for binding to MEK, phosphatases, and downstream substrates of ERK (Brunner et al. 1994; Futran et al. 2015; Tanoue et al. 2000). This implies that mutant phenotypes may rely on a very specific balance of activators, deactivators, and substrates that bind to this region. In the presence of a mutation, an activator may bind more preferably than a deactivator. The major enzymatic reactions that ultimately determine phenotype change due to mutation.

Brunner et al. also found that this mutation is involved in the development of multiple tissues at different times, not just in the Drosophila eye. More specifically, it affects the Torso pathway in Drosophila development. Torso is an RTK that signals through ERK and is responsible for correct formation of the head, tail, and abdominal structures of the embryo. It is found throughout the embryo, but is only activated at the anterior and posterior poles where ligand is preferentially expressed. Developing embryos with the ERKD319N mutation had head and tail structures that developed normally, but lacked all abdominal segments. This phenotype is similar to those that result from mutations in the Torso receptor that make it constitutively active. This suggests that ERKD319N also results in constitutive activation of the pathway, even in the absence of ligand. A separate study on the sevenmaker mutation by Bott et al. in 1994 proposed a different mechanism. These scientists studied ERKD319N in mammalian cells and found that in the absence of ligand, this mutant ERK did not activate downstream substrates (Bott et al. 1994). Therefore, it is not likely that this mutation causes ERK to be inherently active, as was previously proposed. Instead, Bott et al. proposed a mechanism in which ERKD319N is less sensitive to being inactivated by phosphatases. In this case, the mutation would be considered a loss-of-function mutation in regards to binding of phosphatases, MEK, and other substrates.

These studies from the 90s have shown that Drosophila embryos can help connect mutant genotypes to structural phenotypes. Furthermore, in vitro studies from the 90s have shown how particular molecules like ERK may become activated or inactivated to elicit a certain response. On their own, these results are informative. However, our current gap of knowledge exists in combining these two types of studies. Can we use fly embryos to look at morphogenic effects, while also looking on a molecular level at when and where ERK is being activated? Today’s technology allows us to fix and stain embryos at different times during development to see exactly when and where particular mutant proteins are active (Lim et al. 2015a). Furthermore, other techniques can be used to determine when particular genes are being expressed. Using many embryos to perform these experiments can provide quantitative data about signaling on a high spatial and temporal level. Very recently, Optogenetics, or the use of light to activate proteins, has also been utilized to determine how activation of the Ras/ERK pathway within embryos effects development and morphogenesis (Johnson et al. 2017). Even better than fixing and staining, optogenetics has higher control over protein activation and can push activation to limits not seen with mutant proteins. While many studies have shown how mutant proteins behave in a test tube, it is non-trivial to predict how these same mutations will interact in the context of a living organism. Only recently have studies been conducted to show how activating mutations may not necessarily be “active” everywhere within the embryo (Goyal et al. 2017; Jindal et al. 2017). The consequences that this may have on development are unknown and not easily predictable.

The example of the sevenmaker mutation highlights the complexity of signaling networks and the importance of protein interactions. In order to understand the phenotypes that result from mutant genotypes, interactions between kinases, phosphatases, substrates, and scaffold proteins must be comprehended. These interactions can be better understood by observing them in vivo using quantitative assays, as in the early Drosophila embryo. These experiments can describe when and where interactions become important and can better inform mathematical models that try to reconstruct the pathway. This data is generally informative, and helps provide a better understanding of signaling networks as a whole. It also holds significance in learning how to treat diseases with these mutations. In 2005, a mutation in ERK known as ERKE320K was discovered in a head and neck squamous cell carcinoma cell line (Arvind et al. 2005). This mutation occurs in the amino acid residue directly adjacent to the sevenmaker mutation. Learning more about the sevenmaker mutation may help inform treatments for this particular carcinoma and for other diseases that may involve mutations in the common docking domain of ERK.

4 Concluding remarks

Advances are continually being made in the realm of biotechnology and engineering, providing new and improved tools to treat genetic diseases. In fact, CRISPR/Cas9, a recently developed genome-editing tool, has been proposed as a way to cure sickle-cell anemia (Tasan et al. 2016). The key is in editing the genome such that fetal hemoglobin is produced in addition to adult hemoglobin. Fetal hemoglobin replaces the two beta-globin chains (the chains containing the mutation) with two gamma-globin chains. Therefore, fetal hemoglobin is unaffected and blood cell sickling is reduced (Tasan et al. 2016). The enhancer region of the BCL11A gene is known for repressing the production of fetal hemoglobin after birth (Canver et al. 2015). Using CRISPR to permanently disrupt this enhancer region in the DNA promotes the production of fetal hemoglobin and replacement of the mutated beta-chains with healthy gamma-chains. The last century was essential in understanding the causality of sickle cell anemia in order to use these new tools to solve the problem. This provides hope that one day we will further understand complex signaling networks and the genes that control them. Perhaps similar candidate genes will be discovered that can be used in conjunction with gene-editing tools to regulate these systems when disturbed. For now, model organisms will continue to be useful instruments for decoding the signaling networks that play such a large part in development and disease.