Information-Theoretic Approaches to Understanding Stem Cell Variability

Smith, Rosanna C.G.; MacArthur, Ben D.

doi:10.1007/s40778-017-0093-5

Information-Theoretic Approaches to Understanding Stem Cell Variability

Mathematical Models of Stem Cell Behavior (M Kohandel, Section Editor)
Published: 13 July 2017

Volume 3, pages 225–231, (2017)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Current Stem Cell Reports Aims and scope Submit manuscript

Information-Theoretic Approaches to Understanding Stem Cell Variability

Download PDF

Rosanna C.G. Smith^1,2 &
Ben D. MacArthur^1,2,3

348 Accesses
2 Citations
1 Altmetric
Explore all metrics

Abstract

Purpose of Review

The purpose of this study is to outline how ideas from information theory may be used to analyze single-cell data and better understand stem cell behavior.

Recent Findings

Recent technological breakthroughs in single-cell profiling have made it possible to interrogate cell–cell variability in a multitude of contexts, including the role it plays in stem cell dynamics. Here we review how measures from information theory are being used to extract biological meaning from the complex, high-dimensional, and noisy datasets that arise from single-cell profiling experiments. We also discuss how concepts linking information theory and statistical mechanics are being used to provide insight into cellular identity, variability, and dynamics.

Summary

We provide a brief introduction to some basic notions from information theory and how they may be used to understand stem cell identities at the single-cell level. We also discuss how work in this area might develop in the near future.

Quantitative Approaches to Heterogeneity and Growth Variability in Cell Populations

Single-cell technologies sharpen up mammalian stem cell research

Article 01 October 2014

Statistical mechanics meets single-cell biology

Article 19 April 2021

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Introduction

Stem cells are characterized by their ability to self-renew and differentiate along multiple distinct lineages. Due to these remarkable properties, there is much hope for stem cell-based therapies in regenerative medicine. However, the development of such therapies will require a thorough understanding of the molecular mechanisms by which stem cells balance self-renewal and differentiation. Since stem cells are often rare (as in the adult) or exist only transiently (as in development), recent years have seen a growing focus on using single-cell profiling technologies to understand stem cell dynamics. These studies have indicated that apparently functionally homogeneous stem cell populations can vary widely in their expression of important regulators of self-renewal and multipotency. In some cases, this variability is driven by dynamic fluctuations of important master transcription factors, suggesting that stem cell heterogeneity has an important functional role [1, 2, 3••]. However, the relationship between molecular heterogeneity and stem cell function are still not well understood.

Recent years have seen remarkable advances in single-cell sequencing techniques, and it is now possible to profile large portions of the genome, or the entire transcriptome, in hundreds to thousands of individual cells in a single experiment [4••, 5, 6]. Advances in single-cell epigenetics and proteomics are not far behind [7,8,9,10]. These advances promise to transform our understanding of cellular identities, yet they also produce vast amounts of complex data, making it a significant challenge to distinguish meaningful biology from experimental noise. In the context of stem cell dynamics, numerous reports have indicated that functionally homogeneous stem cell populations, both from the adult and the embryo, are highly heterogeneous with respect to their patterns of gene and protein expression [11,12,13,14,15]. However, the extent to which this variability plays a functional role, and the extent to which it represents variability due to inherent, but non-functional, expression noise are not clear. Therefore, in order to understand stem cell function at the individual cell level, it has become increasingly necessary to use high-throughput profiling techniques to explore co-expression dynamics at the single-cell level to identify rare (yet potentially functionally important) cells and determine how co-expression patterns change over time. The data provided by these experiments are fundamentally different from those obtained from measurements on cellular aggregates. While bulk methods typically provide estimates of the mean expression of each variable (e.g., gene) profiled over all cells in the aggregated sample (perhaps along with estimate of variance when the sample mean of multiple replicates are taken), they are not generally well suited to exploring dependencies between variables because they are only capable of examining expression patterns on average, not within individual cells. By contrast, since single-cell methods profile co-expression patterns within individual cells they are able to provide a sample from the joint distribution of all the variables being profiled and so are much better suited to explore functional relationships between variables. Importantly, recent years have seen significant improvements in the efficiency of single-cell RNA-sequencing methods, which now allow profiling of many tens of thousands of individual cells thereby improving estimates of joint expression distributions [6, 16•, 17•]. The experimental progress made in capturing multivariate single cell data has also stimulated research into new analysis techniques that are specifically designed to handle high-dimensional single-cell data [18, 19]. These new analysis methods often make use of classical multivariate statistics and statistical approaches have provided insight into many stem cell systems including identification and characterization of mixtures of cellular states [20], comparison of different stem cell lines [21], rare cell identification [22], and cell lineage decision-making [23]. However, methods from information theory are increasingly also being used to better understand how cellular expression patterns determine cellular identities.

Information Theory

Information theory has its roots in Shannon’s work on communication and his famous 1948 paper laid out the mathematical theory of information [24, 25]. Shannon realized that in order to quantify the information content of a message, it is necessary to consider the message’s context, or how probable it is. An intuitive understanding of this can be seen in the following example. Consider a search for this article using only the last name of one of the authors. Which one is it best to choose? The knowledge that “Smith” is a very common last name and “MacArthur” is less common means that searching for “MacArthur” is more likely to narrow the search and therefore likely to provide more information. The fact that “MacArthur” is a more complex word than “Smith” is irrelevant: it is the rarity of each name that dictates which to choose, not the name itself. In the context of gene expression, the fact that a cell has seven transcripts of a particular messenger RNA (mRNA) does not in itself carry any information: this observation requires context in order to understand how much information is gained from the measurement. Without the context of how likely a read of seven transcripts is, the information gained from the measurement is unknown (colloquially this is known as Shannon’s zeroth law). So how do we calculate information gain? Shannon argued that any measure of information should satisfy three basic requirements: monotonicity, independence, and branching. Monotonicity ensures that the information gained from a question with a wide variety of answers is greater than the information gained from the answer to a question with only a few possible answers. For example, to identify a specific person an answer to the question “where do they come from?” provides more information than an answer to the question “are they female?” Independence ensures that the total information gained from two independent questions is a sum of the information gained from the questions separately. So, for example, the order in which the questions are asked should not matter. Lastly, branching ensures that when a series of questions is composed in a tree-like structure, the overall information gained by passing along a path through the tree is a weighted sum of the information gained from each branch point [25].

Shannon proved that the following function, which he called the entropy by analogy to the closely related thermodynamic entropy, uniquely satisfies these conditions. The Shannon entropy H is the expected amount of information gained from answering a question for which the probability of answer x is given by p(x),

$$ H(X)=-\sum_x p(x)\mathit{\log} p(x). $$

The entropy is a property of the probability distribution p(x), in the same way that the mean and variance are properties of p(x). Informally, the entropy is simply a measure of how “flat” or close to uniform p(x) is, and the “flatter” a distribution is, the greater the entropy and information gained. The units of entropy depend on the base of the logarithm: when the logarithm is taken to the base 2, as is common in information theory, entropy is measured in bits (one bit is the amount of information provided when observing one of two equally likely outcomes, e.g., the flip of a fair coin). Alternatively, entropy is measured in nats when using the natural logarithm (as is typically the case in statistical mechanics), and in hartleys when using base 10 (one hartley is the amount of information provided when observing one of 10 equally likely outcomes, e.g., a uniformly randomly chosen decimal digit). The equation for the entropy given above assumes that the random variable X is discrete. In practice, many measures of interest, such as molecular concentrations, are continuous and the continuous analogue to the entropy above is known as the differential entropy [26]. In the discrete case, the entropy has some useful properties (for example, H(X) ≥ 0) that are not inherited by the differential entropy. To account for these differences, several closely related variations such as the Kullback-Leibler divergence (also known as the relative entropy) and its generalizations are commonly used to assess similarity between continuous expression distributions [27, 28]. For example, the widely used t-SNE dimensionality reduction algorithm [29] (which has been used in several recent stem cell studies to explore heterogeneity in stem cell identities and cluster cell states [16, 17•, 22, 23]) uses the Kullback-Leibler divergence to assess the similarity between the observed co-expression distribution and that obtained by projecting the data to a lower-dimensional space.

Information Theory and Stem Cell Biology

The utility of the entropy in understanding cell identities many be illustrated by returning to our example of the measurement of seven mRNA transcripts in a cell. To gain context to this reading, we need to better understand the natural variability of mRNA expression in the cell population of interest to determine how unusual this reading is. Consider the following two hypothetical scenarios for mRNA expression in a population of stem cells, as shown in Fig. 1a: (Scenario 1) all cells in the population have seven mRNA transcripts (i.e., 7 is the only answer to the question how many transcripts are in the cell? and occurs with probability 1). In this case, since all cells are the same with respect to their transcript counts, the observation of seven transcripts cannot be used to discriminate one cell from another, and therefore does not impart any information. Accordingly, the entropy is H = − 1 log(1) = 0 bits. (Scenario 2) Two stem cell subtypes are present in the population (types A and B). Cells of type A occur with probability 0 < p < 1 and have seven transcripts, while cells of type B occur with probability (1 − p) and have zero transcripts. In this case, the observation of seven transcripts allows us to positively discriminate cells of types A from those of type B and so imparts useful information. Furthermore, the amount of information we gain is related to the relative rarity of types A and B. In particular, the entropy is given by, H = − p log(p) − (1 − p) log(1 − p). Thus, when p is small, the observation of seven transcripts in a cell is a rare event, but the observation of zero transcripts is a common event and so the entropy is low. Conversely, when p is large the observation of seven transcripts is a common event while the observation of zero transcripts is a rare event and again the entropy is low. However, when cells of both types are common in the population (i.e, if p ~ 0.5) then the entropy reaches its maximum. In this example, it is worth noting that the fact that cells of type B express no transcripts is not relevant to the calculation of entropy, they could have expressed any number of transcripts not equal to seven: all that is important is that cells of type B can be distinguished from those of type A by their mRNA transcript count.

In practice, we would not expect that all cells express a given mRNA at one of two fixed levels; rather, intrinsic noise in gene expression naturally gives rise to variations in gene expression levels over time within each individual cell, and within the cell population at any fixed time (see Fig. 1b). While it cannot often be calculated explicitly as above, the entropy can nevertheless be estimated from experimental data to better understand this natural variation (it should be noted that entropy estimation is subject some technical issues including the effect of data binning and bias on entropy estimation [30,31,32]). For example, it has been suggested that a high degree of cell–cell variability in gene expression patterns within a functionally pure population, as quantified by the entropy of the joint expression distribution, is characteristic of undifferentiated pluripotent cells [33, 34•, 35]. Similarly, by considering patterns of gene expression in light of a known signaling networks, Teschendorff and colleagues have argued that both pluripotent cells and cancer cells are associated with a state of high network entropy, characterized by the promiscuous co-expression of important hub proteins [36,37,38]. Relatedly, it has been observed that the entropy of gene expression developing tissues increases with time in a manner that is closely related to differentiation dynamics [39,40,41].

While the entropy is good at assessing how likely it is that a particular expression value will occur, it is not well suited to assessing relationships between co-expression patterns. To do so, a related measure, the mutual information (MI), is also widely used. Consider two discrete random variables, X and Y, which may be related in some unknown way. The entropy of the joint probability density p(x, y) is:

$$ H\left( X, Y\right)=-\sum_x\ \sum_y p\left( x, y\right)\mathit{\log}\ p\left( x, y\right) $$

Informally, this is a measure of the information content of the joint distribution, but it is not a direct measure of association between the two random variables. In order to assess whether one variable provides information about the other, the mutual information I(X; Y) may be used [26]. The mutual information compares the observed joint probability density with that which would be observed if the two random variables were independent. In particular,

$$ I\left( X; Y\right)=\sum_x\ \sum_y p\left( x, y\right)\mathit{\log}\frac{p\left( x, y\right)}{p(x)\ p(y)},\kern0.5em = H(X)+ H(Y)- H\left( X, Y\right), $$

where H(X) and H(Y) are the marginal entropies. If X and Y are independent then p(x, y) = p(x) p(y), so $ \mathit{\log}\frac{p\left( x, y\right)}{p(x)\ p(y)}=\mathit{\log}1=0 $ for all x and y and therefore I(X; Y) = 0. In this case, knowledge of one variable does not provide any information about the other variable. More generally since I(X; Y) = I(Y; X) ≥ 0 , the magnitude of the MI is a measure of the extent to which the observed joint distribution deviates from independence: larger values of MI indicate a stronger dependency between X and Y. The advantage of MI as a measure of association is that it does not specify in advance the nature of the relationship between X and Y so it can capture non-linear, non-monotonic, dependencies between variables in a general way that traditional correlation measures cannot (see Fig. 1c for some examples).

Since the mutual information assesses the extent to which two random variables are independent of one another, it can be used to identify putative functional relationships between experimentally observed variables (e.g., genes or proteins) [42, 43]. For this reason, there has been much interest in using information-theoretic methods to infer genetic regulatory networks (GRNs) from gene expression data, in order to better understand cellular dynamics. Inferring a GRN first involves constructing a matrix of “similarities” between genes based upon pairwise comparisons of their expression profiles. Typically, similarity is assessed using either correlation-based measures or information-theoretic-based measures. Comparisons of different measures and their implementations can be found in a number of recent reviews [44, 45, 46••]. In order to infer a network of regulatory links from the similarity matrix, some form of filtering is required to exclude spurious links or links with little statistical support. The simplest form of filtering is thresholding. However, since thresholding does not take into account the local context of the interactions, it is a crude approach that does not necessarily retain important interactions or exclude unimportant interactions. To address this shortfall, information theory can be used to refine networks based upon thresholding and thereby infer regulatory interactions with greater accuracy. These methods are typically based upon more advanced information-theoretic notions such as the data processing inequality (DIP), which allows indirect regulatory links to be removed from a network (the well-used ARACNE algorithm [47•, 48] makes use of the DIP); conditional mutual information, which assesses the dependency between two random variables conditioned on a third and can be used to uncover genes that have shared upstream co-regulation [49]; the maximum relevance/minimum redundancy (MRMR) strategy, which ranks potential interactions for a given gene based on the difference between MI for the gene and a putative target (maximum relevance) and the average MI of the putative target with all previously ranked targets (minimum redundancy). The aim of this last approach is to rank highly the most informative interactions and is implemented in the MRNET and MRNETB algorithms, for example [50]. Other GRN inference methods make use of closely related information-theoretic measures such as the transfer entropy (as implemented in the MIDER algorithm, for example [51]) and multivariate mutual information measures [52]. In the context of stem cell biology, information-based network reconstruction methods have been used with some success to identify novel regulators of pluripotency and lineage specifiers [53, 54] as well as track changes in network structures during cellular differentiation [52, 55].

Conclusions

Here, we have summarized some of the ways that information theory can be used in combination with multivariate statistics to investigate stem cell identities. Although information-theoretic measures are not always intuitive and their practical application needs careful consideration, information theory provides a suite of tools that can help make the most of experimentally hard-earned data. As well as providing improved measures of variability and association, information theory also has a natural relationship with statistical mechanics [56, 57], and thereby provides a natural approach to the investigation of cellular dynamics. Statistical mechanics addresses the question of how observable “macroscopic” properties of a system arise from unobserved “microscopic” dynamics. For example, the pressure of a gas in a confined container (a macrostate) depends upon the average kinetic energy of the molecules in the gas and can therefore be predicted without detailed knowledge of the instantaneous position and velocity of all the individual gas molecules involved (a microstate). In the 1950s, Jaynes showed that statistical mechanics could be derived directly from information-theoretic principles [56, 57]. For example, he observed that the Boltzmann distribution, which is ubiquitous in statistical mechanics, arises naturally as the maximum entropy probability distribution subject to appropriate physical constraint. It would be interesting to see if similar approaches can be used to better understand cell–cell variability in stem cell systems: do observed patterns of variability in stem cell populations reflect natural biological constraints? If so, what are they? To what extent does cell–cell variability relate to stem cell function? Can a general theory of regulated cellular variability be derived using physical and information-theoretic principles? Some minor progress has been made towards these aims [33, 58,59,60] and this is an exciting area of current research, yet there is still much to be done. Although the relationships between cell–cell variability, entropy, and cell function have yet to be fully deciphered, ongoing research indicates that information-theoretic measures can provide insight into cellular identities that are not apparent from more traditional multivariate statistical methods. We anticipate that advances in the accuracy and reductions in the cost of single-cell methods are likely to see increased interest in the development and use of these methods in the near future.

References

Papers of Particular Interest, Published Recently, Have Been Highlighted as: • Of importance •• Of major importance

Semrau S, van Oudenaarden A. Studying lineage decision-making in vitro: emerging concepts and novel tools. Annu Rev Cell Dev Biol. 2015;31:317–45.
Article CAS PubMed Google Scholar
Moignard V, Göttgens B. Dissecting stem cell differentiation using single cell expression profiling. Curr Opin Cell Biol. 2016;43:78–86.
Article CAS PubMed Google Scholar
•• Kumar P, Tan Y, Cahan P. Understanding development and stem cells using single cell-based analyses of gene expression. Development. 2017;144:17–32. Review of single cell transcriptome analysis methods and in their application in stem cell and developmental biology.
•• Grün D, Van Oudenaarden A. Design and analysis of single-cell sequencing experiments. Cell. 2015;163:799–810. A thorough review of the recent advances in single cell transcriptome sequencing; comparison of preparation and sequencing methods and analysis techniques.
Kolodziejczyk AA, Kim JK, Svensson V, Marioni JC, Teichmann SA. The technology and biology of single-cell RNA sequencing. Mol Cell. 2015;58:610–20.
Article CAS PubMed Google Scholar
Ziegenhain C, Vieth B, Parekh S, Reinius B, Guillaumet-Adkins A, Smets M, Leonhardt H, Heyn H, Hellmann I, Enard W (2017) Comparative analysis of single-cell RNA sequencing methods: Mol Cell. 2017;65:631–643.
Bendall SC, Simonds EF, Qiu P, et al. Single-cell mass cytometry of differential immune and drug responses across a human hematopoietic continuum. Science. 2011;332(80):687–96.
Article CAS PubMed PubMed Central Google Scholar
Cusanovich DA, Daza R, Adey A, Pliner HA, Christiansen L, Gunderson KL, et al. Epigenetics. Multiplex single-cell profiling of chromatin accessibility by combinatorial cellular indexing. Science. 2015;348:910–4.
Article CAS PubMed PubMed Central Google Scholar
Spitzer MH, Nolan GP. Mass cytometry: single cells, many features. Cell. 2016;165:780–91.
Article CAS PubMed PubMed Central Google Scholar
Budnik B, Levy E, Slavov N. Mass-spectrometry of single mammalian cells quantifies proteome heterogeneity during cell differentiation. bioRxiv. 2017:102681.
Hatano S-Y, Tada M, Kimura H, Yamaguchi S, Kono T, Nakano T, et al. Pluripotential competence of cells associated with Nanog activity. Mech Dev. 2005;122:67–79.
Article CAS PubMed Google Scholar
Chambers I, Silva J, Colby D, Nichols J, Nijmeijer B, Robertson M, et al. Nanog safeguards pluripotency and mediates germline development. Nature. 2007;450:1230–4.
Article CAS PubMed Google Scholar
Hayashi K, Chuva de Sousa Lopes SM, Tang F, Surani MA. Dynamic equilibrium and heterogeneity of mouse pluripotent stem cells with distinct functional and epigenetic states. Cell Stem Cell. 2008;3:391–401.
Toyooka Y, Shimosato D, Murakami K, Takahashi K, Niwa H. Identification and characterization of subpopulations in undifferentiated ES cell culture. Development. 2008;135:909–18.
Article CAS PubMed Google Scholar
Canham MA, Sharov AA, Ko MSH, Brickman JM. Functional heterogeneity of embryonic stem cells revealed through translational amplification of an early endodermal transcript. PLoS Biol. 2010;8:e1000379.
Article PubMed PubMed Central Google Scholar
• Macosko EZ, Basu A, Satija R, et al. Highly parallel genome-wide expression profiling of individual cells using nanoliter droplets. Cell. 2015;161:1202–14. Together with the study by Klein et. al, this study on droplet-based single cell RNA sequencing was a breakthrough method, making single cell transcriptome data more accessible.
• Klein AM, Mazutis L, Akartuna I, Tallapragada N, Veres A, Li V, et al. Droplet barcoding for single-cell transcriptomics applied to embryonic stem cells. Cell. 2015;161:1187–201. Together with the study by Macosko et. al, this study on droplet-based single cell RNA sequencing was a breakthrough method, making single cell transcriptome data more accessible to researchers.
Kumar RM, Cahan P, Shalek AK, et al. Deconstructing transcriptional heterogeneity in pluripotent stem cells. Nature. 2014;516:56–61.
Article CAS PubMed PubMed Central Google Scholar
Bacher R, Kendziorski C. Design and computational analysis of single-cell RNA-sequencing experiments. Genome Biol. 2016;17:63.
Article PubMed PubMed Central Google Scholar
Kolodziejczyk AA, Kim JK, Tsang JCH, et al. Single cell RNA-sequencing of pluripotent states unlocks modular transcriptional variation. Cell Stem Cell. 2015;17:471–85.
Article CAS PubMed PubMed Central Google Scholar
Yan L, Yang M, Guo H, et al. Single-cell RNA-Seq profiling of human preimplantation embryos and embryonic stem cells. Nat Struct Mol Biol. 2013;20:1131–9.
Article CAS PubMed Google Scholar
Grün D, Lyubimova A, Kester L, Wiebrands K, Basak O, Sasaki N, et al. Single-cell messenger RNA sequencing reveals rare intestinal cell types. Nature. 2015;525:251–5.
Article PubMed Google Scholar
Olsson A, Venkatasubramanian M, Chaudhri VK, Aronow BJ, Salomonis N, Singh H, et al. Single-cell analysis of mixed-lineage states leading to a binary cell fate choice. Nature. 2016;537:698–702.
Article CAS PubMed PubMed Central Google Scholar
Shannon CE. A mathematical theory of communication. Bell Syst Tech J. 1948;27:379–423.
Article Google Scholar
Bialek W. Biophysics: Searching for Principles. Oxford: Princeton University Press; 2012.
Cover TM, Thomas JA. Elements of information theory, second. Elem Inf Theory. 2005; doi:10.1002/047174882X.
Kullback S, Leibler RA. On information and sufficiency. Ann Math Stat. 1951;22:79–86.
Article Google Scholar
Tonge PD, Olariu V, Coca D, Kadirkamanathan V, Burrell KE, Billings SA, et al. Prepatterning in the stem cell compartment. PLoS One. 2010;5:e10901.
Article PubMed PubMed Central Google Scholar
Van Der Maaten LJP, Hinton GE. Visualizing high-dimensional data using t-sne. J Mach Learn Res. 2008;9:2579–605.
Google Scholar
Olsen C, Meyer PE, Bontempi G. On the impact of entropy estimation on transcriptional regulatory network inference based on mutual information. EURASIP J Bioinform Syst Biol. 2009;2009:308959.
Article Google Scholar
Hausser J, Strimmer K. Entropy inference and the James-Stein estimator, with application to nonlinear gene association networks. J Mach Learn Res. 2009;10:1469–84.
Google Scholar
Mc Mahon SS, Lenive O, Filippi S, Stumpf MPH. Information processing by simple molecular motifs and susceptibility to noise. J R Soc Interface. 2015;12:597.
Article Google Scholar
MacArthur BD, Lemischka IR. Statistical mechanics of pluripotency. Cell. 2013;154:484–9.
Article CAS PubMed Google Scholar
• Grün D, Muraro MJ, Boisset JC, et al. De novo prediction of stem cell identity using single-cell transcriptome data. Cell Stem Cell. 2016;19:266–77. This study incorporates the use of transcriptome entropy in identification of stem cells in mixed populations and is biomedically relevant in many systems where stem cells are not currently well characterised.
Guo M, Bao EL, Wagner M, Whitsett JA, Xu Y. SLICE: determining cell differentiation and lineage based on single cell entropy. Nucleic Acids Res. 2017;45(7):e54.
Teschendorff AE, Severini S. Increased entropy of signal transduction in the cancer metastasis phenotype. BMC Syst Biol. 2010;4:104.
Article PubMed PubMed Central Google Scholar
West J, Bianconi G, Severini S, Teschendorff AE. Differential network entropy reveals cancer system hallmarks. Sci Rep. 2012;2:802.
Article PubMed PubMed Central Google Scholar
Banerji CRS, Miranda-Saavedra D, Severini S, Widschwendter M, Enver T, Zhou JX, et al. Cellular network entropy as the energy potential in Waddington’s differentiation landscape. Sci Rep. 2013;3:3039.
Article PubMed PubMed Central Google Scholar
Anavy L, Levin M, Khair S, Nakanishi N, Fernandez-Valverde SL, Degnan BM, et al. BLIND ordering of large-scale transcriptomic developmental timecourses. Development. 2014;141:1161–6.
Article CAS PubMed Google Scholar
Piras V, Tomita M, Selvarajoo K. Transcriptome-wide variability in single embryonic development cells. Sci Rep. 2014;4:7137.
Article CAS PubMed PubMed Central Google Scholar
Richard A, Boullu L, Herbach U, et al. Single-cell-based analysis highlights a surge in cell-to-cell molecular variability preceding irreversible commitment in a differentiation process. PLoS Biol. 2016;14:e1002585.
Article PubMed PubMed Central Google Scholar
Antebi YE, Reich-Zeliger S, Hart Y, Mayo A, Eizenberg I, Rimer J, et al. Mapping differentiation under mixed culture conditions reveals a tunable continuum of T cell fates. PLoS Biol. 2013;11:e1001616.
Article CAS PubMed PubMed Central Google Scholar
Smith RCG, Stumpf PS, Ridden SJ, Sim A, Filippi S, Harrington HA, MacArthur BD. Nanog fluctuations in embryonic stem cells highlight the problem of measurement in cell biology. Biophys J. 2017;112(12):2641-2652.
Allen JD, Xie Y, Chen M, Girard L, Xiao G. Comparing statistical methods for constructing large scale gene networks. PLoS One. 2012;7:e29348.
Article CAS PubMed PubMed Central Google Scholar
Villaverde AF, Ross J, Banga JR. Reverse engineering cellular networks with information theoretic methods. Cell. 2013;2:306–29.
Article Google Scholar
•• Mc Mahon SS, Sim A, Filippi S, Johnson R, Liepe J, Smith D, et al. Information theory and signal transduction systems: from molecular information processing to network inference. Semin Cell Dev Biol. 2014;35:98–108. Review of information theory measures for inferring gene regulatory networks, including discussion of discretization and entropy estimator methods.
• Nemenman I, Basso K, Wiggins C, Stolovitzky G, Favera RD, Califano A. ARACNE: an algorithm for the reconstruction of gene regulatory networks in a mammalian cellular context. BMC Bioinformatics. 2004;7:1471–2105. One of the most widely used algorithms for determining gene regulatory networks.
Google Scholar
Basso K, Margolin A, Stolovitzky G, Klein U, Riccardo D-F, Califano A. Reverse engineering of regulatory networks in human B cells. Nat Genet. 2005;37:382–90.
Article CAS PubMed Google Scholar
Liang K-C, Wang X. Gene regulatory network reconstruction using conditional mutual information. EURASIP J Bioinform Syst Biol. 2008;2008:253894.
Meyer PE, Kontos K, Lafitte F, Bontempi G. Information-theoretic inference of large transcriptional regulatory networks. EURASIP J Bioinform Syst Biol. 2007;2007:79879.
Villaverde AF, Ross J, Morán F, Banga JR. MIDER: network inference with mutual information distance and entropy reduction. PLoS One. 2014;9:e96732.
Article PubMed PubMed Central Google Scholar
Chan TE, Stumpf MPH, Babtie AC. Network inference from single-cell data using multivariate information measures. bioRxiv. 2016;082099. doi:10.1101/082099.
Kushwaha R, Jagadish N, Kustagi M, et al. Interrogation of a context-specific transcription factor network identifies novel regulators of pluripotency. Stem Cells. 2015;33:367–77.
Article CAS PubMed PubMed Central Google Scholar
Okawa S, Angarica VE, Lemischka I, Moore K, del Sol A. A differential network analysis approach for lineage specifier prediction in stem cell subpopulations. Syst Biol Appl. 2015;1:15012.
Article Google Scholar
Stumpf PS, Smith RCG, Lenz M, et al. Stem cell differentiation is a stochastic process with memory. bioRxiv. 2017;101048. doi:10.1101/101048.
Jaynes ET. Information theory and statistical mechanics. Phys Rev. 1957;106:620–30.
Article Google Scholar
Jaynes ET. Information theory and statistical mechanics. II. Phys Rev. 1957;108:171–90.
Article Google Scholar
Garcia-Ojalvo J, Martinez Arias A. Towards a statistical mechanics of cell fate decisions. Curr Opin Genet Dev. 2012;22:619–26.
Article CAS PubMed Google Scholar
Ridden SJ, Chang HH, Zygalakis KC, MacArthur BD. Entropy, ergodicity, and stem cell multipotency. Phys Rev Lett. 2015;115:208103.
Article PubMed Google Scholar
Moris N, Pina C, Arias AM. Transition states and cell fate decisions in epigenetic landscapes. Nat Rev Genet. 2016;17:693–703.
Article CAS PubMed Google Scholar

Download references

Acknowledgements

This work was funded by the Biotechnology and Biological Sciences Research Council (BBSRC) Grant No. BB/L000512/1.

Author information

Authors and Affiliations

Centre for Human Development, Stem Cells, and Regeneration, Faculty of Medicine, University of Southampton, Southampton, SO17 1BJ, UK
Rosanna C.G. Smith & Ben D. MacArthur
Institute for Life Sciences, University of Southampton, Southampton, SO17 1BJ, UK
Rosanna C.G. Smith & Ben D. MacArthur
Mathematical Sciences, University of Southampton, Southampton, SO17 1BJ, UK
Ben D. MacArthur

Authors

Rosanna C.G. Smith
View author publications
You can also search for this author in PubMed Google Scholar
Ben D. MacArthur
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ben D. MacArthur.

Ethics declarations

Conflict of Interest

Rosanna C. G. Smith PhD and Ben D. MacArthur PhD declare that they have no conflict of interest.

Human and Animal Rights and Informed Consent

This article does not contain any studies with human or animal subjects performed by any of the authors.

Additional information

This article is part of the Topical Collection on Mathematical Models of Stem Cell Behavior

Rights and permissions

Reprints and permissions

About this article

Cite this article

Smith, R.C., MacArthur, B.D. Information-Theoretic Approaches to Understanding Stem Cell Variability. Curr Stem Cell Rep 3, 225–231 (2017). https://doi.org/10.1007/s40778-017-0093-5

Download citation

Published: 13 July 2017
Issue Date: September 2017
DOI: https://doi.org/10.1007/s40778-017-0093-5

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Information-Theoretic Approaches to Understanding Stem Cell Variability