Abstract
Purpose of Review
The purpose of this study is to outline how ideas from information theory may be used to analyze single-cell data and better understand stem cell behavior.
Recent Findings
Recent technological breakthroughs in single-cell profiling have made it possible to interrogate cell–cell variability in a multitude of contexts, including the role it plays in stem cell dynamics. Here we review how measures from information theory are being used to extract biological meaning from the complex, high-dimensional, and noisy datasets that arise from single-cell profiling experiments. We also discuss how concepts linking information theory and statistical mechanics are being used to provide insight into cellular identity, variability, and dynamics.
Summary
We provide a brief introduction to some basic notions from information theory and how they may be used to understand stem cell identities at the single-cell level. We also discuss how work in this area might develop in the near future.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
Introduction
Stem cells are characterized by their ability to self-renew and differentiate along multiple distinct lineages. Due to these remarkable properties, there is much hope for stem cell-based therapies in regenerative medicine. However, the development of such therapies will require a thorough understanding of the molecular mechanisms by which stem cells balance self-renewal and differentiation. Since stem cells are often rare (as in the adult) or exist only transiently (as in development), recent years have seen a growing focus on using single-cell profiling technologies to understand stem cell dynamics. These studies have indicated that apparently functionally homogeneous stem cell populations can vary widely in their expression of important regulators of self-renewal and multipotency. In some cases, this variability is driven by dynamic fluctuations of important master transcription factors, suggesting that stem cell heterogeneity has an important functional role [1, 2, 3••]. However, the relationship between molecular heterogeneity and stem cell function are still not well understood.
Recent years have seen remarkable advances in single-cell sequencing techniques, and it is now possible to profile large portions of the genome, or the entire transcriptome, in hundreds to thousands of individual cells in a single experiment [4••, 5, 6]. Advances in single-cell epigenetics and proteomics are not far behind [7,8,9,10]. These advances promise to transform our understanding of cellular identities, yet they also produce vast amounts of complex data, making it a significant challenge to distinguish meaningful biology from experimental noise. In the context of stem cell dynamics, numerous reports have indicated that functionally homogeneous stem cell populations, both from the adult and the embryo, are highly heterogeneous with respect to their patterns of gene and protein expression [11,12,13,14,15]. However, the extent to which this variability plays a functional role, and the extent to which it represents variability due to inherent, but non-functional, expression noise are not clear. Therefore, in order to understand stem cell function at the individual cell level, it has become increasingly necessary to use high-throughput profiling techniques to explore co-expression dynamics at the single-cell level to identify rare (yet potentially functionally important) cells and determine how co-expression patterns change over time. The data provided by these experiments are fundamentally different from those obtained from measurements on cellular aggregates. While bulk methods typically provide estimates of the mean expression of each variable (e.g., gene) profiled over all cells in the aggregated sample (perhaps along with estimate of variance when the sample mean of multiple replicates are taken), they are not generally well suited to exploring dependencies between variables because they are only capable of examining expression patterns on average, not within individual cells. By contrast, since single-cell methods profile co-expression patterns within individual cells they are able to provide a sample from the joint distribution of all the variables being profiled and so are much better suited to explore functional relationships between variables. Importantly, recent years have seen significant improvements in the efficiency of single-cell RNA-sequencing methods, which now allow profiling of many tens of thousands of individual cells thereby improving estimates of joint expression distributions [6, 16•, 17•]. The experimental progress made in capturing multivariate single cell data has also stimulated research into new analysis techniques that are specifically designed to handle high-dimensional single-cell data [18, 19]. These new analysis methods often make use of classical multivariate statistics and statistical approaches have provided insight into many stem cell systems including identification and characterization of mixtures of cellular states [20], comparison of different stem cell lines [21], rare cell identification [22], and cell lineage decision-making [23]. However, methods from information theory are increasingly also being used to better understand how cellular expression patterns determine cellular identities.
Information Theory
Information theory has its roots in Shannon’s work on communication and his famous 1948 paper laid out the mathematical theory of information [24, 25]. Shannon realized that in order to quantify the information content of a message, it is necessary to consider the message’s context, or how probable it is. An intuitive understanding of this can be seen in the following example. Consider a search for this article using only the last name of one of the authors. Which one is it best to choose? The knowledge that “Smith” is a very common last name and “MacArthur” is less common means that searching for “MacArthur” is more likely to narrow the search and therefore likely to provide more information. The fact that “MacArthur” is a more complex word than “Smith” is irrelevant: it is the rarity of each name that dictates which to choose, not the name itself. In the context of gene expression, the fact that a cell has seven transcripts of a particular messenger RNA (mRNA) does not in itself carry any information: this observation requires context in order to understand how much information is gained from the measurement. Without the context of how likely a read of seven transcripts is, the information gained from the measurement is unknown (colloquially this is known as Shannon’s zeroth law). So how do we calculate information gain? Shannon argued that any measure of information should satisfy three basic requirements: monotonicity, independence, and branching. Monotonicity ensures that the information gained from a question with a wide variety of answers is greater than the information gained from the answer to a question with only a few possible answers. For example, to identify a specific person an answer to the question “where do they come from?” provides more information than an answer to the question “are they female?” Independence ensures that the total information gained from two independent questions is a sum of the information gained from the questions separately. So, for example, the order in which the questions are asked should not matter. Lastly, branching ensures that when a series of questions is composed in a tree-like structure, the overall information gained by passing along a path through the tree is a weighted sum of the information gained from each branch point [25].
Shannon proved that the following function, which he called the entropy by analogy to the closely related thermodynamic entropy, uniquely satisfies these conditions. The Shannon entropy H is the expected amount of information gained from answering a question for which the probability of answer x is given by p(x),
The entropy is a property of the probability distribution p(x), in the same way that the mean and variance are properties of p(x). Informally, the entropy is simply a measure of how “flat” or close to uniform p(x) is, and the “flatter” a distribution is, the greater the entropy and information gained. The units of entropy depend on the base of the logarithm: when the logarithm is taken to the base 2, as is common in information theory, entropy is measured in bits (one bit is the amount of information provided when observing one of two equally likely outcomes, e.g., the flip of a fair coin). Alternatively, entropy is measured in nats when using the natural logarithm (as is typically the case in statistical mechanics), and in hartleys when using base 10 (one hartley is the amount of information provided when observing one of 10 equally likely outcomes, e.g., a uniformly randomly chosen decimal digit). The equation for the entropy given above assumes that the random variable X is discrete. In practice, many measures of interest, such as molecular concentrations, are continuous and the continuous analogue to the entropy above is known as the differential entropy [26]. In the discrete case, the entropy has some useful properties (for example, H(X) ≥ 0) that are not inherited by the differential entropy. To account for these differences, several closely related variations such as the Kullback-Leibler divergence (also known as the relative entropy) and its generalizations are commonly used to assess similarity between continuous expression distributions [27, 28]. For example, the widely used t-SNE dimensionality reduction algorithm [29] (which has been used in several recent stem cell studies to explore heterogeneity in stem cell identities and cluster cell states [16, 17•, 22, 23]) uses the Kullback-Leibler divergence to assess the similarity between the observed co-expression distribution and that obtained by projecting the data to a lower-dimensional space.
Information Theory and Stem Cell Biology
The utility of the entropy in understanding cell identities many be illustrated by returning to our example of the measurement of seven mRNA transcripts in a cell. To gain context to this reading, we need to better understand the natural variability of mRNA expression in the cell population of interest to determine how unusual this reading is. Consider the following two hypothetical scenarios for mRNA expression in a population of stem cells, as shown in Fig. 1a: (Scenario 1) all cells in the population have seven mRNA transcripts (i.e., 7 is the only answer to the question how many transcripts are in the cell? and occurs with probability 1). In this case, since all cells are the same with respect to their transcript counts, the observation of seven transcripts cannot be used to discriminate one cell from another, and therefore does not impart any information. Accordingly, the entropy is H = − 1 log(1) = 0 bits. (Scenario 2) Two stem cell subtypes are present in the population (types A and B). Cells of type A occur with probability 0 < p < 1 and have seven transcripts, while cells of type B occur with probability (1 − p) and have zero transcripts. In this case, the observation of seven transcripts allows us to positively discriminate cells of types A from those of type B and so imparts useful information. Furthermore, the amount of information we gain is related to the relative rarity of types A and B. In particular, the entropy is given by, H = − p log(p) − (1 − p) log(1 − p). Thus, when p is small, the observation of seven transcripts in a cell is a rare event, but the observation of zero transcripts is a common event and so the entropy is low. Conversely, when p is large the observation of seven transcripts is a common event while the observation of zero transcripts is a rare event and again the entropy is low. However, when cells of both types are common in the population (i.e, if p ~ 0.5) then the entropy reaches its maximum. In this example, it is worth noting that the fact that cells of type B express no transcripts is not relevant to the calculation of entropy, they could have expressed any number of transcripts not equal to seven: all that is important is that cells of type B can be distinguished from those of type A by their mRNA transcript count.
In practice, we would not expect that all cells express a given mRNA at one of two fixed levels; rather, intrinsic noise in gene expression naturally gives rise to variations in gene expression levels over time within each individual cell, and within the cell population at any fixed time (see Fig. 1b). While it cannot often be calculated explicitly as above, the entropy can nevertheless be estimated from experimental data to better understand this natural variation (it should be noted that entropy estimation is subject some technical issues including the effect of data binning and bias on entropy estimation [30,31,32]). For example, it has been suggested that a high degree of cell–cell variability in gene expression patterns within a functionally pure population, as quantified by the entropy of the joint expression distribution, is characteristic of undifferentiated pluripotent cells [33, 34•, 35]. Similarly, by considering patterns of gene expression in light of a known signaling networks, Teschendorff and colleagues have argued that both pluripotent cells and cancer cells are associated with a state of high network entropy, characterized by the promiscuous co-expression of important hub proteins [36,37,38]. Relatedly, it has been observed that the entropy of gene expression developing tissues increases with time in a manner that is closely related to differentiation dynamics [39,40,41].
While the entropy is good at assessing how likely it is that a particular expression value will occur, it is not well suited to assessing relationships between co-expression patterns. To do so, a related measure, the mutual information (MI), is also widely used. Consider two discrete random variables, X and Y, which may be related in some unknown way. The entropy of the joint probability density p(x, y) is:
Informally, this is a measure of the information content of the joint distribution, but it is not a direct measure of association between the two random variables. In order to assess whether one variable provides information about the other, the mutual information I(X; Y) may be used [26]. The mutual information compares the observed joint probability density with that which would be observed if the two random variables were independent. In particular,
where H(X) and H(Y) are the marginal entropies. If X and Y are independent then p(x, y) = p(x) p(y), so \( \mathit{\log}\frac{p\left( x, y\right)}{p(x)\ p(y)}=\mathit{\log}1=0 \) for all x and y and therefore I(X; Y) = 0. In this case, knowledge of one variable does not provide any information about the other variable. More generally since I(X; Y) = I(Y; X) ≥ 0 , the magnitude of the MI is a measure of the extent to which the observed joint distribution deviates from independence: larger values of MI indicate a stronger dependency between X and Y. The advantage of MI as a measure of association is that it does not specify in advance the nature of the relationship between X and Y so it can capture non-linear, non-monotonic, dependencies between variables in a general way that traditional correlation measures cannot (see Fig. 1c for some examples).
Since the mutual information assesses the extent to which two random variables are independent of one another, it can be used to identify putative functional relationships between experimentally observed variables (e.g., genes or proteins) [42, 43]. For this reason, there has been much interest in using information-theoretic methods to infer genetic regulatory networks (GRNs) from gene expression data, in order to better understand cellular dynamics. Inferring a GRN first involves constructing a matrix of “similarities” between genes based upon pairwise comparisons of their expression profiles. Typically, similarity is assessed using either correlation-based measures or information-theoretic-based measures. Comparisons of different measures and their implementations can be found in a number of recent reviews [44, 45, 46••]. In order to infer a network of regulatory links from the similarity matrix, some form of filtering is required to exclude spurious links or links with little statistical support. The simplest form of filtering is thresholding. However, since thresholding does not take into account the local context of the interactions, it is a crude approach that does not necessarily retain important interactions or exclude unimportant interactions. To address this shortfall, information theory can be used to refine networks based upon thresholding and thereby infer regulatory interactions with greater accuracy. These methods are typically based upon more advanced information-theoretic notions such as the data processing inequality (DIP), which allows indirect regulatory links to be removed from a network (the well-used ARACNE algorithm [47•, 48] makes use of the DIP); conditional mutual information, which assesses the dependency between two random variables conditioned on a third and can be used to uncover genes that have shared upstream co-regulation [49]; the maximum relevance/minimum redundancy (MRMR) strategy, which ranks potential interactions for a given gene based on the difference between MI for the gene and a putative target (maximum relevance) and the average MI of the putative target with all previously ranked targets (minimum redundancy). The aim of this last approach is to rank highly the most informative interactions and is implemented in the MRNET and MRNETB algorithms, for example [50]. Other GRN inference methods make use of closely related information-theoretic measures such as the transfer entropy (as implemented in the MIDER algorithm, for example [51]) and multivariate mutual information measures [52]. In the context of stem cell biology, information-based network reconstruction methods have been used with some success to identify novel regulators of pluripotency and lineage specifiers [53, 54] as well as track changes in network structures during cellular differentiation [52, 55].
Conclusions
Here, we have summarized some of the ways that information theory can be used in combination with multivariate statistics to investigate stem cell identities. Although information-theoretic measures are not always intuitive and their practical application needs careful consideration, information theory provides a suite of tools that can help make the most of experimentally hard-earned data. As well as providing improved measures of variability and association, information theory also has a natural relationship with statistical mechanics [56, 57], and thereby provides a natural approach to the investigation of cellular dynamics. Statistical mechanics addresses the question of how observable “macroscopic” properties of a system arise from unobserved “microscopic” dynamics. For example, the pressure of a gas in a confined container (a macrostate) depends upon the average kinetic energy of the molecules in the gas and can therefore be predicted without detailed knowledge of the instantaneous position and velocity of all the individual gas molecules involved (a microstate). In the 1950s, Jaynes showed that statistical mechanics could be derived directly from information-theoretic principles [56, 57]. For example, he observed that the Boltzmann distribution, which is ubiquitous in statistical mechanics, arises naturally as the maximum entropy probability distribution subject to appropriate physical constraint. It would be interesting to see if similar approaches can be used to better understand cell–cell variability in stem cell systems: do observed patterns of variability in stem cell populations reflect natural biological constraints? If so, what are they? To what extent does cell–cell variability relate to stem cell function? Can a general theory of regulated cellular variability be derived using physical and information-theoretic principles? Some minor progress has been made towards these aims [33, 58,59,60] and this is an exciting area of current research, yet there is still much to be done. Although the relationships between cell–cell variability, entropy, and cell function have yet to be fully deciphered, ongoing research indicates that information-theoretic measures can provide insight into cellular identities that are not apparent from more traditional multivariate statistical methods. We anticipate that advances in the accuracy and reductions in the cost of single-cell methods are likely to see increased interest in the development and use of these methods in the near future.
References
Papers of Particular Interest, Published Recently, Have Been Highlighted as: • Of importance •• Of major importance
Semrau S, van Oudenaarden A. Studying lineage decision-making in vitro: emerging concepts and novel tools. Annu Rev Cell Dev Biol. 2015;31:317–45.
Moignard V, Göttgens B. Dissecting stem cell differentiation using single cell expression profiling. Curr Opin Cell Biol. 2016;43:78–86.
•• Kumar P, Tan Y, Cahan P. Understanding development and stem cells using single cell-based analyses of gene expression. Development. 2017;144:17–32. Review of single cell transcriptome analysis methods and in their application in stem cell and developmental biology.
•• Grün D, Van Oudenaarden A. Design and analysis of single-cell sequencing experiments. Cell. 2015;163:799–810. A thorough review of the recent advances in single cell transcriptome sequencing; comparison of preparation and sequencing methods and analysis techniques.
Kolodziejczyk AA, Kim JK, Svensson V, Marioni JC, Teichmann SA. The technology and biology of single-cell RNA sequencing. Mol Cell. 2015;58:610–20.
Ziegenhain C, Vieth B, Parekh S, Reinius B, Guillaumet-Adkins A, Smets M, Leonhardt H, Heyn H, Hellmann I, Enard W (2017) Comparative analysis of single-cell RNA sequencing methods: Mol Cell. 2017;65:631–643.
Bendall SC, Simonds EF, Qiu P, et al. Single-cell mass cytometry of differential immune and drug responses across a human hematopoietic continuum. Science. 2011;332(80):687–96.
Cusanovich DA, Daza R, Adey A, Pliner HA, Christiansen L, Gunderson KL, et al. Epigenetics. Multiplex single-cell profiling of chromatin accessibility by combinatorial cellular indexing. Science. 2015;348:910–4.
Spitzer MH, Nolan GP. Mass cytometry: single cells, many features. Cell. 2016;165:780–91.
Budnik B, Levy E, Slavov N. Mass-spectrometry of single mammalian cells quantifies proteome heterogeneity during cell differentiation. bioRxiv. 2017:102681.
Hatano S-Y, Tada M, Kimura H, Yamaguchi S, Kono T, Nakano T, et al. Pluripotential competence of cells associated with Nanog activity. Mech Dev. 2005;122:67–79.
Chambers I, Silva J, Colby D, Nichols J, Nijmeijer B, Robertson M, et al. Nanog safeguards pluripotency and mediates germline development. Nature. 2007;450:1230–4.
Hayashi K, Chuva de Sousa Lopes SM, Tang F, Surani MA. Dynamic equilibrium and heterogeneity of mouse pluripotent stem cells with distinct functional and epigenetic states. Cell Stem Cell. 2008;3:391–401.
Toyooka Y, Shimosato D, Murakami K, Takahashi K, Niwa H. Identification and characterization of subpopulations in undifferentiated ES cell culture. Development. 2008;135:909–18.
Canham MA, Sharov AA, Ko MSH, Brickman JM. Functional heterogeneity of embryonic stem cells revealed through translational amplification of an early endodermal transcript. PLoS Biol. 2010;8:e1000379.
• Macosko EZ, Basu A, Satija R, et al. Highly parallel genome-wide expression profiling of individual cells using nanoliter droplets. Cell. 2015;161:1202–14. Together with the study by Klein et. al, this study on droplet-based single cell RNA sequencing was a breakthrough method, making single cell transcriptome data more accessible.
• Klein AM, Mazutis L, Akartuna I, Tallapragada N, Veres A, Li V, et al. Droplet barcoding for single-cell transcriptomics applied to embryonic stem cells. Cell. 2015;161:1187–201. Together with the study by Macosko et. al, this study on droplet-based single cell RNA sequencing was a breakthrough method, making single cell transcriptome data more accessible to researchers.
Kumar RM, Cahan P, Shalek AK, et al. Deconstructing transcriptional heterogeneity in pluripotent stem cells. Nature. 2014;516:56–61.
Bacher R, Kendziorski C. Design and computational analysis of single-cell RNA-sequencing experiments. Genome Biol. 2016;17:63.
Kolodziejczyk AA, Kim JK, Tsang JCH, et al. Single cell RNA-sequencing of pluripotent states unlocks modular transcriptional variation. Cell Stem Cell. 2015;17:471–85.
Yan L, Yang M, Guo H, et al. Single-cell RNA-Seq profiling of human preimplantation embryos and embryonic stem cells. Nat Struct Mol Biol. 2013;20:1131–9.
Grün D, Lyubimova A, Kester L, Wiebrands K, Basak O, Sasaki N, et al. Single-cell messenger RNA sequencing reveals rare intestinal cell types. Nature. 2015;525:251–5.
Olsson A, Venkatasubramanian M, Chaudhri VK, Aronow BJ, Salomonis N, Singh H, et al. Single-cell analysis of mixed-lineage states leading to a binary cell fate choice. Nature. 2016;537:698–702.
Shannon CE. A mathematical theory of communication. Bell Syst Tech J. 1948;27:379–423.
Bialek W. Biophysics: Searching for Principles. Oxford: Princeton University Press; 2012.
Cover TM, Thomas JA. Elements of information theory, second. Elem Inf Theory. 2005; doi:10.1002/047174882X.
Kullback S, Leibler RA. On information and sufficiency. Ann Math Stat. 1951;22:79–86.
Tonge PD, Olariu V, Coca D, Kadirkamanathan V, Burrell KE, Billings SA, et al. Prepatterning in the stem cell compartment. PLoS One. 2010;5:e10901.
Van Der Maaten LJP, Hinton GE. Visualizing high-dimensional data using t-sne. J Mach Learn Res. 2008;9:2579–605.
Olsen C, Meyer PE, Bontempi G. On the impact of entropy estimation on transcriptional regulatory network inference based on mutual information. EURASIP J Bioinform Syst Biol. 2009;2009:308959.
Hausser J, Strimmer K. Entropy inference and the James-Stein estimator, with application to nonlinear gene association networks. J Mach Learn Res. 2009;10:1469–84.
Mc Mahon SS, Lenive O, Filippi S, Stumpf MPH. Information processing by simple molecular motifs and susceptibility to noise. J R Soc Interface. 2015;12:597.
MacArthur BD, Lemischka IR. Statistical mechanics of pluripotency. Cell. 2013;154:484–9.
• Grün D, Muraro MJ, Boisset JC, et al. De novo prediction of stem cell identity using single-cell transcriptome data. Cell Stem Cell. 2016;19:266–77. This study incorporates the use of transcriptome entropy in identification of stem cells in mixed populations and is biomedically relevant in many systems where stem cells are not currently well characterised.
Guo M, Bao EL, Wagner M, Whitsett JA, Xu Y. SLICE: determining cell differentiation and lineage based on single cell entropy. Nucleic Acids Res. 2017;45(7):e54.
Teschendorff AE, Severini S. Increased entropy of signal transduction in the cancer metastasis phenotype. BMC Syst Biol. 2010;4:104.
West J, Bianconi G, Severini S, Teschendorff AE. Differential network entropy reveals cancer system hallmarks. Sci Rep. 2012;2:802.
Banerji CRS, Miranda-Saavedra D, Severini S, Widschwendter M, Enver T, Zhou JX, et al. Cellular network entropy as the energy potential in Waddington’s differentiation landscape. Sci Rep. 2013;3:3039.
Anavy L, Levin M, Khair S, Nakanishi N, Fernandez-Valverde SL, Degnan BM, et al. BLIND ordering of large-scale transcriptomic developmental timecourses. Development. 2014;141:1161–6.
Piras V, Tomita M, Selvarajoo K. Transcriptome-wide variability in single embryonic development cells. Sci Rep. 2014;4:7137.
Richard A, Boullu L, Herbach U, et al. Single-cell-based analysis highlights a surge in cell-to-cell molecular variability preceding irreversible commitment in a differentiation process. PLoS Biol. 2016;14:e1002585.
Antebi YE, Reich-Zeliger S, Hart Y, Mayo A, Eizenberg I, Rimer J, et al. Mapping differentiation under mixed culture conditions reveals a tunable continuum of T cell fates. PLoS Biol. 2013;11:e1001616.
Smith RCG, Stumpf PS, Ridden SJ, Sim A, Filippi S, Harrington HA, MacArthur BD. Nanog fluctuations in embryonic stem cells highlight the problem of measurement in cell biology. Biophys J. 2017;112(12):2641-2652.
Allen JD, Xie Y, Chen M, Girard L, Xiao G. Comparing statistical methods for constructing large scale gene networks. PLoS One. 2012;7:e29348.
Villaverde AF, Ross J, Banga JR. Reverse engineering cellular networks with information theoretic methods. Cell. 2013;2:306–29.
•• Mc Mahon SS, Sim A, Filippi S, Johnson R, Liepe J, Smith D, et al. Information theory and signal transduction systems: from molecular information processing to network inference. Semin Cell Dev Biol. 2014;35:98–108. Review of information theory measures for inferring gene regulatory networks, including discussion of discretization and entropy estimator methods.
• Nemenman I, Basso K, Wiggins C, Stolovitzky G, Favera RD, Califano A. ARACNE: an algorithm for the reconstruction of gene regulatory networks in a mammalian cellular context. BMC Bioinformatics. 2004;7:1471–2105. One of the most widely used algorithms for determining gene regulatory networks.
Basso K, Margolin A, Stolovitzky G, Klein U, Riccardo D-F, Califano A. Reverse engineering of regulatory networks in human B cells. Nat Genet. 2005;37:382–90.
Liang K-C, Wang X. Gene regulatory network reconstruction using conditional mutual information. EURASIP J Bioinform Syst Biol. 2008;2008:253894.
Meyer PE, Kontos K, Lafitte F, Bontempi G. Information-theoretic inference of large transcriptional regulatory networks. EURASIP J Bioinform Syst Biol. 2007;2007:79879.
Villaverde AF, Ross J, Morán F, Banga JR. MIDER: network inference with mutual information distance and entropy reduction. PLoS One. 2014;9:e96732.
Chan TE, Stumpf MPH, Babtie AC. Network inference from single-cell data using multivariate information measures. bioRxiv. 2016;082099. doi:10.1101/082099.
Kushwaha R, Jagadish N, Kustagi M, et al. Interrogation of a context-specific transcription factor network identifies novel regulators of pluripotency. Stem Cells. 2015;33:367–77.
Okawa S, Angarica VE, Lemischka I, Moore K, del Sol A. A differential network analysis approach for lineage specifier prediction in stem cell subpopulations. Syst Biol Appl. 2015;1:15012.
Stumpf PS, Smith RCG, Lenz M, et al. Stem cell differentiation is a stochastic process with memory. bioRxiv. 2017;101048. doi:10.1101/101048.
Jaynes ET. Information theory and statistical mechanics. Phys Rev. 1957;106:620–30.
Jaynes ET. Information theory and statistical mechanics. II. Phys Rev. 1957;108:171–90.
Garcia-Ojalvo J, Martinez Arias A. Towards a statistical mechanics of cell fate decisions. Curr Opin Genet Dev. 2012;22:619–26.
Ridden SJ, Chang HH, Zygalakis KC, MacArthur BD. Entropy, ergodicity, and stem cell multipotency. Phys Rev Lett. 2015;115:208103.
Moris N, Pina C, Arias AM. Transition states and cell fate decisions in epigenetic landscapes. Nat Rev Genet. 2016;17:693–703.
Acknowledgements
This work was funded by the Biotechnology and Biological Sciences Research Council (BBSRC) Grant No. BB/L000512/1.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of Interest
Rosanna C. G. Smith PhD and Ben D. MacArthur PhD declare that they have no conflict of interest.
Human and Animal Rights and Informed Consent
This article does not contain any studies with human or animal subjects performed by any of the authors.
Additional information
This article is part of the Topical Collection on Mathematical Models of Stem Cell Behavior
Rights and permissions
About this article
Cite this article
Smith, R.C., MacArthur, B.D. Information-Theoretic Approaches to Understanding Stem Cell Variability. Curr Stem Cell Rep 3, 225–231 (2017). https://doi.org/10.1007/s40778-017-0093-5
Published:
Issue Date:
DOI: https://doi.org/10.1007/s40778-017-0093-5