Keywords

3.1 Introduction

In this century, the rapid development of genomics biotechnologies, large data storage technologies, mobile network technology, and portable medical devices makes it possible to measure, record, store, and track analysis of the genome, physiological dynamics, and living environment and subsequently to reveal the differences and uniqueness of an individual. These technologies may allow us to reclassify the diseases for making diagnostic and therapeutic strategies more precisely tailored to individual patients, leading the birth of precision medicine and personalized medicine. Portable noninvasive medical devices are crucial to capture individual characteristics of biological dynamics. In fact, the rapid development of wearable, mobile, automatic, continuous, high-throughput medical device for measuring human biological parameters heralds a new era [5] – high throughput phenotyping era. In this new era, the wearable noninvasive m5edical devices and the analysis/management of related digital medical data will revolutionize the management and treatment of diseases, subsequently resulting in the establishment of a new healthcare system including those for the treatment and care of respiratory patients.

One of the key features that can be extracted from the data obtained by the high-throughput medical device is the complexity of physiological signals. Thus, complexity studies in various major diseases including respiratory diseases are arising as an important method to analyze these continuous monitoring data measured by the noninvasive medical devices. The complexity of physiological signals can be represented by entropy of biological dynamics contained in the physiological signals measured by continuous monitoring medical devices.

The initial entropy is derived from and applied to the physics of thermodynamics and statistical physics. Clausius introduced the concept of entropy in the 1850s [1] and was the first one to enunciate the second law of thermodynamics by saying that “entropy always increases.” Boltzmann was the first to state the logarithmic connection between entropy and probability in 1886. In 1948, Shannon [12] proposed an entropy (later known as Shannon entropy) and a large number of applications in information science. The Kolmogorov entropy [8] and Renyi entropy [10], which are developed on the basis of Shannon’s entropy, are widely used in the nonlinear dynamics of the physical system. The entropy can be applied to the experimental data of biological dynamics, such as approximate entropy [9], sample entropy [11], and multiscale entropy [2, 14], to quantify the physiological signals in the physiological dynamic system, such as heart rate, airflow, pressure in airway, signal sound, and so on. Hence, in this chapter, I should describe the above major concepts of entropy, show their connections, and demonstrate an example of using entropy. The original concepts of entropy proposed by Clausius and Boltzmann are rarely used for biological dynamics directly. Thus, I start with Shannon entropy.

3.2 Shannon Entropy

For a discrete random variable Y with a probability mass function p(Y), the entropy is defined as the expectation of a function of Y, I(Y), where I(Y) =  − log(p(Y)). That is,

$$ H(Y)=E\left\{I(Y)\right\}=E\left\{-\log \left(p(Y)\right)\right\} $$
(3.1)

In information science, I(Y) is the information content of Y, which is also a random variable. Suppose that the random variable Y has possible values {y 1, y 2,  … , y n } and a corresponding probability mass function of p i  = Pr(Y = y i ). The entropy can be expanded as [12]

$$ H(Y)=E\left\{I(Y)\right\}=E\left\{-\log \left(p(Y)\right)\right\}=\sum_{i=1}^n\left({p}_i\times \left(-\log \left({p}_i\right)\right)\right)=-\sum_{i=1}^n\left({p}_i\log {p}_i\right) $$
(3.2)

This definition of entropy can be extended to a continuous random variable with a probability density function f(Y) as follows:

$$ H(Y)=E\left\{I(Y)\right\}=E\left\{-\log \left(f(Y)\right)\right\}=-\int f(Y)\times \log \left(f(Y)\right) dY $$
(3.3)

The entropy defined on the discrete variable is most commonly used. Hence, we focus on the entropy defined on a discrete variable. The concept of entropy can be easily extended to multidimensional random variable.

Suppose there are two events, X and Y, in question with I possibilities for the first and J for the second. Let p(x i , y j ) be the probability of the joint occurrence of x i for the first and y j for the second. The marginal mass density functions of X and Y are \( {p}_X\left({x}_i\right)={\sum}_{j=1}^Jp\left({x}_i,{y}_j\right) \)and \( {p}_Y\left({y}_j\right)={\sum}_{i=1}^Ip\left({x}_i,{y}_j\right) \), respectively. The entropy of the joint event is [12]

$$ H\left(X,Y\right)={E}_{X,Y}\left\{I\left(X,Y\right)\right\}={\mathrm{E}}_{X,Y}\left\{-\log \left(p\left(X,Y\right)\right)\right\}=-\sum_{i=1}^I\sum_{j=1}^J\left(p\left({x}_i,{y}_j\right)\log \left(p\left({x}_i,{y}_j\right)\right)\right) $$
(3.4)

The entropy of X and Y are, respectively,

$$ {\displaystyle \begin{array}{ll}H(X)& ={E}_X\left\{-\log \left(p(X)\right)\right\}=-\sum \limits_{i=1}^I\left({p}_X\left({x}_i\right)\log \left({p}_X\left({x}_i\right)\right)\right)\kern0.5em \hfill \\ {}& =-\sum \limits_{i=1}^I\sum \limits_{j=1}^J\left(p\left({x}_i,{y}_j\right)\log \left({p}_X\left({x}_i\right)\right)\right)=-\sum \limits_{i=1}^I\sum \limits_{j=1}^J\left(p\left({x}_i,{y}_j\right)\log \left(\sum \limits_{j=1}^Jp\left({x}_i,{y}_j\right)\right)\right)\hfill \end{array}} $$
$$ {\displaystyle \begin{array}{l}H(Y)={E}_Y\left\{-\log \left(p(Y)\right)\right\}=-\sum \limits_{j=1}^J\left({p}_Y\left({y}_j\right)\log \left({p}_Y\left({y}_j\right)\right)\right)\hfill \\ {}=-\sum \limits_{j=1}^J\sum \limits_{i=1}^I\left(p\left({x}_i,{y}_j\right)\log \left({p}_Y\left({y}_j\right)\right)\right)\kern0.5em =-\sum \limits_{i=1}^I\sum \limits_{j=1}^J\left(p\left({x}_i,{y}_j\right)\log \left({p}_Y\left({y}_j\right)\right)\right)\hfill \\ {}=-\sum \limits_{i=1}^I\sum \limits_{j=1}^J\left(p\left({x}_i,{y}_j\right)\log \left(\sum \limits_{i=1}^Ip\left({x}_i,{y}_j\right)\right)\right)\hfill \end{array}} $$

The conditional probability of Y = y j given X  = x i is \( p\left(\left.Y\right|X\right)=\frac{p\left({x}_i{y}_j\right)}{\sum_{j=1}^np\left({x}_i{y}_j\right)}=\frac{p\left({x}_i{y}_j\right)}{p_X\left({x}_i\right)} \)The conditional entropy of two discrete random variables X and Y is defined as [12]

$$ {H}_X(Y)={\mathrm{E}}_{X,Y}\left\{I\left(\left.Y\right|X\right)\right\}={E}_{X,Y}\left\{-\log \left(p\left(\left.Y\right|X\right)\right)\right\}=-\sum_{i=1}^I\sum_{j=1}^J\left(p\left({x}_i,{y}_j\right)\log \frac{p\left({x}_i,{y}_j\right)}{\sum_{j=1}^np\left({x}_i,{y}_j\right)}\right) $$
(3.5)

Based on Shannon [12], the entropy has the following major properties for serving as a measure of choice or information:

  1. (i)

    H = 0 if and only if all the p i but one are zero, this one having the value unity. Thus only when we are certain of the outcome does H vanish. Otherwise H is positive.

  2. (ii)

    For a given n, H is a maximum and equal to logn when all the p i are equal (i.e., \( \frac{1}{n} \)). This is also intuitively the most uncertain situation.

  3. (iii)

    Any change toward equalization of the probabilities p 1 , p 2 ,  …  , p n increases H. Thus if p 1 < p 2and we increase p 1, decreasing p 2 an equal amount so that p 1 and p 2 are more nearly equal, then H increases. More generally, if we perform any “averaging” operation on the p i of the form \( {p}_i^{\prime }={\sum}_j^n{a}_{ij}{p}_j \) where \( {\sum}_j^n{a}_{ij}={\sum}_i^n{a}_{ij}=1 \)and all a ij  ≥ 0, then H increases (except in the special case where this transformation amounts to no more than a permutation of the p i with H of course remaining the same).

  4. (iv)

    It is easily shown that H(X, Y) ≤ H(X) + H(Y) because p(x i , y j ) ≥ p X (x i )p Y (y j ). The equality holds only if the events are independent (i.e., p(x i , y j ) = p X (x i )p Y (y j )). The uncertainty of a joint event is less than or equal to the sum of the individual uncertainties.

  5. (v)

    It is easily shown that H(X, Y) = H(X) + H X (Y). The uncertainty (or entropy) of the joint event X and Y is the uncertainty of X plus the uncertainty of Y given X is known.

  6. (vi)

    H(Y) ≥ H X (Y) because H(X) + H(Y) ≥ H(X, Y) = H(X) + H X (Y). The uncertainty of Y is never increased by knowledge of X. It will be decreased unless X and Y are independent events, in which case it is not changed.

3.3 Conditional Entropy

From Shannon’s definition [12] leading to the relationship of H(X, Y) = H(X) + H(Y|X), i.e.,

$$ {\displaystyle \begin{array}{ll}H\left(X,Y\right)& =-\sum \limits_{i=1}^m\sum \limits_{j=1}^n\left(p\left({x}_i,{y}_j\right)\log \left(p\left({x}_i,{y}_j\right)\right)\right)\hfill \\ {}& =-\sum \limits_{i=1}^m\sum \limits_{j=1}^n\left(p\left({x}_i,{y}_j\right)\log \left(\sum \limits_{j=1}^np\left({x}_i,{y}_j\right)\times \frac{p\left({x}_i,{y}_j\right)}{\sum_{j=1}^np\left({x}_i,{y}_j\right)}\right)\right)\hfill \\ {}& =-\sum \limits_{i=1}^m\sum \limits_{j=1}^n\left(p\left({x}_i,{y}_j\right)\log \left(\sum \limits_{j=1}^np\left({x}_i,{y}_j\right)\right)\right)\hfill \\ {}& -\sum \limits_{i=1}^m\sum \limits_{j=1}^n\left(p\left({x}_i,{y}_j\right)\log \left(\frac{p\left({x}_i,{y}_j\right)}{\sum_{j=1}^np\left({x}_i,{y}_j\right)}\right)\right)=H(X)+{H}_X(Y)\hfill \end{array}} $$

we know that X is not fixed at x i when the conditional entropy H(Y|X) is not for one fixed x i .

It is interesting to see that, if Z i  = Y|(X = x i )is treated as a random variable, the entropy defined on Z i will be

$$ {\displaystyle \begin{array}{ll}H\left({Z}_i\right)& =H\left(\left.Y\right|X={x}_i\right)={E}_{Z_i}\left\{I\left({Z}_i\right)\right\}={E}_{Z_i}\left\{-\log \left(\ p\left({Z}_i\right)\right)\right\}\kern0.5em \hfill \\ {}& =-\sum \limits_{j=1}^n\left(p\left(\left.{y}_j\right|{x}_i\right)\log \left(p\left(\left.{y}_j\right|{x}_i\right)\right)\right)\hfill \\ {}& =-\sum \limits_{j=1}^n\left(\frac{p\left({x}_i,{y}_j\right)}{\sum_{j=1}^np\left({x}_i,{y}_j\right)}\log \left(\frac{p\left({x}_i,{y}_j\right)}{\sum_{j=1}^np\left({x}_i,{y}_j\right)}\right)\right)\hfill \end{array}} $$

which differs from the conditional entropy of two discrete random variable. Note, here X is fixed at x i . In another word, the conditional entropy differs from the entropy of a conditional probability or event. The conditional entropy of Y given X measures the uncertainty of Y given we know X regardless of what the value of X is, whereas the entropy of Y given X measures the uncertainty of Y given X equals one of its specific values. Their relationship can be shown below:

$$ {\displaystyle \begin{array}{l}{H}_X(Y)=-\sum \limits_{i=1}^m\sum \limits_{j=1}^n\left(p\left({x}_i,{y}_j\right)\log \left(p\left(\left.{y}_j\right|{x}_i\right)\right)\right)\hfill \\ {}=-\sum \limits_{i=1}^m\sum \limits_{j=1}^n\left(p\left({x}_i\right)p\left({y}_j|{x}_i\right)\log p\left({y}_j|{x}_i\right)\right)\kern0.5em \hfill \\ {}=-\sum \limits_{i=1}^m\left(p\left({x}_i\right)\sum \limits_{j=1}^np\left(\left.{y}_j\right|{x}_i\right)\log p\left(\left.{y}_j\right|{x}_i\right)\right)=-\sum \limits_{i=1}^m\left(p\left({x}_i\right)H\left({Z}_i\right)\right)\hfill \end{array}} $$

That is,

$$ {H}_X(Y)=-\sum_{i=1}^m\left(p\left({x}_i\right)H\left({Z}_i\right)\right) $$
(3.6)

Hence, the conditional entropy is a weighted or average entropy of conditional probabilities or events.

3.4 Renyi Entropy

In mathematics, Shannon entropy can be seen as a special case of Renyi entropy [10] which is introduced here. In general, for a discrete random variable Y with a probability mass function p i  = Pr(Y = y i ), the Renyi entropy of order q, where q ≥ 0 , q ≠ 1, is

$$ {R}_q(Y)=-\frac{1}{q-1}\log \sum_{i=1}^n{p}_i^q $$
(3.7)

The limit of Renyi entropy for q→1 is the Shannon entropy, that is,

$$ \underset{q\to 1}{\lim }{R}_q(Y)=-\underset{q\to 1}{\lim}\frac{1}{q-1}\log \sum_{i=1}^n{p}_i^q=-\sum_{i=1}^n\left({p}_i\log {p}_i\right) $$
(3.8)

Consequently, it can be defined \( {R}_{q=1}(Y)=\underset{q\to 1}{\lim }{R}_q(Y) \). In such a case, the Shannon entropy is a special case of Renyi entropy with q = 1. It can be demonstrated that

$$ {R}_{q_1}(Y)>{R}_{q_2}(Y)\mathrm{when}\kern0.5em {q}_1>{q}_2 $$
(3.9)

3.5 Kolmogorov Entropy

Kolmogorov entropy (also known as metric entropy) is originated from the field of dynamical systems, which is defined as follows [8]. Consider a dynamical system with a phase space. Divide the phase space into a set N(ϵ) of disjoint D-dimensional hypercubes of content ϵ D. Let p(i 0, i 1, i 2,  … , i N ) be the probability that a trajectory is in hypercube i 0 at t = 0, hypercube i 1 at t = τ, hypercube i 2 at t = 2τ …, hypercube i N at t = Nτ.

Kolmogorov entropy is then defined as

$$ K=-\underset{\tau \to 0}{\lim}\underset{\epsilon \to {0}^{+}}{\lim}\underset{N\to \infty }{\lim}\frac{1}{\mathrm{N}\uptau}\left(\sum_{i_0,{i}_1,{i}_2,\dots, {i}_N}\left(p\left({i}_0,{i}_1,{i}_2,\dots, {i}_N\right)\log p\left({i}_0,{i}_1,{i}_2,\dots, {i}_N\right)\right)\right) $$
(3.10)

The above definition relies on quite some jargons such as phase space, hypercube, content, trajectory, and so on in physics that a statistician is usually unfamiliar.

A dynamical system is a description of a physical system that evolves over time. The system has many states, and all states are represented in the state space of the system. The state space is also termed as phase space. A path in the state space describes the dynamics of the dynamical system. The path is termed as trajectory.

Kolmogorov entropy is the theoretical basis for approximate entropy, sample entropy, and multiscale entropy that are commonly used in time series, which relies on the concept that Kolmogorov entropy is the rate of change of Shannon entropy of a system. The Kolmogorov-Sinai entropy measures unpredictability of a dynamical system. The higher the unpredictability, the higher the entropy. A higher Kolmogorov entropy value means a higher rate of change of the internal structure and of the information content and thus the faster development of complexity. The following paragraph should help us to understand why Kolmogorov entropy is the rate of change of Shannon entropy of a system.

Based on the setting for Eq. 3.13, Shannon entropy for a fix n is

$$ {K}_n=-\sum_{i_0,{i}_1,{i}_2,\dots, {i}_n}\left(p\left({i}_0,{i}_1,{i}_2,\dots, {i}_n\right)\log p\left({i}_0,{i}_1,{i}_2,\dots, {i}_n\right)\right) $$
(3.11)

Then, K n + 1 − K n is the increment of Shannon entropy from time to (n+1)τ, which can be seen as the information needed to predict the status at (n+1)τ given the status at up to is known. The overall rate of change H′ of Shannon entropy is thus

$$ {H}^{\prime }=\underset{\tau \to 0}{\lim}\underset{\epsilon \to {0}^{+}}{\lim}\underset{N\to \infty }{\lim}\frac{1}{\mathrm{N}\uptau}\sum_{n=0}^{N-1}\left({K}_{n+1}-{K}_n\right) $$
(3.12)

Consider

$$ \sum_{n=0}^{N-1}\left({K}_{n+1}-{K}_n\right)=\left({K}_1-{K}_0\right)+\left({K}_2-{K}_1\right)+\left({K}_3-{K}_2\right)+\dots +\left({K}_N-{K}_{N-1}\right)={K}_N-{K}_0 $$

and \( {K}_0={\sum}_{i_0}\left(p\left({i}_0\right)\log\ p\left({i}_0\right)\right)=0 \) because status at t = 0 is always known and p(i 0) = 1, we have

$$ {\displaystyle \begin{array}{l}{H}^{\prime }=\underset{\tau \to 0}{\lim \limits}\underset{\epsilon \to {0}^{+}}{\lim \limits}\underset{N\to \infty }{\lim \limits}\frac{1}{\mathrm{N}\uptau}\sum \limits_{n=0}^{N-1}\left({K}_{n+1}-{K}_n\right)=\underset{\tau \to 0}{\lim \limits}\underset{\epsilon \to {0}^{+}}{\lim \limits}\underset{N\to \infty }{\lim \limits}\frac{1}{N\tau}\left({K}_N-{K}_0\right)=\underset{\tau \to 0}{\lim \limits}\underset{\epsilon \to {0}^{+}}{\lim \limits}\underset{N\to \infty }{\lim \limits}\frac{1}{\mathrm{N}\uptau}{K}_N\hfill \\ {}\kern0.5em =-\underset{\tau \to 0}{\lim \limits}\underset{\epsilon \to {0}^{+}}{\lim \limits}\underset{N\to \infty }{\lim \limits}\frac{1}{\mathrm{N}\uptau}\left(\sum \limits_{i_0,{i}_1,{i}_2,\dots, {i}_N}\left(p\left({i}_0,{i}_1,{i}_2,\dots, {i}_N\right)\log p\left({i}_0,{i}_1,{i}_2,\dots, {i}_N\right)\right)\right)=K\hfill \end{array}} $$

That is, K = H′, which shows that Kolmogorov entropy is the rate of change of Shannon entropy of a system.

The application of Kolmogorov entropy in time series commonly goes through another quantity R 2, which is a special case of the rate of change of Renyi entropy R q .

For the setting for Eq. 3.9, the rate of change of Renyi entropy is

$$ {R}_q=-\underset{\tau \to 0}{\lim}\underset{\epsilon \to 0}{\lim}\underset{N\to \infty }{\lim}\frac{1}{\mathrm{N}\uptau}\frac{1}{q-1}\log \sum_{i_0,{i}_1,{i}_2,\dots, {i}_N}{p}^q\left({i}_0,{i}_1,{i}_2,\dots, {i}_N\right) $$
(3.13)

From Eqs. 3.7 and 3.8, we have that R 1is the Kolmogorov entropy and R 2 is the lower bound of Kolmogorov entropy.

Grassberger and Procaccia [6] demonstrated that for typical cases, R 2 is numerically close to K. More importantly, R 2 can be extracted fairly easily from an experimental signal as follows.

Let X i  = (y i , y i + 1,  … , y i + d − 1) (where i is from 1 to N-d+1) be a sequence of Y starting at y i with a length of d. That is, we have a sequence of N-d+1 vectors, X 1 , X 2 ,  …  , X N − d + 1.

Consider

$$ {C}_d\left(\epsilon \right)=\underset{N\to \infty }{\lim}\left\{\frac{1}{N^2}\times \mathrm{number}\ \mathrm{of}\ \mathrm{pairs}\left(n,m\right)\mathrm{with}{\left(\sum_{i=1}^d{\left({X}_{n+i}-{X}_{m+i}\right)}^2\right)}^{1/2}<\epsilon \right\} $$
$$ {K}_{2,d}\left(\epsilon \right)=\frac{1}{\tau}\log \frac{C_d\left(\epsilon \right)}{C_{d+1}\left(\epsilon \right)} $$

Grassberger and Procaccia [6] proved that

$$ \underset{\begin{array}{c}\hfill d\to \infty \hfill \\ {}\hfill \epsilon \to 0\hfill \end{array}}{\lim }{K}_{2,d}\left(\epsilon \right)={K}_2,\mathrm{i}.\mathrm{e}.,\underset{\begin{array}{c}\hfill d\to \infty \hfill \\ {}\hfill \epsilon \to 0\hfill \end{array}}{\lim}\frac{1}{\tau}\log \frac{C_d\left(\epsilon \right)}{C_{d+1}\left(\epsilon \right)}={K}_2 $$
(3.14)

The Euclidean distance in C d (ϵ) may be replaced by the maximum norm [13].

3.6 Approximate Entropy

Consider a time series of data with length of N , Y = (y 1, y 2,  … , y N ), from measurements equally spaced in time. Let X i  = (y i , y i + 1,  … , y i + m − 1) (where i is from 1 to N-m+1) be a sequence of Y starting at y i with a length of m. That is, we have a sequence of N-m+1 vectors, X 1 , X 2 ,  …  , X N − m + 1. For each pair of sequences, X i and X j , we may define a distance between them. One commonly used distance is the maximum absolute difference of their corresponding elements, namely,

$$ d\left({X}_i,{X}_j\right)=\max \left(\left|{X}_i-{X}_j\right|\right)=\underset{k=1,2,\dots, m}{\max}\left|{y}_{i+k-1}-{y}_{j+k-1}\right| $$

For a sequence X i , a sequence X j that has a distance from X i less than or equal to r (i.e., d(X i , X j ) ≤ r) is defined as within the r of X i and count it as a match with X i . We may count the number of X j that matches with X i with respect to (w.r.t.) r and denote it as \( {C}_i^m(r) \). The proportion of X j matching with X i w.r.t. r is then

$$ {P}_i^m(r)={C}_i^m(r)/\left(N-m+1\right) $$

When N is large, \( {P}_i^m(r) \) represents the probability that any X j matching with X i w.r.t. r.

The average proportion of matches for all sequences X i  (1 ≤ i ≤ N − m + 1) is thus

$$ {P}^m(r)=\frac{1}{N-m+1}\sum_{i=1}^{N-m+1}{P}_i^m(r) $$

Define Φm(r) as

$$ {\Phi}^m(r)=\frac{1}{N-m+1}\sum_{i=1}^{N-m+1}\log {P}_i^m(r) $$

Then, Eckmann-Ruelle (ER) entropy [4] is

$$ {\displaystyle \begin{array}{ll}\mathrm{ER}\ \mathrm{entropy}& =\underset{r\to 0}{\lim \limits}\underset{m\to \infty }{\lim \limits}\underset{N\to \infty }{\lim \limits}\left({\Phi}^m(r)-{\Phi}^{m+1}(r)\right)\hfill \\ {}& =\underset{r\to 0}{\lim \limits}\underset{m\to \infty }{\lim \limits}\underset{N\to \infty }{\lim \limits}\left(\frac{1}{N-m+1}\sum \limits_{i=1}^{N-m+1}\log {P}_i^m(r)-\frac{1}{N-m}\sum \limits_{i=1}^{N-m}\log {P}_i^{m+1}(r)\right)\hfill \end{array}} $$

The approximate entropy for fixed m and r [9] is

$$ {\displaystyle \begin{array}{ll}\mathrm{ApEn}\ \left(m,r\right)& =\underset{N\to \infty }{\lim \limits}\left({\Phi}^m(r)-{\Phi}^{m+1}(r)\right)\kern0.5em \hfill \\ {}& =\underset{N\to \infty }{\lim \limits}\left(\frac{1}{N-m+1}\sum \limits_{i=1}^{N-m+1}\log {P}_i^m(r)-\frac{1}{N-m}\sum \limits_{i=1}^{N-m}\log {P}_i^{m+1}(r)\right)\hfill \end{array}} $$

Given N data points

$$ \mathrm{ApEn}\ \left(m,r,N\right)={\Phi}^m(r)-{\Phi}^{m+1}(r)=\frac{1}{N-m+1}\sum_{i=1}^{N-m+1}\log {P}_i^m(r)-\frac{1}{N-m}\sum_{i=1}^{N-m}\log {P}_i^{m+1}(r) $$

The theoretical framework for approximate entropy is based on the following theorems [9].

Assume a stationary process u(i) with continuous state space. Let (X, Y) be the joint stationary probability measure on a two-dimensional space for this process (assuming uniqueness), and π X be the equilibrium probability of X. Then a.s.

Theorem 1

$$ \mathrm{ApEn}\ \left(1,r\right)=-\int u\left(x,y\right)\log \left({\int}_{z=y-r}^{y+r}{\int}_{w=x-r}^{x+r}u\left(w,z\right)\mathrm{d}w\ \mathrm{d}z/{\int}_{w=x-r}^{x+r}\pi (w)\mathrm{d}w\right)\mathrm{d}x\ \mathrm{d}y $$

Theorem 2

For an i.i.d. process with density function, a.s. (for any m ≥ 1)

$$ \mathrm{ApEn}\ \left(m,r\right)=-\int \pi (y)\log \left({\int}_{z=y-r}^{y+r}\pi (z)\mathrm{d}z\right)\mathrm{d}y $$

Theorem 3

In the first-order stationary Markov chain (discrete state space values) case, with r < min(|x − y|, x ≠ y, x and y state space values X), a.s. for any m

$$ \mathrm{ApEn}\ \left(m,r\right)=-\sum_{x\in X}\sum_{y\in Y}\pi (x){p}_{\mathrm{xy}}\log {p}_{\mathrm{xy}} $$

Pincus [9] considers ApEn(m, r) as a family of formulas and ApEn(m, r, N) as a family of statistics; system comparisons are intended with fixed m and r. This family of statistics is rooted in the work of Grassberger and Procaccia (1983) [6, 7] and Eckmann and Ruelle [4]. The above theory and method for a measure of regularity proposed by Pincus [9] are closely related to the Kolmogorov entropy, the rate of generation of new information, which can be applied to the typically short and noisy time series of clinical data [11].

3.7 Sample Entropy

Like approximate entropy, sample entropy is also defined on the count \( {C}_i^m(r) \) and proportion \( {P}_i^m(r) \) of a sequence X j matching with another sequence X i w.r.t. r but with two alterations [11]. First, self-matches are counted in the calculation of \( {C}_i^m(r) \) for approximate entropy, whereas self-matches are excluded in the calculation of \( {C}_i^m(r) \) for sample entropy. Second, for a fixed m, only the first Nm sequences X j of length m are used for calculating \( {C}_i^m(r) \) to ensure that the number of sequences used for \( {C}_i^m(r) \) is the same as the number of sequences available for \( {C}_i^{m+1}(r) \). To avoid confusion, in the altered setting as contrasted to the unaltered sitting, we may use \( {C}_i^m{(r)}^{\ast } \) to denote the number of matches with the i th sequence X i of length m, i.e., \( {C}_i^m{(r)}^{\ast }=\left\{{C}_i^m(r);i=1,2,i-1,i+1,N-m\right\} \), and to use \( {C}_i^m{(r)}^{\ast \ast } \) to denote the number of matches with the i th sequence X i sequence of length m + 1, i.e., \( {C}_i^m{(r)}^{\ast \ast }=\left\{{C}_i^{m+1}(r);i=1,2,i-1,i+1,N-m\right\} \). Similar to the proportion \( {P}_i^m(r) \)of matches for approximate entropy, we have the proportions, \( {P}_i^m{(r)}^{\ast } \)and \( {P}_i^m{(r)}^{\ast \ast } \), of matches for sample entropy as follows:

$$ {P}_i^m{(r)}^{\ast }={C}_i^m{(r)}^{\ast }/\left(N-m\right) $$
$$ {P}_i^m{(r)}^{\ast \ast }={C}_i^m{(r)}^{\ast \ast }/\left(N-m-1\right) $$

Similar to the average proportion P m(r) of matches for all sequences X i for approximate entropy, we have the average proportions, P m(r)* and P m(r)**, of matches for sample entropy as follows:

$$ {P}^m{(r)}^{\ast }=\frac{1}{N-m}\sum_{i=1}^{N-m}{P}_i^m{(r)}^{\ast } $$
$$ {P}^m{(r)}^{\ast \ast }=\frac{1}{N-m-1}\sum_{i=1}^{N-m-1}{P}_i^m{(r)}^{\ast \ast } $$

Clearly, P m(r)* is an estimate for the probability that two different sequences will match for m points, whereas P m(r)** is an estimate for the probability that two different sequences will match for m + 1 points.

The sample entropy for fixed m and r [11] is defined as

$$ \mathrm{SampEn}\ \left(m,r\right)=\underset{N\to \infty }{\lim}\left(-\log \frac{P^m{(r)}^{\ast \ast }}{P^m{(r)}^{\ast }}\right) $$

Given N data points

$$ \mathrm{SampEn}\ \left(m,r,N\right)=-\log \frac{P^m{(r)}^{\ast \ast }}{P^m{(r)}^{\ast }} $$

From the above definition, sample entropy is essentially the negative logarithm of the ratio of the probability that any two different sequences in a time series match for m points to the probability that any two different sequences in the time series match for m + 1 points.

The theoretical framework for sample entropy is based on Grassberger and Procaccia’s (1983) work as concluded in Eq. 3.14.

3.8 Multiscale Entropy Analysis

Kolmogorov entropy, approximate entropy, and sample entropy described above are based on a single scale, reflecting the uncertainty of the next point given the past history of the series as their calculation depends on a function’s one step difference (e.g., K n+1K n ). Hence, they do not account for features related to structure on scales other than the shortest one [2]. To address this issue, Zhang [14] introduced multiscale entropy to the analysis of the physical time series. Costa et al. [2] further extended the multiscale entropy technique applicable to the analysis of the biological time series.

The basic idea for calculating multiscale entropy is to first generate a new time series consisting of the means of consecutive nonoverlap segments each with fixed length k of observed data points and then to calculate Kolmogorov entropy, approximate entropy, or sample entropy based on the newly generated time series. Specifically, we first divide the original signal ({y i }, 1 ≤ i ≤ N) into nonoverlapping segments of equal length (k) and calculating the mean value of the data points in each of these segments [14]. This process is called coarse graining, and the newly generated time series is called coarse-grained time series. The length k is called a scale factor (or simply a scale).The coarse-graining process is repeated for a range of the scale factor. As the scale factor k changes, we will construct different coarse-grained time series, and subsequently we will calculate corresponding entropy values on the newly coarse-grained time series. We may now plot the entropy of coarse-grained time series against the scale factor k. This procedure is called multiscale analysis (MSE) [2].

The coarse-graining process can be applied to not only the mean of a divided segment but also its variance and any moments in statistics [3]. Costa and Goldberger [3] termed the MSE on mean-based coarse-grained time series as MSE μ , MSE on variance-based coarse-grained time series as MSEσ 2, and MSE on any moment-based coarse-grained time series as MSE n . Note, in statistics, the mean is the first central moment, and the variance is the second central moment.

3.9 Demonstration of an Example

Diabetes is a major disease in human being. It is a group of metabolic diseases in which the patients’ blood sugar level is high over a prolonged period. The key index for monitoring diabetes is blood glucose level in a person. The left panel of Fig. 3.1 shows the glucose levels of a healthy person (black points) and a patient with type II diabetes (gray points), measured once every 3 min in 2 days. I calculate the multiscale sample entropy based on the data shown in the left panel of Fig. 3.1 and obtain the results shown in the right panel of Fig. 3.1.

Fig. 3.1
figure 1

Human glucose dynamics and its analysis using multiscale sample entropy. Left Panel, value of glucose levels in a healthy person (black points) and in a patient with type II diabetes (gray points) measured once in every 3 min in 2 days. Right Panel, multiscale sample entropy values in a healthy person (black points and line) and in a patient with type II diabetes (gray points) calculated based on the glucose levels shown in the left panel

The average value and standard deviation of the glucose levels are 5.16 and 1.00, respectively, in the healthy person and insulin and 12.45 and 3.08, respectively, in the patient with type II diabetes. The diabetes patient has sample entropy values lower than the healthy person at each scale (the right panel of Fig. 3.1). The change of direction in entropy is different from that in the average value (or variability), which may indicate that entropy contains information that the average value cannot be disclosed.

3.10 Discussion and Conclusion

Based on the description of various concepts of entropy for biological dynamics, the Shannon entropy is a measurement of the disorder (or uncertainty) of a system. Shannon entropy is a special case (with q = 1) of more broad entropy called Renyi entropy. Kolmogorov entropy is the rate of change of Shannon entropy. The low bound of Kolmogorov entropy is the rate of change of Renyi entropy with q = 2. This low bound can be approximated by an entropy that can be calculated using time series data from biological dynamics. This entropy is called approximate entropy. Sample entropy is an improved version of approximate entropy that corrects unpleased self-matching effect. Both sample entropy and approximate entropy can be calculated in different layers of data, resulting in multiscale entropy (MSE). Sample entropy, approximate entropy, and/or their corresponding MSE have now been broadly used to assess the complexity of biological dynamics in various diseases measured by noninvasive medical device.

As demonstrated in the diabetes example, entropy analysis for the dynamics of physiological signals can disclose information that is not contained in the average value or variability of the physiological signals. Thus, entropy analysis could be used to differentiate the major diseases such as diabetes into different sub-diseases on the top of existing approach (such as using the average value in the diabetes case). It could also be used to reclassify the major diseases so that more specific and more effective drugs or treatment can be developed. All these will help to the development of precision medicine in which the right drug at the right dosage can be prescribed to treat the right person at the right time.