Definition

Cellular Automata (CA) are discrete, spatially explicit extended dynamic systems. A CA system is composed of adjacent cells characterized by an internal state whose value belongs to a finite set. The updating of these states is made simultaneously according to a common local transition rule involving only a neighborhood of each cell. Thus, if \( { \sigma ^{(T)}_{i} } \) is taken to denote the value of cell i at time step T, the site values evolve by iteration of the mapping: \( { \sigma^{(T+1)}_{i}= \phi {\big (\{\sigma} ^{(T)}_{j}\}\in \mathcal{N}_{i}{\big)} } \), where ϕ is an arbitrary function which specifies the cellular automaton rule operating on \( { \mathcal{N}_{i} } \), i. e. the set of cells in the neighborhood of the cell i. Last but not least, standard CA are ahistoric (memoryless): i. e., the new state of a cell depends on the neighborhood configuration only at the preceding time step.

The standard framework of CA can be extended by the consideration of all past states (history) in the application of the CA rules by implementing memory capabilities in cells: \( \sigma^{(T+1)}_{i}=\phi\big(\{{s}^{(T)}_{j}\}\in\mathcal{N}_{i}{\big)} \), with \( s^{(T)}_j=s\big(\sigma^{(1)}_j,\ldots,\sigma^{(T-1)}_j,\sigma^{(T)}_j\big) \) being a state function of the series of states of the cell j up to time-step T. Thus, in CA with memory here: while the transition functions φ of the CA remain unaltered, historic memory of all past iterations is retained by featuring each cell by a summary of its past states.

The memory mechanism considered here is different from that of other CA with memory reported in the literature (e. g., p. 7 in [1], p. 43 in [35], p. 118 in [56]). Typically, these are higher‐order-in-time rules that incorporate memory into the transition rule, determining the new configuration in terms of the configurations at previous time-steps. Thus, in second order in time rules: \( { \sigma^{(T+1)}_{i}= \Phi \big (\{\sigma^{(T)}_{j}\}\in \mathcal{N}_{i}, \{\sigma ^{(T-1)}_{j}\}\in \mathcal{N}_{i}\big) } \). CA with memory in cells are cited in [58], but just to state that “CA with memory in cells would result in a qualitatively different behavior”. Some authors [55], define rules with memory as those with dependence in ϕ on the state of the cell to be updated. So, one‐dimensional rules with no memory, take the form: \( { \sigma^{(T+1)}_{i}=\phi\big(\sigma^{(T)}_{i-1},\sigma ^{(T)}_{i+1} \big) } \). Our use of the term memory is not any of these.

Introduction

As a simple example, in the two‐dimensional, two-state automaton labeled ahistoric in Fig. 1, a cell becomes (or remains) alive if any cell in its nearest neighborhood is alive, but becomes (or remains) dead on the contrary case. The initial single perturbation in Fig. 1 spreads as fast as possible, i. e. at the speed of light.

The lower series of patterns in Fig. 1 shows the effect of featuring cells by their most frequent state, i. e. mode memory: \( { s^{(T)}_{i}=\text{mode}(\sigma^{(1)}_{i},\ldots,\sigma^{(T)}_{i}) } \) (with \( { s^{(T)}_{i}=\sigma^{(T)}_{i} } \) in case of a tie) on the speed of light. Memory has a characteristic inertial effect.

Figure 1
figure 1_55

The speed of light starting from a single active cell

Average Memory

Cells can be featured by a weighted mean value of all their previous states through powers of some parameter \( { \alpha\in[0,1] } \) acting as a memory factor. Thus, at every time-step T, the weighted mean of the states up to T is computed for every cell i:

$$ m^{(T)}_i(\sigma ^{(1)}_i,\ldots ,\sigma ^{(T)}_i) = {\frac{\sigma^{(T)}_i + \sum^{T-1}_{t=1}\alpha ^{T-t}\sigma ^{(t)}_i}{1+ \sum^{T-1}_{t=1}\alpha ^{T-t}}} \equiv {\frac{{{\omega^{(T)}_i}}} {\Omega(T)}} $$

and then, the featuring states s are obtained by rounding the m ones to 1 for \( m > 0.5 \) and to 0 for \( { m < 0.5 } \). If m is exactly 0.5, then the last state is assigned (\( { s_i^{(T)}=\sigma_i^{(T)} } \)). This memory mechanism is accumulative in their demand of knowledge of past history: to calculate the memory charge \( { \omega^{(T)}_{i} } \) it is not necessary to know the whole \( { \big\{ {\sigma}^{(t)}_{i}\big\} } \) series, while it can be sequentially calculated as: \( \omega^{(T)}_{i}= \alpha \omega ^{(T-1)}_{i}+\sigma ^{(T)}_{i} \). It is, \( \Omega(T)=\displaystyle {(\alpha^{T}-1)}/ {(\alpha-1)} \).

The choice of the memory factor α simulates the long-term or remnant memory effect: the limit case \( { \alpha =1 } \) corresponds to memory with equally weighted records (full memory, equivalent to mode if \( { k=2 } \)), whereas \( { \alpha \ll 1 } \) intensifies the contribution of the most recent states and diminishes the contribution of the past ones (short term working memory). The choice \( { \alpha = 0 } \) leads to the ahistoric model.

In the most unbalanced scenario up to T, i. e.: \( { \sigma^{(1)}_{i}= \dots =\sigma ^{(T-1)}_{i} \neq \sigma ^{(T)}_{i} } \), it is: 

$$ \begin{aligned} m(0,0,\ldots,0,1)&=\displaystyle{\frac{1}{2}} \Rightarrow {\alpha-1\over\alpha^{T}-1}={\frac{1}{2}} \\ m(1,1,\ldots,1,0)&=\displaystyle{\frac{1}{2}} \Rightarrow {{\alpha^{T} -\alpha \over\alpha^{T}-1}}={1\over 2}\:. \end{aligned} $$

Thus, memory is only operative if α is greater than a critical \( { \alpha_{T} } \) that verifies:

$$ { \alpha^{T}_{T}-2\alpha_{T}+1=0\:, } $$
(1)

in which case cells will be featured at T with state values different to the last one. Initial operative values are: \( { \alpha_{3}= 0.61805 } \), \( { \alpha_{4}=0.5437 } \). When \( { T\rightarrow \infty } \), Eq. 1 becomes: \( { -2\alpha_{\infty}+1=0 } \), thus, in the \( { k=2 } \) scenario, α‑memory is not effective if \( { \alpha \le 0.5 } \).

Figure 2
figure 2_55

The 2D parity rule with memory up to \( { T=15 } \)

A Worked Example: The Parity Rule

The so-called parity rule states: cell alive if the number of neighbors is odd, dead on the contrary case. Figure 2 shows the effect of memory on the parity rule starting from a single live cell in the Moore neighborhood. In accordance with the above given values of α 3 and α 4: (i) The pattern at \( { T=4 } \) is the ahistoric one if \( { \alpha\le0.6 } \), altered when \( { \alpha \ge 0.7 } \), and (ii) the patterns at \( { T=5 } \) for \( { \alpha=0.54 } \) and \( { \alpha=0.55 } \) differ.

Not low levels of memory tend to freeze the dynamics since the early time-steps, e. g. over 0.54 in Fig. 2. In the particular case of full memory small oscillators of short range in time are frequently generated, such as the period‐two oscillator that appears as soon as at \( { T=2 } \) in Fig. 2. The group of evolution patterns shown in the [0.503,0.54] interval of α variation of Fig. 2, is rather unexpected to be generated by the parity rule, because they are too sophisticated for this simple rule. On the contrary, the evolution patterns with very small memory, \( { \alpha =0.501 } \), resemble those of the ahistoric model in Fig. 2. But this similitude breaks later on, as Fig. 3 reveals: from \( { T=19 } \), the parity rule with minimal memory evolves producing patterns notably different to the ahistoric ones. These patterns tend to be framed in squares of size not over \( { T \times T } \), whereas in the ahistoric case, the patterns tend to be framed in \( { 2T\times 2T } \) square regions, so even minimal memory induces a very notable reduction in the affected cell area in the scenario of Fig. 2. The patterns of the featured cells tend not to be far to the actual ones, albeit examples of notable divergence can be traced in Fig. 2. In the particular case of the minimal memory scenario of Fig. 2, that of \( { \alpha=0.501 } \), memory has no effect up to \( { T=9 } \), when the pattern of featured live cells reduces to the initial one; afterward both evolutions are fairly similar up to \( { T=18 } \), but at this time step both kinds of patterns notably differs, and since then the evolution patterns in Fig. 3 notably diverge from the ahistoric ones.

Figure 3
figure 3_55

The 2D parity rule with \( { \alpha=0.501 } \) memory starting from a single site live cell up to \( { T=55 } \)

To give consideration to previous states (historic memory) in two‐dimensional CA tends to confine the disruption generated by a single live cell. As a rule, full memory tends to generate oscillators, and less historic information retained, i. e. smaller α value, implies an approach to the ahistoric model in a rather smooth form. But the transition which decreases the memory factor from \( { \alpha =1.0 } \) (full memory) to \( { \alpha = 0.5 } \) (ahistoric model), is not always regular, and some kind of erratic effect of memory can be traced.

The inertial (or conservating) effect of memory dramatically changes the dynamics of the semitotalistic LIFE rule. Thus, (i) the vividness that some small clusters exhibit in LIFE, has not been detected in LIFE with memory. In particular, the glider in LIFE does not glide with memory, but stabilizes very close to its initial position as the tub  , (ii) as the size of a configuration increases, often live clusters tend to persist with a higher number of live cells in LIFE with memory than in the ahistoric formulation, (iii) a single mutant appearing in a stable agar can lead to its destruction in the ahistoric model, whereas its effect tends to be restricted to its proximity with memory [26].

One‐Dimensional CA

Elementary rules are one‐dimensional, two-state rules operating on nearest neighbors. Following Wolfram's notation, these rules are characterized by a sequence of binary values \( { (\beta) } \) associated with each of the eight possible triplets

$$ \begin{aligned} &\big(\sigma^{(T)}_{i-1},\sigma^{(T)}_{i},\sigma^{(T)}_{i+1}\big)\::\\ &\begin{array}{cccccccc} 111 & 110 & 101 & 100 & 011 & 010 & 001 & 000 \\ \beta_1 & \beta_2 & \beta_3 & \beta_4 & \beta_5 & \beta_6 & \beta_7 & \beta_8 \\ \end{array} \end{aligned} $$

The rules are conveniently specified by their rule number \( { {\mathcal R}= \sum^{8}_{i=1}\beta_{i}2^{8-i} } \). Legal rules are reflection symmetric (\( { \beta_{5}=\beta_{2}, \beta_{7}=\beta_{4} } \)), and quiescent (\( { \beta_{8}=0 } \)), restrictions that leave 32 possible legal rules.

Figure 4
figure 4_55

Elementary, legal rules with memory from a single site live cell

Figure 4
figure 5_55

(continued)

Figure 4 shows the spatio‐temporal patterns of legal rules affected by memory when starting from a single live cell [17]. Patterns are shown up to \( { T= 63 } \), with the memory factor varying from 0.6 to 1.0 by 0.1 intervals, and adopting also values close to the limit of its effectivity: 0.5. As a rule, the transition from the \( { \alpha =1.0 } \) (fully historic) to the ahistoric scenario is fairly gradual, so that the patterns become more expanded as less historic memory is retained (smaller \( { \alpha) } \). Rules 50, 122, 178,250, 94, and 222,254 are paradigmatic of this smooth evolution. Rules 222 and 254 are not included in Fig. 4 as they evolve as rule 94 but with the inside of patterns full of active cells. Rules 126 and 182 also present a gradual evolution, although their patterns with high levels of memory models hardly resemble the historic ones. Examples without a smooth effect of memory are also present in Fig. 4: (i) rule 150 is sharply restrained at \( { \alpha=0.6 } \), (ii) the important rule 54 extinguish in [0.8,0.9], but not with full memory, (iii) the rules in the group {18,90,146,218} become extinct from \( { \alpha=0.501 } \). Memory kills the evolution for these rules already at \( { T=4 } \) for α values over α 3 (thus over 0.6 in Fig. 4): after \( { T=3 } \) all the cells, even the two outer cells alive at \( { T=3 } \), are featured as dead, and (iv) rule 22 becomes extinct for \( { \alpha=0.501 } \), not in 0.507, 0.6, and 0.7, again extinguish at 0.8 and 0.9, and finally generate an oscillator with full memory. It has been argued that rules 18, 22, 122, 146 and 182 \( { simulate } \) Rule 90 in that their behavior coincides when restricted to certain spatial subsequences. Starting with a single site live cell, the coincidence fully applies in the historic model for rules 90, 18 and 146. Rule 22 shares with these rules the extinction for high α values, with the notable exception of no extinction in the fully historic model. Rules 122 and 182 diverge in their behavior: there is a gradual decrease in the width of evolving patterns as α is higher, but they do not reach extinction.

Figure 5
figure 6_55

Elementary, legal rules with memory starting at random

Figure 5
figure 7_55

(continued)

Figure 5
figure 8_55

(continued)

Figure 5 shows the effect of memory on legal rules when starting at random: the values of sites are initially uncorrelated and chosen at random to be 0 (blank) or 1 (gray) with probability 0.5. Differences in patterns resulting from reversing the center site value are shown as black pixels. Patterns are shown up to \( { T=60 } \), in a line of size 129 with periodic boundary conditions imposed on the edges. Only the nine legal rules which generate non‐periodic patterns in the ahistoric scenario are significantly affected by memory. The patterns with inverted triangles dominate the scene in the ahistoric patterns of Fig. 5, a common appearance that memory tends to eliminate.

History has a dramatic effect on Rule 18. Even at the low value of \( { \alpha = 0.6, } \) the appearance of its spatio‐temporal pattern fully changes: a number of isolated periodic structures are generated, far from the distinctive inverted triangle world of the ahistoric pattern. For \( { \alpha = 0.7, } \) the live structures are fewer, advancing the extinction found in [0.8,0.9]. In the fully historic model, simple periodic patterns survive.

Rule 146 is affected by memory in much the same way as Rule 18 because their binary codes differ only in their \( { \beta _{1} } \) value. The spatio‐temporal of rule 182 and its equivalent Rule 146 are reminiscent, though those of Rule 182 look like a negatives photogram of those of Rule 146.

The effect of memory on rule 22 and the complex rule 54 is similar. Their spatio‐temporal patterns in \( { \alpha =0.6 } \) and \( { \alpha =0.7 } \) keep the essential of the ahistoric, although the inverted triangles become enlarged and tend to be more sophisticated in their basis. A notable discontinuity is found for both rules ascending in the value of the memory factor: in \( { \alpha =0.8 } \) and \( { \alpha =0.9 } \) only a few simple structures survive. But unexpectedly, the patterns of the fully historic scenario differ markedly from the others, showing a high degree of synchronization.

The four remaining chaotic legal rules (90, 122, 126 and 150) show a much smoother evolution from the ahistoric to the historic scenario: no pattern evolves either to full extinction or to the preservation of only a few isolated persistent propagating structures (solitons). Rules 122 and 126, evolve in a similar form, showing a high degree of synchronization in the fully historic model.

As a rule, the effect of memory on the differences in patterns (DP) resulting from reversing the value of its initial center site is reminiscent of that on the spatio‐temporal patterns, albeit this very much depends on the actual simulation run. In the case of rule 18 for example, damage is not present in the simulation of Fig. 5. The group of rules 90, 122, 126 and 150 shows a, let us say canonical, fairly gradual evolution from the ahistoric to the historic scenario, so that the DP appear more constrained as more historic memory is retained, with no extinction for any α value. Figure 6 shows the evolution of the fraction \( { \rho _{T} } \) of sites with value 1, starting at random (\( { \rho _{0}=0.5 } \)). The simulation is implemented for the same rules as in Fig. 4, but with notably wider lattice: \( { N=500. } \) A visual inspection of the plots in Fig. 6, ratifies the general features observed in the patterns in Fig. 5 regarding density. That also stands for damage spreading: as a rule, memory depletes the damaged region.

Figure 6
figure 9_55

Evolution of the density starting at random in elementary legal rules. Color code: blue → full memory, black \( { \rightarrow \alpha=0.8 } \), red → ahistoric model

In one‐dimensional \( { r=2 } \) CA, the value of a given site depends on values of the nearest and next‐nearest neighbors. Totalistic \( { r=2 } \) rules with memory have the form: \( \sigma^{(T+1)}_{i} = \phi\big(s^{(T)}_{i-2}+s^{(T)}_{i-1}+s^{(T)}_{i}+s^{(T)}_{i+1}+s^{(T)}_{i+2} \big) \). The effect of memory on these rules follows the way traced in the \( { r=1 } \) context, albeit with a rich casuistic studied in [14].

Probabilistic CA

So far the CA considered are deterministic. In order to study perturbations to deterministic CA as well as transitional changes from one deterministic CA to another, it is natural to generalize the deterministic CA framework to the probabilistic scenario. In the elementary scenario, the β are replaced by probabilities

$$ \begin{aligned} &p=P\big(\sigma^{(T+1)}_{i}=1\Big/\sigma^{(T)}_{i-1},\sigma^{(T)}_{i},\sigma^{(T)}_{i+1}{\big)}\::\\ &\advance\arraycolsep-2pt\begin{array}{cccccccccc} 111 & 110 & 101 & 100 & 011 & 010 & 001 & 000 & &\\ \beta_1 & \beta_2 & \beta_3 & \beta_4 & \beta_5 & \beta_6 & \beta_7 & \beta_8 & \quad & \beta \in \{0,1\} \\ \hline p_1 & p_2 & p_3 & p_4 & p_5 & p_6 & p_7 & p_8 & \quad & p\in [0,1]\\ \end{array} \end{aligned} $$

As in the deterministic scenario, memory can be embedded in probabilistic CA (PCA) by featuring cells by a summary of past states s i instead of by their last state \( { \sigma_i } \): \( { p={P}\big(\sigma^{(T+1)}_{i}=1/{s}^{(T)}_{i-1},{s}^{(T)}_{i},{s}^{(T)}_{i+1}{\big)} } \). Again, memory is embedded into the characterization of cells but not in the construction of the stochastic transition rules, as done in the canonical approach to PCA. We have explored the effect of memory on three different subsets \( { \big(0,p_{2},0,p_{4},p_{2},0,p_{4},0\big) } \), \( { \big(0,0,p_{3},1,0,p_{6},1,0\big) } \), and \( { \big(p_{1},p_{2},p_{1},p_{2},p_{2},0,p_{2},0\big) } \) in [9].

Other Memories

A number of average‐like memory mechanisms can readily be proposed by using weights different to that implemented in the α-memory mechanism: \( { \delta(t)=\alpha^{T-t} } \). Among the plausible choices of δ, we mention the weights \( { \delta(t)=t^{c} } \) and \( { \delta(t)=c^{t} } \), \( { c \in \mathbb{N} } \), in which the larger the value of c, the more heavily is the recent past taken into account, and consequently the closer the scenario to the ahistoric model [19,21]. Both weights allow for integer‐based arithmetics (à la CA) comparing \( { 2\omega^{(T)} } \) to \( { 2\Omega(T) } \) to get the featuring states s (a clear computational advantage over the α-based model), and are accumulative in respect to charge: \( { \omega^{(T)}_{i}=\omega^{(T-1)}_{i}+T^{c}\sigma_{i}^{(T)} } \), \( { \omega^{(T)}_{i}=\omega^{(T-1)}_{i}+c^{T}\sigma_{i}^{(T)} } \). Nevertheless, they share the same drawback: powers explode, at high values of t, even for \( { c=2 } \).

Limited trailing memory would keep memory of only the last τ states. This is implemented in the context of average memory as: \( { \omega^{(T)}_i = \sum^{T}_{t=\top}\delta(t)\sigma ^{(t)}_{i} } \), with \( { \top = \max(1,T-\tau+1) } \). Limiting the trailing memory would approach the model to the ahistoric model \( { (\tau = 1) } \). In the geometrically discounted method, such an effect is more appreciable when the value of α is high, whereas at low \( { \alpha } \) values (already close to the ahistoric model when memory is not limited) the effect of limiting the trailing memory is not so important. In the \( { k=2 } \) context, if \( { \tau=3, } \) provided that \( { \alpha > \alpha _{3} = 0.61805, } \) the memory mechanism turns out to be that of selecting the mode of the last three states: \( { s^{(T)}_{i}=mode\big(\sigma^{(T-2)}_{i},\sigma^{(T)}_{i},\sigma^{(T-1)}_{i}\big) } \), i. e. the elementary rule 232.

Figure 7
figure 10_55

Legal (first row of patterns) and quiescent asymmetric elementary rules significantly affected by the mode of the three last states of memory

Figure 7 shows the effect of this kind of memory on legal rules. As is known, history has a dramatic effect on Rules 18, 90, 146 and 218 as their pattern dies out as early as at \( { T=4. } \) The case of Rule 22 is particular: two branches are generated at \( { T=17 } \) in the historic model; the patterns of the remaining rules in the historic model are much reminiscent of the ahistoric ones, but, let us say, compressed. Figure 7 shows also the effect of memory on some relevant quiescent asymmetric rules. Rule 2 shifts a single site live cell one space at every time-step in the ahistoric model; with the pattern dying at \( { T=4 } \). This evolution is common to all rules that just shift a single site cell without increasing the number of living cells at \( { T=2 } \), this is the case of the important rules 184 and 226. The patterns generated by rules 6 and 14 are rectified (in the sense of having the lines in the spatio‐temporal pattern slower slope) by memory in such a way that the total number of live cells in the historic and ahistoric spatio‐temporal patterns is the same. Again, the historic patterns of the remaining rules in Fig. 7 seem, as a rule, like the ahistoric ones compressed [22].

Figure 8
figure 11_55

The Rule 150 with elementary rules up to \( { R=125 } \) as memory

Elementary rules (ER, noted f) can in turn act as memory rules:

$$ s^{(T)}_{i}=f\big(\sigma^{(T-2)}_{i},\sigma^{(T)}_{i},\sigma^{(T-1)}_{i}\big) $$

Figure 8 shows the effect of ER memories up to \( { R=125 } \) on rule 150 starting from a single site live cell up to \( { T=13 } \). The effect of ER memories with \( { R > 125 } \) on rule 150 as well as on rule 90 is shown in [23]. In the latter case, complementary memory rules (rules whose rule number adds 255) have the same effect on rule 90 (regardless of the role played by the three last states in ϕ and the initial configuration). In the ahistoric scenario, Rules 90 and 150 are linear (or additive): i. e., any initial pattern can be decomposed into the superposition of patterns from a single site seed. Each of these configurations can be evolved independently and the results superposed (module two) to obtain the final complete pattern. The additivity of rules 90 and 150 remains in the historic model with linear memory rules.

Figure 9 shows the effect of elementary rules on the 2D parity rule with von Neumann neighborhood from a singe site live cell. This figure shows patterns from \( { T=4 } \), being the three first patterns:  . The consideration of CA rules as memory induces a fairly unexplored explosion of new patterns.

Figure 9
figure 12_55

The parity rule with elementary rules as memory. Evolution from \( {T=4-15} \) in the Neumann neighborhood starting from a singe site live cell

CA with Three States k

This section deals with CA with three possible values at each site \( { (k=3) } \), noted \( { \{0,1,2\} } \), so the rounding mechanism is implemented by comparing the unrounded weighted mean m to the hallmarks 0.5 and 1.5, assigning the last state in case on an equality to any of these values. Thus,

$$ \begin{aligned} \!s^{T}=0 \text{ if } m^{T} < 0.5, s^{T}=1 \text{ if } 0.5 < m^{T} < 1.5, s^{T}=2 \\ \text{ if } m^{T} > 1.5, \text{ and } s^{T}= \sigma^{T} \text{ if } m^{T}=0.5 \text{ or } m^{T}=1.5 \;. \end{aligned} $$

In the most unbalanced cell dynamics, historic memory takes effect after time step T only if \( \alpha > \alpha_{T} \), with \( 3\alpha^{T}_{T}-4\alpha _{T}+1= 0 \), which in the temporal limit becomes \( -4\alpha_{*}+1= 0 \Leftrightarrow \alpha_{*}=0.25 \). In general, in CA with k states (termed from 0 to \( k - 1 \)), the characteristic equation at T is \( (2k-3) \alpha^{T}_{T} - (2k - 1) \alpha_{T} + 1 = 0 \), which becomes \( { -2(k-1)\alpha_{*}+1=0 } \) in the temporal limit. It is then concluded that memory does not affect the scenario if \( \alpha \le \alpha_{*}(k) = {1}/{(2(k-1))} \).

We study first totalistic rules: \( \smash{\sigma^{(T+1)}_{i} = \phi \big (\sigma^{(T)}_{i-1}} +\sigma^{(T)}_{i}+\sigma^{(T)}_{i+1}{\big)} \), characterized by a sequence of ternary values \( { (\beta_{\texttt{s}}) } \) associated with each of the seven possible values of the sum (s) of the neighbors: \( (\beta_{6}, \beta_{5}, \beta_{4}, \beta_{3}, \beta_{2}, \beta_{1}, \beta_{0}) \), with associated rule number \( \mathcal{R} = \sum^{6}_{\texttt{s}=0}\beta_{\texttt{s}}3^{\texttt{s}} \in [0,2186]. \)

Figure 10 shows the effect of memory on quiescent \( { (\beta_{0}=0) } \) parity rules, i. e. rules with \( { \beta_{1}, \beta_{3} } \) and β 5 non null, and \( { \beta_{2}=\beta_{4}=\beta _{6}=0. } \) Patterns are shown up to \( { T=26 } \). The pattern for \( { \alpha =0.3 } \) is shown to test its proximity to the ahistoric one (recall that if \( { \alpha \le 0.25 } \) memory takes no effect). Starting with a single site seed it can be concluded, regarding proper three-state rules such as those in Fig. 10, that: (i) as an overall rule the patterns become more expanded as less historic memory is retained (smaller \( { \alpha) } \). This characteristic inhibition of growth effect of memory is traced on rules 300 and 543 in Fig. 10, (ii) the transition from the fully historic to the ahistoric scenario tends to be gradual in regard to the amplitude of the spatio‐temporal patterns, although their composition can differ notably, even at close α values, (iii) in contrast to the two-state scenario, memory fires the pattern of some three-state rules that die out in the ahistoric model, and no rule with memory dies out. Thus, the effect of memory on rules 276, 519, 303 and 546 is somewhat unexpected: they die out at \( { \alpha \le 0.3 } \) but at \( { \alpha =0.4 } \) the pattern expands, the expansion being inhibited (in Fig. 10) only at \( { \alpha \ge 0.8 } \). This activation under memory of rules that die at \( { T=3 } \) in the ahistoric model is unfeasible in the \( { k=2 } \) scenario.

Figure 10
figure 13_55

Parity \( { k=3 } \) rules starting from a single \( { \sigma=1 } \) seed. The red cells are at state 1, the blue ones at state

The features in the evolving patterns starting from a single seed in Fig. 10 are qualitatively reflected starting at random as shown with rule 276 in Fig. 11, which is also activated (even at \( { \alpha =0.3) } \) when starting at random. The effect of average memory (α and integer‐based models, unlimited and limited trailing memory, even \( { \tau=2 } \)) and that of the mode of the last three states has been studied in [21].

Figure 11
figure 14_55

The \( { k=3, {\mathcal R}=276 } \) rule starting at random

When working with more than three states, it is an inherent consequence of averaging the tendency to bias the featuring state to the mean value: 1. That explains the redshift in the previous figures. This led us to focus on a much more fair memory mechanism: the mode, in what follows. Mode memory allows for manipulation of pure symbols, avoiding any computing/arithmetics.

In excitable CA , the three states are featured: resting 0, excited 1 and refractory 2. State transitions from excited to refractory and from refractory to resting are unconditional, they take place independently on a cell's neighborhood state: \( { \sigma^{(T)}_{i}=1 \rightarrow \sigma^{(T+1)}_{i}=2 } \), \( \smash{ \sigma^{(T)}_{i}=2 \rightarrow \sigma^{(T+1)}_{i}=0 } \). In [15] the excitation rule adopts a Pavlovian phenomenon of defensive inhibition: when strength of stimulus applied exceeds a certain limit the system ‘shuts down’, this can be naively interpreted as an inbuilt protection of energy loss and exhaustion. To simulate the phenomenon of defensive inhibition we adopt interval excitation rules [2], and a resting cell becomes excited only if one or two of its neighbors are excited: \( \sigma^{(T)}_{i}=0 \rightarrow \sigma^{(T)}_{i}=1 \) if \( \sum_{j\in\mathcal{N}_{i}}\big(\sigma^{(T)}_{j}=1\big) \in \{1,2\} \) [3].

Figure 12 shows the effect of mode of the last three time steps memory on the defensive‐inhibition CA rule with the Moore neighborhood, starting from a simple configuration. At \( { T=3 } \) the outer excited cells in the actual pattern are not featured as excited but as resting cells (twice resting versus one excited), and the series of evolving patterns with memory diverges from the ahistoric evolution at \( { T=4 } \), becoming less expanded. Again, memory tends to restrain the evolution.

Figure 12
figure 15_55

Effect of mode memory on the defensive inhibition CA rule

The effect of memory on the beehive rule, a totalistic two‐dimensional CA rule with three states implemented in the hexagonal tessellation [57] has been explored in [13].

Reversible CA

The second‐order in time implementation based on the subtraction modulo of the number of states (noted ⊖): \( \sigma^{(T+1)}_{i}= \phi(\sigma^{(T)}_{j}\in \mathcal{N}_{i}) \ominus\sigma^{(T-1)}_{i} \), readily reverses as: \( \sigma^{(T-1)}_{i}=\phi(\sigma^{(T)}_{j}\in \mathcal{N}_{i}) \ominus\sigma^{(T+1)}_{i} \). To preserve the reversible feature, memory has to be endowed only in the pivotal component of the rule transition, so: \( \sigma^{(T-1)}_{i}=\phi(s^{(T)}_{j}\in \mathcal{N}_{i}) \ominus\sigma^{(T+1)}_{i} \).

For reversing from T it is necessary to know not only \( { \sigma_{i} ^{(T)} } \) and \( { \sigma_{i}^{(T+1)} } \) but also \( { \omega_{i}^{(T)} } \) to be compared to \( { \Omega (T) } \), to obtain:

$$ s_{i}^{(T)}= \begin{cases} 0 & \text{if\quad} 2\omega_{i}^{(T)} < \Omega(T)\\[1mm] \sigma_{i}^{(T+1)} & \text{if\quad} 2\omega_{i}^{(T)} =\Omega(T)\\[1mm] 1 & \text{if\quad} 2\omega_{i}^{(T)} > \Omega(T)\:.\\ \end{cases} $$

Then to progress in the reversing, to obtain \( s_{i}^{(T-1)}= {\hbox{round}}\big(\displaystyle\omega_{i}^{(T-1)}/{\Omega(T-1)}\big) \), it is necessary to calculate \( \omega^{(T-1)}_i=\big(\omega^{(T)}_i-\sigma^{(T)}_{i}\big)/\alpha \). But in order to avoid dividing by the memory factor (recall that operations with real numbers are not exact in computer arithmetic), it is preferable to work with \( \gamma_{i}^{(T-1)} = \omega_{i}^{(T)}-\sigma_{i}^{(T)} \), and to compare these values to \( \Gamma(T-1) = \sum^{T-1}_{t=1}\alpha ^{T-t} \). This leads to:

$$ s_{i}^{(T-1)}= \begin{cases} 0 & \text{if\quad} 2\gamma_{i}^{(T-1)} < \Gamma(T-1)\\[1mm] \sigma_{i}^{(T)} & \text{if\quad} 2\gamma_{i}^{(T-1)} = \Gamma(T-1)\\[1mm] 1 & \text{if\quad} 2\gamma_{i}^{(T-1)} > \Gamma(T-1)\:.\\ \end{cases} $$

In general: \( \gamma_{i}^{(T-\tau)} = \gamma_{i}^{(T-\tau+1)} - \alpha ^{\tau-1}\sigma_{i}^{(T-\tau+1)}\:,\: {\Gamma(T-\tau)} = \Gamma(T-\tau+1)-\alpha ^{\tau-1} \) .

Figure 13
figure 16_55

The reversible parity rule with memory

Figure 13 shows the effect of memory on the reversible parity rule starting from a single site live cell, so the scenario of Figs. 2 and 3, with the reversible qualification. As expected, the simulations corresponding to \( { \alpha=0.6 } \) or below shows the ahistoric pattern at \( { T=4 } \), whereas memory leads to a pattern different from \( { \alpha=0.7 } \), and the pattern at \( { T=5 } \) for \( { \alpha=0.54 } \) and \( { \alpha=0.55 } \) differ. Again, in the reversible formulation with memory, (i) the configuration of the patterns is notably altered, (ii) the speed of diffusion of the area affected are notably reduced, even by minimal memory (\( { \alpha=0.501 } \)), (iii) high levels of memory tend to freeze the dynamics since the early time-steps.

We have studied the effect of memory in the reversible formulation of CA in many scenarios, e. g., totalistic, \( { k=r=2 } \) rules [7], or rules with three states [21].

Reversible systems are of interest since they preserve information and energy and allow unambiguous backtracking. They are studied in computer science in order to design computers which would consume less energy [51]. Reversibility is also an important issue in fundamental physics [31,41,52,53]. Geraldt 't Hooft, in a speculative paper [34], suggests that a suitably defined deterministic, local reversible CA might provide a viable formalism for constructing field theories on a Planck scale. Svozil [50] also asks for changes in the underlying assumptions of current field theories in order to make their discretization appear more CA-like. Applications of reversible CA with memory in cryptography are being scrutinized [30,42].

Heterogeneous CA

CA on networks have arbitrary connections, but, as proper CA, the transition rule is identical for all cells. This generalization of the CA paradigm addresses the intermediate class between CA and Boolean networks (BN, considered in the following section) in which, rules may be different at each site.

Figure 14
figure 17_55

The parity rule with four inputs: effect of memory and random rewiring. Distance between two consecutive patterns in the ahistoric model (red) and memory models of α levels: 0.6,0.7.0.8, 0.9 (dotted) and 1.0 (blue)

In networks two topological ends exist, random and regular networks, both display totally opposite geometric properties. Random networks have lower clustering coefficients and shorter average path length between nodes commonly known as small world property. On the other hand, regular graphs, have a large average path length between nodes and high clustering coefficients.

In an attempt to build a network with characteristics observed in real networks, a large clustering coefficient and a small world property, Watts and Strogatz (WS, [54]) proposed a model built by randomly rewiring a regular lattice. Thus, the WS model interpolates between regular and random networks, taking a single new parameter, the random rewiring degree, i. e.: the probability that any node redirects a connection, randomly, to any other. The WS model displays the high clustering coefficient common to regular lattices as well as the small world property (the small world property has been related to faster flow in the information transmission). The long-range links introduced by the randomization procedure dramatically reduce the diameter of the network, even when very few links are rewired.

Figure 14 shows the effect of memory and topology on the parity rule with four inputs in a lattice of size \( { 65\times 65 } \) with periodic boundary conditions, starting at random. As expected, memory depletes the Hamming distance between two consecutive patterns in relation to the ahistoric model, particularly when the degree of rewiring is high. With full memory, quasi‐oscillators tend to appear. As a rule, the higher the curve the lower the memory factor α, but in the particular case of a regular lattice (and lattice with 10% of rewiring), the evolution of the distance in the full memory model turns out rather atypical, as it is maintained over some memory models with lower α parameters.

Figure 15
figure 18_55

Damage up to \( { T=100 } \) in the parity CA of Fig. 14

Figure 15 shows the evolution of the damage spread when reversing the initial state of the \( { 3\times3 } \) central cells in the initial scenario of Fig. 14. The fraction of cells with the state reversed is plotted in the regular and 10% of rewiring scenarios. The plots corresponding to higher rates of rewiring are very similar to that of the 10% case in Fig. 15. Damage spreads fast very soon as rewiring is present, even in a short extent.

Boolean Networks

In Boolean Networks (BN,[38]), instead of what happens in canonical CA, cells may have arbitrary connections and rules may be different at each site. Working with totalistic rules: \( { \sigma^{(T+1)}_{i} =\phi_{i} (\sum_{j\in\mathcal{N}_{i}}s^{(T)}_{j}) } \).

Figure 16
figure 19_55

Relative Hamming distance between two consecutive patterns. Boolean network with totalistic, \( { K=4 } \) rules in the scenario of Fig. 14

The main features on the effect of memory in Fig. 14 are preserved in Fig. 16: (i) the ordering of the historic networks tends to be stronger with a high memory factor, (ii) with full memory, quasi‐oscillators appear (it seems that full memory tends to induce oscillation), (iii) in the particular case of the regular graph (and a lesser extent in the networks with low rewiring), the evolution of the full memory model turns out rather atypical, as it is maintained over some of those memory models with lower α parameters. The relative Hamming distance between the ahistoric patterns and those of historic rewiring tends to be fairly constant around 0.3, after a very short initial transition period.

Figure 17
figure 20_55

Evolution of the damage when reversing the initial state of the \( { 3\times3 } \) central cells in the scenario of Fig. 16

Figure 17 shows the evolution of the damage when reversing the initial state of the \( { 3\times3 } \) central cells. As a rule in every frame, corresponding to increasing rates of random rewiring, the higher the curve the lower the memory factor α. The damage vanishing effect induced by memory does result apparently in the regular scenario of Fig. 17, but only full memory controls the damage spreading when the rewiring degree is not high, the dynamics with the remaining α levels tend to the damage propagation that characterizes the ahistoric model. Thus, with up to 10% of connections rewired, full memory notably controls the spreading, but this control capacity tends to disappear with a higher percentage of rewiring connections. In fact, with rewiring of 50% or higher, neither full memory seems to be very effective in altering the final rate of damage, which tends to reach a plateau around 30% regardless of scenario. A level notably coincident with the percolation threshold in site percolation in the simple cubic lattice, and the critical point for the nearest neighbor Kaufmann model on the square lattice [49]: 0.31.

Structurally Dynamic CA

Structurally dynamic cellular automata (SDCA) were suggested by Ilachinski and Halpern [36]. The essential new feature of this model was that the connections between the cells are allowed to change according to rules similar in nature to the state transition rules associated with the conventional CA. This means that given certain conditions, specified by the link transition rules, links between rules may be created and destroyed; the neighborhood of each cell is dynamic, so, state and link configurations of an SDCA are both dynamic and continually interacting.

If cells are numbered 1 to N, their connectivity is specified by an \( N \times N \) connectivity matrix in which \( \lambda_ij=1 \) if cells i and j are connected; 0 otherwise. So, now: \( \mathcal{N}^{(T)}_{i}=\{ j/ \lambda ^{(T)}_{ij}=1\} \) and \( \sigma^{(T+1)}_{i}=\phi(\sigma^{(T)}_{j}\in \mathcal{N}^{(T)}_{i} ) \). The geodesic distance between two cells i and j, \( \delta_{ij} \), is defined as the number of links in the shortest path between i and j. Thus, i and j are direct neighbors if \( \delta_{ij}=1 \), and are next‐nearest neighbors if \( \delta_ij=2 \), so \( \mathcal{NN}^{(T)}_{i}=\{ j/ \delta^{(T)}_{ij}=2\} \). There are two types of link transition functions in an SDCA: couplers and decouplers, the former add new links, the latter remove links. The coupler and decoupler set determines the link transition rule: \( \lambda^{(T+1)}_{ij}= \psi (l^{(T)}_{ij}, \sigma^{(T)}_{i} , \sigma^{(T)}_{j} ) \).

Instead of introducing the formalism of the SDCA, we deal here with just one example in which the decoupler rule removes all links connected to cells in which both values are zero (\( \lambda^{(T)}_{ij}=1 \rightarrow \lambda^{(T+1)}_{ij}=0 \) iff \( \sigma^{(T)}_i+\sigma^{(T)}_j=0 \)) and the coupler rule adds links between all next‐nearest neighbor sites in which both values are one (\( \lambda^{(T)}_{ij}=0 \rightarrow \smash{\lambda^{(T+1)}_ij = 1 } \) iff \( \smash{\sigma^{(T)}_{i}+\sigma^{(T)}_{j} = 2} \) and \( j \in \smash{\mathcal{NN}^{(T)}_i} \)). The SDCA with these transition rules for connections, together with the parity rule for mass states, is implemented in Fig. 18, in which the initial Euclidean lattice with four neighbors (so the generic cell □ has eight next‐nearest neighbors: ) is seeded with a \( 3 \times 3 \) block of ones. After the first iteration, most of the lattice structure has decayed as an effect of the decoupler rule, so that the active value cells and links are confined to a small region. After \( T = 6 \), the link and value structures become periodic, with a periodicity of two.

Figure 18
figure 21_55

The SDCA described in text up to \( { T=6 } \)

Memory can be embedded in links in a similar manner as in state values, so the link between any two cells is featured by a mapping of its previous link values: \( l_{ij}^{(T)}= \smash{l(\lambda _{ij}^{(1)},\ldots, \lambda _{ij}^{(T)})} \). The distance between two cells in the historic model (d ij ), is defined in terms of l instead of λ values, so that i and j are direct neighbors if \( d_{ij}=1 \), and are next‐nearest neighbors if \( d_{ij} =2 \). Now: \( N_{i}^{(T)}=\{j/d_{ij}^{(T)}=1\} \), and \( \smash{ NN^{(T)}_{i}=\{ j/ d^{(T)}_{ij}=2\}} \). Generalizing the approach to embedded memory applied to states, the unchanged transition rules (ϕ and ψ) operate on the featured link and cell state values: \( \sigma^{(T+1)}_i=\phi(s^{(T)}_j\in N_{i} ),\:\lambda^{(T+1)}_{ij} = \psi(\smash{l^{(T)}_ij},\smash{s^{(T)}_{i}},\smash{s^{(T)}_j}) \).

Figure 19
figure 22_55

The SD cellular automaton introduced in text with weighted memory of factor α. Evolution from \( { T=4 } \) up to \( { T=9 } \) starting as in Fig. 18

Figure 19 shows the effect of α-memory on the cellular automaton above introduced starting as in Fig. 18. The effect of memory on SDCA in the hexagonal and triangular tessellations is scrutinized in [11].

A plausible wiring dynamics when dealing with excitable CA is that in which the decoupler rule removes all links connected to cells in which both values are at refractory state (\( \lambda ^{(T)}_{ij}= 1 \rightarrow \lambda^{(T+1)}_{ij}=0 \) iff \( \smash{ \sigma^{(T)}_i=\sigma^{(T)}_{j} } \)=2) and the coupler rule adds links between all next‐nearest neighbor sites in which both values are excited (\( \lambda^{(T)}_{ij} = 0 \rightarrow \lambda^{(T+1)}_{ij}= 1 \) iff \( \sigma^{(T)}_i=\sigma^{(T)}_j=1 \) and \( { j\in \mathcal{NN}^{(T)}_{i} } \)).

Figure 20
figure 23_55

The \( { k=3 } \) SD cellular automaton described in text, up to \( { T=4 } \)

Figure 21
figure 24_55

The SD cellular automaton starting as in Fig. 20 at \( { T=20 } \), with no memory (left) and mode memory in both cell states and links

In the SDCA in Fig. 20, the transition rule for cell states is that of the generalized defensive inhibition rule: resting cell is excited if its ratio of excited and connected to the cell neighbors to total number of connected neighbors lies in the interval [1/8,2/8]. The initial scenario of Fig. 20 is that of Fig. 12 with the wiring network revealed, that of an Euclidean lattice with eight neighbors, in which, the generic cell □ has 16 next‐nearest neighbors:  . No decoupling is verified at the first iteration in Fig. 20, but the excited cells generate new connections, most of them lost, together with some of the initial ones, at \( { T=3 } \). The excited cells at \( { T=3 } \) generate a crown of new connections at \( { T=4 } \). Figure 21 shows the ahistoric and mode memory patterns at \( { T=20 } \). The figure makes apparent the preserving effect of memory.

The Fredkin's reversible construction is feasible in the SDCA scenario extending the ⊖ operation also to links: \( \smash{\lambda^{(T+1)}_{ij}}= \psi (\smash{\lambda^{(T)}_{ij}}, \smash{\sigma^{(T)}_{i}} ,\smash{\sigma^{(T)}_{j}})\ominus \smash{\lambda^{(T-1)}_{ij}} \). These automata may be endowed with memory as: \( \smash{\sigma^{(T+1)}_{i}}= \phi \big( s^{(T)}_{j} \in N^{(T)}_{i}\big)\ominus \sigma^{(T-1)}_{i}, \lambda^{(T+1)}_{ij}= \psi \big(l^{(T)}_{ij}, s^{(T)}_{i}\:,\: s^{(T)}_{j} \big) \ominus \lambda^{(T-1)}_{ij} \) [12].

The SDCA seems to be particularly appropriate for modeling the human brain function—updating links between cells imitates variation of synaptic connections between neurons represented by the cells—in which the relevant role of memory is apparent. Models similar to SDCA have been adopted to build a dynamical network approach to quantum space-time physics [45,46]. Reversibility is an important issue at such a fundamental physics level. Technical applications of SDCA may also be traced [47]. Anyway, besides their potential applications, SDCA with memory have an aesthetic and mathematical interest on their own [1,35]. Nevertheless, it seems plausible that further study on SDCA (and Lattice Gas Automata with dynamical geometry [40]) with memory should turn out to be profitable.

Memory in Other Discrete Time Contexts

Continuous-Valued CA

The mechanism of implementation of memory adopted here, keeping the transition rule unaltered but applying it to a function of previous states, can be adopted in any spatialized dynamical system. Thus, historic memory can be embedded in:

  • Continuous−valued CA (or Coupled Map Lattices in which the state variable ranges in \( { {\mathcal{R}} } \), and the transition rule φ is a continuous function [37]), just by considering m instead of σ in the application of the updating rule: \( { \sigma^{(T+1)}_{i}={\varphi} \big({m}^{(T)}_{j}\in\mathcal{N}^{(T)}_{i}\big) } \). An elementary CA of this kind with memory would be [20]: \( \sigma ^{(T+1)}_{i} = {\tfrac{1}{3}}{\big (m}^{(T)}_{i-1}+ m^{(T)}_{i}+ m^{(T)}_{i+1}{\big)} \).

  • Fuzzy CA , a sort of continuous CA with states ranging in the real [0,1] interval. An illustration of the effect of memory in fuzzy CA is given in [17]. The illustration operates on the elementary rule \( 90: \sigma ^{(T+1)}_{i}= {\big (\sigma} ^{(T)}_{i-1}\wedge (\neg \sigma ^{(T)}_{i+1}){\big)\vee (}(\neg \sigma ^{(T)}_{i-1})\wedge \sigma ^{(T)}_{i+1}{\big)} \), which after fuzzification (\( a\vee b \rightarrow \min(1,a+b), a\wedge b \rightarrow ab, \neg a \rightarrow 1-a \)) yields: \( \sigma ^{(T+1)}_{i} = \sigma ^{(T)}_{i-1}+\sigma ^{(T)}_{i+1}-2\sigma ^{(T)}_{i-1}\sigma ^{(T)}_{i+1} \) ; thus incorporating memory: \( \sigma ^{(T+1)}_{i}= m^{(T)}_{i-1} +m^{(T)}_{i+1} - 2m^{(T)}_{i-1}m^{(T)}_{i+1} \).

  • Quantum CA , such, for example, as the simple 1D quantum CA models introduced in [32]:

    $$ \sigma ^{(T+1)}_{j} = { \frac{1}{N^{1/2}}} {\big (i\delta \sigma} ^{(T)}_{j-1}+ \sigma^{(T)}_{j}+ i\delta ^{*}\sigma ^{(T)}_{j+1}{\big)} \;, $$

    which would become with memory [20]:

    $$ \sigma ^{(T+1)}_{j} = {\frac{1}{N^{1/2}}} \big (i\delta m^{(T)}_{j-1}+ m^{(T)}_{j}+ i\delta ^{*}m^{(T)}_{j+1} \big)\;. $$

Spatial Prisoner's Dilemma

The Prisoner's Dilemma (PD) is a game played by two players (A and B), who may choose either to cooperate (C or 1) or to defect (D or 0). Mutual cooperators each score the reward R, mutual defectors score the punishment P ; D scores the temptation \( { \mathcal{T} } \) against C, who scores S (sucker's payoff) in such an encounter. Provided that \( { {\mathcal T} > R > P > S } \), mutual defection is the only equilibrium strategy pair. Thus, in a single round both players are to be penalized instead of both rewarded, but cooperation may be rewarded in an iterated (or spatial) formulation. The game is simplified (while preserving its essentials) if \( { P=S=0. } \) Choosing \( { R=1, } \) the model will have only one parameter: the temptation \( { {\mathcal T} } \)=b.

In the spatial version of the PD, each player occupies at a site (i,j) in a 2D lattice. In each generation the payoff of a given individual (\( { p^{(T)}_{i,j} } \)), is the sum over all interactions with the eight nearest neighbors and with its own site. In the next generation, an individual cell is assigned the decision (\( { d^{(T)}_{i,j} } \)) that received the highest payoff among all the cells of its Moore's neighborhood. In case of a tie, the cell retains its choice. The spatialized PD (SPD for short) has proved to be a promising tool to explain how cooperation can hold out against the ever‐present threat of exploitation [43]. This is a task that presents problems in the classic struggle for survival Darwinian framework.

When dealing with the SPD, memory can be embedded not only in choices but also in rewards. Thus, in the historic model we dealt with, at T: (i) the payoffs coming from previous rounds are accumulated (\( { \pi^{(T)}_{i,j} } \)), and (ii) players are featured by a summary of past decisions (\( \smash{ \delta^{(T)}_{i,j} } \)). Again, in each round or generation, a given cell plays with each of the eight neighbors and itself, the decision δ in the cell of the neighborhood with the highest π being adopted. This approach to modeling memory has been rather neglected, the usual being that of designing strategies that specify the choice for every possible outcome in the sequence of historic choices recalled [33,39].

Table 1 shows the initial scenario starting from a single defector if \( { 8b > 9 \Leftrightarrow b > 1.125 } \), which means that neighbors of the initial defector become defectors at \( { T=2 } \).

Table 1 Choices at \( { T=1 } \) and \( { T=2 } \); accumulated payoffs after \( { T=1 } \) and \( { T=2 } \) starting from a single defector in the SPD. \( { b > 9/8 } \)

Nowak and May paid particular attention in their seminal papers to \( { b=1.85 } \), a high but not excessive temptation value which leads to complex dynamics. After \( { T=2 } \), defection can advance to a \( { 5 \times 5 } \) square or be restrained as a \( { 3 \times 3 } \) square, depending on the comparison of \( {8\alpha+ 5\times 1.85} \) (the maximum π value of the recent defectors) with \( { 9\alpha + 9 } \) (the π value of the non-affected players). As \( {8 \alpha+ 5\times 1.85 = 9 \alpha +9 \rightarrow \alpha = 0.25 } \), i. e., if \( { \alpha > 0.25 } \), defection remains confined to a \( { 3 \times 3 } \) square at \( { T=3 } \). Here we see the paradigmatic effect of memory: it tends to avoid the spread of defection.

Figure 22
figure 25_55

Frequency of cooperators (f) with memory of the last three iterations. a starting from a single defector, b starting at random (\( { f(1)=0.5 } \)). The red curves correspond to the ahistoric model, the blue ones to the full memory model, the remaining curves to values of α from 0.1 to 0.9 by 0.1 intervals, in which, as a rule, the higher the α the higher the f for any given T

If memory is limited to the last three iterations: \( \pi^{(T)}_{i,j}= \alpha^{2}p^{(T-2)}_{i,j}+ \alpha p^{(T-1)}_{i,j}+p^{(T)}_{i,j},\: m^{(T)}_{i,j}= \big(\alpha^{2} d^{(T-2)}_{i,j}+\alpha d^{(T-1)}_{i,j}+ d^{(T)}_{i,j}\big)/(\alpha^{2}+\alpha+1),\: \Rightarrow \delta^{(T)}_{i,j} = \text{round}(m^{(T)}_{i,j}) \), with assignations at \( { T=2 } \): \( { \pi^{(2)}_{i,j} = \alpha \pi^{(1)}_{i,j}+\pi^{(2)}_{i,j} , \delta^{(2)}_{i,j}=d^{(2)}_{i,j} } \) .

Memory has a dramatic restrictive effect on the advance of defection as shown in Fig. 22. This figure shows the frequency of cooperators (f) starting from a single defector and from a random configuration of defectors in a lattice of size \( { 400\times400 } \) with periodic boundary conditions when \( { b=1.85 } \). When starting from a single defector, f at time step T is computed as the frequency of cooperators within the square of size \( { (2(T-1)+1)^{2} } \) centered on the initial D site. The ahistoric plot reveals the convergence of f to 0.318, (which seems to be the same value regardless of the initial conditions [43]). Starting from a single defector (a), the model with small memory (\( { \alpha=0.1 } \)) seems to reach a similar f value, but sooner and in a smoother way. The plot corresponding to \( { \alpha=0.2 } \) still shows an early decay in f that leads it to about 0.6, but higher memory factor values lead f close to or over 0.9 very soon. Starting at random (b), the curves corresponding to \( { 0.1 \le \alpha \le 0.6 } \) (thus with no memory of choices) do mimic the ahistoric curve but with higher f, as \( { \alpha \ge 0.7 } \) (also memory of choices) the frequency of cooperators grows monotonically to reach almost full cooperation: D persists as scattered unconnected small oscillators (D‑blinkers), as shown in Fig. 23. Similar results are found for any temptation value in the parameter region \( { 0.8 < b < 2.0 } \), in which spatial chaos is characteristic in the ahistoric model. It is then concluded that short-type memory supports cooperation.

Figure 23
figure 26_55

Patterns at \( { T=200 } \) starting at random in the scenario of Fig. 22b

Table 2 Weighted mean degrees of cooperation after \( { T=2 } \) and degree of cooperation at \( { T=3 } \) starting with a single defector in the continuous‐valued SPD with \( { b=1.85 } \)

As a natural extension of the described binary model, the 0-1 assumption underlying the model can be relaxed by allowing for \( { degrees } \) of cooperation in a continuous‐valued scenario. Denoting by x the degree of cooperation of player A and by y the degree of cooperation of the player B, a consistent way to specify the pay-off for values of x and y other than zero or one is to simply interpolate between the extreme payoffs of the binary case. Thus, the payoff that the player A receives is:

$$ G_{A}(x,y)=(x,1-x) \left(\begin{matrix}R&S\\{\mathcal T}&P\end{matrix}\right) \left(\begin{matrix}y\\1-y\end{matrix}\right)\:. $$

In the continuous‐valued historic formulation it is \( { \delta \equiv m } \), including \( { \delta^{(2)}_{i,j} = \big(\alpha d^{(1)}_{i,j}+d^{(2)}_{i,j} \big)/(\alpha+1) } \). Table 2 illustrates the initial scenario starting from a single (full) defector. Unlike in the binary model, in which the initial defector never becomes a cooperator, the initial defector cooperates with degree \( { \alpha/(1+\alpha) } \) at \( { T=3 } \): its neighbors which received the highest accumulated payoff (those in the corners with \( { \pi^{(2)}=8\alpha+5b > 8b\alpha } \)), achieved this mean degree of cooperation after \( { T=2 } \). Memory dramatically constrains the advance of defection in a smooth way, even for the low level \( { \alpha=0.1 } \). The effect appears much more homogeneous compared to the binary model, with no special case for high values of α, as memory on decisions is always operative in the continuous‐valued model [24]. The effect of unlimited trailing memory on the SPD has been studied in [5,6,7,8,9,10].

Discrete‐Time Dynamical Systems

Memory can be embedded in any model in which time plays a dynamical role. Thus, Markov chains \( { \mathbf{p}^{\prime}_{T+1}=\mathbf{p}^{\prime}_{T}\mathbf{M} } \) become with memory: \( { \mathbf{p}^{\prime}_{T+1}={{\mathbf{\pi}}}^{\prime}_{T}\mathbf{M} } \) with \( { {\mathbf{\pi}}_{T} } \) being a weighted mean of the probability distributions up to T: \( { {\mathbf{\pi}}_{T}=\pi(\mathbf{{p}}_{1},\ldots,\mathbf{{p}_{T}}) } \). In such scenery, even a minimal incorporation of memory notably alters the evolution of \( { \mathbf{p} } \) [23]. Last but not least, conventional, non‐spatialized, discrete dynamical systems become with memory: \( { x_{T+1}=f(m_{T}) } \) with m T being an average of past values. As an overall rule, memory leads the dynamics a fixed point of the map f [4].

Figure 24
figure 27_55

Dynamics of the mean values of x (red) and y (blue) starting from any of the points of the \( { 1\times1 } \) square

We will introduce an example of this in the context of the PD game in which players follow the so-called Paulov strategy: a Paulov player cooperates if and only if both players opted for the same alternative in the previous move. The name Paulov stems from the fact that this strategy embodies an almost reflex‐like response to the payoff: it repeats its former move if it was rewarded by \( { {\mathcal T} } \) or R, but switches behavior if it was punished by receiving only P or S. By coding cooperation as 1 and defection as 0, this strategy can be formulated in terms of the choices x of Player A (Paulov) and y of Player B as: \( { x^{(T+1)}= 1-\mid x^{(T)}-y^{(T)}\mid } \). The Paulov strategy has proved to be very successful in its contests with other strategies [44]. Let us give a simple example of this: suppose that Player B adopts an Anti‐Paulov strategy (which cooperates to the extent Paulov defects) with \( { y^{(T+1)}=1-\mid 1-x^{(T)}-y^{(T)}\mid } \). Thus, in an iterated Paulov‐Anti‐Paulov (PAP) contest, with \( T(x,y) =\big(1-\left|x-y\right|,1-\left|1-x-y\right|\big) \), it is \( { T(0,0) = T(1,1) = (1,0) } \), \( { T(1,0) = (0,1) } \), and \( { T(0,1) = (0,1) } \), so that (0,1) turns out to be immutable. Therefore, in an iterated PAP contest, Paulov will always defect, and Anti‐Paulov will always cooperate. Relaxing the 0-1 assumption in the standard formulation of the PAP contest, degrees of cooperation can be considered in a continuous‐valued scenario. Now x and y will denote the degrees of cooperation of players A and B respectively, with both x and y lying in [0,1].

In this scenario, not only (0,1) is a fixed point, but also \( { T(0.8,0.6) = (0.8,0.6) } \). Computer implementation of the iterated PAP tournament turns out to be fully disrupting of the theoretical dynamics. The errors caused by the finite precision of the computer floating point arithmetics (a common problem in dynamical systems working modulo 1) make the final fate of every point to be (0,1). With no exceptions: even the theoretically fixed point (0.8,0.6) ends up as (0,1) in the computerized implementation.

A natural way to incorporate older choices in the strategies of decision is to feature players by a summary (m) of their own choices farther back in time. The PAP contest becomes in this way: \( x^{(T+1)}= 1-\mid m^{(T)}_{x}-m^{(T)}_{y}\mid, y^{(T+1)}=1-\mid 1-m^{(T)}_{x}-m^{(T)}_{y}\mid \). The simplest historic extension results in considering only the two last choices: \( { m(z^{(T-1)},z^{(T)})= (\alpha z^{(T-1)}+z^{(T)})/(\alpha +1) } \) (z stands for both x and y) [10].

Figure 24 shows the dynamics of the mean values of x and y starting from any of the \( { 101\times101 } \) lattice points of the \( { 1\times1 } \) square with sides divided by 0.01 intervals. The dynamics in the ahistoric context are rather striking: immediately, at \( { T=2, } \) both \( { \overline{x} } \) and \( { \overline{y} } \) increase from 0.5 up to app. \( { 0.66 (\simeq 2/3 } \)), a value which remains stable up to app. \( { T=100 } \), but soon after Paulov cooperation plummets, with the corresponding firing of cooperation of Anti‐Paulov: finite precision arithmetics leads every point to (0,1). With memory, Paulov not only keeps a permanent mean degree cooperation but it is higher than that of Anti‐Paulov; memory tends to lead the overall dynamics to the ahistoric (theoretically) fixed point (0.8, 0.6).

Future Directions

Embedding memory in states (and in links if wiring is also dynamic) broadens the spectrum of CA as a tool for modeling, in a fairly natural way of easy computer implementation. It is likely that in some contexts, a transition rule with memory could match the correct behavior of the CA system of a given complex system (physical, biological, social and so on). A major impediment in modeling with CA stems from the difficulty of utilizing the CA complex behavior to exhibit a particular behavior or perform a particular function. Average memory in CA tends to inhibit complexity, inhibition that can be modulated by varying the depth of memory, but memory not of average type opens a notable new perspective in CA. This could mean a potential advantage of CA with memory over standard CA as a tool for modeling. Anyway, besides their potential applications, CA with memory (CAM) have an aesthetic and mathematical interest on their own.

Thus, it seems plausible that further study on CA with memory should turn out profitable, and, maybe that as a result of a further rigorous study of CAM it will be possible to paraphrase T. Toffoli in presenting CAM—as an alternative to (rather than an approximation of) integro‐differential equations in modeling— phenomena with memory.