People are embedded in social interaction that shapes their brains throughout lifetime. Instead of emerging from lower-level cognitive functions, social interaction could be the default mode via which humans communicate with their environment.

      — Hari et al. (2015)

The challenge for the study of brain-to-brain coupling is to develop detailed models of the dynamical interaction that can be applied at the behavioural levels and at the neural levels.

      — Hasson and Frith (2016)

...[A] deeper understanding of inter-brain dynamics may provide unique insight into the neural basis of collective behavior that gives rise to a broad range of economic, political, and sociocultural activities that shape society.

      — Kingsbury and Hong (2020)

Introduction

A recent elegant study of a bat population by Rose et al. (2021) finds that bidirectional interbrain activity patterns are a feature of their socially interactive behaviors, and that such shared interbrain activity patterns likely play an important role in social communication between group members. Sliwa (2021) summarizes that work, and parallel material by Baez-Mendoza et al. (2021) on macaques. Kingsbury et al. (2019), in a particularly deep analysis, studied correlations in brain activity between socially interacting mice, finding strong structuring by dominance relations.

Rose et al. are careful to cite the large and growing human literature on brain-to-brain coupling in social interaction, including Hasson et al. (2012); Barraza et al. (2020); Kuhlen et al. (2017); Perez et al. (2017); Stolk (2014), and Dikker (2017). As Rose et al. put it, a wide range of species naturally interact in groups and exhibit a diversity of social structures and forms of communication involving similarities and differences in neural repertoires for social communication.

On far longer time scales, Abraham et al. (2020) found that concordance in parent and offspring cortico-basal ganglia white matter connectivity varies by parental history of major depressive disorder and early parental care. As they put it,

Social behavior is transmitted cross-generationally through coordinated behavior within attachment bonds. Parental depression and poor parental care are major risks for disruptions of such coordination and are associated with offspring’s psychopathology and interpersonal dysfunction... [Study] showed diminished neural concordance among dyads with a depressed parent and that better parental care predicted greater concordance, which also provided a protective buffer against attenuated concordance among dyads with a depressed parent... [Such] disruption may be a risk factor for intergenerational transmission of psychopathology. Findings emphasize the long-term role of early caregiving in shaping neural concordance among at-risk and affected dyads.

Indeed, a broad spectrum of Holocaust studies (Dashorst et al. 2019) has followed intergenerational transmission of psycho- and other pathologies, including, but not limited to, patterns of brain function.

Here, following Wallace (2022a, b), we examine ‘shared interbrain activity patterns’ from the perspectives of recent developments in control and information theories, using the asymptotic limit theorems of those disciplines to develop probability models that might be converted to statistical tools of value in future observational and empirical studies of the phenomena at various time scales. There is, after all, a very long tradition of using control theory ideas in psychological research. See the review by Henry et al. (2021) for a deep and cogent summary.

Shared interbrain activity patterns are concrete representations – indeed, instantiations – of information transmission within a group, and, as Dretske (1994) indicates, the properties of any transfer of information are strongly constrained by the asymptotic limit theorems of information theory, in the same sense that the Central Limit Theorem imposes constraints leading to useful statistical models of supposedly ‘random’ phenomena.

We begin with ‘simple’ correlation of brain activity between individuals in social interaction, and then move on to more complex models of joint cognition across individuals and/or ‘workgroups’, in a large sense.

Correlation

The elegant paper by Kingsbury et al. (2019) explores correlated neural activity and the encoding of behavior across the brains of socially-interacting mice. Two central findings of that work are shown in Fig. 1, (as adapted from their Figs. 2 and 8). The top row of Fig. 1 shows time series of brain activity in two mice, first with, and then without, direct contact. Correlation between the signals is much higher during social interaction. The lower part indicates that correlations rise with difference in status between animals.

Fig. 1
figure 1

Adapted from Kingsbury et al. (2019), Figs. 1 and 8. Top: Correlation between brain activity is much higher during social interaction. Bottom: Correlation is much higher between discordantly dominant animals

Here, we derive these results using perhaps the simplest possible dynamic model, one based on a principal asymptotic limit theorem of information theory, the Rate Distortion Theorem.

We adapt the introductory model of Wallace (2020b, Sec. 1.3), focused on interacting institutions under conditions of conflict analogous to status disjunction.

Suppose we have developed – or been given – a robust scalar measure of social dominance between pairs of individuals, Z. How does Z affect ‘correlation’, in a large sense, during interactions?

A ‘dominant’ partner transmits signals to a ‘subordinate’ in the presence of noise, sending a sequence of signals \(U_{i}=\{u^{i}_{1}, u^{i}_{2} ... \}\) that, – again, in the presence of noise – is received as \(\hat{U}^{i} =\{\hat{u}^{i}_{1}, \hat{u}^{i}_{2}, ... \}\). We take the \(U^{i}\) as sent with probabilities \(P(U^{i})\), and define a scalar distortion measure between \(U^{i}\) and \(\hat{U}^{i}\) as \(d(U^{i},\hat{U}^{i})\), defining an average distortion D as

$$\begin{aligned} D \equiv \sum _{i}P(U^{i})d(U^{i},\hat{U}^{i}) \end{aligned}$$
(1)

Following Wallace (2020b, Sec. 1.3) closely, it is then possible to apply a rate distortion argument. The Rate Distortion Theorem – for stationary, ergodic systems – states that there is a convex Rate Distortion Function that determines the minimum channel capacity, written R(D), that is needed to keep the average distortion below the limit D. See Cover and Thomas (2006) for details. The theory can be extended, with some difficulty, to nonergodic sources via an infimum argument applied to the ergodic decomposition (e.g., Shields et al. 1978).

The ‘central trick’ is to construct a Boltzmann pseudoprobability in the dominance measure Z as

$$\begin{aligned} dP(R,Z) = \frac{\exp [-R/g(Z)]dR}{\int _{0}^{\infty }\exp [-R/g(Z)]dR} \end{aligned}$$
(2)

where the function g(Z) is unknown and must be determined from first principles.

We have implicitly adopted the ‘worst case coding’ scenario of an analog ‘Gaussian’ channel (Cover and Thomas 2006). In consequence, the ‘partition function’ integral in the denominator of Eq. (2) has the simple value g(Z).

For the Gaussian channel using the squared distortion measure, the Rate Distortion Function R(D), and the distortion measure, are given as (Cover and Thomas 2006)

$$\begin{aligned} R(D)=\frac{1}{2}\log _{2}[\sigma ^{2}/D] \nonumber \\ D = \sigma ^{2}2^{-2R} \end{aligned}$$
(3)

If \(D \ge \sigma ^{2}\), then \(R=0\).

From these relations, using Eq. (2), and after some manipulation, it is possible to calculate the ‘average average distortion’ \(<D>\) as

$$\begin{aligned} <D>=\frac{\sigma ^{2}}{\log (4)g(Z)+1} \end{aligned}$$
(4)

For the ‘natural channel’

$$\begin{aligned} D=\frac{\sigma ^{2}}{1+R} \nonumber \\ <D> = \frac{\sigma ^{2} {\mathrm e}^{\frac{1}{g(Z)}} \mathrm {Ei}_{1}\! \left( \frac{1}{g(Z)}\right) }{g(Z)} \end{aligned}$$
(5)

where \(\mathrm {Ei}_{1}\) is the exponential integral of order 1.

What, then, is g(Z)?

Here we abduct more formalism from statistical mechanics, using the ‘partition function’ in Eq. (2) to define an ‘iterated free energy’ as

$$\begin{aligned} &\exp [-F/g(Z)]=\int _{0}^{\infty }\exp [-R/g(Z)]dR = g(Z) \nonumber \\& F=-\log [g(Z)]g(Z) \nonumber \\ &g(Z) = \frac{-F(Z)}{W(n,-F(Z))} \end{aligned}$$
(6)

where W(nx) is the Lambert W-function of order n solving the relation \(W(n,x)\exp [W(n,x)]=x\) and is real-valued only for \(n=0, \, -1\) over limited ranges. These are, respectively, for \(n=0, \, x > -\exp [-1]\), and for \(n=-1, \, -\exp [-1]< x < 0\). These conditions are important and impose themselves on the expressions for \(<D>\).

The next step is to define an ‘iterated entropy’ in the standard manner as the Legendre transform of the ‘free energy’ analog F, and from it, impose a simple first-order version of the Onsager treatment of nonequilibrium thermodynamics (de Groot and Mazur 1984), in the context of a nonequilibrium steady state, so that \(dZ/dt=0\). The relations are then

$$\begin{aligned} &S(Z) \equiv -F(Z) + Z dF(Z)/dZ \nonumber \\ &dZ/dt \propto dS/dZ = Z d^{2}F/dZ^{2} = 0 \nonumber \\& F(Z) = C_{1}Z+C_{2} \end{aligned}$$
(7)

where the \(C_{i}\) are appropriate boundary conditions and g(Z) is then given by the last expression in Eq. (6).

For the Gaussian channel, taking the Lambert W-function of order zero, and setting \(C_{1}=-1, \, C_{2}=1/10\), gives Fig. 2.

For the ‘natural’ channel, again with the W-function of order zero, taking the same values for the \(C_{i}\) gives Fig. 3.

Fig. 2
figure 2

Mean distortion \(<D>\) for the Gaussian channel vs. the dominance index Z. Higher status disjunction implies closer coupling during social interaction, according to this model

Fig. 3
figure 3

Mean distortion \(<D>\) for the ‘natural’ channel vs. the dominance index Z. Again, higher status disjunction implies closer coupling during social interaction

Higher status disjunction, in this model, implies closer coupling between individuals.

Similar results will follow for all possible Rate Distortion Functions that must be both convex in D and zero-valued for \(D \ge \sigma ^{2}\) (Cover and Thomas 2006).

A fairly elementary model of social interaction under dominance relations, based on a somewhat counter-intuitive dynamic adaptation of the Rate Distortion Theorem, produces results consistent with the empirical observations of Kingsbury et al. (2019). Extension of the model to both nonergodic and nonstationary conditions more likely to mirror real-world conditions, or at least going beyond the nonequilibrium steady state assumption, will requires further work.

The model can, in theory, be extended using the ergodic decomposition of a nonergodic process using the methods of Shields et al. (1978).

Otherwise, we are constrained to adiabatically, piecewise, stationary ergodic (APSE) systems that remain as close to ergodic and stationary as needed for the Rate Distortion Theorem to work, much like the Born-Oppenheimer approximation of molecular physics in which rapid electron dynamics are assumed to quasi-equilibrate about much slower nuclear oscillations, allowing calculations using ‘simple’ quantum mechanics models.

Cognition

Cognition is not correlation, and requires more general address. Indeed, cognition has become a kind of shibboleth in theoretical biology, seen by some as the fundamental characterization of the living state at and across its essential scales and levels of organization (Maturana and Varela 1980). In this regard, a central inference by Atlan and Cohen (1998), in their study of the immune system, is that cognition, via mechanisms of choice, demands reduction in uncertainty, implying the existence of information sources ‘dual’ to any cognitive process. The argument is unambiguous and direct, and serves as the foundation to our general approach.

A first step is to view ‘information’ as a biological and social resource matching the importance of metabolic free energy and other overtly material agents and agencies.

Information and other resources

Here, we must move beyond ‘simple’ measures of dominance between individuals in social interaction.

At least three resource streams are required by any cognitive entity facing real-time, real-world challenges. The first is measured by the rate at which information can be transmitted between elements within the entity, determined as an information channel capacity, say \(\mathscr {C}\) (Cover and Thomas 2006). The second resource stream is sensory information regarding the embedding environment – here, primarily social interaction – available at a rate \(\mathscr {Q}\). It is along this channel that ‘neural representations’ will be shared.

The third regards material resources, including metabolic free energy – in a large sense – available at a rate \(\mathscr {M}\).

These three rates may well – but not necessarily – interact, a matter characterized as a 3 by 3 matrix analogous to, but not the same as, a simple correlation matrix. Let us write this as \(\mathbf {Z}\).

An n-dimensional square matrix has n scalar invariants \(r_{i}\) defined by the relation

$$\begin{aligned} &p(\gamma ) = \det [\mathbf {Z} - \gamma \mathbf {I}] = \nonumber \\ &\gamma ^{n} - r_{1}\gamma ^{n-1} + r_{2}\gamma ^{n-2}-... +(-1)^{n}r_{n} \end{aligned}$$
(8)

\(\mathbf {I}\) is the n-dimensional identity matrix, \(\det\) the determinant, and \(\gamma\) a real-valued parameter. The first invariant is usually taken as the matrix trace, and the last as ± the determinant.

These scalar invariants make it possible to project the full matrix down onto a single scalar index \(Z=Z(r_{1}, ..., r_{n})\) retaining much of the basic structure, analogous to conducting a principal component analysis. The simplest index might be \(Z = \mathscr {C} \times \mathscr {Q} \times \mathscr {M}\) – the matrix determinant for a system without crossinteraction. However, scalarization must be appropriate to the individual circumstance studied, and there will almost always be important cross-interactions between resource streams.

Clever scalarization, however, enables approximate reduction to a one dimensional system.

Taking \(\mathscr {M}\) out of the equation – equalizing it across the social structure – might generate two independent ‘orthogonal’ indices, for example the determinant and the trace of the ‘interaction matrix’ separately, so that Z becomes a two dimensional vector. General expansion of Z into vector form leads to difficult multidimensional dynamic equations (e.g., Wallace 2021c). See the Mathematical Appendix for details.

Cognition and information

Only in the case that cross-sectional and longitudinal means are the same can information source uncertainty be expressed as a conventional Shannon ‘entropy’ (Khinchin 1957; Cover and Thomas 2006). Here, we only require that source uncertainties converge for sufficiently long paths, not that they fit some functional form. It is the values of those uncertainties that will be of concern so that we study ‘Adiabatically Piecewise Stationary’ (APS) systems, in the sense of the Born-Oppenheimer approximation for molecular systems that assume nuclear motions are so slow in comparison with electron dynamics that they can be effectively separated, at least on appropriately chosen trajectory ‘pieces’ that may characterize the various phase transitions available to such systems. Extension of this work to nonstationary circumstances remains to be done.

This approximation can be carried out via a fairly standard Morse Function iteration (Pettini 2007).

The systems of interest here are composed of cognitive submodules that engage in crosstalk. At every scale and level of organization all such submodules are constrained by both their own internals and developmental paths and by the persistent regularities of the embedding environment, including the cognitive intent of colleagues, in a broad sense, and the regularities of ‘grammar’ and ‘syntax’ imposed by the embedding social structure.

Further, there are structured uncertainties imposed by the large deviations possible within that environment, again including the behaviors of adversaries who may be constrained by quite different developmental trajectories and ‘punctuated equilibrium’ evolutionary transitions.

Recapitulating somewhat the arguments of Wallace (2018, 2020a), the Morse Function construction assumes a number of interacting factors:

  • As Atlan and Cohen (1998) argue, cognition requires choice that reduces uncertainty. Such reduction in uncertainty directly implies the existence of an information source ‘dual’ to that cognition at each scale and level of organization. The argument is unambiguous and sufficiently compelling.

  • Cognitive physiological processes, like the immune and gene expression systems, are highly regulated, in the same sense that ‘the stream of consciousness’ flows between cultural and social ‘riverbanks’. That is, a cognitive information source \(X_{i}\) is generally paired with a regulatory information source \(X^{i}\).

  • Environments (in a large sense), also have sequences of very high and very low probability: night follows day, hot seasons follow cold, and so on.

  • ‘Large deviations’, following Champagnat et al. (2006) and Dembo and Zeitouni (1998), also involve sets of high probability developmental pathways, often governed by ‘entropy’ analog laws that imply the existence of yet one more information source.

Full system dynamics must then be characterized by a joint, nonergodic information source uncertainty

$$\begin{aligned} H(\{X_{i}, \, X^{i}\}, V, L_{D}) \end{aligned}$$
(9)

that is defined path-by-path and not represented as an ‘entropy’ function (Khinchin 1957). Consequently, each path will have it’s own H-value, but the functional form of that value is not specified in terms of underlying probability distributions.

The set \(\{X_{i}, \, X^{i}\}\) includes the internal interactive cognitive dual information sources of the system of interest and their associated regulators, V is taken as the information source of the embedding environment. This may include the actions and intents of adversaries/symbionts/colleagues, as well as ‘weather’. \(L_{D}\) is the information source of the associated large deviations possible to the system, possibly including ‘punctuated equilibrium’ evolutionary transitions.

Again, we are projecting the spectrum of essential resources onto a scalar rate Z.

The underlying equivalence classes of developmental or dynamic system paths used to define groupoid symmetries can be defined fully in terms of the magnitude of individual path source uncertainties, \(H(x^{j}), \, x^{j} \equiv \{x^{j}_{0}, \, x^{j}_{1}, \, ... \, x^{j}_{n}, \, ...\})\) alone.

Recall the central conundrum of the ergodic decomposition of nonergodic information sources. It is formally possible to express a nonergodic source as the composition of a sufficient number of ergodic sources, much as it is possible to reduce planetary orbits to a Fourier sum of circular epicycles, obscuring the basic dynamics. Hoyrup (2013) discusses the problem further, finding that ergodic decompositions are not necessarily computable. Here, we need focus only on the values of the source uncertainties associated with dynamic paths.

Dynamics

The next step is to build an iterated ‘free energy’ Morse Function (Pettini 2007) from a Boltzmann pseudoprobability, based on enumeration of high probability developmental pathways available to the system \(j=1, 2, ...\) so that

$$\begin{aligned} P_{j}=\frac{\exp [-H_{j}/g(Z)]}{\sum _{k} \exp [-H_{k}/g(Z)]} \end{aligned}$$
(10)

where \(H_{j}\) is the source uncertainty of the path j, which – again – we do not assume to be given as a ‘Shannon entropy’ since we are no longer restricted to ergodic sources.

The essential point is the ability to divide individual paths into two equivalence classes, a small set of high probability paths consonant with an underlying ‘grammar’ and ‘syntax’, and a much larger set of vanishingly low probability, essentially a set of measure zero.

The temperature-analog characterizing the system, written as g(Z) in Eq. (10), can be calculated – or at least approximated – via a first-order Onsager nonequilibrium thermodynamic approximation built from the partition function, i.e., the denominator of Eq. (10) (de Groot and Mazur 1984).

We define the ‘iterated free energy’ Morse Function as

$$\begin{aligned} \exp [-F/g(Z)] \equiv \sum _{k}\exp [-H_{k}/g(Z)] \equiv h(g(Z)) \nonumber \\ F(Z) = -\log [h(g(Z))]g(Z) \end{aligned}$$
(11)

where the sum is over all possible high probability developmental paths of the system, again, those consistent with an underlying grammar and syntax. Again, system paths not consonant with grammar and syntax constitute a set of measure zero that is very much larger than the set of high probability paths.

Feynman (2000) makes the direct argument that information itself is to be viewed as a form of free energy, using Bennett’s ‘ideal machine’ that turns a message into work. Here, we invoke an iterated – rather than a direct – free energy construction.

F, taken as a free energy, then becomes subject to symmetry-breaking transitions as g(Z) varies (Pettini 2007). These symmetry changes, however, are not as associated with physical phase transitions as represented by standard group algebras. Cognitive phase changes involve shifts between equivalence classes of high probability developmental pathways to be represented as groupoids, a generalization of the group concept where a product is not necessarily defined for every possible element pair (Brown 1992; Cayron 2006; Weinstein 1996). See the Mathematical Appendix for an outline of the theory.

The disjunction described above – into high and low probability equivalence classes representing paths consonant with, or discordant from, underlying grammar and syntax – should be seen as the primary ‘groupoid phase transition’ affecting cognitive systems. It is essentially the biological ‘big bang’ of Maturana and Varela (1980). Think about this carefully.

Dynamic equations follow from invoking a first-order Onsager approximation akin to that of nonequilibrium thermodynamics (de Groot and Mazur 1984) in the gradient of an entropy measure constructed from the ‘iterated free energy’ F of Eq. (11):

$$\begin{aligned} S(Z) \equiv -F(Z) + Z dF(Z)/dZ \nonumber \\ \partial Z/\partial t \approx dS/dZ = f(Z) \nonumber \\ f(Z)=Z d^{2}F/dZ^{2} \nonumber \\ g \! \left( Z \right) = \nonumber \\ \frac{ -C_{1} Z -\left( \int \frac{f \left( Z \right) }{Z}dZ \right) Z + C_{2} +\int f \! \left( Z \right) dZ}{ RootOf \! \left( {\mathrm e}^{\textit{Q}}-h \! \left( -\frac{C_{1} Z + \left( \int \frac{f \left( Z \right) }{Z}dZ \right) Z -C_{2} -\left( \int f \left( Z \right) dZ \right) }{\textit{Q}} \right) \right) } \end{aligned}$$
(12)

where the last relation follows from an expansion of the third part of Eq. (12) using the second expression of Eq. (11). \(C_{1}\) and \(C_{2}\) are two constants in the indefinite integral of the second derivative of F(Z), and Q is the independent variable of the function being taken roots.

Three important – and somewhat subtle – points:

  1. (1)

    The ‘RootOf’ construction actually generalizes the not-so-well-known Lambert W-function (e.g., Yi et al. 2010; Mezo and Keady 2015). This leads to deep waters: since ‘RootOf’ may have complex number solutions, the temperature analog g(Z) enters the realm of the ‘Fisher Zeros’ characterizing phase transition in physical systems (e.g., Dolan et al. 2001; Fisher 1965; Ruelle 1964 Sec. 5).

  2. (2)

    Information sources are not microreversible, that is, palindromes are highly improbable, e.g., ‘ eht ’ has far lower probability than ‘ the ’ in English. In consequence, there are no ‘Onsager Reciprocal Relations’ in higher dimensional systems. The necessity of groupoid symmetries appears to be driven by this directed homotopy.

  3. (3)

    Typically, it is necessary to impose a delay in provision of Z, so that, for example, \(dZ/dt = f(Z) = \beta - \alpha Z(t)\) and \(Z \rightarrow \beta /\alpha\) at a rate determined by \(\alpha\).

Suppose, in the first of Eq. (11), it is possible to approximate the sum with an integral, so that

$$\begin{aligned} \exp [-F/g(Z)] \approx \int _{0}^{\infty }\exp [-H/g(Z)]dH = g(Z) \end{aligned}$$
(13)

g(Z) must be real-valued and positive. Then

$$\begin{aligned} F(Z)=-\log [g(Z)]g(Z) \nonumber \\ g(Z)=-F(Z)/W_{L}[n,-F(Z)] \end{aligned}$$
(14)

Again, \(W_{L}\) is the ‘simplest’ Lambert W-function that satisfies \(W_{L}[n,x]\exp [W_{L}[n,x]]=x\). It is real-valued only for \(n=0, \, -1\) and only over limited ranges of x in each case.

In theory, specification of any two of the functions fgh permits calculation of the third. h, however, is determined – fixed – by the internal structure of the larger system. Similarly, ‘boundary conditions’ \(C_{1}, \, C_{2}\) are externally-imposed, further sculpting dynamic properties of the ‘temperature’ g(Z), and f determines the rate at which the composite essential resource Z can be delivered. Both information and metabolic free energy resources are rate-limited.

Cognition rate

For phase transitions in physical systems, there is generally a minimum temperature for punctuated activation of the dynamics associated with given group structure – underlying symmetry changes associated with the transitions of ice to water water to steam, and so on. For cognitive processes, following the arguments of Eq. (5), there will be a minimum necessary value of g(Z) for onset of the next in a series of transitions. That is, at some \(T_{0} \equiv g(Z_{0})\), having a corresponding information source uncertainty \(H_{0}\), a second groupoid phase transition becomes manifest.

Taking a reaction rate perspective from chemical kinetics (Laidler 1987), we can write an expression for the rate of cognition as

$$\begin{aligned} L(Z) =\frac{\sum _{H_{j}>H_{0}}\exp [-H_{j}/g(Z)]}{\sum _{k}\exp [-H_{k}/g(Z)]} \end{aligned}$$
(15)

If the sums can be approximated as integrals, then the system’s rate of cognition at resource rate index Z can be written as

$$\begin{aligned} L(Z) \approx \frac{\int _{H_{0}}^{\infty }\exp [-H/g(Z)]dH}{\int _{0}^{\infty }\exp [-H/g(Z)]dH} = \exp [-H_{0}/g(Z)] \nonumber \\ = \exp [H_{0}W_{L}(n, -F)/F] \end{aligned}$$
(16)

where \(W_{L}(n,-F)\) is the Lambert W-function of order n in the free energy index \(F=-\log [g(Z)]g(Z)\).

Figure 4 shows L(F)vs.F, using Lambert W-functions of orders 0 and \(-1\), respectively real-valued only on the intervals \(x > -\exp [-1]\) and \(-\exp [-1]< x < 0\).

Fig. 4
figure 4

Rate of cognition from Eq. (16) as a function of the iterated free energy measure F, taking \(H_{0}=1\). The Lambert W-function is only real-valued for orders 0 and \(-1\), and only if \(F < \exp [-1]\). However, if \(0< F < \exp [-1]\), then a bifurcation instability emerges, with a transition to complex-valued oscillations in cognition rate at higher values. Since F is driven by Z, there is a minimum resource rate for stability

The Lambert W-function is only real-valued for orders 0 and \(-1\), and only for \(F < \exp [-1]\). However, if \(0< F < \exp [-1]\), then a bifurcation instability emerges, with a transition to complex-valued oscillations in cognition rate at higher values.

This development recovers what is essentially an analog to the Data Rate Theorem from control theory (Nair et al. 2007 and the Mathematical Appendix), in the sense that the requirement \(H > H_{0}\) in Eqs.(15) and (16) imposes stability constraints on F, the free energy analog, and by inference, on the resource rate index Z driving it.

An example

We again approximate the sum in Eq. (11) by an integral – so that \(h(g(Z))=g(Z)\) – and make a simple assumption on the form of \(dZ/dt=f(Z)\), say \(f(Z)=\beta - \alpha Z(t)\). Then

$$\begin{aligned} g(Z)=-\frac{2 \ln \! \left( Z \right) Z \beta -Z^{2} \alpha +2 C_{1} Z -2 Z \beta +2 C_{2}}{2 W_{L}\! \left( n, -\ln \! \left( Z \right) Z \beta +\frac{Z^{2} \alpha }{2}-C_{1} Z +Z \beta -C_{2} \right) } \end{aligned}$$
(17)

with, again, \(L = \exp [-H_{0}/g(Z)]\), depending on the rate parameters \(\alpha\) and \(\beta\), the boundary conditions \(C_{i}\), and the degree of the Lambert W-function. Proper choice of boundary conditions generates a classic ‘inverted-U’ signal transduction Yerkes-Dodson Law analog (e.g., Wallace 2020a, 2021c; Diamond et al. 2007). That is, since \(Z \rightarrow \beta /\alpha\), we look at the cognition rate for fixed \(\alpha\) and boundary conditions \(C_{j}\) as \(\beta\) increases. The result is shown in Fig. 5, for appropriate boundary conditions.

Similar results follow if

$$\begin{aligned} \exp [-F/g(Z)] = h(g(Z)) \propto A_{m}g(Z)^{m} \end{aligned}$$

that is, if the function h(g(Z)) has a strongly dominant term of order \(m >0\).

Fig. 5
figure 5

Classic ‘inverted-U’ signal transduction for the cognition rate based on Eq. (17), setting \(\alpha =1, \, C_{1}=-2, \, C_{2}=-2, \, H_{0}=1\). Increase in \(\beta\) is taken as the ‘arousal’ measure

It is complicated – but not difficult – to incorporate stochastic effects into cognition rate dynamics based on Eq. (17), via standard methods from the theory of stochastic differential equations (e.g., Wallace 2021a, b, c). See the Mathematical Appendix for an example based on Fig. 5, fixing \(\beta =3\).

Cooperation: multiple workspaces

Individual brains – and indeed, even individual cells – are composed of interacting (often spatially distributed) cognitive submodules. Social groups are constituted by interacting individuals, separated by space, time, and/or social distance. Institutions are made up of dispersed but interacting ‘workgroups’, in a large sense. A critical phenomenon in all such examples is that the joint uncertainty of the dual information source associated with the particular level of cognition is less than or equal to the sum of the uncertainties of independent components – the information theory chain rule (Cover and Thomas 2006). To invert the argument, preventing crosstalk between cognitive submodules requires more investment of free energy or other resources than allowing interaction, as the electrical engineers often lament. From this lemon, evolution has made lemonaide (e.g., Wallace 2022).

More specifically for the work here, the emergence of a generalized Lambert W-function in Eq. (12), reducing to a ‘simple’ W-function if \(h(g(Z))=g(Z)\), is particularly suggestive. Recall that the fraction of network nodes in a giant component of a random network of N nodes with probability P of linkage between them can be given as (Newman 2010)

$$\begin{aligned} \frac{W_{L}(0,-NP\exp [-NP])+NP}{NP} \end{aligned}$$
(18)

where, again, the Lambert W-function emerges.

This expression has punctuated onset of a giant component of linked nodes only for \(NP >1\). See Fig. 6. In general, we might expect P to be a monotonic increasing function of g(Z).

Fig. 6
figure 6

Fraction of an N-node random network within the giant component as determined by the probability of contact between nodes P. The essential point is the punctuated accession to ‘global broadcast’ if and only if \(NP>1\) (e.g., Baars 1989; Dehaene and Changeux 2011). We might expect P to be a monotonic increasing function of g(Z)

Within broadly ‘social’ groupings, interacting cognitive submodules – individuals – can become linked into shifting, tunable, temporary, workgroup equivalence classes to address similarly rapidly shifting patterns of threat and opportunity. These might range from complicated but relatively slow multiple global workspace processes of gene expression and immune function to the rapid – hence necessarily stripped-down – single-workspace neural phenomena of higher animal consciousness (Wallace 2012).

More complicated approaches to such a phase transition – involving Kadanoff renormalizations of the Morse Function free energy measure F and related measures – can be found in Wallace (2005; 2012; 2022).

By contrast here, while multiple workspaces are most simply invoked in terms of a simultaneous set of the \(g_{j}(Z_{j})\), individual workspace tunability emerges from exploring equivalence classes of network topologies associated with a particular value of some designated \(g_{j}(Z_{j})\). Central matters then revolve around the equivalence class decompositions implied by the existence of the resulting workgroups, leading again to dynamic groupoid symmetry-breaking as they shift form and function in response to changing patterns of threat and opportunity. See Wallace (2021a, Sec. 12) for a parallel argument from an ergodic system perspective. Here, the g(Z) substitute for the ‘renormalization constant’ \(\omega\) in that development. For brevity, we omit a full discussion here.

Network topology is important

We can, indeed, calculate the cognition rate of a single ‘global workspace’ as follows.

Suppose we have N linked cognitive submodules – individuals – in a social ‘giant component’, i.e., we operate in the upper portion of Fig. 6. The free energy Morse Function F can be expressed in terms of a full-bore partition function as

$$\begin{aligned}& \exp [-F/g(Z)] = \sum _{k=1}^{N} \sum _{j=1}^{M} \exp [-H_{k,j}/g(Z)] \nonumber \\& \approx \sum _{k=1}^{N} \int _{0}^{\infty } \exp [-H_{k}/g(Z)]dH_{k} = \sum _{k=1}^{N}g(Z) = N g(Z) \nonumber \\& F=-\log [Ng(Z)]g(Z), \, g(Z)= \frac{-F}{W_{L}(n, -NF)} \end{aligned}$$
(19)

The sum over j represents available states within individual submodules, and the sum over k is across submodules, and \(W_{L}\) is again the Lambert W-function. Note, however, the appearance of the factor NF in the last expression.

The rate of cognition of the linked-node giant component can be expressed as

$$\begin{aligned} L=\frac{\sum _{k=1}^{N}\int _{H_{0}^{k}}^{\infty }\exp [-H_{k}/g(Z)]dH_{k}}{\sum _{k=1}^{N}\int _{0}^{\infty }\exp [-H_{k}/g(Z)]dH_{k}} \nonumber \\ =\frac{g(Z)\sum _{k}\exp [-H_{0}^{k}/g(Z)]}{N g(Z)} \nonumber \\ = \left( \sum _{k}\exp [-H_{0}^{k}/g(Z)] \right) /N \equiv <L_{k}> \end{aligned}$$
(20)

where, not entirely unexpectedly, \(<L_{k}>\) represents an averaging operation. More sophisticated averages might well be applied – at the expense of more formalism.

If we impose the approximation of Onsager nonequilibrium thermodynamics, i.e., defining \(S(Z)=-F(Z)+ZdF/dZ\), and assume \(dS/dZ=f(Z)\), we again obtain \(f(Z)=Z d^{2}F/dZ^{2}\), and can calculate g(Z) from the last expression in Eq. (19).

The appearance of N in expressions for g(Z) and L is of some note. In particular, groupthink – high values of \(H_{0}^{k}\) – may result in failure to detect important signals.

Such considerations lead to a fundamentally different picture that is not just possible, but often observed, i.e., a transmission model. Consider something like a colony of prairie dogs under predation by hawks. A ‘sentinel’ pattern emerges, rather than the average system of Eq. (20), through the ‘bottleneck’ of a single, very highly-optimized, ‘subcomponent’ of the larger social structure. This involves a hypervigilant individual – or a small number of such appropriately dispersed – whose special danger signal or signals can be imposed rapidly across the entire population – postures, vocalizations, or both. Then

$$\begin{aligned} L = \max _{k}{\{L_{k}=\exp [-H_{0}^{k}/g(Z)]\}} \end{aligned}$$
(21)

where \(\max\) is the maximization across the set \(\{ L_{k} \}\). However, Dore et al. (2019) show that, while brain activity in humans can track information sharing, there are likely to be important individual differences. That is, not everyone may respond the same to an ‘alert’ message.

Other network topologies are clearly possible. For example, in a rigidly hierarchical linear-chain business, military, or political setting, L will often be given by the minimization of cognition rates across the \(L_{k}\), that is, by a bottleneck model.

The critical dependence of a system’s cognitive dynamics on its underlying ‘social topology’ has profound implications for theories of the ‘extended conscious mind’ (Clark 2009; Lucia Valencia and Froese 2020). A particular expression of such matters involves failures of institutional cognition on wickedly hard problems (Wallace 2021b).

Time and resource constraints are important

Multiple workspaces, however, can also present a singular – and independent – problem of resource delivery, taking the scalar \(Z_{j}\) and time itself as essential resources. That is, not only are the \(Z_{j}\) both limited and delayed, in most cases, there will be a limit on possible response times, for both ‘predator’ and ‘prey’, so to speak.

If we assume an overall limit to available resources across a multiple workspace system \(j=1, \, 2, ...\) as \(Z=\sum _{j}Z_{j}\), and available time as \(T=\sum _{j}T_{j}\), then it is possible to carry out a simple Lagrangian optimization on the rate of system cognition \(\sim \sum _{j}\exp [-H_{j}^{0}/g_{j}(Z_{j})]\) as

$$\begin{aligned} \mathscr {L} \equiv \sum _{j} \exp [-H^{0}_{j}/g_{j}(Z_{j})] + \nonumber \\ \lambda \left( Z - \sum _{j}Z_{j} \right) + \mu \left( T - \sum _{j}T_{j} \right) \nonumber \\ \partial \mathscr {L}/\partial Z_{j} = 0, \, \, \, \partial \mathscr {L}/\partial T_{j} = 0 \end{aligned}$$
(22)

where we assume \(dZ_{j}/dt = f_{j}(Z_{j}(t)) = \beta _{j} - \alpha _{j}Z_{j}(t)\).

This leads, after some development, to a necessary expression for individual subsystem resource rates as

$$\begin{aligned} Z_{j} = f_{j}^{-1}(\mu /\lambda ) = \frac{\beta _{j}-\mu /\lambda }{\alpha _{j}} > 0 \end{aligned}$$
(23)

\(\mu\) and \(\lambda\) are to be viewed in economic terms as shadow prices imposed by ‘environmental constraints’, in a large sense (e.g., Jin et al. 2007; Robinson 1993).

The essential point is that cognitive dynamics – the rates of cognition driven by the rate of available resources \(Z_{j}\) and time – in this model, are strongly determined by the shadow price ratio \(\mu /\lambda\). The shadow price ratio is to be interpreted as an environmental signal. A sufficiently large shadow price ratio, according to Eq. (23), can starve essential components, driving their cognition rates to failure.

Further theoretical development

Following the arguments of Wallace (2021a), the cognition models can be extended in a number of possible directions, much as is true for ‘ordinary’ regression theory.

Perhaps the simplest next step is to replace the relation \(dZ/dt = f(Z(t))\) with a stochastic differential equation having the form \(dZ_{t} = f(Z_{t})dt + \sigma g(Z_{t})dB_{t}\), where \(dB_{t}\) represents ordinary white noise.

A next ‘simple’ generalization might be replacing the scalar index Z with a multidimensional vector quantity, \(\mathbf {Z}\), leading to an intricate set of simultaneous partial differential equations requiring, at best, Lie symmetry address. See the Mathematical Appendix for an outline.

Further development could involve expanding the ‘Onsager approximation’ in terms of a ‘generalized entropy’ \(S=\sum _{k}\epsilon _{k}Z^{k-1}F^{k-1}\), where \(F^{j} \equiv d^{j}F/dZ^{j}\). Then the dynamics might also be generalized, at least for a scalar Z, as \(\partial Z/\partial t \approx \sum _{j}\mu _{j}d^{j}S/dZ^{j} = f(Z)\), and so on toward multidimensional models.

As with regression equations much beyond \(Y = mX + b\), matters can rapidly become complicated indeed.

Discussion

Hasson et al. (2012), in a widely-cited foundational study, call for a reorientation of neuroscience from a single-brain to a multi-brain frame of reference:

Cognition materializes in an interpersonal space. The emergence of complex behaviors requires the coordination of actions among individuals according to a shared set of rules. Despite the central role of other individuals in shaping our minds, most cognitive studies focus on processes that occur within a single individual. We call for a shift from a single-brain to a multi-brain frame of reference. We argue that in many cases the neural processes in one brain are coupled to the neural processes in another brain via the transmission of a signal through the environment. Brain-to-brain coupling constrains and simplifies the actions of each individual in a social network, leading to complex joint behaviors that could not have emerged in isolation.

Kingsbury et al. (2019), Rose et al. (2021) and Baez-Mendoza et al. (2021) extend this perspective to interacting non-human populations, while Abraham et al. (2020), for humans, extend the time scale across generations.

Here, we outline something of the formal developments needed to implement such reorientation, based on groupoid symmetry-breaking within longstanding paradigms of information and control theories, as affected and afflicted by network topologies and their dynamics. This, it can be argued, is very much a ‘rocket science’ problem, since the difficulty lies not in the individual components of a possible comprehensive approach, which are all well-studied, but in using the building blocks to construct a theoretical enterprise that accounts well for observational and experimental data. Very similar conundrums confront contemporary theories of consciousness (e.g., Wallace 2022), albeit, in this case, without the hindrance of quite so many longstanding philosophical and de-facto theological presuppositions.

Application of the approach developed here to stochastic systems is straightforward if somewhat complicated, as is extension to higher approximation ‘Onsager-type’ entropy gradient models, the analog of moving from \(y=mx+b\) to \(y=mx^{2}+b\), and so on (Wallace 2021a, b).

In a deep-time sense, the underlying mechanisms have long been with us, i.e., the evolutionary exaptation of the inevitable second-law ‘leakage’ of crosstalk between co-resident cognitive processes (e.g., Wallace 2012). Crosstalk characterizes the immune system, wound-healing, tumor control, gene expression, and so on, up through and including far more rapid neural processes. It is not a great leap-of-faith to infer that similar dynamics instantiate social interactions between individuals within and across populations.

Mathematical appendix

Groupoids

We follow Brown (1992) closely. Consider a directed line segment in one component, written as the source on the left and the target on the right.

$$\begin{aligned} \bullet \longrightarrow \bullet \end{aligned}$$

Two such arrows can be composed to give a product \(\mathbf {ab}\) if and only if the target of \(\mathbf {a}\) is the same as the source of \(\mathbf {b}\)

$$\begin{aligned} \bullet {\mathop {\longrightarrow }\limits ^{\mathbf {a}}} \bullet {\mathop {\longrightarrow }\limits ^{\mathbf {b}}} \bullet \end{aligned}$$

Brown puts it this way,

One imposes the geometrically obvious notions of associativity, left and right identities, and inverses. Thus a groupoid is often thought of as a group with many identities, and the reason why this is possible is that the product \(\mathbf {ab}\) is not always defined.

We now know that this apparently anodyne relaxation of the rules has profound consequences... [since] the algebraic structure of product is here linked to a geometric structure, namely that of arrows with source and target, which mathematicians call a directed graph.

Cayron (2006) elaborates this as follows,

A group defines a structure of actions without explicitly presenting the objects on which these actions are applied. Indeed, the actions of the group G applied to the identity element e implicitly define the objects of the set G by ge = g; in other terms, in a group, actions and objects are two isomorphic entities. A groupoid enlarges the notion of group by explicitly introducing, in addition to the actions, the objects on which the actions are applied. By this approach, many identities may exist (they correspond to the actions that leave an object invariant).

It is of particular importance that equivalence class decompositions permit construction of groupoids in a highly natural manner.

Weinstein (1996) and Golubitsky and Stewart (2006) provide more details on groupoids and on the relation between groupoids and bifurcations.

An essential point is that, since there are no necessary products between groupoid elements, ‘orbits’, in the usual sense, disjointly partition groupoids into ‘transitive’ subcomponents.

The data rate theorem

Real-world environments are inherently unstable. Organisms, to survive, must exert a considerable measure of control over them. These control efforts range from immediate responses to changing patterns of threat and affordance, through niche construction, and, in higher animals, elaborate, highly persistent, social and sociocultural structures. Such necessity of control can, in some measure, be represented by a powerful asymptotic limit theorem of probability theory different from, but as fundamental as, the Central Limit Theorem: the Data Rate Theorem, first derived as an extension of the Bode Integral Theorem of signal theory.

Consider a reduced model of a control system as follows:

Fig. 7
figure 7

The reduced model of an inherently unstable system stabilized by a control signal \(U_{t}\)

For the inherently unstable system of Fig. 7, assume an initial n-dimensional vector of system parameters at time t, as \(x_{t}\). The system state at time \(t+1\) is then – near a presumed nonequilibrium steady state – determined by the first-order relation

$$\begin{aligned} x_{t+1}=\mathbf {A}x_{t}+\mathbf {B}u_{t}+W_{t} \end{aligned}$$
(24)

In this approximation, \(\mathbf {A}\) and \(\mathbf {B}\) are taken as fixed n-dimensional square matrices. \(u_{t}\) is a vector of control information, and \(W_{t}\) is an n-dimensional vector of Brownian white noise.

According to the DRT, if H is a rate of control information sufficient to stabilize an inherently unstable control system, then it must be greater than a minimum, \(H_{0}\),

$$\begin{aligned} H > H_{0} \equiv \log [\Vert \det [\mathbf {A}^{m}]\Vert ] \end{aligned}$$
(25)

where \(\det\) is the determinant of the subcomponent \(\mathbf {A}^{m}\) – with \(m \le n\) – of the matrix \(\mathbf {A}\) having eigenvalues \(\ge 1\). \(H_{0}\) is defined as the rate at which the unstable system generates ‘topological information’ on its own.

If this inequality is violated, stability fails.

Stochastic analysis for Fig. 5

Here, we apply the Ito Chain Rule (Protter 2005) to the expression \(L(Z)=\exp [-1/g(Z)]\), as based on Eq. (17) for g(Z). We set \(\alpha =1, \, \beta =3, \, C_{1}=C_{2}=-2\) and numerically calculate the solution set for the relation \(<dL_{t}>=0\) based on the underlying stochastic differential equation

$$\begin{aligned} dZ_{t} = (\beta - \alpha Z_{t})dt + \sigma Z_{t} dB_{t} \end{aligned}$$
(26)

where \(dB_{t}\) is assumed to be ordinary white noise, as associated with Brownian motion.

Figure 8 shows the result, the solution equivalance class \(\{ \sigma , \, Z \}\) for \(\beta =3\), just to the left of the peak in Fig. 5. Note that, at \(\sigma \approx 0.278\), the system becomes susceptible to a bifurcation instability, well before the ‘standard’ instability expected from a second-order Ito Chain Rule analysis based on Eq. (26). More specifically, the nonequilibrium steady state (nss) conditions associated with Eq. (26) are the relations \(<Z_{t}> = \beta /\alpha\) and Var\([Z_{t}] = \left( \beta /(\alpha - \sigma ^{2}/2) \right) ^{2} - (\beta /\alpha )^{2}\), so that variance in Z explodes as \(\sigma \rightarrow \sqrt{2 \alpha }\), here, \(\sqrt{2}\).

Similar analyses to Fig. 8 across increasing values of \(\beta\) produce increasingly complicated equivalence classes \(\{\sigma , \, Z \}\), as constrained by the nss conditions on \(Z_{t}\).

Fig. 8
figure 8

Numerical solution equivalence class \(\{ \sigma , \, Z \}\) for the relation \(<dL_{t}>=0\) from Fig. 5, taking \(\beta =3\), just to the left of the peak. Again, \(\alpha = 1, \, C_{1}=C_{2}=-2\). While, in this model, \(Z_{t}\) becomes unstable in variance at \(\sigma > \sqrt{2}\), the cognition rate can suffer a bifurcation instability for \(\sigma > \approx 0.278\)

Higher dimensional systems

Above, we have viewed systems as sufficiently well characterized by the single scalar parameter Z, mixing material resource/energy supply with internal and external flows of information. The real world, however, may often be far more complicated. That is, invoking techniques akin to Principal Component Analysis, there may be more than one independent composite entity irreducibly driving system dynamics. It may then be necessary to replace the scalar Z with an n-dimensional vector \(\mathbf {Z}\) having orthogonal components that, together, account for a good portion of the total variance in the rate of supply of essential resources. The dynamic equations are then in vector form:

$$\begin{aligned} F(\mathbf {Z})=-\log \left( h(g(\mathbf {Z})) \right) g(\mathbf {Z}) \nonumber \\ S = -F + \mathbf {Z} \cdot \nabla _{\mathbf {Z}}F \nonumber \\ \partial \mathbf {Z}/ \partial t \approx \hat{\mu } \cdot \nabla _{\mathbf {Z}}S = f(\mathbf {Z}) \nonumber \\ -\nabla _{\mathbf {Z}}F + \nabla _{\mathbf {Z}}(\mathbf {Z} \cdot \nabla _{\mathbf {Z}}F)= \nonumber \\ \hat{\mu }^{-1} \cdot f(\mathbf {Z}) \equiv f^{*}(\mathbf {Z}) \nonumber \\ \left( \left( \partial ^{2}F/\partial z_{i}\partial z_{j} \right) \right) \cdot \mathbf {Z}= f^{*}(\mathbf {Z}) \nonumber \\ \left( \left( \partial ^{2}F/\partial z_{i}\partial z_{j} \right) \right) |_{\mathbf {Z}_{nss}} \cdot \mathbf {Z_{nss}}= \mathbf {0} \end{aligned}$$
(27)

Here, F, g, h, and S are scalar functions, and \(\hat{\mu }\) is an n-dimensional square matrix of diffusion coefficients. The matrix \(((\partial F/\partial z_{i}\partial z_{j} ))\) is the obvious n-dimensional square matrix of second partial derivatives, and \(f(\mathbf {Z})\) is a vector function. The last relation imposes a nonequilibrium steady state condition, i.e. \(f^{*}(\mathbf {Z}_{nss})=\mathbf {0}\).

For the ‘simple’ Rate Distortion approach, \(h(g(\mathbf {Z})) \rightarrow g(\mathbf {Z})\), while, again, we assume \(\mathbf {Z}(t) \rightarrow \mathbf {Z}_{nss}\).

For \(n \ge 2\), this is an overdetermined system of partial differential equations (Spencer 1969). Indeed, for a general \(f^{*}(\mathbf {Z})\) the system is inconsistent, resulting in as many as n different expressions for \(F(\mathbf {Z})\), and hence the same number of ‘temperature’ measures as determined by the relation \(F=-\log (g)g\).