1 Introduction

It is a common observation that natural languages are ambiguous, namely, that linguistic utterances can potentially be assigned more than one interpretation and that receivers of linguistic utterances need to resort to supplementary information (i.e., the linguistic or the communicative context) to choose one among the available interpretations.

Both linguists and logicians have been interested in this observation. On the one hand, a traditional task of grammar is to illustrate and classify ambiguity, which may be of different types, as well as to determine how apparently ambiguous utterances are disambiguated at the relevant level of representation; indeed, the search for a parsimonious treatment of certain ambiguities such as scope ambiguities has been one of the most powerful motors in the development of the formal inquiry of the syntax-semantics interface, since its modern inception in Montague’s semiotic program (Montague 1974). It is no exaggeration at all, in our opinion, to say that the presence of ambiguity (particularly, scope ambiguity) and the apparent mismatch between the form and the alleged semantic structure of quantified expressions in natural languages have been the two major guiding problems in the development of a formal theory of the syntax-semantics interface of natural languages.

On the other hand, logicians in general would not be as interested in describing or characterizing the phenomenon of ambiguity, as in the construction of unambiguous artificial languages whose primitive symbols have a univocal interpretation and whose formulae are constructed by the appropriate recursive syntactic definitions and unambiguously interpreted by the relevant compositional semantic rules, formulated as recursive definitions that trace back the syntactic construction of the formulae. Not surprisingly, some philosophers, such as (Wittgenstein 1992, 3.323–3.325), identified the ambiguity of ordinary language as the source of philosophical confusion, and aspired to construct a language whose signs were univocal and whose propositions mirrored the logical structure of reality itself.

It is a rather common view that the presence of phenomena such as ambiguity and garden path sentences suggests that language is poorly designed for communication (Chomsky 2008). In fact, there are at least two opposite starting hypotheses about the nature and emergence of ambiguity in natural communication systems. It could well be that ambiguity is an intrinsic imperfection, since natural, self-organized codes of communication are not perfect, but coevolved through a fluctuating medium and no one designed them. But it may be as well that ambiguity is the result of an optimization process, inasmuch as natural, self-organized codes of communication must satisfy certain constraints that a non-ambiguous artificial language can afford to neglect. Logical languages, for instance, are constructed to study relations such as logical consequence and equivalence among well-formed formulae, for which it is desirable to define syntactic rules that do not generate syntactically ambiguous expressions.Footnote 1 However, in the design of logical languages that provide the appropriate tools for that particular purpose, certain features that may be crucial in the emergence of natural communication systems are neglected, such as the importance of the cost in generating expressions and the role of cooperation between the coder and the decoder in the process of communicating those expressions (Wang and Steels 2008), a factor that is completely extraneous to the design of logical languages.

Despite the indisputability of ambiguity in natural languages and the attention that this observation has received among linguists, philosophers and logicians, it is fair to conclude that the emergence of ambiguity has not come yet under serious theoretical scrutiny.

In this article we provide a general mathematical definition of ambiguity through the computational concept of logical irreversibility and we quantify the ambiguity of a code in terms of the amount of uncertainty of the reversal of the coding process. We finally capitalize on the two above-mentioned factors (the importance of the cost in generating expressions and the role of cooperation between the coder and the decoder) in order to provide a detailed argument for the idea that ambiguity is an unavoidable consequence of the following efficiency factor in natural communication systems: interacting communicative agents must attain a code that tends to minimize the complexities of both the coding and the decoding processes. As a proof of concept, we thoroughly explore a simple system based on two agents—coder and decoder—under a symmetrical—cooperative—scenario, and we show that ambiguity must emerge.

The remainder of this article is organized as follows. In Sect. 2 we introduce Landauer’s concept of logical (ir)reversibility of a given computational device and we quantify the degree of ambiguity of a code as the amount of uncertainty in reversing the coding process. In Sect. 3 we introduce Zipf’s vocabulary balance condition, a particular instance of Zipf’s Least Effort Principle (Zipf 1965), and we show how it can be properly generalized and accommodated to the information-theoretic framework adopted in Sect. 2. We conclude our reasoning by showing that, if the coding and the decoding processes are performed in a cooperative regime expressed in terms of a symmetry equation between coding and decoding complexities, a certain amount of logical uncertainty or ambiguity is unavoidable. In Sect. 4 we recapitulate our derivation of the presence of ambiguity and we stress a further important result intimately related to our development: Zipf’s law, another well-known and ubiquitous feature of natural language products, is the sole expected outcome of an evolving communicative system that satisfies the symmetry equation between coding and decoding complexities, as argued in (Corominas-Murtra et al. 2011). “Appendix” section emphasizes the relationship between logical irreversibility and thermodynamical irreversibility.

2 Ambiguity and Logical Reversibility

In this section we begin by informally presenting the concept of logical (ir)reversibility of a given computation (Sect. 2.1). Subsequently we formally explain how a code generated through logically irreversible computations is necessarily ambiguous (Sect. 2.2) and we quantify the degree of ambiguity as the minimum amount of information needed to properly reconstruct a given message (Sect. 2.3).

2.1 Logical Irreversibility

Traditionally, the concept of computation is theoretically studied as an abstract process of data manipulation and only its logical properties are taken into consideration; however, if we want to investigate how an abstract computation is realized by a physical system, such as an electronic machinery, an abacus or a biological system as the brain, it becomes important to consider the connections between the logical properties of (abstract) computations and the physical—or more precisely, thermodynamical—properties of the system that perform those computations.

The fundamental examination of the physical constraints that computations must satisfy when they are performed by a physical system was started by Landauer (1961), and continued in several other works (Bennett 1973, 2008; Bennett and Landauer 1985; Ladyman et al. 2007; Toffoli 1980). The general objective of these approaches is to determine the physical limits of the process of computing, the “general laws that must govern all information processing no matter how it is accomplished” (Bennett and Landauer 1985, p. 48). Accordingly, the concept of computation is subject to the same questions that apply to other physical processes, and thus the following questions become central to the physical study of computational devices (Bennett and Landauer 1985, p. 48):

  1. 1.

    How much energy must be expended to perform a particular computation?

  2. 2.

    How long must a computation take?

  3. 3.

    How large must the computing device be?

It must be remarked that by answering these questions one expects to find fundamental physical principles that govern any kind of information processing, and not the actual limits of a particular technological implementation. A central objective within this general framework is the study of the properties of reversibility and irreversibility relative to a computational process, a distinction that was first considered in relation to the problem of heat generation during the computational process. An important idea relative to question (1) is formulated in the so-called Landauer’s principle, according to which any irreversible computation generates an amount of heat (cfr. Bennett and Landauer 1985 and also “Appendix” section). Let us thus introduce the concept of logical reversibility/irreversibility, which is crucial to our concerns.

As remarked by Bennett and Landauer (1985), no computation ever generates information since the output is implicit in the input. However many operations destroy information whenever two previously distinct situations become indistinguishable, in which case the input contains more information than the output. For instance, if we consider the operation + defined on \(\mathbb N \), the output 5 can be obtained from the following inputs: (0, 5), (1, 4), (2, 3), (3, 2), (4, 1), (5, 0). The concept of logical irreversibility is introduced in (Landauer (1961), p. 264) in order to study those computations for which its input cannot be unequivocally determined from its output:

We shall call a device logically irreversible if the output of a device does not uniquely define the inputs.

Conversely, a device is logically reversible if its output can be unequivocally defined from the inputs—see Fig. 1.

Fig. 1
figure 1

A computation is said to be logically irreversible if the input cannot be univocally defined only with the knowledge of the output. Here we have two examples of simple logic gates performing irreversible computations. The AND gate (which corresponds to the logic connective \(\wedge \)) has the truth table shown on the bottom. Although the output 1 can be uniquely obtained through the input (1, 1) as input, the input for the output 0 can be either (1, 0), (0, 1) or (0, 0). The presence of a 0 in the output is not enough to properly revert the computational process, and therefore we need an amount of extra information if we want to know the inputs with no errors. Therefore, the computations of this gate are logically irreversible. The same occurs with the OR gate, corresponding to the logical connective \(\vee \). This example has been taken from Bennett and Landauer (1985)

2.2 Logical (Ir)Reversibility in Terms of Turing Machines

In this subsection we define logically (ir)reversible computations in terms of a Turing machine.

We informally describe, as usual, the action of a Turing machine on an infinite tape divided into squares as a sequence of simple operations that take place after an initial moment. At each step, the machine is at a particular internal state and its head examines a square of the tape (i.e., reads the symbol written on a square). The machine subsequently writes a symbol on that square, changes its internal state and moves the head one square leftwards or rightwards or remains at the same square.

Formally, a deterministic Turing machine \(\mathcal TM \) is a triple composed of a finite set of internal states Q, a finite set of symbols \(\varSigma \) (an alphabet) and a transition function \(\delta \),

$$\begin{aligned} \mathcal TM = (Q, \varSigma , \delta ). \end{aligned}$$

There is an initial state s belonging to Q and in general \(\textit{Q} \cap \varSigma = \emptyset \). Two special symbols, the blank \(\sqcup \) and the initial symbol \(\vartriangleright \), belong to \(\varSigma \). Additionally the transition function \(\delta \) is understood as follows:

$$\begin{aligned} \delta : \textit{Q} \times \varSigma \rightarrow \textit{Q} \times \varSigma \times \left\{ \textit{L}, \textit{R}, \square \right\} , \end{aligned}$$

where ‘L’ and ‘R’ mean, respectively, “move the head one square leftwards or rightwards”, and ‘\(\square \)’ means “stay at the square just examined”.

Thus, \(\delta \) is the program of the machine; it specifies, for each combination of current state \(\sigma _i \in \textit{Q}\) and current symbol \(r_k \in \varSigma \), a triple

$$\begin{aligned} \langle \sigma _j, r_{\ell }, \textit{D}\rangle , \end{aligned}$$

where \(\sigma _j\) is the step immediately after \(\sigma _i,\,r_{\ell }\) is the symbol to be overwritten on \(r_k\), and \(\textit{D}\,\in \,\left\{ \textit{L}, \textit{R}, \square \right\} \). A schema of how a computation is performed in a Turing machine is shown in Fig. 2.Footnote 2

Fig. 2
figure 2

The sequence of a given computation in a Turing machine—see text

In general, we say that \(\mathcal TM \) performs only logically reversible computations if the inverse function of \(\delta ,\,\delta ^{-1}\), defined as

$$\begin{aligned} \delta ^{-1}: \textit{Q} \times \varSigma \times \left\{ \textit{L}, \textit{R}, \square \right\} \rightarrow \textit{Q} \times \varSigma ,\nonumber \end{aligned}$$

exists. This implies that for any input we have a different output and, therefore, we can invert the process for every element of the input set. If \(\delta ^{-1}\) does not exist then

$$\begin{aligned} \left( \exists \alpha , \beta \in \textit{Q} \times \varSigma \right) :\left( (\alpha \ne \beta )\wedge \left( \delta (\alpha )=\delta (\beta )\in \textit{Q} \times \varSigma \times \left\{ \textit{L}, \textit{R}, \square \right\} \right) \right) . \end{aligned}$$

Therefore, from the knowledge of \(\gamma =\delta (\alpha )=\delta (\beta )\) we cannot determine with certainty whether the actual value of the input was either \(\alpha \) or \(\beta \).

After these general definitions, we shall provide a particular definition of a Turing machine suitable for the study of the coding process. This coding machine \(\mathcal T \) will be compounded of a set of internal states \(\textit{Q}\), a transition function \(\delta \) and two alphabets—an input alphabet \(\varOmega =\{m_1,\ldots ,m_n\}\) and an output alphabet \(\textit{S}=\{s_1,\ldots ,s_m\}\):

$$\begin{aligned} \mathcal T = \left( Q, \varOmega , S, \delta \right) . \end{aligned}$$

We shall call the elements of \(\varOmega \) referents and the elements of \(\textit{S}\) signs. We assume that \(\textit{Q} \cap S\) = \(\emptyset \) and that \(\textit{Q} \cap \varOmega = \emptyset \). For simplicity, we also assume that, in a coding process, the two alphabets are disjoint,

$$\begin{aligned} \varOmega \bigcap \textit{S} = \emptyset , \end{aligned}$$

i.e., an object is either a sign or a referent, but not both. Technically, this implies that \(\mathcal T \) can never reexamine a square where a sign has been printed, which means that \(\mathcal T \) must move always in the same direction; assume for concreteness that it must move rightwards. Accordingly, we define the transition function as follows:

$$\begin{aligned} \delta : \textit{Q} \times \varOmega \rightarrow \textit{Q} \times S \times \left\{ \textit{R}\right\} \end{aligned}$$
(1)

In order to clearly identify input and output configurations, we express the applications of \(\delta \) in the following terms:

$$\begin{aligned} \sigma _k m_i \rightarrow \sigma _js_{\ell }R, \end{aligned}$$

where \(\sigma _k m_i\) and \(\sigma _js_{\ell }R,\,(\sigma _k, \sigma _j\in Q,\,m_i\in \varOmega ,\,s_{\ell }\in S\)) are respectively input and output configurations.

A coding machine \(\mathcal T \) will perform solely logically reversible computations if there exists the inverse function of \(\delta ,\,\delta ^{-1}\), defined as

$$\begin{aligned} \delta ^{-1}: \textit{Q} \times S \times \left\{ \textit{L}\right\} \rightarrow \textit{Q} \times \varOmega . \end{aligned}$$

This implies that for any input configuration we have a different output configuration and, therefore, we can invert the process for every element of \(\varOmega \). The inexistence of \(\delta ^{-1}\) is due to the fact that

$$\begin{aligned} \left( \exists \alpha , \beta \in \textit{Q} \times \varOmega \right) :\left( \left( \alpha \ne \beta \right) \wedge \left( \delta (\alpha )=\delta (\beta )\in \textit{Q} \times S \times \left\{ R\right\} \right) \right) . \end{aligned}$$

If a coding machine is logically irreversible then it generates signs which are ambiguous in the sense that they encode more than one referent.

As shown by Bennett (1973), a logically irreversible Turing machine can always be made logically reversible at every step. Thus, logical irreversibility is not an essential property of computations. Regarding the general questions (2) and (3) above formulated, it is particularly relevant for our concerns that a logically reversible machine need not be much more complicated than the irreversible machine it is associated with: computations on a reversible machine take about twice as many steps as on the irreversible machine they simulate and do not require a remarkably larger computing device.Footnote 3

Therefore, the study of the complexity of the computations of the coding device alone does not seem to offer a necessity argument for the emergence of ambiguity but only a relatively weak plausibility argument, since reversible computations do not need to be significantly more complex than irreversible computations. However, as we shall see, it is possible to obtain a strong necessity argument for the emergence of ambiguity if instead of considering the complexity of the coding machine in isolation we study how a coding machine interacts with a decoding machine in an optimal way.

2.3 Logical Reversibility and Ambiguous Codes

Before proceeding further in studying the concepts of logical (ir)reversibility in a communication system formed by two agents (a coder and a decoder) and the channel, some clarifications are in order. Firstly, we note that logical reversibility refers to the potential existence of a reconstruction or decoding algorithm, which does not entail that, in a real scenario, such algorithm is at work; in other words, logical (ir)reversibility is a feature of the computations alone. Secondly, in the process of transmitting a signal through a channel, the presence of noise in the channel through which the output is received may be responsible for the emergence of logical irreversibility. This implies that, although the coder agent could in principle compute in a reversible regime, the noise of the channel makes the cascade system coder agent + channel analogue to a single computation device working in an irreversible regime. And finally, whereas logical (ir)reversibility is a property of the computational device (or coding algorithm) related to the potential existence of a reconstruction algorithm (or decoding algorithm), ambiguity is a property referred to signs: we say that a sign (an output of a coding computation) is ambiguous when the decoder can associate it with more than one referent (or input). A sign transmitted through a channel is ambiguous if the cascade coder agent + channel is logically irreversible, which may be due to the computations of the coding agent itself or due to the noise of the channel.

2.3.1 Noise: Quantifying the Degree of Ambiguity

The minimal amount of additional information needed to properly reconstruct the input from the knowledge of the output is identified as the quantitative estimator of ambiguity. The more additional information we need, the more ambiguous the code is. This minimal amount of dissipated information is known as noise in standard information theory, and its formulation in terms of the problem we are dealing with is the objective of this subsection and the following one.

To study logical irreversibility in information-theoretical terms, we choose the following simpler version of the transition function \(\delta \) given in (1):

$$\begin{aligned} \delta :\varOmega \rightarrow S. \end{aligned}$$

This choice puts aside the role of the states but is justified for the sake of clarity and because the qualitative nature of the results does not change: the only changes are the sizes of the input and output sets. Let \(\delta _{ij}\) be a matrix by which

$$\begin{aligned} \delta _{ij}=\left\{ \begin{array}{ll} 1\Leftrightarrow \delta (m_i)=s_j\\ 0\quad \mathrm{otherwise.} \end{array}\nonumber \right. \end{aligned}$$

Since the machine is deterministic, there is no possibility of having two outputs for a given input; therefore

$$\begin{aligned} (\forall k\le n)(\exists ! i\le m):\left[ (\delta _{ki}=1)\wedge (\forall j\ne i)(\delta _{kj}=0)\right] . \end{aligned}$$

In terms of probabilities, this deterministic behavior means that:

$$\begin{aligned} \mathbb P (s_k|m_i)=\delta _{ik}. \end{aligned}$$
(2)

To properly study the reversibility of the above defined coding machine \(\mathcal T \), let us define two random variables, \(X_{\varOmega }, X_S\). \(X_{\varOmega }\) takes values on the set \(\varOmega \) following the probability measure \(p\), being \(p(m_k)\) the probability to have symbol \(m_k\) as the input in a given computation. Essentially, \(X_{\varOmega }\) describes the behavior of a fluctuating environment. \(X_S\) takes values on \({S}\) and follows the probability distribution \(q\), which for a given \(s_i\in { S}\), takes the following value:

$$\begin{aligned} q(s_i)=\sum _{k\le n}p(m_k)\delta _{ki}, \end{aligned}$$
(3)

i.e., the probability of obtaining symbol \(s_i\) as the output of a computation. The amount of uncertainty in recovering the inputs from the knowledge of the outputs of the computations performed by \(\mathcal T \) is related to logical irreversibility. In fact, this amount of uncertainty is precisely the amount of extra information we need to introduce to have a non-ambiguous code. This amount of conditional uncertainty or extra information needed is well defined by the uncertainty function or Shannon’s conditional entropy:Footnote 4

$$\begin{aligned} H(X_{\varOmega }|X_S) =-\sum _{k\le m} q(s_k)\sum _{i\le n} \mathbb P (m_i|s_k)\log \mathbb P (m_i|s_k). \end{aligned}$$
(4)

To properly derive \(\mathbb P (m_i|s_k)\) in terms of the transition function of the coding machine we know, virtue of Bayes’ theorem, that:

$$\begin{aligned} \mathbb P (m_i|s_k)q(s_k)=\mathbb P (s_k|m_i)p(m_i), \end{aligned}$$

which, using Eqs. (2, 3) leads us to a general expression only depending on the transition function of the coding machine and the prior probabilities \(p\):

$$\begin{aligned} \mathbb P (m_i|s_k)= p(m_i)\delta _{ik}\left( \sum _{j\le n}\delta _{jk}p(m_j) \right) ^{-1}. \end{aligned}$$

We observe that in the special case where \((\forall m_i)(p(m_i)=\frac{1}{n})\), the above equation simply reads:

$$\begin{aligned} \mathbb P (m_i|s_k)= \left( \sum _{i\le n}\delta _{jk} \right) ^{-1}\delta _{jk}. \end{aligned}$$

Eq. (4) is the amount of noise, i.e., the information that is dissipated during the communicative exchange or, conversely, the (minimum) amount of information we need to externally provide for the system in order to perfectly reconstruct the input. Consistently with this interpretation, the amount of information shared by \(X_{\varOmega }\) and \(X_{S}\), to be written as \(I(X_S;X_{\varOmega })\), will be:

$$\begin{aligned} I(X_S;X_{\varOmega })=H(X_{\varOmega })-H(X_{\varOmega }|X_S), \end{aligned}$$
(5)

which is the so-called Shannon Information or Mutual Information among the two random variables \(X_S, X_{\varOmega }\) (Cover and Thomas 1991; Ash 1990). In our particular case, such measure quantifies the amount of information we have from the input set by the only knowledge of the output set after the computations.

2.3.2 Ambiguity and Logical Irreversibility

The interpretation we provided for the noise equation enables us to connect ambiguity and logical (ir)reversibility. First, we emphasize a crucial fact: by the properties of Shannon’s entropy,

$$\begin{aligned} H\left( X_{\varOmega }|X_S\right) \ge 0, \end{aligned}$$

which explicitly states that information can be either destroyed or maintained but never created in the course of a given computation—as pointed out in (Bennett and Landauer 1985).

If there is no uncertainty in defining the input signals by the only knowledge of the outputs, then

$$\begin{aligned} H(X_{\varOmega }|X_S)=0, \end{aligned}$$

i.e., there is certainty when reversing the computations performed by the coding machine. Therefore, the computations performed by \(\mathcal T \) to define the code are logically reversible and the code is not ambiguous. Otherwise, if

$$\begin{aligned} H(X_{\varOmega }|X_S)>0, \end{aligned}$$

then, we need extra information (at least \(H(X_{\varOmega }|X_S)\)) to properly reverse the process, which indicates that the computations defining the code are logically irreversible and, thus, that the code is ambiguous.

We have therefore identified in a quantitative way the ambiguity of the code with the amount of uncertainty of the reversal of the coding process or the minimal amount of additional information we need to properly reverse the coding process. Furthermore, we have identified the source of uncertainty through the concept of logical irreversibility, which is a feature of the computations generating the code. In this way, we establish the following correspondences:

$$\begin{aligned}&\mathrm{logically\;reversible\;computations\;\Leftrightarrow \;No\;Ambiguity}\;\Leftrightarrow H(X_{\varOmega }|X_S)=0\\&\mathrm{logically\;irreversible\;computations\;\Leftrightarrow \;Ambiguity}\;\Leftrightarrow \;H(X_{\varOmega }|X_S)>0\\&\mathrm{Amount\;of\;Ambiguity\;}=H(X_{\varOmega }|X_S). \end{aligned}$$

Now that we have rigorously defined ambiguity on theoretical grounds of computation theory and information theory, we are ready to explain why it appears in natural communication systems. As we shall see in the following section, the reason is that natural systems must satisfy certain constraints that generate a communicative tension whose solution implies the emergence of a certain amount of ambiguity.

3 The Emergence of Ambiguity in Natural Communication

The tension we referred to at the end of the last section was postulated by the linguist Zipf (1965) as the origin of the widespread scaling behavior of word appearance having his name. Such a communicative tension was conceived in terms of a balance between two opposite forces: the speaker’s economy force and the auditor’s economy force.

3.1 Zipf’s Hypothesis

Let us thus informally present Zipf’s vocabulary balance between two opposite forces, the speaker’s economy force and the auditor’s economy force (Zipf 1965, pp. 19–31). The speaker’s economy force (also called unification force) is conceived as a tendency “to reduce the size of the vocabulary to a single word by unifying all meanings”, whereas the auditor’s economy force (or diversification force) “will tend to increase the size of a vocabulary to a point where there will be a distinctly different word for each different meaning”. Therefore, a conflict will be present while trying to simultaneously minimize these two theoretical opposite forces, and the resulting vocabulary will emerge from a cooperative solution to that conflict. In Zipf’s words,

whenever a person uses words to convey meanings he will automatically try to get his ideas across most efficiently by seeking a balance between the economy of a small wieldy vocabulary of more general reference on the one hand, and the economy of a larger one of more precise reference on the other, with the result that the vocabulary of n different words in his resulting flow of speech will represent a vocabulary balance between our theoretical Forces of Unification and Diversification (Zipf 1965, p. 22).

Obviously the unification force ensures a minimal amount of lexical ambiguity, since it will require some words to convey more than one meaning, and the diversification force constrains such amount. Thus, lexical ambiguity can be viewed as a consequence of the vocabulary balance. Although Zipf’s vocabulary balance, as stated, provides a useful intuition to understand the emergence of lexical ambiguity by emphasizing the cooperative strategy between communicative agents, it lacks the necessary generality to provide a principled account for the origins of ambiguity beyond the particular case of lexical ambiguity. In the following sections we shall present several well-known concepts in order to generalize Zipf’s informal condition and provide solid foundations for it.

We remark that Zipf conceived the vocabulary balance as a particular case of a more general principle, the Least Effort Principle, “the primary principle that governs our entire individual and collective behaviour of all sorts, including the behaviour of our language and preconceptions” (Zipf 1965, p. 22). In Zipf’s terms,

the Principle of Least Effort means, for example, that a person in solving his immediate problems will view these against the background of his probable future problems as estimated by himself. Moreover he will strive to solve his problems in such a way as to minimize the total work that must be expended in solving both his immediate problems and his probable future problems. That in turn means that the person will strive to minimize the probable average rate of his work-expenditure (over time). And in so doing he will be minimizing his effort, by our definition of effort. Least effort, therefore, is a variant of least work.

Hence, we consider the symmetry equation between the complexities of the coder and the decoder we shall arrive at to be a particular instance of the Least Effort Principle.

3.2 Symmetry in Coding/Decoding Complexities

How can we accommodate the previous intuitions to the formal framework proposed in Sect. 2? The auditor’s economy force leads to a one-to-one mapping between \({\varOmega }\) and \(\mathcal{S}\). In this case, the computations performed by \(\mathcal T \) to generate the code are logically reversible and thus generate an unambiguous code, and no supplementary amount of information to successfully reconstruct \({X_\varOmega }\) is required. However, the speaker’s economy force conspires exactly in the opposite direction. In these latter terms, the best option is an all-to-one mapping, i.e., a coding process where any realization of \(X_{\varOmega }\) is coded through a single signal. The coding computations performed by \(\mathcal T \) are logically irreversible and the generated code is ambiguous, for it is clear that the knowledge of the output tells us nothing about the input. In order to characterize this conflict, let us properly formalize the above intuitive statement: the auditor’s force pushes the code in such a way that it is possible to reconstruct \(X_{\varOmega }\) through the intermediation of the coding performed by \(\mathcal T \). Therefore, the amount of bits the decoder of \(X_S\) needs to unambiguously reconstruct \(X_{\varOmega }\) is

$$\begin{aligned} H\left( X_{\varOmega }, X_S\right) =-\sum _{i\le n} \sum _{k\le n}\mathbb P (m_i, s_k)\log \mathbb P (m_i, s_k), \end{aligned}$$

which is the joint Shannon entropy or, simply, joint entropy of the two random variables \(X_{\varOmega }, X_S\) (Cover and Thomas 1991). From the codification process, the auditor receives \(H(X_S)\) bits, and thus, the remaining uncertainty it must face will be

$$\begin{aligned} H\left( X_{\varOmega }, X_S\right) -H(X_S)=H\left( X_{\varOmega }|X_S\right) , \end{aligned}$$

where

$$\begin{aligned} H(X_S)=-\sum _{i\le n}q(s_i)\log q(s_i), \end{aligned}$$

(i.e, the entropy of the random variable \(X_S\)) and

$$\begin{aligned} H(X_{\varOmega }|X_S)=-\sum _{i\le n}q(s_i)\sum _{k\le n}\mathbb P (m_k|s_i)\log \mathbb P (m_k|s_i), \end{aligned}$$

the conditional entropy of the random variable \(X_{\varOmega }\) conditioned to the random variable \(X_S\). At this point Zipf’s hypothesis becomes crucial. Under this interpretation, the tension between the auditor’s force and the speaker’s force is cooperatively solved by imposing a symmetric balance between the efforts associated to each communicative agent: the coder sends as many bits as the additional bits the decoder needs to perfectly reconstruct \(X_{\varOmega }\):

$$\begin{aligned} H(X_S)=H(X_{\varOmega }|X_S). \end{aligned}$$
(6)

This is the symmetry equation governing the communication among cooperative agents when we take into account computational efforts—which have been associated here with the entropy or complexity of the code.Footnote 5 Selective pressures will push \(H(X_S)\) and, at the same time, by Eq. (6), the amount of ambiguity will also grow, as a consequence of the cooperative nature of communication.Footnote 6

Equation (6) specifies that a certain amount of information must be lost (or equivalently, a certain amount of ambiguity must appear) if coder and decoder minimize their efforts in a symmetric scenario. A further question is how much information is lost due to Eq. (6). In order to measure this amount of information, we must take into consideration the properties of the Shannon Information or Mutual Information among the two random variables \(X_S, X_{\varOmega }\), defined in Eq. (5). An interesting property of Shannon information is its symmetrical behavior, i.e., \(I(X_S;X_{\varOmega })=I(X_{\varOmega }; X_S)\) (Cover and Thomas 1991; Ash 1990). Thus, by Eq. (5),

$$\begin{aligned} H(X_{\varOmega })-H\left( X_{\varOmega }|X_S\right) =H(X_S)-H\left( X_S|X_{\varOmega }\right) , \end{aligned}$$

where \(H(X_s|X_{\varOmega })=0\), because the Turing machine is deterministic.Footnote 7 Therefore, by applying directly Eq. (6) to the above equation we reach the following identity:

$$\begin{aligned} H(X_S)=\frac{1}{2}H(X_{\varOmega }). \end{aligned}$$
(7)

Thus,

$$\begin{aligned} I(X_S;X_{\varOmega })&= H(X_{\varOmega })-H(X_{\varOmega }|X_S)\nonumber \\ {by\;eq.\;(6)}&= H(X_{\varOmega })-H(X_S)\nonumber \\ {by\;eq.\;({7})}&= \frac{1}{2}H(X_{\varOmega }). \end{aligned}$$
(8)

The above derivation shows that half of the information is dissipated during the communicative exchange if coding and decoding computations are symmetrically or cooperatively optimized. Accordingly, an amount of ambiguity must emerge. Ambiguity is not an inherent imperfection of a communication system or a footprint of poor design, but rather a property emerging from conditions on efficient computation: coding and decoding computations have a cost when they are performed by physical agents and thereby it becomes crucial to minimize the costs of coding and decoding processes. Whereas studying the process of an isolated coding agent would not provide a necessity argument for the emergence of ambiguous codes (as noted in Sect. 2.2, following Bennett 1973), a formalization of an appropriately general version of Zipf’s intuitions along the course we have developed provides a general necessity argument for the emergence of ambiguity.

Note that he derivation in (8) has been performed by assuming that there is no noise affecting the process of output set observation. If we assume the more realistic situation in which there is noise in the process of output observation, the situation is even worse, and, actually, \(I(X_s;X_{\varOmega })=\frac{1}{2}H(X_{\varOmega })\) would be considered as an upper bound; therefore, in presence of noise in the process of output observation, this equation must be replaced by:

$$\begin{aligned} I(X_S;X_{\varOmega })<\frac{1}{2}H(X_{\varOmega }). \end{aligned}$$

We will end this section by finding a bound on the wrong messages one can expect for an ambiguous code arising from Eq. (6). Let \(p_e\) be the probability of wrongly decoding a given signal—which, as shown by the fact that \(H(X_{\varOmega }|X_S)>0\), will necessary happen. Let \(h(p_e)\) be the following binary entropy:

$$\begin{aligned} h(e)=-p_e\log p_e-(1-p_e)\log (1-p_e), \end{aligned}$$

then, by the so-called Fano’s inequality, we have that:

$$\begin{aligned} p_e\ge \frac{H(X_{\varOmega }|X_S)-h(e)}{\log (n-1)}. \end{aligned}$$

Therefore,

$$\begin{aligned} p_e&\ge \frac{H(X_{\varOmega }|X_S)-h(e)}{\log (n-1)}\nonumber \\ {by\;eq.\;(6)}&= \frac{H(X_S)-h(e)}{\log (n-1)}\nonumber \\ {by\;eq.\;(7)}&= \frac{H(X_{\varOmega })-2h(e)}{2\log (n-1)}. \end{aligned}$$

The above equation can be rewritten, in the paradigmatic case in which

$$\begin{aligned} (\forall m_i)\left( p(m_i)=\frac{1}{n}\right) , \end{aligned}$$

as

$$\begin{aligned} p_e\ge \frac{1}{2}, \end{aligned}$$

since \(H(_{\varOmega })=\log n\). This means that, in this case, at least one message out of two will be wrongly decoded or, in other words, in the kind of ambiguous codes we described and under the condition of equiprobability among the elements of \(\varOmega \), the correct decoding of more than half of the sent messages will depend on some external, additional source of information.

4 Discussion

In this article we have constructed a communicative argument based on fundamental concepts from computation theory and information theory in order to understand the emergence of ambiguity.

We have identified the source of ambiguity in a code with the concept of logical irreversibility in such a way that a code is ambiguous when the coding process performs logically irreversible computations. Since logical irreversibility is not an essential property of computations and a logically reversible machine need not be much more complicated than the logically irreversible machine it simulates, we have inquired into how a coding machine interacts with a decoding machine in an optimal way in order to identify the source of ambiguity. We have quantified the ambiguity of a code in terms of the amount of uncertainty of the reversal of the coding process, and we have subsequently formulated the intuition that coder and decoder cooperate in order to minimize their efforts in terms of a symmetry equation that forces the coder to send only as many bits as the additional bits the decoder needs to perfectly reconstruct the coding process. Given the symmetric behaviour of Shannon information it has been possible to quantify the amount of ambiguity that must emerge from the symmetry equation regardless the presence of noise in the channel: at least half of the information is dissipated during the communicative process if both the coding and the decoding computations are cooperatively minimized. As noted explicitly in “Appendix” section, the presence of ambiguity associated to a computational process realized by a physical system seems as necessary as the generation of heat during a thermodynamical process.

The interest of the symmetry equation from which we derive a certain amount of ambiguity in natural languages is further corroborated in Corominas-Murtra et al. (2011). In this study it is shown that Zipf’s law emerges from two factors: a static symmetry equation that solves the tension between coder and decoder (namely, our symmetry Eq. 6) and the path-dependence of the code evolution through time, which is mathematically stated by imposing a variational principle between successive states of the code (namely, Kullback’s Minimum Discrimination of Information Principle). We thus conclude this study by emphasizing the importance of the symmetry equation for the understanding of how communicative efficiency considerations shape linguistic productions.