Bounded Rationality and Control

Novikov, D. A.

doi:10.1134/S0005117922060145

Bounded Rationality and Control

MATHEMATICAL GAME THEORY AND APPLICATIONS
Published: 05 July 2022

Volume 83, pages 990–1009, (2022)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Automation and Remote Control Aims and scope Submit manuscript

Bounded Rationality and Control

Download PDF

D. A. Novikov¹

145 Accesses
Explore all metrics

Abstract

A rationality-bounding condition is formulated: when jointly solving control, communication, and computing problems (С $ {}^{3} $), an optimal solution (control action) can be impossible to find due to real-time requirements, and almost optimal solutions have to be used instead (the best ones found under the existing constraints on the search procedure). This condition interconnects common concepts in control and optimization such as requisite variety, bounded rationality, analytical complexity, heuristics, and records in real-time optimization, demonstrating their unity and deep relationship. For the institutional control of organizational and technical systems, several applications-relevant problems are solved: minimizing the error or complexity, finding a critical capacity of a communication channel and a critical rate of computing, and determining the maximum number of controlled subsystems.

Control Theory with Information Structures

Information Structures, the Witsenhausen Counterexample, and Communicating Using Actions

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1. INTRODUCTION

Control is understood as influencing a controlled system to ensure its required behavior [19, 27]. For the analysis and design of control systems, it is necessary to have a model of a controlled system that describes its state depending on control actions. An optimal control problem is to find a control action that optimizes a given performance criterion under given constraints. This criterion is defined on the set of controlled system states and control actions. In other words, control relates to optimization.

For solving a corresponding optimization problem, a control system needs to acquire information (about the controlled system, the current situation, etc.), perform calculations, and implement control actions. Each of these stages requires time, particularly due to the existing constraints on the capacity of communication channels, computing resources, etc. Therefore, especially in real time, the control system may fail to find and implement the optimal control action, being forced to use almost optimal, rational solutions. Another approach is to formulate conditions on the existing constraints under which the complete control cycle is possible in real time. (As an example, we mention Data Rate Theorems; see the pioneering works [55, 56] and surveys in the monographs [26, 54], paper [3], and Sec. 3.3 of the book [27].

Within network control, a branch of automatic control, this problem is called С $ {}^{3} $ (control + communication + computing). In studies of organizational systems and decision models (including management), a similar problem, initiated by the Nobel laureate Herbert Simon, is called bounded rationality.

The main purpose of this paper is to establish a qualitative correspondence and common grounds for these two large branches of control theory and apply the proposed approaches to several control problems for organizational and technical systems.

In Sec. 2, approaches to classical and bounded rationality are briefly surveyed. Section 3 presents the key result—a rationality-bounding condition—considering the time (resource) constraints on the decision (optimization) process. In Sec. 4, we pose and solve an institutional control problem in the framework of game-theoretical model of organizational system in which decisions are made (the corresponding optimization problems are solved) by the control system and the controlled system due to the activity and purposeful behavior of the people included in the latter system. Section 5 is devoted to the analytical complexity and error of solving control problems. The rationality-bounding condition is concretized for the institutional control problem under time, resource, and other constraints. Also, several illustrative examples of solving particular problems are given therein. Section 6 considers heuristics and typical solutions. In the Conclusions (Sec. 7), we discuss the main results of the paper.

2. CLASSICAL AND BOUNDED RATIONALITY

The rational behavior of agents is traditionally modeled by their desire to increase the value of some function (utility, payoff, goal, etc.) defined on the set of alternatives (actions) that the agent can choose and situations (external conditions of the agent’s activity).

Consider the case of a single agent. (In one-agent models, the subscript denoting the agent’s number will be omitted.) The agent’s interests are reflected by a real-valued goal function $ f(\theta ,y) $ (or utility function) defined on the Cartesian product of the set $ A $ of admissible actions $ y $ and the set $ \Omega $ of admissible situations $ \theta $ of the agent’s activity: $ y\in A $, $ \theta \in \Omega $, $ f:\Omega \times A\rightarrow \mathfrak {R}^1 $. Then the rational choice set is the set of all actions maximizing the goal function:

$$ C^{0}\big (f(\cdot ,\cdot ),A,\theta \big )=\mathrm {Arg}\thinspace \max \limits _{y\in A} f(\theta ,y). $$

(2.1)

A solution belonging to the set (2.1) is said to be optimal under the situation $ \theta $.

Note that the optimization principle (2.1) of decision-making is essentially deterministic and disregards possible uncertainty (e.g., situational uncertainty): the agent is supposed to be completely aware of the situation at the decision instant. At the same time, models of utility theory historically arose from probabilistic models, in which the agent knows only the probability distribution on the set of possible situations at the decision instant. In this case, it is rational to maximize the expected utility function [15, 28]. The properties of utility functions were considered in a vast literature. Nonlinearity reflects the agent’s attitude to risk (e.g., agents with a concave utility function are risk-averse) [58]. A higher rate of decrease in the negative domain reflects lower desirability of losses for the agent than the same payoff [51], etc. Utility theories and empirical evidence from various fields were surveyed in [38].

The optimization principle (2.1) of decision-making corresponds to the so-called classical rationality or full rationality. H. Simon proposed models of bounded rationality, replacing the agent’s desire to maximize the utility function with the desire to achieve a certain level of utility, perhaps depending on the optimum; see his first paper [63], the book [65], and the subsequent papers [64, 66]. This behavior is due to the following possible limitations:

1.
the agent’s cognitive capabilities:
1. –
  to receive, process, and transmit information (including calculations) in the required time;
2. –
  to handle several goals or criteria;
2.
the available information about the situation and (or) consequences of different decisions.

In other words, bounded rationality occurs when the agent lacks knowledge, time, ability, or desire to choose the optimal decision. Note that the first three “reasons” correspond to the presence of constraints (the deficit or lack of something), and these constraints on the decision process will be considered below. By contrast, the absence of desire to find the optimal decision (the Simon’s original interpretation of bounded rationality as satisfaction with a certain aspiration level) has nothing in common with constraints.

Consider several models of bounded rationality, partially following [18, 22].

We introduce the following assumption on the goal function and admissible sets: let the function $ f(\cdot ,\cdot ) $ be continuous, and let the sets $ A $ and $ \Omega $ be convex and compact. Under these assumptions, the set $ C^{0}(f(\cdot ,\cdot ),A,\theta ) $ is nonempty.

We denote $ y^*(\theta )=\mathrm {arg}\thinspace \max \limits _{y\in A}f(\theta ,y) $. There is a whole family of goal functions with the same maxima set (2.1). According to utility theory, the goal function is defined up to a positive linear transformation: for any number $ a $ and any positive number $ b $, the functions $ u(y) $ and $ g(y)\!=\!a\!+\!bu(y) $ have the same maxima sets, $ C^{0}(u(\cdot )A)=C^{0}(g(\cdot ),A) $. Therefore, for the sake of simplicity, let $ f(\theta ,y^{*}(\theta ))\geq 0 $ $ \forall \theta \in \Omega $.

Following [22], consider three types of bounded rationality (BR).

The first type of BR. Suppose that the agent seeks a certain level of individual utility U (the aspiration level). Then the rational choice set is given by

$$ C^{\mathrm {\thinspace I}}\big (f(\cdot ,\cdot ),A,\theta ,U\big )=\big \{y\in A\mid f(\theta ,y)\geq U\big \}. $$

(2.2)

In a more general case, the aspiration level depends on the set of admissible actions: U(A). Deep results on the relationship between the decision principles (2.1) and (2.2) were obtained in [1].

Digressing from the main subject, we make an important methodological remark. Rational or “irrational” behavior cannot be defined separately from the behavioral model. The agent’s behavioral model includes particular assumptions on corresponding decision procedures and can encapsulate, to varying degrees, information from economics, sociology, cognitive, and other sciences. For example, within the optimization model (2.1), the agent chooses an action maximizing the utility. The choice of any other “nonoptimal” action that satisfies, e.g., the model (2.2) will not be rational within the optimization model. (It can be considered “boundedly rational.”) Conversely, a rational choice within the principle (2.2) will be “boundedly rational” within the principle (2.1). In decision theory, choice theory, and game theory, probably hundreds of decision models have been proposed using the apparatus of utility functions, choice functions, preference relations, etc.; see surveys in the monographs and textbooks [1, 2, 11, 25]. Accordingly, rational or optimal behavior within one model will be “boundedly rational” (or even unacceptable) within other models. Moreover, according to Yu.B. Germeier, it is always possible to select a criterion (construct an appropriate model) in which a given decision will be “optimal” [7]. From a historical point of view, the term “bounded rationality” was introduced to reflect the difference between the decisions based on the principles (2.2) and (2.1).

The second type of BR. Suppose that the agent is ready to lose a fixed value $ \varepsilon \geq 0 $ from the global maximum (the principle of $ \varepsilon $-optimality). In this case, the rational choice set is given by

$$ C^{\thinspace \mathrm {II}}\big (f(\cdot ,\cdot ),A,\theta ,\varepsilon \big )= \Big \{y\in A\mid f(\theta ,y)\geq f\big (\theta ,y^{*}(\theta )\big ) - \varepsilon \Big \}. $$

(2.3)

Note that this method of considering the “insensitivity” and agents’ thresholds for discriminating the consequences of choice is widespread in game-theoretic models. It allows stabilizing the solution with respect to the model parameters [13] by regularizing the optimality criteria. In addition, this type of rational behavior agrees with models of organizational systems under uncertainty (including the uncertainty about the agent’s goals [21]). In a more general statement, the insensitivity threshold may depend on the alternatives being compared, the entire choice set, etc.; see the monograph [31].

The third type of BR. Suppose that the agent is ready to lose no more than a fixed share $ \delta \in (0,1] $ of the maximum payoff. In this case, the rational choice set is given by

$$ C^{\thinspace \mathrm {III}}\big (f(\cdot ,\cdot ),A,\theta ,\delta \big )= \Big \{y\in A\mid f(\theta ,y)\geq (1-\delta )f\big (\theta ,y^{*}(\theta )\big )\Big \}. $$

(2.4)

The three types of bounded rationality cover most of the well-known models of rational and boundedly rational behavior.

Let us characterize the sets (2.2)–(2.4). (Whenever the dependence on the situation is negligible, or the situation is fixed, it will be omitted.) Under the above assumptions, for any $ U\le f(\theta ,y^{*}(\theta )) $, $ \varepsilon \geq 0 $, and $ \delta \in (0,1] $, we have the following properties [22]:

–
$ C^{0}\subseteq C^{\thinspace \mathrm {I}},C^{0}\subseteq C^{\thinspace \mathrm {II}} $, and $ C^{0}\subseteq C^{\thinspace \mathrm {III}} $.
–
For any $ U^{\prime }\geq U $, $ \varepsilon ^{\prime }\geq \varepsilon $, and $ \delta ^{\prime }\geq \delta $:
$$ \begin {aligned} C^{\thinspace \mathrm {I}} \big (f(\cdot ,\cdot ),A,\theta ,U\big )&{}\subseteq C^{\thinspace \mathrm {I}}\big (f(\cdot ,\cdot ),A,\theta ,U^{\prime }\big ), \\ C^{\thinspace \mathrm {II}}\big (f(\cdot ,\cdot ),A,\theta ,\varepsilon \big )&{}\subseteq C^{\thinspace \mathrm {II}}\big (f(\cdot ,\cdot ),A,\theta ,\varepsilon ^{\prime }\big ), \\ C^{\thinspace \mathrm {III}}\big (f(\cdot ,\cdot ),A,\theta ,\delta \big )&{}\subseteq C^{\thinspace \mathrm {III}}\big (f(\cdot ,\cdot ),A,\theta ,\delta ^{\prime }\big )\qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \end {aligned} $$
(monotonicity).
–
$ C^{\thinspace \mathrm {I}}(f(\cdot ,\cdot ),A,\theta ,0)=C^{\thinspace \mathrm {II}} (f(\cdot ,\cdot ),A,\theta ,0)=C^{\thinspace \mathrm {III}}(f(\cdot ,\cdot ),A,\theta ,0) =C^{0}(f(\cdot ,\cdot ),A,\theta ) $.
–
For any admissible value of any parameter ( $ U\le f(\theta ,y^{*}(\theta )) $, $ \varepsilon \geq 0 $, and $ \delta \in (0,1] $), there exist values of the other parameters under which the sets (2.2)–(2.4) coincide.

The latter property expresses, in some sense, the equivalence of the three types of BR. However, a particular type of BR is dictated by the specifics of a given model. (For example, unlike the second and third types, the first one does not require the knowledge of the absolute maximum.)

At the same time, the invariance of the choice set with respect to positive linear transformations of the goal function holds only for some types of bounded rationality. For example, for the first type of BR, the set (2.2) defined for the function $ u(\cdot ) $ will remain the same if we change $ U $ to $ a+bU $ for the function $ g(y)=a+bu(y) $. For the second type of BR, it suffices to change $ \varepsilon $ to $ b\varepsilon $. However, such a general change cannot be found for the third type of BR.

The principles (2.1)–(2.4) correspond to one-time decision-making by the agent. There are many models describing decision-making in dynamics. For example, let the time $ t $ be discrete ( $ t=0,1,2,\dots $), $ A=\mathfrak {R}^1 $, and the situation be stationary. (Therefore, the dependence on it in the dynamic models below will be omitted.) We denote by $ y_t $ the agent’s action in period $ t $. According to the hypothesis of indicator behavior [12, 24], in period $ t $ the agent makes a step $ \gamma _t\in [0,1] $ towards the current goal position $ x_{t-1}^{*} $:

$$ y_t=y_{t-1}+\gamma _t(y_{t-1}^*-y_{t-1}),\quad t=1,2,\dots $$

(2.5)

(If there is only one agent, and the situation is stationary, then the goal position remains the same.)

The satisficing stop rule introduced by H. Simon [63] suggests the following: stop searching when a decision satisfying the aspiration level is found. The agent with the first type of BR will change actions until reaching the aspiration level. Within the procedure (2.5), this principle takes the form

$$ y_t=\begin {cases} y_{t-1}+\gamma _t(y_{t-1}^*-y_{t_-1}) & \text {if } f(y_{t-1})<U \\ y_{t-1}& \text {if } f(y_{t-1})\ge U. \end {cases} $$

(2.6)

By analogy with the expression (2.6), it is possible to write dynamic decision procedures of agents with the second and third types of BR (see also dynamical models in [58]).

Some dynamic models of BR consider constraints on chosen actions, current values of the goal function, etc., depending on the history (the preceding values of the goal function, actions, constraints, aspiration level). In turn, the aspiration level can depend on the history [68].

The concept of an adaptive toolbox was proposed and developed in [35, 46]. According to this concept, there is a modular set of heuristics in the mind of each person: faced with a new situation, each person does not try to optimize the decision but applies the most appropriate heuristic, choosing an acceptable (satisfying) decision; see the satisficing stop rule discussed above.

This concept is close to D. Kahneman’s idea of two cognitive systems [50]: intuition (system 1) and reasoning (system 2). System 1 includes associations, heuristics, i.e., fast (“reflex”) processes with minimum control of consciousness. This system allows quickly making a decision, albeit not optimal in the current situation. Some examples of heuristics are “choose the best known,” “accept the first offer,” “follow the majority opinion,” “divide equally,” and others [46]. System 2 includes logic, computing, and optimization, i.e., slow processes requiring significant cognitive efforts from the agent but yielding a more efficient choice [50, 51].

The role of emotions in the interaction of two systems during decision-making was analyzed in detail in the monograph [41]. The psychological aspects of rationality (including emotions) were considered in [67].

The following are particular cases of bounded rationality:

–
heuristics and typical solutions (see [5, 6] and Sect. 6 below);
–
strategic reflexion, where agents use finite reflexion ranks provided that those of their opponents are also bounded [9, 20, 23, 37];
–
quantal response equilibrium (QRE) and other sociophysical models where the agent’s choice is stochastic but the actions yielding a greater payoff are chosen more often (under given expected payoffs, they implement, e.g., the so-called Gibbs equilibrium).

Examples of bounded rationality in game theory and economic mechanisms can be found in [33, 47, 61]. Also, models of BR are widely used in behavioral economics [42].

The framework of bounded rationality interprets many learning models, e.g., with the following learning process. At each discrete time instant (period), a certain situation is realized, belonging to an a priori known finite set. If some situation $ \theta $ is realized again, the agent makes the optimal decision $ y^*(\theta ) $ in this situation. If some situation is realized for the first time, the agent makes an arbitrary decision from a given set of actions [1, 5]. This decision principle conditionally matches R. Aumann’s model: in [34], he distinguished between optimal decision-making in the current situation (act-rationality) and decision-making by predetermined rules (rule-rationality).

H. Simon emphasized the rationality of choice and the rationality of procedures: “we must give an account not only of substantive rationality—the extent to which appropriate courses of action are chosen—but also procedural rationality—the effectiveness, in light of human cognitive powers and limitations, of the procedures used to choose actions.” [66].

In addition to bounded rationality (in Simon’s narrow sense), other close types of rationality were proposed and considered in the literature:

minimal rationality :: (when making a decision, the agent analyzes several, albeit not necessarily all, admissible [39, p. 166]);
ecological rationality :: (the agent’s decisions are largely determined by the experience of interaction with the environment [45]);
computational rationality :: (the emphasis is on the computational complexity and intensiveness of decision processes [32, 44, 48, 52, 53]).

Also, researchers distinguish behavioral and epistemic rationality [40, 62]. (The latter type is focused on forming a logically adequate system of agent’s beliefs.)

Recently, the problem of bounded rationality has attracted increasing interest among experts in artificial intelligence; see the survey in [10].

We refer to the thesis [30] for a fairly complete description of various concepts of rationality, including the problem statement on considering constraints on memory capacity and the number of computational operations per unit time.

Nowadays, there are dozens of models of bounded rationality with various mathematical apparatuses. For example, the paper [43] considered the following binary choice model: an agent receives dynamic information (a Wiener process) about the payoff in a particular outcome and selects an appropriate instant for decision-making (the optimal stopping time problem).

Note the especially large variety of ways to consider possible uncertainty (as a rule, about the situation). For example, the paper [48] described a transformation of probability distributions on the set of actions (a mixed choice strategy) that increases the uncertainty by the Pigou–Dalton principle. Reducing the uncertainty requires costs, and the problem is that of choosing a distribution with the maximum expected utility under bounded costs, or minimizing the costs of achieving a given value of the expected utility. In this case, the optimal mixed strategy is the Gibbs distribution [48], and $ \varepsilon $-optimality under info-communication constraints corresponds to the optimization of $ \varepsilon $-entropy [36].

Now we discuss the interconnection between the С $ {}^3 $ problems of control theory and the concept of bounded rationality in decision theory.

3. RATIONALITY-BOUNDING CONDITION

Consider a control system in Fig. 1. Note that in the case of automatic control, the controlled and control systems are both technical objects. In organizational, social, economic, and other such systems, the control system (and, often, the controlled one) is an agent making decisions independently; see the Introduction and Sec. 1.1 in the book [27]. We introduce the following notation:

–
$ t_1 $ is the time required for encoding, sampling, transmitting, decoding, and recovering information about the controlled system and, possibly, about the situation;
–
$ t_2 $ is the time required for finding control actions (information processing, including computing and choice);
–
$ t_3 $ is the time required for encoding, sampling, transmitting, decoding, and recovering information about the control actions;
–
$ T $ is the directive time for implementing the control action or the characteristic time of the controlled system’s dynamics.

These times must satisfy a fairly obvious and intuitive condition further called the rationality-bounding condition (Fig. 1).

$$ t_1+t_2+t_3\le T. $$

(3.1)

In the terminology of Sec. 2, adding the constraint (3.1) changes the Principal’s decision model. Therefore, decision-making under this constraint can be considered “boundedly rational” compared to the optimization behavior without such constraints. In other words, the rationality-bounding condition (3.1) supplements the optimization principle of decision-making with time (resource) constraints and reflects the following:

–
the constraints considered in known models of bounded rationality (cognitive, computational, informational, communication, etc.; see Sec. 2) and, in particular, the method of heuristics (typical solutions) presented in Sec. 6;
–
Ashby’s law of requisite variety [29] stating that the control system’s variety must exceed the controlled system’s variety; in other words, the control system must be adequate to the controlled system and the conditions (situation) in which the latter operates (by the response speed, computing power, and other capabilities);
–
the joint solution of the C $ {}^{3} $ problems (control, communication, computing); see surveys in [1, 19, 26, 27].
–
the computational complexity of finding equilibria in algorithmic and computational game theory [32, 53] and organizational control mechanisms [14, 17, 57] (in part, even more broadly, in distributed optimization; see the survey and discussion in [19, 27]);
–
the principle of the record (best current solution) in real-time optimization, stating that a solution implemented at the directive time has the maximum efficiency among all the solutions analyzed by this time [60]. (The search process based on the satisficing stop rule (2.6) terminates when an acceptable value of the goal function is achieved. According to the principle of the best current solution, the stopping moment is a priori fixed.)

Below, condition (3.1) is illustrated by an example of the institutional control of an organizational and technical system. In the next section, we formulate and solve the institutional control problem; in Sec. 5, we concretize the rationality-bounding condition for this problem.

4. ACTIVITY NORMS: CONTROL PROBLEM

Consider an organizational and technical system (OTS) composed of $ n $ agents. They chooses actions $ y_i\in A_i $ from compact sets $ A_i $ and have continuous goal functions $ f_i(\theta ,y) $, where $ \theta \in \Omega $ is the situation, $ y=(y_1,y_2,\dots ,y_n)\in A^{\prime }=\prod \limits _{i\in N}A_i $, $ i\in N $, and $ N=\{1,2,\dots ,n\} $ denotes the set of agents.

Recall that institutional control is the control of constraints and activity norms [21].

An activity norm is a mapping $ \aleph :\Omega \rightarrow A^{\prime } $ of the set of admissible situations into the set of admissible vectors of agents’ actions [18]. For the sake of simplicity, this mapping will be assumed to be bijective. In a practical interpretation, the $ i $th component of the vector function $ \aleph (\cdot ) $ determines the action expected from agent $ i $ by the other agents and the Principal. The norm can be interpreted as a heuristic (Sec. 2) or a typical solution (Sec. 6).

Let the Principal’s preferences be defined on the set of situations, activity norms, and agents’ actions: $ \Phi (\theta ,\aleph (\cdot ),y) $. Supposing that the agents follow the established norms, we denote by $ K(\aleph (\cdot ))=F_\theta (\Phi (\theta ,\aleph (\cdot ),\aleph (\theta ))) $ the efficiency of the institutional control $ \aleph (\cdot ) $, where $ F_\theta (\cdot ) $ is an uncertainty elimination operator. Depending on the Principal’s awareness, this operator can be the guaranteed result on the set $ \Omega $, the mathematical expectation with a known probability distribution $ p(\theta ) $ on the set $ \Omega $, etc.

An institutional control problem under constraints $ M_\aleph $ on the activity norms has the following statement: choose an admissible norm $ \aleph ^*(\cdot )\in M_\aleph $ with the maximum efficiency [18],

$$ \aleph ^*(\cdot )=\mathrm {arg}\thinspace \max \limits _{\aleph (\cdot )\in M_\aleph } K\big (\aleph (\cdot )\big ), $$

(4.1)

provided that the agents follow the established activity norms.

The last condition needs clarification. Since each agent is active and independent, the agent’s choice will coincide with the norm only when this agent benefits from it. Let us explain the concept of benefit.

By analogy with the models of bounded rationality (2.2)–(2.4), we define the parametric Nash equilibrium (4.2) and rational behavior (4.3)–(4.5) for each of the three types of bounded rationality:

$$ E_N^0(\theta )=\big \{x\in A^{\prime }\mid \forall i\in N, \forall y_i\in A_i\; f_i(\theta ,x)\geq f_i(\theta ,x_{-i},y_i)\big \}, $$

(4.2)

$$ E_N^1(\theta ,\bar {U})=\big \{x\in A^{\prime }\mid \forall i\in N\; f_i(\theta ,x)\geq \bar {U}_i\big \}, $$

(4.3)

$$ E_N^2(\theta ,\varepsilon )=\big \{x\in A^{\prime }\mid \forall i\in N, \forall y_i\in A_i\; f_i(\theta ,x)\geq f_i(\theta ,x_{-i},y_i)-\varepsilon _i\big \}, $$

(4.4)

$$ E_N^3(\theta ,\delta )=\big \{x\in A^{\prime }\mid \forall i\in N, \forall y_i\in A_i\; f_i(\theta ,x)\geq (1-\delta _i)\; f_i(\theta ,x_{-i},y_i)\big \}. $$

(4.5)

An activity norm $ \aleph (\cdot ) $ is said to be compatible with the $ j $th type of bounded rationality [18], $ j={0,\dots ,3} $, if

$$ \forall \theta \in \Omega \; E_N^j(\theta )\cap \aleph (\theta )\neq \emptyset . $$

(4.6)

Condition (4.6) can be interpreted as follows: an activity norm implements an equilibrium if for any situation, the normal choice will not contradict the rational behavior of the agents, i.e., gives them a corresponding payoff and (or) makes a unilateral deviation from the norm unbeneficial (in the case of Nash equilibrium).

Since $ \aleph (\cdot ) $ is a bijective mapping, the Principal’s choice of a compatible activity norm can be treated as a narrowing of the set of equilibria (e.g., a hint about the existence of a focal point, etc.) Hence, control of activity norms can be viewed as the problem of implementing the correspondence of group choice, in which $ \theta \in \Omega $ is the vector of agents’ characteristics.

We denote by $ y_{-i}=(y_1,\dots ,y_{i-1},y_{i+1},\dots ,y_n)\in A_{-i}=\prod \limits _{j\ne i}A_j $ the opponents’ action profile for agent $ i $, $ i\in N $.

Conditions (4.2) and (4.6) can be combined as follows: an activity norm $ \aleph (\cdot ) $ is compatible if and only if

$$ \forall \theta \in \Omega , \forall i\in N, \forall y_i\in A_i\; f_i(\theta ,\aleph (\theta ))\geq f_i(\theta ,\aleph _{-1}(\theta ),y_i). $$

(4.7)

According to condition (4.7), an activity norm is compatible with the agents’ interests if for any opponents’ action profile, each agent benefits from following this norm provided that the other agents also follow it. Conditions (4.3)–(4.5) can be written similar to condition (4.7).

An activity norm $ \aleph (\cdot ) $ is said to be strongly compatible if the normal choice is the dominant strategy of each agent:

$$ \forall \theta \in \Omega , \forall i\in N, \forall y_{-i}\in A_{-i}, \forall y_i\in A_i\; f_i\big (\theta ,y_{-i},\aleph _i(\theta )\big )\geq f_i(\theta ,y_{-i},y_i). $$

(4.8)

Conditions (4.3)–(4.5) can be written similar to condition (4.8).

What awareness should the agents have for a compatible activity norm to exist? Clearly, the game conditions—the set of agents, their goal functions and admissible sets, the activity norm, and the realized situation (action profile) should be common knowledge [23]. (Recall that in game theory, common knowledge is a fact such that: (a) all players know it; (b) all players know (a); (c) all players know (b), . . . This chain is generally infinite.)

Indeed, to calculate a parametric Nash equilibrium or a dominant strategy equilibrium (DSE) under the existing activity norms, each agent must be sure that the other agents will calculate the same equilibrium as this agent does. For this purpose, each agent must put oneself in place of the other agents simulating the former’s behavior, etc. One way to create common knowledge is to inform all agents brought together. Modern firms pay so much attention to developing corporate culture, corporate standards of conduct, etc., through informal communication between employees, loyalty to the company, etc. Modern firms thereby create the impression and beliefs that the employees belong to a common venture, share common values, etc. All these components are necessary for the existence of common knowledge.

Thus, we will understand the institutional control [18] of activity norms as problem (4.1), (4.6) of finding a norm with the maximum efficiency on the set of admissible and compatible ones.

We denote by $ S_\aleph $ the set of activity norms (all possible mappings $ \aleph :\Omega \rightarrow A^{\prime } $) satisfying condition (4.6). Then the problem of compatible institutional control can be written as follows:

$$ K\big (\aleph (\cdot )\big )\rightarrow \max \limits _\aleph (\cdot )\in M_\aleph \bigcap S_\aleph . $$

(4.9)

That is, the control problem of activity norms is solved in the following stages:

1.
Find the set $ S_\aleph $ of compatible norms.
2.
Find the set $ S_\aleph \cap M_\aleph $ of norms that are simultaneously compatible and admissible.
3.
Choose from this set a norm with the maximum efficiency from the Principal’s viewpoint. The first stage in solving problem (4.9) is an incentive-compatible control problem [21]. This problem has a high computational complexity because the desired variable is the mapping $ \aleph : \Omega \rightarrow A^{\prime } $. We will examine it in greater detail.

Let institutional control be used jointly with motivational control [18, 21]. The goal function of agent $ i $ has the form

$$ G_i(\theta ,y,\sigma _i)= f_i(\theta ,y)+\sigma _i\big (\theta ,\aleph (\cdot ),y\big ),\quad y\in A,\quad i\in N, $$

(4.10)

where $ \sigma _i:\Omega \times M_\aleph \times A^{\prime }\rightarrow \mathfrak {R}_{+}^1 $ is the incentive function (scheme) of agent $ i $. Compatibility conditions for activity norms were established in [18]. A strong compatibility condition is given below.

Proposition 1.

(a)
Under the Principal’s motivational control
$$ \sigma _i(\theta ,\aleph (\cdot ),y)= \begin {cases} s_i\big (\theta ,\aleph (\cdot ),y_{-i}\big ), & y_i =\aleph _i(\theta ) \\ 0, &y_i\ne \aleph _i(\theta ), \end {cases} \quad i\in N, $$
(4.11)
where
$$ s_i\big (\theta ,\aleph (\cdot ),y_{-i}\big ) = \max \limits _{y_i\in A_i} f_i(\theta ,y_{-i},y_i)- f_i\big (\theta ,y_{-i},\aleph _i(\theta )\big )+ \mu _i,\quad i\in N, $$
(4.12)
and $ \mu _i>0 $ , $ i\in N $ , is an arbitrarily small and strictly positive constant, the activity norm $ \aleph (\cdot ) $ is strongly compatible. Moreover, the normal actions are a unique DSE in the agents’ game.
(b)
If $ \mu _i=0 $ , there exists no other nonnegative motivational control implementing $ \aleph (\cdot ) $ as a DSE in the agents’ game with strictly smaller Principal’s control costs.

Proof of Proposition 1. We fix an arbitrary situation $ \theta \in \Omega $, an arbitrary agent $ i\in N $, an arbitrary action $ y_i\in A_i $ such that $ y_i\neq \aleph _i(\theta ) $, and an arbitrary opponents’ action profile $ y_{-i}\in A_{-i} $ for this agent. Substituting the incentive scheme (4.12) into condition (4.8) for the goal function (4.10), we obtain

$$ f_i\big (\theta ,y_{-i},\aleph _i(\theta )\big )+\max \limits _{y_i\in A_i} f_i(\theta ,y_{-i},y)- f_i\big (\theta ,y_{-i},\aleph _i(\theta )\big )+ \mu _i\geq f_i(\theta ,y_{-i},y_i)+0. $$

Hence, $ \max \limits _{y_i\in A_i}f_i(\theta ,y_{-i},y)\geq f_i(\theta ,y_{-i},y_i)-\mu _i $, which always holds. Item (a) is proved.

For proving item (b), assume on the contrary that there exists an incentive scheme $ \hat {\sigma }(\theta ,\aleph (\cdot ),y) $ implementing $ \aleph (\cdot ) $ as a DSE in the agents’ game with strictly smaller Principal’s control costs. Due to condition (4.8), $ \forall \theta \in \Omega $, $ \forall i\in N $, $ \forall y_{-i}\in A_{-i} $, $ \forall y_i\in A_i $

$$ f_i\big (\theta ,y_{-i},\aleph _i(\theta )\big )+\hat {\sigma }_i\big (\theta ,\aleph _i(\theta ),y_{-i}\big )\geq f_i(\theta ,y_{-i},y_i)+\hat {\sigma }_i(\theta ,y_i, y_{-i}). $$

(4.13)

We fix an arbitrary situation $ \theta \in \Omega $, an arbitrary agent $ i\in N $, and an arbitrary opponents’ action profile $ y_{-i}\in A_{-i} $ for this agent. Inequality (4.13) holds for any action of agent $ i $, and the incentive scheme in its right-hand side is nonnegative. Therefore,

$$ \hat {\sigma }_i\big (\theta ,\aleph _i(\theta ),y_{-i}\big )\geq \max \limits _{y_i\in A_i} f_i(\theta ,y_{-i},y_i)- f_i\big (\theta ,y_{-i},\aleph _i(\theta )\big ). $$

Due to (4.12), in the case of $ \mu _i=0 $, we have $ \hat {\sigma }_i(\theta ,\aleph _i(\theta ),y_{-i})\geq s_i(\theta ,\aleph (\cdot ),y_{-i}) $, which contradicts the initial assumption. The proof of Proposition 1 is complete. $ \quad \blacksquare $

According to item (b) of Proposition 1, the expression (4.12) characterizes the minimum Principal’s control costs to motivate agent $ i $ follow the activity norm $ \aleph (\cdot ) $. If all agents choose the normal actions, summing the expressions (4.12) over all agents yields

$$ c\big (\theta ,\aleph (\cdot )\big )=\sum _{i\in N}\max \limits _{y_i\in A_i} f_i\big (\theta ,\aleph _{-i}(\theta ),y_i\big )-\sum _{i\in N}f_i\big (\theta ,\aleph (\theta )\big ). $$

(4.14)

This value is the minimum Principal’s costs for strongly compatible (joint institutional and motivational) control. Note that these costs are the same under compatibility and strong compatibility: both suppose that all agents follow the norms. Let the Principal’s goal function $ \Phi (\theta ,\aleph (\cdot ),y) $ be the difference between the income $ D(y) $ and control costs $ c(\theta ,\aleph (\cdot )) $. Due to the compatibility of control, we obtain

$$ \Phi \big (\theta ,\aleph (\cdot )\big )=D\big (\theta ,\aleph (\theta )\big )-c\big (\theta ,\aleph (\cdot )\big ). $$

(4.15)

In this case, the efficiency of the institutional control $ \aleph (\cdot ) $ can be defined as

$$ K\big (\aleph (\cdot )\big )=F_\theta \Big (D\big (\theta ,\aleph (\theta )\big )-c\big (\theta ,\aleph (\cdot )\big )\Big ), $$

where $ F_\theta (\cdot ) $ is an uncertainty elimination operator. (The uncertainty should be eliminated if the Principal does not know the realized situation at the decision instant.)

The strongly compatible institutional control problem

$$ F_\theta (D(\theta ,\aleph (\theta ))-c(\theta ,\aleph (\cdot )))\rightarrow \max \limits _{\aleph (\cdot )\in M_\aleph } $$

(4.16)

differs from problem (4.9): maximization is performed over the set of all admissible activity norms, and the compatibility condition is embedded in the maximand. Since the activity norm is assumed to be a bijective mapping, it suffices to use motivational control with a flexible plan [18]. (A flexible plan is a plan that depends on the situation; see formula (4.17) below.) In other words, there exists a motivational control of no less efficiency for any institutional control in this model. Meanwhile, calculating motivational control is much simpler than institutional control: in the former case, maximization is performed over the set of agents’ actions; in the latter case, over the set of mappings.

If the Principal knows the realized situation at the decision instant, the strongly compatible institutional control problem takes the following form: find an optimal vector of compatible plans

$$ x(\theta )=\mathrm {arg}\thinspace \max \limits _{x\in A^{\prime }} \big [D(\theta ,x)-c(\theta ,x)\big ] $$

(4.17)

considering the Principal’s control costs. In this case, the Principal’s payoff is

$$ \Phi _0(\theta )=D\big (\theta ,x(\theta )\big )-c\big (\theta ,x(\theta )\big ). $$

5. ANALYTICAL COMPLEXITY AND ERROR OF SOLUTIONS

Consider the institutional control problem of the previous section under the assumptions that the sets $ \{A_i\}_{i\in N} $ and $ \Omega $ are intervals on $ \mathfrak {R}_1^+ $ and the agent’s real-valued goal function $ f(\theta ,y) $ is Lipschitz with a Lipschitz constant l. (For the sake of definiteness, the $ l_\infty $-norm will be used below.) Note that if a function satisfies the Lipschitz condition, it is uniformly continuous and hence continuous on the whole domain. In turn, this property implies that the functions of maximum and minimum are also continuous and Lipschitz (e.g., see Theorem 1.1.4 in [16]).

Recall that an oracle is a source of information about the values of a function (zero order), its gradient (first order), etc. [16]. We denote by $ A|_h $ the set of points from $ A $ on a uniform grid with a step $ h\ll ~1 $. Let the function $ f(\cdot ,\cdot ) $ be defined in tabular form using uniform grids with steps $ H $ and $ h $ in the first and second variables, respectively (the zero-order oracle). This definition corresponds to a uniform search. In this case, the analytical complexity $ W_y $ of calculating the rational choice set (2.1) (the agent’s decision model) $ \mathrm {Arg}\thinspace \max \limits _{y\in A}f(\theta ,y) $ has order $ \frac {|A|}{h} $, i.e., $ W_y\sim \Theta (\frac {|A|}{h}) $, where $ |A| $ is the length of the interval $ A $. This model yields the maximum value of the agent’s goal function with the error (accuracy) $ \Delta _f\approx \frac {l\; h}{2} $; see the general results in [16] and the analytical complexity of solving control problems in organizational and technical systems in [17]. We can use the methods described in [6] for an incompletely known goal function or interval optimization methods [7, 49] for a goal function with uncertain coefficients. Note that choosing $ \mu =lh/2 $ in the expression (4.11) compensates the unknown values of the agent’s goal function outside the grid nodes.

An error in calculating the optimum corresponds to the second and third types of bounded rationality. Indeed, for the grid step $ h=\varepsilon /l $ or $ h=\delta f(\theta ,y^*(\theta ))/l $, the agent behaves in accordance with the expressions (2.3) or (2.4).

We emphasize that:

1.
Under sequential optimization (e.g., summation of maximums or minimums), the corresponding complexities and errors are added to each other.
2.
Under recursive calculation of maximums/minimums of a certain function, the corresponding complexities are multiplied by each other.

The complexity W of calculating the agent’s rational choice $ y^*(\theta ) $ depending on the situation has the order $ \frac {|A|}{h} \frac {|\Omega |}{H} $, i.e., $ W\sim \Theta (\frac {|A|}{h}\frac {|\Omega |}{H} $).

Let the function $ D(\cdot ,\cdot ) $ be Lipschitz with a Lipschitz constant $ L $ defined in tabular form on a uniform grid with steps $ H $ and $ h $ in the first and second variables, respectively. Assume that the Principal knows the realized situation at the decision instant. Due to (4.11) and (4.15), calculating the Principal’s goal function

$$ \Phi _0(\theta )=\max \limits _{x\in A^{\prime }|_h}\left [D(\theta ,x)+\sum _{i\in N} f_i(\theta ,x)-\sum _{i\in N}\max \limits _{y_i\in A_{i}|_h} f_i(\theta ,x_{-i},y_i)-\sum _{i\in N}\mu _i\right ] $$

(5.1)

for all situations has the analytical complexity

$$ W_x\sim \Theta \left (\frac {|\Omega |}{H} \left (2n+\sum _{i=1}^n\frac {|A_i|}{h}\right ) \prod _{j=1}^n\frac {|A_j|}{h}\right ) $$

(5.2)

and the error

$$ \Delta \approx \frac {LH\sqrt {n}+lhn\left [\sqrt {n}+\sqrt {n-1}\thinspace \right ]}{2}. $$

(5.3)

The set of admissible actions may generally have different grid steps for different agents, and their Lipschitz constants may also differ.

Assume that the variation [ $ \max \limits _{\theta \in \Omega }\Phi _0(\theta )-\min \limits _{\theta \in \Omega } \Phi _0(\theta )] $ of the values of the Principal’s goal function depending on the situation has an a priori accurate estimate of the order $ \Delta $. Then a rational approach is to replace the flexible (situation-dependent) plan (4.17) with the following one (not depending on the situation):

$$ x^*=\mathrm {arg}\thinspace \max \limits _{x\in A^{\prime }|_h} \min \limits _{\theta \in \Omega |_H} \left [D(\theta ,x)+\sum _{i\in N}f_i(\theta ,x)- \sum _{i\in N}\max \limits _{y_i\in A_i|_h} f_i(\theta ,x_{-i},y_i)-\sum _{i\in N}\mu _i\right ]. $$

(5.4)

(The Principal has to use the same strategy if the realized situation is unknown at the decision instant (plan assignment).)

Using the expressions (5.2) and (5.3) for the complexity and error of Principal’s decision-making, we can formulate different problems with constraints on the Principal’s cognitive, communication, computational, and other capabilities; see the general rationality-bounding condition (3.1).

Consider the sequence of moves in the system (stages 1–6) and the awareness of its participants (also, see Fig. 1):

1.
The Principal receives information about the agents’ goal functions (their values at $ n\frac {|\Omega |}{H}\prod \limits _{j=1}^n\frac {|A_{j}|}{h} $ points).
2.
The Principal solves problem (4.11), (5.1).
3.
The Principal and agents receive information about the realized situation $ \theta \in \Omega $.
4.
The Principal informs the agents of the incentive scheme (4.11) with $ \aleph _i(\theta )=x_i(\theta ) $, $ i\in N $. (It suffices to transmit only 2n values: for each agent, the plan and the reward for plan fulfillment. Therefore, the incentive scheme (4.11) is very “economical” in terms of information: it requires transmitting two values (the plan and reward) instead of transmitting a function of the situation $ \theta $ and the vector $ y $.)
5.
The agents simultaneously and independently choose their actions (the plans due to Proposition 1) and receive their rewards.
6.
The Principal receives information about the actions of all agents ( $ n $ values).

Now we concretize the general condition (3.1) for this institutional control problem.

The Principal must receive or transmit $ 1+3n+n\frac {|\Omega |}{H}\prod \limits _{j=1}^{n}\frac {|A_{j}|}{h} $ values through an available communication channel; see stages 1, 3, 4, and 6. Let the information be transmitted through this channel without coding, delays, and distortions. We denote by $ R $ the channel capacity in bits per time step. Hence

$$ t_1+t_3=\frac {1}{R}\log _2\left (1+3n+n\frac {|\Omega |}{H}\prod _{j=1}^{n}\frac {|A_{j}|}{h}\right ). $$

Generally speaking, an incentive scheme is defined by a value set on the entire grid of admissible actions.

We fix a time interval of a length $ T $. Let $ \tau $ be the time for calling an oracle (the rate of computing). Under the analytical complexity $ W $ of this problem, the total “computation time” is $ t_2=W\tau $. Thus, we have established the following fact.

Proposition 2.

For the institutional control problem, the rationality-bounding condition (3.1) takes the form

$$ \frac {1}{R}\log _2\left (1+3n+n\frac {|\Omega |}{H}\prod _{j=1}^{n} \frac {|A_{j}|}{h}\right )+ \left (\frac {|\Omega |}{H}\left (2n+\sum _{i=1}^{n}\frac {|A_{i}|}{h} \right )\prod _{j=1}^{n}\frac {|A_{j}|}{h}\right )\tau \le T. $$

(5.5)

Condition (5.5) rests on the assumption that the values of the goal function are known at the grid nodes only. Additional information about the goal function gradients (the first-order oracles [16] if available) can be used for reformulating condition (5.5).

The first and second terms in (5.5) can be conventionally called the communication and computational constraints, respectively. In the case $ h=H=1/x $, the communication constraint has the order $ n\ln (x) $ and the computational constraint the order $ n(x)^n $. In other words, the computational costs grow much faster than the communication ones with increasing the number of agents or decreasing the grid step.

Thus, the rationality-bounding condition (5.5) can be interpreted in the institutional control problem as:

–
A real-time constraint (all calculations and information processing must be completed by time $ T $).
–
A constraint on the Principal’s cognitive capabilities (one elementary operation requires $ \tau $ units of time).
–
A capacity constraint on the Principal’s communication channels.

(Also, see the paper [17].)

In practice, condition (5.5) means that the sum of times for receiving, processing, and transmitting information by the Principal must not exceed the directive time for making the decision (implementing the control action).

Several problem statements are possible:

1.
Error minimization: choosing appropriate grid steps h and H, minimize the error (5.3) subject to the complexity constraint (5.5).
2.
Complexity minimization: minimize the complexity (5.2) subject to the error constraint
$$ \Delta _\Phi \le \Delta * $$
(5.6)
(see the second and third types of bounded rationality considered above).
3.
The critical channel capacity or the critical rate of computing: find the minimum value $ R(\tau ) $, respectively) that satisfies (5.5) under constraints on the grid steps or errors.
4.
The maximum number of controlled subsystems (subordinates in organizational and technical systems): find the maximum natural number $ n $ such that inequalities (5.5) and (5.6) hold simultaneously.

Consider the limit case: the Principal with infinitely large capabilities to handle information (arbitrarily fast computing, $ \tau =0 $) and infinitely large communication channels ( $ R=+infty $). In this case, the constraint (5.5) becomes insignificant: the Principal can surely find the optimal solution. In all other cases (with constraints), the Principal makes a boundedly rational decision.

Example 1 (boundedly rational control).

Let $ A_i=\Omega =[0,1] $. From (5.5) it follows that

$$ \frac {1}{R}\log _2\left (1+3n+\frac {n}{Hh^n}\right )+\frac {n}{Hh^n}\left (2+\frac {1}{h}\right )\tau \le T. $$

(5.7)

The error minimization problem (finding a control action with the maximum efficiency (minimum error) under information handling constraints takes the form:

$$ LH\sqrt {n}+lhn\left [\sqrt {n}+\sqrt {n-1}\thinspace \right ]\to \min \limits _{h,H} $$

(5.8)

subject to the constraint (5.7). Note that the goal function (5.8) is linear.

In the optimal solution, the constraint (5.7) holds as equality. Neglecting the linear terms and assuming that H $ \mathrm {\ll } $ 1 and h $ \mathrm {\ll } $ 1, we write it in the approximate form

$$ \frac {1}{R}\log _2\left (\frac {n}{Hh^n}\right )+\frac {\tau n}{Hh^{n+1}}=T. $$

Consider two special cases.

Case 1 :

(only communication constraints, $ \tau =0 $). From the expression (5.7) we find

$$ H=\frac {n}{h^{n} \left (2^{TR} -3n-1\right )}. $$

(5.9)

We substitute (5.9) into (5.8) and use the first-order optimality conditions in the variable $ h $ to obtain

$$ h=\left [\frac {Ln\sqrt {n}}{l\left (\sqrt {n}+\sqrt {n-1}\thinspace \right ) \left (2^{TR}-3n-1\right )}\right ]^{\frac {1}{n+1}} . $$

(5.10)

For example, for $ L=l=1 $, $ n=1 $, $ T=10^{-5} $ s, and $ R=10^{6} $ bits/s, formula (5.10) gives h $ \mathrm {\approx } $ 0.0313. Substituting this result into (5.9), we have $ H=0.0313 $.

Let the grid steps be given: $ h=H=0.001 $. Then inequality (5.7) yields the following lower bound on the channel capacity:

$$ R\ge \frac {1}{T}\log _2\left (1+3n+\frac {n}{Hh^{n}}\right )\approx 3\times 10^4\text { bits/s}. $$

Case 2 :

(only computational constraints, $ R=+\infty $). From the expression (5.7) we find

$$ H=\frac {n}{h^{n}}\left (2+\frac {1}{h}\right )\frac {\tau }{T}. $$

(5.11)

We substitute (5.11) into (5.8) and use the first-order optimality conditions in the variable $ h $ to obtain the algebraic equation

$$ \frac {1}{h^{n} } \left (2+\frac {1}{h} \right )=\frac {T\; l}{\tau \; L} \frac {\sqrt {n} \; {\rm +}\; \sqrt {n-1} }{n\sqrt {n} } . $$

(5.12)

For example, for $ L=l=1 $, $ n=1 $, $ T=10^{-5} $ s, and $ \tau =10^{-8} $ s, formula (5.12) gives $ h\approx 0.0326 $. Substituting this result into (5.11), we have $ H\approx 1 $.

Let the grid steps be given: $ h=H=0.1 $. Then inequality (5.7) yields the following upper bound on the rate of computing: $ \tau \le 8.3 $ ns.

Example 2 (the maximum norm of controllability).

Let the model parameters have the following values: the grid steps $ h=H=0.01 $, the channel capacity $ R=10^{6} $ bits/s, the rate of computing $ \tau =10^{-9} $ s, and the directive time $ T=0.1 $ s. Then inequality (5.7) yields the following upper bound on the number of agents controlled by the Principal: $ n\le 5 $.

We have considered a “static” model with a single interaction between the Principal and agents; see stages 1–6 above. If this interaction occurs many times (repeating sequentially in time) and (or) the controlled system has dynamics, the model can be generalized using the framework of repeated or differential games with control constraints under discrete states (controls) and limited capacities of communication channels; see [1, 26, 54,55,56].

6. TYPICAL SOLUTIONS

Let us analyze the agent’s decision problem, in which the agent chooses an appropriate action to maximize the goal function. As was mentioned above, the search for the agent’s best action $ y^*(\theta ) $ in a situation $ \theta $ has the complexity $ W_y\sim \Theta (\frac {|A|}{h} $). For any fixed situation, this search yields the maximum of the agent’s goal function with the error $ \Delta _f\approx \frac {lh}{2} $.

For the sake of simplicity, let the goal functions be such that the rational choice set is a singleton. If $ \theta \in [0,1]|_H $, we find the agent’s action maximizing the goal function for each of $ 1/H $ situations. The result is the ( $ 1+[1/H] $)-dimensional vector $ y^* $. If $ A=[0,1]|_h $, the complexity $ W $ of calculating the agent’s rational choice $ y^*(\theta ) $ depending on the situation has the order $ \frac {1}{hH} $, i.e., $ W\sim \Theta (\frac {1}{hH} $), and the error of each component is $ \Delta _f $.

We denote by $ \{Q_i\} $, $ i={1,\dots ,k} $, the partition of the unit interval into k connected sets. Let $ y^{\prime }_i $ be the solution of the problem

$$ \min \limits _{\theta \in Q_i|_H} f(y,\theta ) \to \max \limits _{y\in [0;1]|_h},\quad i={1,\dots ,k}. $$

(6.1)

The $ k $-dimensional vector $ y^{\prime } $ representing the solution of (6.1) was called the typical solution [5]. (Also, see the unified solutions of control problems for organizational systems in [21].) Typical solutions [5, 6] suggest the following: the complete set of decision situations (here, the unit interval or $ 1/H $ situations for the discrete set $ \Omega $) is replaced by $ k $ typical situations in which the agent should use heuristics, i.e., choose corresponding typical solutions.

For making such typical decisions, the agent has to diagnose the current situation: identify to which of the sets $ \{Q_i\} $, $ i={1,\dots ,k} $, the value $ \theta $ belongs. (When solving this problem without any errors, the complexity of implementing the typical solution can be estimated as $ \sim \theta (k) $.) Then the agent has to choose the corresponding component of the vector $ y^{\prime } $ as the action [5]. If $ k\ll \frac {1}{hH} $, then the complexity of a typical solution is much lower compared to the complete solution of the agent’s decision problem.

The error of a typical solution (formula (6.1)) is determined by $ \Delta ^{\prime }\approx \frac {l}{2}\max \{h,H\} $; for a given partition $ \{Q_i\} $, $ i={1,\dots ,k} $, it has the analytical complexity $ W^{\prime }\sim \theta (\frac {k}{hH} $).

However, a typical solution should be characterized not in terms of errors but the price of standardization, i.e., the goal function losses due to replacing the complete solution with a typical one:

$$ \Delta ^{\prime \prime }\big (\{Q_i\}, i={1,\dots ,k}\big )=\max \limits _{i={1,\dots ,k}}\; \max \limits _{\theta \in Q_i}\;\left [\;\max \limits _{y\in [0;1]}f(y,\theta )-f(y^{\prime }_i,\theta )\right ]\approx l\max \limits _{i={1,\dots ,k}}\mathrm {diam}\thinspace Q_i. $$

(6.2)

The considerations above have proceeded from the hypothesis that the partition $ \{Q_i\} $, $ i\!=\!{1,\dots ,k} $, is given. Generally speaking, typical solution design includes two steps: (2.1) find an optimal number $ k $ of typical situations (taking into account the DM’s cognitive capabilities and other constraints); (2.2) calculate an optimal partition minimizing the price of standardization (6.2). The analytical complexity of the latter problem can be very high, especially in the case of multidimensional sets of admissible actions and decision situations.

Thus, typical solutions are justified if they are calculated once but applied many times [5].

7. CONCLUSIONS

The rationality-bounding condition (3.1) imposes mutual constraints on the time to handle information and the time to make and implement a corresponding decision (control action). This condition states: when jointly solving control, communication, and computing problems, an optimal solution (control action) can be impossible to find due to real-time requirements, and almost optimal solutions have to be used instead (the best ones found under the existing constraints on the search procedure).

Moreover, the rationality-bounding condition interconnects common concepts in control and optimization such as requisite variety, bounded rationality, C $ {}^{3} $, heuristics, and records in real-time optimization, demonstrating their unity and deep relationship; see Fig. 2.

Among the particular results of this paper, we mention the concretization (5.5) of the rationality-bounding condition for the institutional control problem of organizational and technical systems. The constructiveness of condition (5.5) allows posing and solving different applications-relevant problems: minimizing the error or complexity, finding a critical capacity of a communication channel and a critical rate of computing, and determining the maximum number of controlled subsystems. Some illustrative examples have been presented in Sec. 5.

Note that the rationality-bounding condition has been formulated in terms of time costs. However, decision-making can also require financial costs, which possibly determine the information available at the decision instant (for example, see models with payments for reducing uncertainty [21]) and the corresponding times of data transmission, processing, etc. These costs can be considered, e.g., by additive inclusion in goal functions. Models optimizing decision processes (by time, accuracy, cost, and other interrelated criteria) are a promising area of further research.

Also, it seems interesting to consider the following factors in rationality-bounding conditions: dynamics of the controlled system, information coding, noises in communication channels, and uncertain conditions in which the control system operates.

REFERENCES

Aizerman, M.A. and Aleskerov, F.T., Vybor variantov: osnovy teorii (Choice of Alternatives: Theoretical Foundations), Moscow: Nauka, 1990.
Google Scholar
Aleskerov, F.T., Khabina, E.L., and Shvarts, D.A., Binarnye otnosheniya, grafy i kollektivnye resheniya (Binary Relations, Graphs, and Collective Decisions), Moscow: Vyssh. Shkola Ekon., 2006.
Google Scholar
Andrievsky, B.R., Matveev, A.S., and Fradkov, A.L., Control and estimation under information constraints: Toward a unified theory of control, computation and communications, Autom. Remote Control, 2010, vol. 71, no. 4, pp. 572–633.
Article MathSciNet Google Scholar
Belov, M.V. and Novikov, D.A., Modeli deyatel’nosti (Models of Activity), Moscow: Lenand, 2021.
Google Scholar
Belov, M. and Novikov, D., Models of Technologies, Heidelberg: Springer, 2020.
Book Google Scholar
Vasil’ev, D.K., Zalozhnev, A.Yu., Novikov, D.A., and Tsvetkov, A.V., Tipovye resheniya v upravlenii proektami (Typical Solutions in Project Management), Moscow: Inst. Probl. Upr. Ross. Akad. Nauk, 2003.
Google Scholar
Germeier, Yu.B., Vvedenie v teoriyu issledovaniya operatsii (Introduction to Operations Research), Moscow: Nauka, 1971.
Google Scholar
Kalmykov, S.A., Shokin, Yu.I., and Yuldashev, Z.Kh., Metody interval’nogo analiza (Interval Analysis Methods), Novosibirsk: Nauka, 1986.
MATH Google Scholar
Korepanov, V.O., Modeli refleksivnogo gruppovogo povedeniya i upravleniya (Models of Reflexive Group Behavior and Control), Moscow: Inst. Probl. Upr. Ross. Akad. Nauk, 2011.
Google Scholar
Kuznetsov, O.P., Bounded rationality and decision making, Iskusstv. Intell. Prinyatie Reshenii, 2019, no. 1, pp. 3–15.
Larichev, O.I., Teoriya i metody prinyatiya reshenii (Theory and Methods of Decision Making), Moscow: Logos, 2002.
Google Scholar
Malishevskii, A.V., Kachestvennye modeli v teorii slozhnykh sistem (Qualitative Models in the Theory of Complex Systems), Moscow: Nauka, 1998.
Google Scholar
Molodtsov, D.A., Ustoichivost’ printsipov optimal’nosti (Stability of Optimality Principles), Moscow: Nauka, 1987.
MATH Google Scholar
Burkov, V. et al., Mechanism Design and Management: Mathematical Methods for Smart Organizations, New York: Nova Sci. Publ., 2013.
Google Scholar
Von Neumann, J. and Morgenstern, O., Theory of Games and Economic Behavior, Princeton: Princeton Univ. Press, 1944.
MATH Google Scholar
Nesterov, Y., Introductory Lectures on Convex Optimization, Boston: Springer, 2004.
Book Google Scholar
Novikov, D.A., Analytical complexity and errors of solving control problems for organizational and technical systems, Autom. Remote Control, 2018, vol. 79, no. 5, pp. 860–869.
Article MathSciNet Google Scholar
Novikov, D.A., Institutsional’noe upravlenie organizatsionnymi sistemami (Institutional Control of Organizational Systems), Moscow: Inst. Probl. Upr. Ross. Akad. Nauk, 2004.
Google Scholar
Novikov, D.A., Cybernetics: from Past to Future, Cham: Springer, 2016.
Book Google Scholar
Novikov, D.A., Models of strategic behavior, Autom. Remote Control, 2012, vol. 73, no. 1, pp. 1–19.
Article MathSciNet Google Scholar
Novikov, D.A., Theory of Control in Organizations, New York: Nova Sci. Publ., 2013.
Google Scholar
Novikov, D.A., Setevye struktury i organizatsionnye sistemy (Network Structures and Organizational Systems), Moscow: Inst. Probl. Upr. Ross. Akad. Nauk, 2003.
Google Scholar
Novikov, D.A. and Chkhartishvili, A., Reflexion and Control: Mathematical Models, CRC Press, 2014.
Opoitsev, V.I., Ravnovesie i ustoichivost’ v modelyakh kollektivnogo povedeniya (Equilibrium and Stability in Collective Behavior Models), Moscow: Nauka, 1977.
Google Scholar
Petrovskii, A.B., Teoriya prinyatiya reshenii (Decision Theory), Moscow: Akademiya, 2009.
Google Scholar
Problemy setevogo upravleniya (Problems of Networked Control), Fradkov, A.L., Ed., Moscow–Izhevsk: Inst. Komp’yut. Issled., 2015.
Teoriya upravleniya (dopolnitel’nye glavy) (Control Theory (Additional Chapters), Novikov, D.A., Ed., Moscow: Lenand, 2019.
Fishburn, P., Utility Theory for Decision Making, New York: John Wiley and Sons, 1970.
Book Google Scholar
Ashby, W., An Introduction to Cybernetics, London: Chapman and Hall, 1956.
Book Google Scholar
Abel, D., Concepts in bounded rationality: perspectives from reinforcement learning, Master’s Thesis, Providence, Brown Univ., 2019.
Aleskerov, F. and Monjardet, B., Utility Maximization, Choice and Preference, Berlin: Springer, 2013.
MATH Google Scholar
Algorithmic Game Theory, Nisan, N., Roughgarden, T., Tardos, E., and Vazirani, V., Eds., New York: Cambridge Univ. Press, 2009.
Aumann, R., Rationality and bounded rationality, Games Econ. Behav., 1997, vol. 21, pp. 2–14.
Article MathSciNet Google Scholar
Aumann, R., Rule-rationality versus Act-rationality; Discussion Paper no. 497 , Jerusalem: Hebrew Univ., 2008.
Google Scholar
Bounded Rationality: The Adaptive Toolbox, Gigerenzer, G. and Selten, R., Eds., Massachusetts: MIT Press, 2001.
Braun, D. and Ortega, P., Information-theoretic bounded rationality and $ \varepsilon $-optimality, Entropy, 2014, vol. 16, pp. 4662–4676.
Article MathSciNet Google Scholar
Camerer, C., Behavioral Game Theory: Experiments in Strategic Interactions, Princeton: Princeton Univ. Press, 2003.
MATH Google Scholar
Camerer, C., Bounded rationality in individual decision making, Exp. Econ., 1998, vol. 1, pp. 163–183.
Article Google Scholar
Cherniak, C., Minimal rationality, Mind, 1981, vol. 90, no. 358, pp. 161–183.
Article Google Scholar
Christensen, D. et al., Putting Logic in Its Place: Formal Constraints on Rational Belief , Oxford: Oxford Univ. Press, 2004.
Book Google Scholar
Damasio, A., Descartes Error: Emotion, Reason, and the Human Brain, New York: Penguin, 2005.
Dhami, S., The Foundations of Behavioral Economic Analysis, Oxford: Oxford Univ. Press, 2016.
Fudenberg, D., Strack, P., and Strzalecki, T., Speed, accuracy, and the optimal timing of choices, Am. Econ. Rev., 2018, vol. 108, no. 12, pp. 3651–3684.
Gershman, S., Horvitz, E., and Tenenbaum, J., Computational rationality: a converging paradigm for intelligence in brains, minds and machines, Science, 2015, vol. 349, no. 6245, pp. 273–278.
Gigerenzer, G., Adaptive Thinking: Rationality in the Real World, Oxford: Oxford Univ. Press, 2000.
Gigerenzer, G., Why heuristics work, Perspect. Psychol. Sci., 2008, vol. 3, no. 1, pp. 20–29.
Gintis, H., The Bounds of Reason: Game Theory and the Unification of the Behavioral Sciences, Princeton: Princeton Univ. Press, 2009.
Gottwald, S. and Braun, D., Bounded rational decision-making from elementary computations that reduce uncertainty, Entropy, 2019, vol. 21, p. 375.
Hansen, E. and Walster, G., Global Optimization Using Interval Analysis, New York: Marcel Dekker, 2004.
Kahneman, D., Maps of bounded rationality: psychology for behavioral economics, Am. Econ. Rev., 2003, vol. 93, pp. 1449–1475.
Kahneman, D. and Tversky, A., Judgment under Uncertainty: Heuristics and Biases, Cambridge: Cambridge Univ. Press, 1982.
Lewis, R., Howes, A., and Singh, S., Computational rationality: linking mechanism and behavior through bounded utility maximization, Topics Cognit. Sci., 2014, vol. 6, no. 2, pp. 279–311.
Mansour, Y., Computational Game Theory, Tel Aviv: Tel Aviv Univ., 2003.
Matveev, A. and Savkin, A., Estimation and Control over Communication Networks, Basel: Birk, 2009.
Matveev, A. and Savkin, A., The problem of state estimation via asynchronous communication channels with irregular transmission times, IEEE Trans. Autom. Control, 2003, vol. 48, no. 4, pp. 670–676.
Nair, G. and Evans, R., Exponential stability of finite-dimensional linear systems with limited data rates, Automatica, 2003, vol. 39, pp. 585–593.
Narahari, Y., Game Theory and Mechanism Design, Singapore: World Sci., 2014.
Novikov, D., Control, activity, personality, Adv. Syst. Sci. Appl., 2020, vol. 20, no. 3, pp. 113–135.
Pratt, J., Risk aversion in the small and in the large, Econometrica, 1964, vol. 32, no. 1/2, pp. 122–136.
Real-Time Optimization, Bonvin, D., Ed., Basel: MDPI, 2017.
Rubinstein, A., Modeling bounded rationality in economic theory. Four examples, in Routledge Handbook of Bounded Rationality, New York: Routledge, 2020, pp. 453–469.
Russel, S. and Subramanian, D., Provably bounded-optimal agents, J. Artif. Intell. Res., 1995, vol. 2, pp. 575–609.
Simon, H., A behavioral model of rational choice, Q. J. Econ., 1955, vol. 69, pp. 99–118.
Simon, H., Models of Bounded Rationality. Vol. 1: Economic Analysis and Public Policy. Vol. 2: Behavioral Economics and Business Organization, Cambridge: MIT Press, 1982.
Simon, H., Models of Man: Social and Rational, London: Wiley, 1957.
Simon, H., Rationality as process and as product of thought. Richard Ely lecture, Am. Econ. Rev., 1978, vol. 68, no. 2, pp. 1–16.
Stanovich, K., The Psychology of Rational Thought, New Haven: Yale Univ. Press, 2009.
Wall, K., A bounded rationality decision process model, in Proc. IFAC Symp. Dyn. Model. Control Natl. Econ. (Edinburgh, 1989), pp. 473–478.

Download references

Author information

Authors and Affiliations

Trapeznikov Institute of Control Sciences, Russian Academy of Sciences, Moscow, 117997, Russia
D. A. Novikov

Authors

D. A. Novikov
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to D. A. Novikov.

Additional information

Translated by The Author(s)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Novikov, D.A. Bounded Rationality and Control. Autom Remote Control 83, 990–1009 (2022). https://doi.org/10.1134/S0005117922060145

Download citation

Received: 03 November 2021
Revised: 10 January 2022
Accepted: 10 March 2022
Published: 05 July 2022
Issue Date: June 2022
DOI: https://doi.org/10.1134/S0005117922060145

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Bounded Rationality and Control

Abstract

Similar content being viewed by others

Control Theory with Information Structures

Information Structures, the Witsenhausen Counterexample, and Communicating Using Actions

Information Structures, the Witsenhausen Counterexample, and Communicating Using Actions

1. INTRODUCTION

2. CLASSICAL AND BOUNDED RATIONALITY

3. RATIONALITY-BOUNDING CONDITION

4. ACTIVITY NORMS: CONTROL PROBLEM

Proposition 1.

5. ANALYTICAL COMPLEXITY AND ERROR OF SOLUTIONS

Proposition 2.

Example 1 (boundedly rational control).

Example 2 (the maximum norm of controllability).

6. TYPICAL SOLUTIONS

7. CONCLUSIONS

REFERENCES

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Bounded Rationality and Control

Abstract

Similar content being viewed by others

Control Theory with Information Structures

Information Structures, the Witsenhausen Counterexample, and Communicating Using Actions

Information Structures, the Witsenhausen Counterexample, and Communicating Using Actions

1. INTRODUCTION

2. CLASSICAL AND BOUNDED RATIONALITY

3. RATIONALITY-BOUNDING CONDITION

4. ACTIVITY NORMS: CONTROL PROBLEM

Proposition 1.

5. ANALYTICAL COMPLEXITY AND ERROR OF SOLUTIONS

Proposition 2.

Example 1 (boundedly rational control).

Example 2 (the maximum norm of controllability).

6. TYPICAL SOLUTIONS

7. CONCLUSIONS

REFERENCES

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation