Keywords

1 Introduction

Artificial General Intelligence (AGI) is the art of building thinking machines. These machines are able to understand, learn and perform any intellectual task that human can. In contrast to AI, AGI treats intelligence as a whole, resulting in the construction of versatile and general-purpose intelligent systems that can learn, reason, plan, communicate as well as any other tasks at the human intelligence level or perhaps ultimately well beyond it.

The original version of AGI Brain which was proposed in 2019 [2], worked well in a number of linear/nonlinear, continuous/discrete, single agent/multi agent deterministic environments, but it lacked the efficiency to perform in stochastic environments. In this paper, an upgraded version of the model, called AGI Brain II, is proposed which can also perform well in a stochastic environment. For testing the performance of the upgraded model, it was tested in a portfolio optimization scenario as a stochastic environment. In order to compare the two versions, an index called versatility index (VI) is suggested, which is used to measure the versatility of AGI systems.

2 Versatility Index

AGI systems are meant to be as versatile as possible. Versatility is a necessary condition for an intelligent system to be called as an AGI system. According to Legg and Hutter, AGI systems have to perform well in a very large range of environments [1]. Regarding this quote, if an intelligent system is going to be called an AGI system, these two questions must be answered: How many environments that system can perform in? And how well the system can perform in each environment? Therefore, the number of different operating environments of an intelligent system in combination of its performance wellness in each environment can be considered as the measure of the versatility of the candidate system in order to be called as an AGI system. We call this measure the versatility index (VI) which is defined as the summation of the performances of an AGI system in each environment as follows:

$$ VI = \sum\limits_{i = 1}^{N} {\alpha_{i} } $$
(1)

Where \(N\) is the number of different operating environments of the system, and \(\alpha_{i}\) is the performance of the system in environment \(i\). Since \(N\) and \(\alpha_{i}\) are positive real numbers, the VI is also a dimensionless positive real number.

Since AI systems are problem-specific, their VI value will obviously be low compared to AGI systems. So, the VI can be considered as a distinction between AI and AGI system. The VI also provides a quantitative ground for comparison between different AGI systems. Different AGI systems can be compared by their VIs. The more versatile systems will have higher VI values and vice versa. The VI in combination with other evaluation methods might also be considered as an alternative way to measure the efficiency and intelligence level of AGI systems (or even of human brain), which will be discussed in Sect. 6.

Example 1.

AGI system A is able to perform 3 AI tasks such as speech recognition, image processing, and intelligent control with performances \(\alpha_{1} = 85\%\), \(\alpha_{2} = 62\%\), and \(\alpha_{3} = 93\%\) respectively. Therefore, the VI for the AGI system A is calculated as follows;

$$ VI_{{{\text{AGI}}\;{\text{system}}\;{\text{A}}}} = \sum\limits_{i = 1}^{N} {\alpha_{i} = 240} $$

3 The Original AGI Brain

AGI Brain is a unified learning and decision-making framework for artificial general intelligence systems based on modern control theory. It considers intelligence as a form of optimality. In AGI Brain intelligence means and equals optimization; Optimization of the surrounding world towards common goals. In AGI Brain the design is emphasized on versatility, i.e., designing a general-purpose artificial brain. Figure 1 illustrates the general schematic world \(\gamma\) consisting of the artificial agent \(\omega\) and the object \(\psi\).

Fig. 1.
figure 1

The world \(\gamma\) consisting of the artificial agent \(\omega\) and its brain \(\Gamma\), and the object \(\psi\). Observed feedbacks: (\({\mathbf{x}}_{\omega }\): vector of the bodily states of the agent \(\omega\), \({\mathbf{a}}\): action vector, \({\mathbf{x}}_{\psi }\): vector of the states of the object \(\psi\), \({\varvec{r}}\): response vector of the object \(\psi\)). The artificial brain \(\Gamma\) can observe these actions and states fully or partially by its sensors.

At every time step \(n\), the artificial brain \(\Gamma\) produces commands \({\varvec{u}}\) (e.g. hormones or neural signals) which change the states of \(\omega\)’s body, i.e. \({\mathbf{x}}_{\omega }\), which then leads to performing action \({\mathbf{a}}\) on the object \(\psi\). This action changes \(\psi\)’s states \({\mathbf{x}}_{\psi }\), which consequently leads to \(\psi\)’s response \({\varvec{r}}\). Like a natural brain, the \(\Gamma\) can observe these actions and states fully or partially by its sensors (Fig. 1).

By benefiting from powerful modelling capability of state-space representation, as well as ultimate learning ability of the neural networks (NNs), AGI Brain tries to duplicate intelligence using a unified strategy. The model emulates three learning stages of human being for learning its surrounding world. In AGI Brain, these 3 stages are called: 1) infancy stage (random actions), 2) decision making stage (action selection via \(EM\)), and 3) expert stage (autonomous action via \(IM\)) (Fig. 2).

In its decision-making stage, the agent selects the best policy from its set of possible alternatives as follows;

$$ \begin{array}{*{20}l} {{\mathbf{U}}^{*} = \left\{ {\left. {{\varvec{u}}(n)} \right| \, \mathop {ArgMax}\limits_{{{\mathbf{u}} \in \aleph }} \sum\limits_{{n = n_{1} }}^{{n_{f} }} {[R = {\varvec{P}}^{T} {\varvec{J}}]} } \right\}} \hfill \\ {s.t.} \hfill \\ {\left\langle {\begin{array}{*{20}c} {{\user2{\hat{x}}}(n + 1)} \\ {{\user2{\hat{y}}}(n + 1)} \\ \end{array} } \right\rangle \xleftarrow{EM}\left\langle {\begin{array}{*{20}c} {{\user2{x}}(n)} \\ {{\user2{y}}(n)} \\ {{\varvec{u}}(n)} \\ \end{array} } \right\rangle } \hfill \\ \end{array} $$
(2)

Where \({\mathbf{U}}^{*}\) is the optimal policy, \({\varvec{u}}(n)\) is the possible action at time \(n\), \(\aleph\) is the set of all possible alternatives, \(R\) is the reward value, \({\varvec{P}}\) is the personality vector, \({\varvec{J}}\) is the vector of objectives, \({\varvec{x}}(n)\) is the vector of states, \({\varvec{y}}(n)\) is the vector of outputs, and \({\hat{\mathbf{x}}}(n + 1)\) and \({\hat{\mathbf{y}}}(n + 1)\) are the estimated states and outputs which are estimated by the agent’s explicit memory \(EM\).

In the original version, the explicit memory \(EM\) is made up of neural networks (NN) and works as a state/output estimator. The original model also benefits from some other features like an implicit memory (\(IM\)) for autonomous policy selection as well as emotions (stress) for moderating the exploration/exploitation behavior ratio.

In addition to these, the model benefits from shared explicit and implicit memories for the multi agent problems, where the agents can easily share their experiences with each other in order to improve their performances.

Fig. 2.
figure 2

Working cycle of AGI Brain. Paths: 1) Infancy stage, 2) Decision making stage, and 3) Expert stage

The original model was tested on three different continuous and hybrid (continuous and discrete) Action/State/Output/Reward (ASOR) space scenarios in deterministic single-agent/multi-agent worlds. Successful simulation results demonstrated the versatile applicability of the original version of AGI Brain in deterministic worlds.

4 AGI Brain II

4.1 ProMem

Due to its neural network estimators, the original AGI Brain lacked the ability to perform well in stochastic environments. In order to empower the original model with stochastic capabilities, the state/output estimator of the original model was replaced with a modified Mamdani fuzzy inference system which we call it ProMem. This results in construction of the upgraded and more versatile version of the model, AGI Brain II.

By estimating the Probability Density Function (PDF) of the observed data, ProMem is able to estimate the state/output of a certain action in stochastic worlds as well as deterministic worlds. Applying ProMem to the decision-making problem of Eq. (2), we have;

$$ \begin{array}{*{20}l} {{\mathbf{U}}^{*} = \left\{ {\left. {{\varvec{u}}(n)} \right| \, \mathop {ArgMax}\limits_{{{\mathbf{u}} \in \aleph }} \sum\limits_{{n = n_{1} }}^{{n_{f} }} {\left[ {R = {\varvec{P}}^{T} {\varvec{J}}} \right]} } \right\}} \hfill \\ {s.t.} \hfill \\ {\left\langle {\begin{array}{*{20}c} {{\user2{\hat{x}}}(n + 1)} \\ {{\user2{\hat{y}}}(n + 1)} \\ \end{array} } \right\rangle \xleftarrow{PROMEM}\left\langle {\begin{array}{*{20}c} {{\user2{x}}(n)} \\ {{\user2{y}}(n)} \\ {{\varvec{u}}(n)} \\ \end{array} } \right\rangle } \hfill \\ \end{array} $$
(3)

The other components of AGI Brain II are the same as the original version. Figure 3 illustrates the architecture of AGI Brain II. The new model has been tested in a portfolio optimization problem as a stochastic world as follows.

Fig. 3.
figure 3

Architecture of AGI Brain II (inside the artificial brain \(\Gamma\) of Fig. 1). Observed feedbacks: (\({\user2{x}}_{\omega }\): vector of the bodily states of the agent \(\omega\), \({\mathbf{a}}\): action vector, \({\user2{x}}_{\psi }\): vector of the states of the object \(\psi\), \({\varvec{r}}\): response vector of the object \(\psi\)), \({\varvec{P}}\): personality vector, \(EM\): explicit memory (ProMem), \(IM\): implicit memory, \({\varvec{J}}\): vector of objectives, \(Rnd.\): random action generator, \(DM\): decision making unit (Eq. 3), \(\aleph\): set of all possible alternatives, \(Str.\): stress simulator unit, \({\varvec{u}}\): vector of output commands, 1: infancy stage, 2: decision making stage, 3: expert stage.

5 Simulation

5.1 Portfolio Optimization

Assume a world \(\gamma\) which consists of a hypothetical stock market with three assets, A, B and C as the objects \(\psi_{1}\), \(\psi_{2}\), and \(\psi_{3}\). The single AGI Brain II (ProMem estimator) agent \(\omega\) has to maximize its net wealth by optimal allocation of its assets in its portfolio. The set of possible actions of the agent are the number of shares placed at each time in the various assets. The agent may sell, buy or hold some predefined portions of its shares at each time step.

The set of equations that govern the evolution of the system is as follows;

$$ \left\{ {\begin{array}{*{20}c} {x_{A} (n) = 1 + \sin (\frac{2\pi n}{{100}}) + r_{A} (n)} \\ {x_{B} (n) = 1 + \cos (\frac{2\pi n}{{100}}) + r_{B} (n)} \\ {x_{C} (n) = 1 + 2\sin (\frac{2\pi n}{{100}})\cos (\frac{2\pi n}{{100}}) + r_{C} (n)} \\ \end{array} \, ,} \right.0 \le n \le 1000 $$
(4)

Where \(x_{i} (n)\) is the close price of asset \(i\), and \(r(n)\) is a random number \(0 \le r_{j} (n) \le 0.25\) with mean \(\mu_{r} = 0.125\) and standard deviation \(\sigma_{r} = 0.0725\).

For comparison purposes, a single agent with original AGI Brain (NN estimator) was added to the system. The two agents start with 1000000 units (e.g., US Dollars) of cash and zero shares at the start time \(n = 800\). Using their estimators ProMem and NN, the two agents try to estimate the close price of the next time step \(x_{i} (n + 1)\) based on the 10 previous close prices, and allocate their assets in order to maximize their net wealth. At every time step \(n \ge 800\), they make decisions on whether to hold, buy or sell 1, 5, 10, 20, 50, 100, 1000, 10000, 100000 or 1000000 shares based on the predicted close price value of the next time step. If the estimated close price value of the next time step is higher than the close price value of the current time step, they decide to buy some shares that would maximize their net wealth. If the estimated close price value of the next time step is lower than the close price value of the current time step, they decide to sell some of their shares, and if the estimated close price value of the next time step is equal to the close price value of the current time step, they decide to hold their shares. Please note that this is an overly simplified portfolio optimization scenario where the agents make their decisions based on only one time step ahead, and they do not incorporate real financial analysis tools in their decision-making process. Figures 4 and 5 show the performance of the two agents in this scenario.

Fig. 4.
figure 4

Upper) Close prices (Asset values) of the three assets A, B and C. Middle) Net Wealth of the two agents. Lower) Mean overall estimation error.

Fig. 5.
figure 5

Mean overall estimation accuracy of the two memories ProMem (blue) and NN (red) (Color figure online)

6 Conclusion and Future Works

As illustrated in Fig. 4, at the final time step \(n = 1000\) the net wealth of the AGI Brain agent is 62284014.51 units and the net wealth of the AGI Brain II agent is 85242829.95 units. The mean estimation error of the AGI Brain agent is 0.144 and the mean estimation error of the AGI Brain II agent is 0.122. As illustrated in Fig. 5, the average overall estimation accuracy of the AGI Brain agent is 73.83%, and the average overall estimation accuracy of the AGI Brain II agent is 75.33%.

The simulation results show that the new model, AGI Brain II, performed much better than the original one in the stochastic world. The new model’s net wealth is higher than its antecessor. This is because of the higher estimation accuracy (leading to the lower estimation error) of the ProMem compared to the estimation error of the NN. In other words, ProMem could estimate the close price of the next time step more accurately than NN. The reason for this accuracy is grounded in the ability of ProMem in estimation of the probability density function of the observed data. In its training stage, ProMem tries to form a PDF over the observed data as accurate as possible.

AGI Brain II has also been tested in the scenarios which the original model was tested [2], and performed well in linear/nonlinear, continuous/discrete, single agent/multi agent, deterministic/stochastic worlds. Table 1 contains the performances of the two models in different scenarios:

Table 1. Performances of the two models in different environments

Thus;

$$ VI_{{{\text{AGI}}\;{\text{Brain}}}} = \sum\limits_{i = 1}^{4} {\alpha_{i} } = 365.25 $$

And,

$$ VI_{{{\text{AGI}}\;{\text{Brain}}\;{\text{II}}}} = \sum\limits_{i = 1}^{4} {\alpha_{i} } = 370.78 $$

So, based on their VI values, AGI Brain II is more versatile than its antecessor, the original AGI Brain.

Although AGI Brain II is more versatile than its antecessor, it is still far from being a real AGI. The next development stages would be augmenting the ability to perform well in 1) delayed reward problems, and 2) the environments with intelligent opponents (e.g., Games).