Introduction

AI and machine learning have recently made great strides—thanks especially to the explosive developments in neural networks [1,2,3,4,5,6,7,8,9]. But several critics, including some of the leading researchers in AI, have noted the fact that the currently dominant AI paradigm—often referred to as deep learning—diverges from natural intelligence in fundamental ways [10,11,12,13,14,15,16]. Concurring with these critiques, this position paper argues that these divergences will keep current ML systems from achieving the flexible and versatile general intelligence of the kind seen even in fairly simple animals with central nervous systems, though they will become far better than humans at solving a wide range of very specific, extremely complex problems that are traditionally associated with human intelligence. The paper argues further that building natural intelligence will require a way of thinking about AI that is rooted more firmly in biology and complex systems rather than computational engineering [15]. The classical engineering paradigm based on the concepts of design, optimization, verification, validation, stability, predictability, controllability, and goal-directedness does not work well for complex adaptive systems and must be replaced by a different framework that emphasizes self-organization, autonomy, functional adequacy, versatility, flexibility, adaptivity, growth, and resilience [17]. In doing so, the focus should be on identifying and deploying the “enabling technologies” of biology that have produced functional intelligent systems of great complexity and ultimately develop an engineering framework based on these.

One part of the question motivating this special issue of Cognitive Computation is, “What can AI learn from neuroscience?” The main message of this position paper is that AI should learn not only from neuroscience, but also from evolutionary and developmental biology, and from the emergent behavior of animals as self-organizing complex dynamical systems.

In the 1980s, the re-emergence of neural networks was seen as liberating AI from this abstract symbolic approach to a more biologically inspired, distributed, and adaptive one. However, it is now clear that not all the problems were solved, and some new ones were created. The neural networks of today may nominally use biologically inspired neurons and synapses, but in fact, the biologically implausible, hand-crafted, serial, symbolic algorithms of the past have simply been replaced by parallel, distributed algorithms trained on data while retaining the dualistic notion that the function of intelligence can be abstracted away completely from its natural biological substrate. This position paper argues for an alternative set of principles to first understand and then, perhaps, to build a more natural AI that is grounded firmly in biology.

The proposed approach is termed deep intelligence (DI) because it sees intelligence as requiring depth along several dimensions: Structural, functional, and adaptive. But, more fundamentally, it asserts that flexible, fluent, and versatile general intelligence is a biological phenomenon that is a property of the organism as a whole and emerges naturally from the structure and dynamics of the organism as it interacts with its environment. The organisms—in this case, animals with nervous systems—are seen as self-organizing complex systems embedded in a complex, dynamic environment.

These ideas are not very new in themselves and have been explored well within disciplines relevant to AI, including evolutionary and developmental biology [18,19,20,21], neuroscience [22,23,24,25], cognitive science [26, 27], complex systems studies [28, 29], and robotics [30,31,32,33,34,35]. Unfortunately, their application in AI has remained confined to sub-disciplines such as developmental and evolutionary robotics, while the mainstream thrust of AI in recent years has been on building ever larger deep neural networks trained primarily with supervised learning—no doubt because such networks produce more immediate benefits from an applications perspective. This tendency is also rooted in a brain-centric view of intelligence, assuming that the best way to obtain general intelligence is to model higher cognitive abilities—notably language and reasoning—and that, after sufficient scaling up, such models would lead to artificial general intelligence (AGI). With this focus on large neural networks learning statistical inference from large datasets, AI has moved steadily away from the biological nature of intelligence, and thus from the viewpoint stated above. Most of the well-known neural models of today—with the possible exception of convolutional neural networks [5] and some recent work in reinforcement learning (RL) [36,37,38]—have very little of the biological inspiration that originally motivated neural networks.

Natural Intelligence

What makes an agent intelligent? Much of the work on AI has been based on a human-centric and brain-centric definition of intelligence, which sees it as an attribute comprising human—or human-like—capabilities associated with the brain: Reasoning, planning, language, complex problem-solving, etc. But, in fact, intelligence is an attribute of all animals with central nervous systems: It is an inherent understanding of the world that gives the animal the ability to exploit its environment opportunistically and pervasively for productive survival. A spider weaving a web to catch prey, an octopus changing color to camouflage itself, a leopard stalking deer on the savannah, and a human shopping for groceries are all examples of intelligence at various levels. The complexity of intelligent behaviors is related directly to the complexity of the animal’s body and its nervous system. Thus, intelligence is something that has evolved gradually over hundreds of millions of years, punctuated with sharp jumps in complexity corresponding to the emergence of novel “enabling technologies” such as the spinal cord, the neocortex, and bipedality. Human intelligence is only the latest stage of this long process. This type of intelligence will be termed natural intelligence in this paper to distinguish it from the technical definition of general intelligence in cognitive psychology [39, 40].

Natural intelligence has many specific characteristics, but the following stand out in particular:

Autonomy

Natural intelligence is autonomous in the sense that it involves only the animal and its environment (including other organisms). Intelligent behavior emerges from the interaction between the two, driven and modulated by the internal motivations, drives, and emotions of the animal. At its most basic level, it is automatic and has implicit goals, but more deliberate behaviors with subjectively perceived goals emerge in more complex animals [41].

Integrity

Intelligence is a property of the whole animal, with all its sensory, cognitive, and behavioral capacities, not a piecemeal collection of distinct functions. This means that intelligence always interacts with the world as a single entity, and, except in pathological cases, never faces the problem of integrating parts of its immediate perception, memory, and emotion post facto to make inferences across all aspects of experience. The animal always lives in a single, unified moment even though its attention may focus only on parts of it.

Fluency

Natural intelligence is always “real-time”—continually receiving sensory data and generating internal and external behaviors in the context of its internal state. It learns extremely rapidly—sometimes with almost no experience—and generalizes well out of sample. There is no opportunity for massive off-line learning—only limited intervals of mental rehearsal learning and memory consolidation. Humans have, of course, developed layers of external communication (e.g., language) and memory (e.g., written knowledge) that extend this, but that is a late emergent feature of intelligence that must have developed by leveraging more rudimentary capacities present in earlier primates and other animals.

Adaptivity

The ability to adapt over time is an essential feature of all intelligent animals—even the simplest. However, this adaptation is not confined just to neural plasticity. It includes the developmental process of the animal and its real-time behavior that is always dynamically adaptive and emergent. At a deeper level, it also includes the evolutionary process that has brought the animal to its current form.

Versatility

Intelligence in all animals is versatile in that a single, integrated brain-body system performs all the functions underlying intelligence. Importantly, these functions are always inherently coordinated because they have evolved and developed within an integrated system through evolutionary, developmental, and behavioral time.

Resilience

The quality of an animal’s intelligence depends ultimately on its ability to survive in perpetually unexpected situations, i.e., resilience. There are very specific features of the biological system that make it resilient, including modularity [19, 42], emergent coordination [28, 29], and functional diversity [43].

Evolvability

Intelligence emerged in its simplest form in very simple animals and has developed over several hundred million years into a far, far more complex functional attribute as instantiated in humans. This ability to evolve and grow over orders of magnitude is not an accidental thing, but the result of specific biological principles that are grouped under the term evolvability [44,45,46]. Successful intelligence that can thrive in a changing world must itself be evolvable.

The term “intelligent system” is often used casually to refer to any system that can learn from data, but it is absurd to say that any one function such as vision or language is “intelligent” in any real sense. The proposed DI framework attempts to make the term “intelligent systems” more specific, applying it only to systems that possess the seven features listed above to some degree. The question is: How many of these features are present in the widely celebrated AI systems of today? The answer is that almost none are, except adaptivity, and that too in a superficial, narrowly defined way. The next question is: Is it possible to get to a system with these features using today’s dominant AI approaches? The case made in this position paper is that it will not be possible (or will be extremely difficult), and a different approach—Deep Intelligence—is needed, first to understand and then to replicate natural intelligence.

Critique of Current AI Methods

Deep Learning and Natural Intelligence

This, in many ways, is the golden age of AI—a time when it is, at last, achieving real-world successes at an impressive pace that is likely to continue for the foreseeable future. But today’s successful deep learning-based machine learning—referred to henceforth as DL/ML—differs from natural intelligence in several fundamental ways:

  • Most DL/ML systems are specialists that perform only a single task (or a well-defined range of tasks) that they are explicitly trained for, whereas natural systems show versatile intelligence across a broad range of modalities and tasks.

  • DL/ML systems typically require a large amount of data and a large number of learning iterations, whereas natural systems can often learn to solve real-world problems quickly from very limited data [12, 14, 37].

  • Most of the successful, application-scale DL/ML systems use supervised learning, whereas natural systems depend much more on unsupervised and fast reinforcement learning.

  • In most DL/ML methods, data needs to be stored offline so it can be iterated through repeatedly whereas natural systems learn mainly from real-time data with only limited off-line storage and recall (rehearsal).

  • While DL/ML systems can be extremely good at generalizing within sample (after sufficient training), they remain poor in out-of-sample generalization, which natural intelligent systems do as a matter of course [47].

  • While DL/ ML systems are extremely good at pattern recognition and statistical inference, they are severely limited in terms of the symbolic processing and compositionality that would be required for human capacities such as language, domain-independent reasoning, and complex planning [12].

  • DL/ML systems have difficulty with causal inference because of their inability to handle complex temporal compositionality [10].

  • DL/ML systems have a fixed learning capacity that causes them to converge to a good solution on a complex task and stop or just maintain performance with some ongoing learning. In contrast, natural intelligent systems start with simple tasks and become more capable over their lifetime by building on the scaffolding of previous learning. Thus, learning depletes the learning capacity of DL/ML systems but enhances that of natural ones.

  • DL/ML systems do not have any notion of meaning beyond inferred statistical regularities present in their training datasets whereas natural intelligent systems ground meaning in the experience of their physical environment [14, 15]. As a result, even extremely sophisticated ML systems are “cognitively shallow” [48], and frequently make absurd inferences [13].

  • DL/ML systems are not autonomous in the sense of being driven by internal motivations. They are trained only to serve specific purposes defined by external users and evaluated in terms of these purposes.

These differences do not matter much when AI is applied to narrow problems where a large amount of reliable data is available. Exponential increases in computational power and advances in learning algorithms are allowing DL/ML systems to show remarkable performance on tasks such as translation [49, 50], code generation [51], game playing [52, 53], image analysis [5, 54], and answering complicated questions [55], to name a few. However, it is not compatible with the goal of developing natural intelligence.

Why Is a New Framework Needed?

There are many reasons why a new framework beyond the DL/ML approach is needed, but two stand out in particular: Lack of scalability, and difficulty of integration.

Building large-scale flexible and versatile intelligent systems—such as high degree-of-freedom (DOF) autonomous intelligent robots expected to perform a wide range of functions in the real world—will require exponentially greater amounts of data, time, and computational resources with increasing system complexity, and the data- and compute-hungry DL/ML approaches will have difficulty scaling in this situation. A major reason for this is that DL/ML systems try to learn very complex tasks using neural networks that: (1) Are initially naïve, i.e., have very limited, if any, prior inductive biases; and (2) Have generic, architecturally simple forms, e.g., repeated attentional, convolutional, feed-forward layers, etc., that are expected to work across a broad range of tasks. Functional systems in animals are much more heterogeneous and function-specific. To be sure, the human brain has its own generic processor in the neocortex, but all the functions it is involved in are performed in conjunction with very specifically structured systems such as the hippocampus, basal ganglia, cerebellum, the spinal cord, etc.—and, of course, the very specific networks of sensory receptors and musculoskeletal elements. The effort to learn very complex functions with initially naïve and generic networks is what requires so much supervised learning. Animals, in contrast, do neural learning on top of a non-naïve substrate configured by evolution, and refined not only through neural learning but also via a gradual developmental process.

With regard to having the DL/ML approach possibly lead to natural intelligence that is autonomous, versatile, and flexible, the main challenge is the narrow functional specificity of most systems. There are two obvious paths towards greater generality: (1) Start with a system that does a single thing, e.g., a large language model (LLM) generating text [55], and gradually add new task capabilities; (2) Implement specialist systems for all complex tasks important to intelligence, and then combine them into a single system. Both approaches have severe problems—above all the fact that there is no canonical list of tasks that an intelligent must perform, nor any applicable metrics: After merging N tasks, there will always be an (N + 1)th. All that can be hoped for is ad hoc combination of modalities, which is essentially what a system like DALL-E does [56]. Second, even if all the tasks could be enumerated, each of the task-specific DL/ML systems would require huge amounts of data and computation time to train, and the real world offers neither the data nor the time. Third, the world is too complex for any amount of data and training to exhaust its possibilities and ground the system in the real world, so the system will always make absurd errors. And finally, the fact is that the best systems currently available for various tasks are very different from each other in fundamental ways, including the way they learn. Patching them onto each other or combining them will lead to conflicts and arbitrarily serious emergent problems that are always created when very complicated systems are combined. These will be apparent especially when brain-centric (rather than intrinsically embodied) AI systems trained only on large datasets are embedded into embodied real-world agents with many degrees-of-freedom.

Both these difficulties are addressed by the approach proposed below.

The Deep Intelligence View

Background

The alternative Deep Intelligence approach to AI proposed in this position paper begins with a crucial observation: There is only one class of systems in the world that actually have natural intelligenceanimals with central nervous systems. Based on this observation, it recommends that, instead of trying to outdo Nature by devising new models of intelligence based purely on reductionistic computational thinking, AI research should build from a deeper, more comprehensive understanding of how the biological structures and processes of the animal lead to intelligence with all its capacities. While this may sound like a standard, old-fashioned statement of biological inspiration that has putatively driven neural networks for decades, it is, in fact, advocating a complete reimagining of the AI enterprise and abandoning the computational-utilitarian, “neural learning only” approach in favor of one that looks at the entire biology of intelligence. This does not deny the utility of studying the system at different levels and in different parts, but emphasizes that such analysis must not lose sight of the whole at any point. Indeed, even for specific domains and models, e.g., neural networks, more biologically grounded approaches should be adopted instead of relying on abstract computational ones.

The evolution of intelligence can be seen via a soft-core, hard-periphery model [57]. In this view, the earliest behaving animals consisted of rather rigid sensory networks connecting to rather rigid musculoskeletal networks with minimal mediation by the intervening nervous system. Such animals were like low-order Braitenberg vehicles [58] where specific stimuli elicited fast, stereotypical responses. Since then, evolution has done three things in modular fashion:

  1. 1.

    Complexified the sensory networks—both by adding modalities, and by making the networks for each modality more complex—but keeping the network structure fairly rigid and steroeotypical, e.g., the pattern of receptors in the retina.

  2. 2.

    Complexified the musculoskeletal networks by producing bodies with increasing degrees of freedom and more complex architectures, but here too, keeping the structure quite stereotypical, e.g., segmented, bilaterally symmetric bodies [20].

  3. 3.

    Greatly complexified the nervous system network mediating between the other two networks by making its architecture wider, deeper, and more sophisticated, thus adding an enormous number of adaptable degrees of freedom into the system.

Each of these coevolving networks has constrained the others, ensuring that the sensory and behavioral capacities of the animal remain matched with the capacity of its central nervous system. More recently—and especially in the evolution from pre-hominids to humans—the soft cognitive component has grown much more rapidly than the hard periphery, layering more levels to create increasing cognitive depth. This has allowed wholly new capacities such as natural language, symbolic reasoning, abstraction, complex causal inference, etc., to emerge—creating, in layers, a System 2 on top of the more primitive System 1 in the terminology of Kahnemann [41]. AI should understand this process in depth, and use it as a template for systematically building intelligent systems of increasing complexity.

The Significance of Depth

In the DI framework, depth does not refer just to the structural depth of deep neural networks (though that too is useful), but also functional and adaptive depth. Structural depth follows from the fact that the brain-body system of the animal is organized into multiple levels, each instantiated by a structurally deep network. These include the musculoskeletal network of the body, and the networks of the sensory receptors, thalamus, spinal cord and brainstem, the midbrain, the limbic system, the basal ganglia, the hippocampus, the neocortex, etc. This structural depth induces functional depth, as each level has its own functionality, and the final behavior emerges as a result of bidirectional interaction and dynamics across all these levels. Very importantly, functional depth is accompanied by functional diversity: Each level’s structure and function are distinct, not generic. This is the case with all highly optimized complex systems [59, 60].

But the most important type of depth for AI is adaptive depth, which comes from the fact that intelligence is an emergent product of four complex, multi-scale adaptive processes:

  1. 1.

    Evolution configures useful structures and processes in species over a very slow time-scale and encodes them into the genetic code of each species. These structures and processes represent prior inductive biases that are well-tuned to the environment in which the organisms of that species have to survive and reproduce [61].

  2. 2.

    Development instantiates the design specified by evolution in individual organisms, using a staged process of interleaved growth and learning to produce an extremely complex, well-trained intelligent agent at maturity. Each stage makes the system a little more complex and learns in the context of what prior stages have set up, thus constraining the complexity of the learning process at each stage [27].

  3. 3.

    Learning in the nervous system works in tandem with development to create detailed maps, programs, and control strategies to exploit the physical configuration produced by evolution and development extremely efficiently for survival in the animal’s specific environment. Once development slows down or stops, neural plasticity becomes the primary mechanism for further learning.

  4. 4.

    Emergent behavior is the result of real-time, dynamic assembly of synergistic coordination modes in neural and musculoskeletal networks to generate ongoing external behaviors (actions) and internal perceptual and cognitive states [28, 29, 62]. This is what enables a deep complex system with relatively slow components to generate real-time responses [63, 64].

Figure 1 shows this deep intelligence adaptive process that produces adaptation in a way that is very different from current ML practice of shallow adaptation where all adaptation beyond the initial design of the naïve agent is packed into neural learning (Fig. 2).

Fig. 1
figure 1

The deep adaptation process

Fig. 2
figure 2

The shallow adaptation process used in most machine learning

The Role of Evolution

“Nothing in biology makes sense except in the light of evolution.” That famous quote by Theodosius Dobzhansky [65]—a major figure in the Modern Synthesis of evolutionary biology—is an excellent principle for thinking about the fundamentally biological phenomenon of intelligence. Looking at evolution helps answer two fundamental questions about intelligence: (1) How do animals learn rapidly in a complex world? (2) What makes animals with extremely complex intelligence possible?

The answer to the first question is that evolution configures useful priors, or inductive biases, in the embodiment of animals, which make rapid learning possible. For example, the genetically specified connectivity patterns of neurons in the visual cortex enable mammals to learn feature detectors very rapidly [66, 67]; or the connectivity of the spinal cord neural networks and muscles enables many animals to walk or swim immediately after birth. In these cases, evolution can be seen as a designer of extremely learning-ready systems with excellent inductive biases. However, the deeper question—and one of profound relevance for AI—is how such complex, well-tuned systems are possible in the first place.

The evolutionary paradigm has had a place in AI for a long time [68, 69], mainly as an optimization mechanism [30, 70,71,72,73], but the truly important thing AI can learn from evolution is the set of strategies it uses to generate more and more complex organisms, i.e., evolvability [44,45,46]. As engineers know all too well, building more complex systems can increase the risk of failure exponentially. Evolvability is the capacity to avoid this explosion of risk. It is what makes natural intelligence possible.

The key insight that has emerged from the study of evolvability is that modularity and its diverse modes of deployment play a central role in evolvability [19, 42]. These modes include hierarchical modular composition [74,75,76], encapsulation of critical functions [77, 78], creation of neutral spaces for exploration [79, 80], re-use of modules for different functions [20, 78], and emergent coordination between modules [81, 82]. The brain too has a hierarchical modular structure [22,23,24]. Two important examples of hierarchical modularity enabling complex functions are the cerebral cortex in humans, with its structure of columns [83], hypercolumns [84], cell assemblies [85, 86], etc., and the hierarchical networks underlying motor control [36, 87,88,89,90,91]. Hierarchical modularity is also important because it means that the system is nearly decomposable, i.e., it minimizes cross-module dependencies, which is a key attribute of successful and evolvable complex systems [92, 93]. Evolution exploits this to produce increasingly complex viable systems by deepening modular hierarchies (Fig. 3).

Fig. 3
figure 3

A conceptual view of how evolution builds more complex systems by deepening modular hierarchies with composition and variation

The Role of Development

While evolution has the role of designing potentially useful structures and processes available in all multicellular organisms of any given species, development ensures that each individual organism realizes that potential effectively and efficiently. The importance of a developmental approach to learning has been asserted before in the framework of autonomous mental development, with a focus on autonomy and internal motivation rather than external tuning [24, 31, 34].

From an AI engineering viewpoint, development enables two crucial abilities in the adaptive system:

  1. 1.

    Development releases the eventually available degrees of freedom in the system gradually in stages, allowing each newly released set to settle into coordination patterns with those released in previous stages, and only then making new degrees of freedom available. It also ensures that the behavioral degrees of freedom are released in conjunction with increased perceptual and cognitive capacity, so that the complexity of the behaviors the organism is learning at any stage is matched with the complexity of the environment in which it perceives itself as operating. This turns what would have been an extremely complex learning problem of coordinating the full set of degrees of freedom all at once into a sequence of simpler, more constrained problems that are much likelier to converge to good solutions without requiring a lot of data and training.

  2. 2.

    Development enables the construction of increasingly complex functionality by hierarchical functional modularization. In humans, for example, this is apparent in linguistic learning, as simple words and sentences become building blocks for more complex ones in multiple stages. The same is true in motor learning, where simpler actions can serve as functional modules for the construction of more complex actions [94, 95]. The lack of this developmental process is a major reason why DL/ML systems are not lifelong learners.

Using a developmental approach to learning in neural networks was proposed by Elman in a seminal paper [96], where he noted: “Maturational changes may provide the enabling conditions which allow learning to be most effective. …. these models work best (and in some cases, only work at all) when they are forced to “start small” and to undergo a developmental change which resembles the increase in working memory which also occurs over time in children. This effect occurs because the learning mechanism in such systems has specific shortcomings which are neatly compensated for when the initial learning phase takes place with restricted capacity.” Unfortunately, there was very little follow-up on Elman’s suggestions until the recent emergence of curriculum learning [97], which has since seen significant growth [98, 99]. Curriculum learning focused initially on managing the presentation of data, but has since expanded to include models of growing networks, but without connecting with the biology of development. Work on developmental robotics [34, 35, 99,100,101,102] has more in common with the DI framework, with its focus on autonomous learning with a developmental construction of internal models.

The Importance of Inherent Integration

In addition to structural, functional, and adaptive depth, the other major attribute that makes systems with natural intelligence possible is that they are inherently integrated at every level of the deep, multi-scale adaptive process described above. Whether the animal is simple or complex, infant or mature, it always experiences the world as a whole, while DL/ML systems only experience the parts captured by their training data and their task. An animal does not need to “maintain consistency” between its models explicitly, or to merge them post facto; it has only one multimodal, multi-scale model of how the world works in all its complexity, and the model inherently integrates the perceptual and behavioral affordances of the animal. Natural intelligence is always general, even at the simplest level.

A crucial capacity enabled by this integration is pervasive zero-shot generalization. In an integrated world model that is self-consistent across all modalities and experiences, even very novel stimuli can find system-wide resonances leading to sensible inference—especially after the animal has significant experience of its world. All intelligent animals have this instinctive, inherent “common sense.” In contrast, for DL/ML systems, there is no world outside of their training data, which is a very limited simulacrum of the world, so a lot of real-world stimuli seem far out-of-sample. That is a major reason why common sense continues to elude AI so far, and why producing AI with this critical attribute will require an approach based on inherent integration.

The Importance of Embodiment

In cognitive psychology, embodiment is the idea that mental function is not just a product of the brain, but of the brain and the rest of the body as an integrated system embedded within a specific environment [103, 104]. It is a powerful idea that is often seen in opposition to a purely computational view of the mind and purely control theoretic models of movement. Among other things, this is because the embodied agent can generate cognitive states and motor behavior through emergent coordination rather than explicit information processing or signaling [64, 105, 106].

From an AI perspective—and thus within the DI framework—embodiment is especially important because it grounds mental functions in the physical reality experienced by the agent rather than just in a dataset. The critical point is that the embodied agent (animal or AI system) and the environment it is embedded in (the real world) are both integrated systems operating under the same laws, i.e., the laws of physics, and can actually be seen as comprising a single integrated complex system. This means that the agent’s experience is inherently self-consistent, and thus grounded and generalizable. Contrast this with a purely computational (simulated) agent that is not necessarily bound by the laws of physics experiencing a dataset that is, at best, an extremely limited, distorted, and selective view of reality. Expecting such a system to then be intelligent in the real world is unrealistic.

To be sure, advanced intelligence builds abstractions on top of direct sensorimotor experience, and one of the most important questions that should be explored in the DI framework is how, in the course of evolution, the ability to create abstract representations has arisen in animals. There has long been great debate in AI about symbolic processing and compositionality, which has been called “the central problem of AI” [107]. However, once the dualistic view of mind and body [108, 109] is rejected, it is obvious that any symbolic processing emerges necessarily from the physics of the brain-body system—especially the neural networks of the brain [24]. The central question, then, is: How does a brain-like physical system achieve this? Several models have been proposed to address this at the level of artificial neural networks [110,111,112,113,114,115,116], but all of them have serious limitations. There also have been experimental studies to understand how concepts, numbers, words, and other symbolic entities might be represented and composed in the human brain [117,118,119,120,121,122]. However, interpreting these results is complicated by the immense complexity of the systems being considered (e.g., the human brain), resulting in a focus on simplistic tasks and experiments. From a DI perspective, with its focus on evolution, development, and embodiment (in addition to neural learning), the way to understand the mechanisms of abstraction and symbolic processing is to study it first in simpler animals, to understand their simpler underlying neural mechanisms, and then build systematically upon that understanding. For example, experiments have shown that a sense of numerical value and numerical order exists in birds [123], and perhaps even in fish [124]. By understanding the neural basis of this simplest type of abstract processing, it might be possible to understand the more complex kind seen in humans. Similarly, it has been suggested that the abstract, high-dimensional representations of the cortex may be built on the scaffolding of spatial, 2-dimensional concrete representations of place in the hippocampus [125]. Exploring the path from embodiment to abstraction is, therefore, the most principled path to understanding the mechanisms of higher cognition. Doing this will be extremely difficult in practice, but the first step must be to devise new experiments that explore the neural processes underlying any primitive abstract processing capacities in simpler animals, focusing on the underlying neural architectures, modules, and processes. Computational modeling based on the results can then be used to generate further hypotheses in more complex animals, and these can be rejected or validated using new experiments. It would also be useful to apply lessons from purely computational models [110,111,112,113,114,115,116, 126, 127], and from the vast quantitative literature on conceptual representation and processing in humans [120, 128,129,130,131]. A potentially promising, general, and biologically plausible way to understand how the ability for abstract thinking could emerge from evolutionarily more primitive cognitive tasks such as sensorimotor prediction might be to use free-energy and predictive coding-based approaches to mental function [132, 133]. Indeed, recent work has demonstrated how such processing could lead to the emergence of abstractions from a sub-symbolic neural substrate [134,135,136].

The Significance of Modularity

Hierarchical modularity is perhaps the single most important “enabling technology” underlying the emergence of intelligence (and all other attributes of complex living organisms) [42]. Not only does it allow evolution to build systems of arbitrary complexity without encountering catastrophic failure, it is also crucial to the ability of a complex agent to generate useful complex behavior in the real world because it allows behavior to be produced through selection, combination and hierarchical encapsulation of modular primitives rather than explicit construction. This is a principle familiar to engineers at the structural level—most complex design and construction is now done using modules—but biology uses modularity in both structure and function. In cognitive science, the latter has been studied most intensively in the context of motor control. The embodiment of any organism configures coordination modes or synergies throughout the brain-body system, so that neural structures and muscles across several joints are constrained collectively, and global responses can arise without information propagating explicitly through all layers of the deep but slow brain-body system [63, 64]. For example, specific muscles act in an inherently coordinated way because of their connectivity with the central pattern generators (CPGs) of the spinal cord [87,88,89,90,91], and groups of muscles develop prototypical activation patterns called muscle synergies that are used as primitives in the construction of a whole range of complex movements [25, 63, 94, 137,138,139,140]. This means that the actual degrees of freedom available to the system in any specific situation are fewer than the entire combinatorial space of all degrees of freedom—thus addressing the so-called degrees of freedom problem [141, 142]. Essentially, the modules and their configuration predefine a rich but lower-dimensional latent repertoire of behaviors, and a high-level controller—the brain—simply needs to specify the code that unlocks a specific behavior rather than directing the individual muscles in detail [91]. This idea is also implicit in the subsumption model of behavior [143, 144], and the use of motor primitives in robots [145, 146].

While synergies are seen most clearly in motor control, the concept is widely applicable across the entire network-of-networks system, which is why synergies have been termed “the atoms of brain and behavior” [147]. Attractors in recurrent networks are an example of coordination modes and are likely to be widely used across the nervous system [148]. So are the patterns generated by CPGs [88]. It has been suggested that the entire cortex could usefully be seen as a very complex, hierarchical central pattern generator consisting of modules of neural subpopulations forming interacting CPGs [149]. Others have also noted the hierarchical modular organization of the cortex [84, 150], and a general theory of intelligence has recently been proposed that sees cortical columns as information processing modules that represent information and learn by making local predictions [151, 152]. An especially interesting hypothesis is to see intelligence, understanding, and even life in terms of emergent modules within the self-organizing networks comprising an organism [57, 62, 153,154,155]. This is completely consistent with the DI perspective, which sees all mental processes—perception, cognition, memory, and behavior—as the emergence of synergistic activation patterns across networks of sensors, neurons, and musculoskeletal elements. In this sense—and as also implied by [62]—there is no essential difference between “thought” and “action”. It is just that the networks involved in “thought” are networks of neurons, and those involved in action are networks of both neurons and musculoskeletal elements. It is worth noting that coordination modes can also be dynamic—emerging as metastable, context-dependent attractors in multi-scale networks [106, 156,157,158,159,160,161].

Most neural architectures are inherently—though superficially—modular, and the idea of exploiting structural modularity more directly to enable higher-level cognitive processes has been well-explored [5, 23, 85, 86, 162,163,164,165,166]. It is now being applied explicitly to symbolic tasks [47, 167], albeit for specific problems and deriving more from symbolic abstractions than embodied biology. Modular self-organized neural networks have also been proposed as the basis of sensorimotor integration [168, 169].

One place where evolutionary and developmental adaptation in a hierarchically modular agent could be applied fruitfully to AI is in selective attention. It has been shown that the reason humans can learn new reinforcement learning (RL) tasks rapidly is that they abstract the complexity of the given stimulus into lower-dimensional representations through attentional mechanisms [170], but it is not clear how they learn which features to attend to. In the DI framework, evolution would have already provided modular priors that privilege specific feature classes, and developmental learning starting with very simple tasks would have allowed the agent to refine them to learn what types of features are generally useful to attend to in the real world. Thus, the system goes into any specific task with strong generically useful attentional biases, and RL simply needs to select and shape them rather than discovering and learning them from scratch.

Engineering Deep Intelligence

Defining a Feasible Framework

While the DI framework is motivated mainly as a way to understand intelligence better, the goal for AI must be to turn it into an engineering framework. In doing so, the goal would not be to replicate the entire process of animal evolution and development—an impossible task in any case. Rather, the approach would be as follows: a) To operationalize the principles underlying the success of evolution and developmental learning; and b) To incorporate into AI systems the architectures, modules, and processes that underlie intelligence in behaving animals.

Evolutionary and developmental biologists have explicated the principles of evolvability in great detail over the last several decades [20, 44,45,46,47] (as discussed in the section entitled “The Deep Intelligence View”), and more recently under the rubric of evolutionary developmental biology (EvoDevo) [20, 171,172,173]. A DI-based system would use these principles to build sequentially more complex intelligent systems by explicit complexification rather than artificial evolution. Each system would display integrated intelligence at its own level, and become the basis for the next more complex system. At each level, the brain and body architectures, modules, and processes would derive from those observed in animals—albeit with some abstraction, and at a feasible level of detail.

The learning process would begin with a simple, modular system with limited but integrated perception, cognition, and behavior. The system would learn neurally how to exploit its limited capabilities in its limited environment, then add a bit more complexity through modular operators such as duplication, growth, splitting, modularization, etc., creating new sensory, cognitive, and motor modalities emergently as variations or combinations of the prior ones, learning more by building on what has already been learned, and so on, bootstrapping to a full-scale complex system by repeated cycles of alternating complexification and additive learning—remaining integrated all the while. The developmental complexification and learning could be nested within the outer loop of evolutionary complexification, but it would probably be more feasible to structure the process as alternating between architectural and modular complexification (evolution) and functional complexification (development) interleaved with learning. Figure 4 illustrates this process conceptually in comparison with the DL/ML approach.

Fig. 4
figure 4

Conceptual view of how the DL/ML and DI approaches would produce complex natural intelligence. The red arrows for the DL/ML systems indicate explicit integration; the red frame for the DI systems indicates inherent integratedness

An extremely simple version of this approach can be seen in the work of Sims [74, 75] and others, but the explicit use of simulated evolution would need to be replaced by a more scalable framework, one that incorporates evolutionary developmental insights more directly with neural learning. Developing such a framework—even for neural networks alone—is quite non-trivial. At a minimum, it would require defining canonical repertoires of (a) modules; (b) architectures; (c) adaptive mechanisms; (d) developmental operators to complexify modules; and (e) evolutionary operators to grow and reconfigure the system. All of these would be grounded in biology and ranging across the spectrum of vertebrate and arthropod evolution, development, and neural learning, and, as a result, across many spatial and temporal scales, as is the case in biological systems. Of course, all five things would need to be instantiated in computational or physical models. Human AI engineers would focus on designing richer repertoires and generative programs rather than specific large-scale neural architectures and training algorithms. Most importantly, the modules, architectures, and mechanisms of this generative framework would come from those of animal biology rather than abstract formalisms such as Markov decision processes, predicate logic, causal analysis, or even uniformly structured neural networks. The animal may be a kludge produced by evolution’s tinkering over billions of years [174], but it is this kludge that is actually intelligent in ways that human ingenuity still cannot replicate. AI should respect the kludge and stop trying to fit the complexity of Nature’s imagination into simplified, abstract boxes. This does not mean that every molecular detail and every voltage spike has to be accounted for, or that mathematical models cannot be used. Quite the contrary! The goal should be to build better mathematical and computational models that capture more of the essential features of the biology of intelligence at the level that is most appropriate—and, of course, feasible. To do this, it is critical to understand all of the biology and underlying intelligence, not just neuroscience.

Role of Supervised Learning

Given the critique of supervised learning-based approaches laid out in this paper and the ubiquity of such approaches in AI today, it might be asked if supervised learning has a place within the DI framework at all. Clearly, it must, because many complex behaviors and skills can only be learned through supervision and corrective feedback. In humans, this includes things such as learning to play a musical instrument, to do mathematics, or even to use language correctly. The key here is that supervised learning must build upon and exploit the fundamentally integrated and self-organizing nature of the DI system rather than replacing it. In general, supervised learning should be seen as a late-stage mechanism in a system where the DI process has configured—and continues to configure—the primitives that supervised learning needs. This paradigm of self-organizing processes laying the groundwork for more complex, incremental, and careful learning is seen in many parts of the brain. For example, the evolutionary architecture of the early visual system and the self-organization of feature detectors during development provide a general basis for the rapid learning of more detailed skills such as object segmentation, recognition, etc. that may need more corrective feedback. Another example is how muscle synergies [137, 138, 142]—presumably configured through evolution and early development—can then form the primitive basis functions [145, 146] for an ever-growing repertoire of complex movements, many of them, e.g., dance moves, requiring careful supervised learning. The contention is that a system produced by the DI process will, in fact, be more ready to do supervised (and reinforcement) learning across a range and combination of modalities, and will do so much more rapidly, than purely supervised systems, thus coming closer to the ideal seen in animals. This point has been demonstrated in a recent paper from my lab, where rapid unsupervised learning in a simple, hippocampus-like model generated a place field substrate for subsequent one-shot reinforcement learning of goal-directed navigation [175].

To some degree, this approach of using pre-configuration of priors to facilitate supervised learning is already used in a simplistic way when self-supervised restricted Boltzmann machines [3] are used to learn initial features for subsequent supervised learning [176], and in the use of feature transfer to enable rapid learning across tasks [177]. However, this idea needs to be generalized and applied in integrated systems rather than within narrow modalities. Supervised learning in animals is also unlikely to use back-propagation, though that is not necessarily a barrier in artificial systems once the basis system has been configured. In many—perhaps all—cases, more biologically plausible alternatives such as contrastive [176], self-supervised [175, 178], or resonance-based [23, 179] learning as well as free-energy and predictive coding approaches [132,133,134,135,136,137,138,139,140,141,142,143,144,145,146,147,148,149,150,151,152,153,154,155,156] might be sufficient to achieve the same goal when paired with a DI process generating good prior biases.

Conclusion

There is currently a great deal of debate on whether AI is going in the “right direction” with its focus on scaling up deep learning systems [10,11,12,13,14,15,16, 180,181,182]. The debate focuses on such issues as symbolic processing, causal reasoning, compositionality, etc., with some experts suggesting that these things will need to be incorporated by design into the current models. A major aim of this position paper has been to suggest that, since all these capabilities arise naturally in a self-organizing complex system, i.e., the embodied brain, their origins, and mechanisms can be understood by studying them in that natural system rather than coming up with unnatural, biologically implausible engineering methods and symbolic abstractions. Here, it is important to point out that, while the mainstream of AI today is focused on the DL/ML approaches, and expects to achieve general intelligence through that route, it is more likely that such intelligence will emerge from the work in areas such as evolutionary, developmental, and cognitive robotics [30,31,32,33,34,35, 100,101,102], where embodied agents learn complex tasks in a more biologically motivated framework. However, this work is at a very early stage and is still focused on specific functions or modalities, such as morphology, control, imitation learning, language acquisition, vision, etc. A full DI framework would eventually need to apply these methods to inherently integrated systems.

One final point: Natural intelligence will not be achieved as long as the focus of AI is on building systems purely to serve human purposes. This only creates glorified screwdrivers. A system with natural intelligence must be autonomous, have its own—probably unexplainable—purposes, and learn all its life in an open-ended way. Such a system may not be immediately useful and may even be dangerous if it was sufficiently complex, but until such systems are built, AI is just the building of smart tools, not intelligent systems [183].