Keywords

1 Introduction: From Strokes to Conscious Percepts and Back

Whenever an artist manipulates a canvas, say by applying a dab of color to a canvas, he or she immediately experiences a conscious percept of the result. This percept emerges from all the brain machinery whereby we consciously see and know about our visual world. Artists typically have no explicit knowledge about the brain processes that mediate between painterly manipulations and the resulting conscious percepts. Yet despite this intellectual chasm between action and percept, the particular interests and aesthetic sensibilities of different artists have led each of them to emphasize different combinations of these brain processes, and to thereby create their own artistic style. In the hands of a master, the results can be both astonishing and transcendently beautiful.

The corpus of works of art on two-dimensional surfaces, across time and culture, provide an incredible richness of issues that paintings elicit, both scientific and aesthetic. This chapter reviews several of these issues through a discussion of specific paintings by well-known artists that have been chosen to illustrate how different combinations of brain processes were used to achieve their aesthetic goals. Illustrative paintings or painterly theories by nine artists were given a unified analysis in Grossberg and Zajac [44] using neural design principles and mechanisms that have been articulated and computationally characterized by the most advanced neural models of how advanced brains consciously see. This article also summarized, where possible, descriptions of an artist’s stated goals, or summarized reviews of the artist’s work written by art historians, curators, or critics.

The current chapter does not attempt to scientifically explain why a painting looks beautiful, or how it may arouse strong emotions. Such an analysis would require the study of how perceptual, cognitive, and emotional processes interact. Some promising approaches have been described whereby to understand aesthetic emotions by using mathematical models of the mind (e.g., Perlovsky [54]). The current goal is to first try to better understand the brain mechanisms of perception and cognition whereby humans consciously see paintings, and whereby painters have achieved their aesthetic goals. Further studies of beauty and of aesthetic emotions may benefit from the considerable neural modeling literature about the brain processes that create coordinated conscious experiences of seeing, knowing, and feeling (e.g., Grossberg [32, 34]). These more comprehensive theoretical insights would, in any case, need to build upon insights such as those described herein.

In addition, Grossberg [34] summarizes some (but not all!) of the basic brain processes that are needed to understand how we perceive and recognize music.

The current summary will provide comments about the numbered powerpoint slides in the lecture with the same title as the current article that can be found at Online Resource 12 and Online Resource 13.

2 A Step-by-Step Theory of How We See Art and How Artists Make It

Let’s begin by raising the basic question of how various painters struggled to intuitively understand how they see in order to generate desired aesthetic effects in their paintings (Slides 1–3). Answering this question is made possible due to neural modeling work that clarifies what goes on in each brain as it consciously sees, hears, feels, or knows something. In Grossberg [34], I provide a self-contained, non-technical summary of current modeling knowledge about how this happens. The current article focuses only on one aspect of how we consciously see. It also summarizes a claim concerning why evolution was driven to discover conscious states in the first place. This analysis begins with Slide 136 in the Supplementary Materials to this book. It proposes how conscious perception is used to close the loop between perception and action, in this case between manipulating a painting, seeing it, and then manipulating it again.

In brief, the chapter and its Supplementary Materials will explain how multiple processing stages overcome the incompleteness and ambiguities of the raw sensory data that reaches our brains. These sensory data are hopelessly inadequate for triggering effective actions that can enable us to survive in a changing world that is filled with potentially life-threatening challenges. After these processing stages do their work, the result is sufficiently complete, context-sensitive, and stable perceptual representations upon which to base effective actions. In civilized societies, these actions include the strokes that create a painting. The article hereby proposes that evolution discovered conscious states in order to mark, or “light up”, the sufficiently complete, context-sensitive, and stable perceptual representations that can support effective actions, notably feature-category resonances for consciously knowing about objects, and surface-shroud resonances for consciously seeing them and triggering actions based upon them. These resonances will be defined and discussed below.

Slide 5 summarizes some of the painters whose work will be discussed. They include Jo Baer, Banksy, Ross Bleckner, Gene Davis, Charles Hawthorne, Henry Hensche, Henri Matisse, Claude Monet, Jules Olitski, and Frank Stella. These painters were chosen to demonstrate how the paintings of different artists, and even of different artistic movements, can often be easily recognized due to their emphasis on different combinations of brain processes. Works of several other artists, such as Rembrandt, Graham Rust, Georges Seurat, and Sean Williams, will also be briefly mentioned to make specific points.

A reader can rightly ask: How can this kind of insight about paintings be discovered in the first place. In order to understand this, one needs to appreciate how scientists have been discovering and developing brain models of psychological processes, including artistic processes like painting. Slides 6–9 emphasize that, since “brain evolution needs to achieve behavioral success,” neural models that hope to link brain to mind need to discover and model the level of brain processing that governs behavioral success. A half-century of modeling has consistently shown that these are network and system levels, which is why we study neural networks.

In order to complete such a model, individual neurons must be designed and connected in networks whose emergent, or interactive, properties give rise to successful behaviors. Keeping all these levels in mind at once—behavior, network, neuron—requires an appropriate modeling language whereby to link them. Such a mathematical model makes it much simpler to understand how brains give rise to minds, not only by articulating appropriate brain design principles and mechanisms, but also by explaining the emergent properties that they generate when they interact together in response to a rapidly changing world. Unaided intuition cannot, by itself, understand these emergent properties.

Although rigorous mathematical modeling and computational analyses are needed to understand how brains give rise to minds in a way that feels inevitable, it is nonetheless possible to explain the ideas upon which these models are based using simple, self-contained, and intuitively understandable stories. That is what these articles try to illustrate. In so doing, they clarify that perhaps the hardest obstacle to understanding mind and brain is to know how to think about each problem. Once one is on the right path, the technical details can then often follow in a natural way. Finding such paths requires guidance from lots of data.

This perspective argues that, as illustrated in Slides 10 and 11, to deeply understand how brains work, you need to understand how evolution selects brain designs based on their behavioral success. That is why the modeling method and cycle that I have developed with many colleagues over the past 50 years always starts with behavioral data, often scores or even hundreds of experiments in a given area of psychology. Having lots of data to guide one’s thinking helps to rule out incorrect, but initially appealing, ideas.

The Art of Modeling consists in large part of figuring out how to understand behavioral data, which one receives as static curves that plot one variable against another, as interactive, or emergent, properties of individual behaviors as they adapt autonomously in real time to a changing world. For example, one might be trying to understand why the curve that summarizes the number of correct responses at each position in a list after a fixed number of learning trials has the shape that it does, with more correct responses at the beginning and the end of the list than in its middle. This kind of bowing effect occurs during essentially every experience we have when we are trying to remember sequences of events that we have experienced. If you look at these data in the right way, you can see that they embody lots of exciting philosophical paradoxes.

The results of such top-down analyses from behavioral data have always been the discovery of brain design principles that are translated into the simplest possible mathematical models (Slide 11). Then mathematical and computational analyses of these models are used to generate emergent behavioral properties that explain much more behavioral data than went into the hypotheses from which the model was derived. In this way, the modeling loop between behavior-to-design-to-model-to-behavior is closed.

In addition, and of critical importance, is the fact that the mathematical models always look like part of a brain. As a result, despite using no facts about the brain to derive these models, they explain a body of known brain data, as well as predict as yet unreported new brain data. Because this derivation proceeds from behavior-to-design-to-model-to-brain, it often proposes novel functional explanations of both known and unknown brain data.

Once the connection is made between behavior and brain, one can explain and predict lots of behavioral and brain data using the currently derived model. After the explanatory and predictive range of the model in its current form is understood, one can press both top-down from behavioral data, and bottom-up from brain data, to identify an additional design principle that the model does not currently embody. Then this new design principle is consistently added, “embedded”, of “unlumped” into an expanded model, and the cycle begins again, leading to a broader range of interdisciplinary data that can be explained and predicted.

This cycle has been repeated many times during the past 50 years. As a result, we now have models that can individually explain and predict psychological, neuroanatomical, neurophysiological, biophysical, and even biochemical data. In this sense, the classical mind/body problem is incrementally being solved.

After going through this modeling cycle, what is the result? Is the brain just a “bag of tricks” as even famous neuroscientists like my colleague V. S. Ramachandran have claimed in the past (Slide 12)? If that were the case, true theories would be impossible.

Instead, as illustrated in Slide 13, a small number of fundamental equations have sufficed to explain thousands of interdisciplinary experiments, just as in physics. A somewhat larger number of modules, or microcircuits, that are defined using these fundamental equations, are used in specialized forms to compute useful, but not universal, combinations of properties. These modules, in turn, are assembled into modal architectures for carrying out different kinds of biological intelligence. The word “modal” stands for different modalities of intelligence, such as vision, audition, cognition, emotion, and action. None of them computes all possible computable functions in the manner of a modern von Neumann computer. However, each of them is general-purpose within its own modality of intelligence, can respond adaptively to wide range of environmental challenges, and can seamlessly interact with other modal architectures to generate autonomous adaptive intelligence as we know it.

What principles determine how modal architectures are designed (Slide 14)? It is here that the novel computational paradigms, and corresponding design principles that underlie brain computing play a critical role in ensuring that we can autonomously adapt to rapidly changing environments that are filled with unexpected events. Two of these paradigms are called Complementary Computing and Laminar Computing (Slide 15). Together they also imply a third fundamental brain design that I call the Hierarchical Resolution of Uncertainty. It is this latter design that requires multiple processing stages before our brains can compute perceptual representations that are complete, context-sensitive, and stable enough to be used to generate effective actions. It is because only such complete representations can be selectively used to generate effective actions that conscious states “light them up” to use them, and not earlier representations, for this purpose. These are the processing stages that enable a painter to apply paint to a canvas and consciously see and appreciate his or her handiwork.

Complementary Computing asks what is the nature of brain specialization (Slide 18). It provides an alterative to the earlier idea that brains compute using independent modules (Slide 17). There are lots of specialized brain regions in the visual cortex, and at least three parallel cortical processing streams with which to activate them. However, independent modules should compute each property—such as luminance, motion, binocular disparity, color, and texture—independently of the others. In reality, huge perceptual and psychophysical databases show that there are strong interactions between these various perceptual qualities.

Complementary Computing explains how such specialization coexists with, and indeed requires, these interactions by providing a very different answer to the question: What is the nature of brain specialization? Complementary Computing identifies new principles of uncertainty and complementarity that clarify why multiple parallel processing streams exist in the brain, each with multiple processing stages to realize a hierarchical resolution of uncertainty (Slide 19).

There are analogies to computationally complementary properties, such as a key fitting into a lock, and puzzle pieces fitting together (Slide 20), but these analogies do not explain the dynamism that is required to carry out Complementary Computing. In particular, computing one set of properties at a processing stage prevents that stage from computing a complementary set of properties. These complementary parallel processing streams are balanced against one another. This kind of balance is reminiscent of classical ideas about Yin and Yang, but again not explained by them. Instead, prescribed interactions between these streams, at multiple processing levels, overcome their complementary weaknesses and support intelligent and creative behaviors. They do so, in particular, by creating conscious visual states that can be used to guide looking and reaching behaviors, including those used to create and see paintings.

Each row in Slide 21 summarizes a pair of computationally complementary processes and the cortical streams in which they are proposed to occur. This list is not, however, exhaustive of all the complementary processes in our brains (Figs. 1 and 2).

Fig. 1
figure 1

What is a visual boundary or grouping? (Slide 25)

Fig. 2
figure 2

Visual boundary and surface computations are complementary (Slide 26)

When one puts together the first four of them (Slide 22), one is led to an emerging unified theory of visual intelligence, starting at our photosensitive retinas and ending at the prefrontal cortex, or PFC (Slide 23). Each box in the slide functionally describes a basic process that occurs in the corresponding part of the brain, and both the What and Where cortical streams are included. The What, or ventral, cortical stream carries out processes of perception and recognition, whereas the Where, or dorsal, cortical stream carries out processes of spatial representation and action. The modeling work that I and my colleagues have carried out over the years to explain hundreds of interdisciplinary experiments support my hypothesis that the bottom-up, horizontal, and top-down interactions between these various processes help to overcome complementary processing deficiencies that each process would experience if it had to act alone.

Slides 24–26 begin to show what it means for visual boundaries and surfaces to be complementary. Much psychophysical evidence has supported my prediction that 3D boundaries and surfaces are the basic functional units in natural vision. This prediction was first made in Grossberg [25] and was supported by computer simulations of perceptual and psychophysical data in Grossberg and Mingolla [39, 40] and Grossberg and Todorovic [43]. I began to extend it in Grossberg [26, 27] to explanations and simulations of data about 3D vision and figure-ground perception using the Form-And-Color-And-DEpth (FACADE) model of 3D vision and figure-ground separation, and its 3D LAMINART model extension to simulate identified cell types within the laminar circuits of visual cortex. This major research program was carried out with multiple Ph.D. students and postdoctoral fellows, including Rushi Bhatt, Yongqiang Cao, Nicolas Foley, Gregory Francis, Alan Gove, Simon Hong, Piers Howe, Seungwoo Hwang, Frank Kelly, Levin Kuhlmann, Jasmin Leveille, John Marshall, Niall McLoughlin, Steven Olson, Luiz Pessoa, Rajeev Raizada, William Ross, Aaron Seitz, David Somers, Karthik Srinivasan, Guru Swaminathan, Massimiliano Versace, James Williamson, Lonce Wyse, and Arash Yazdanbakhsh. The vision models were complemented by the SACCART, SAC-SPEM, TELOS, and lisTELOS models of the saccadic and smooth pursuit eye movements that occur during visual perception and planning, and invariant object category learning. A parallel but distinct line of work also developed the 3D FORMOTION model of visual motion perception, with its extensions to visually-based navigation and target tracking. See my personal web page sites.bu.edu/steveg for many such archival articles and https://en.wikipedia.org/wiki/Stephen_Grossberg for a list of the names of available models and the areas of biological intelligence to which they contribute.

Visual boundaries are emphatically not just edge detectors. Rather, boundaries can form in response to many different kinds of images and scenes. Boundaries hereby give rise to properties of texture pop-out, 3D shape from texture, figure-ground separation, and visual illusions, among others (Slide 25). This versatility spares our brains from having to use specialized detectors for each of these types of stimuli, only to have to figure out at a later processing stage how to put all the information together. Such specialization cannot, in any case, work in response to natural scenes if only because edges, shading, texture, and figure-ground properties are often overlaid at the same perceptual positions in a scene.

Neon color spreading is one of the visual illusions that provides lots of useful information about the complementary properties of visual boundaries and surfaces (Slide 26). A typical neon-inducing image is constructed of black and blue arcs, where the blue contrast relative to its white background is smaller than that of the black contrast. When these arcs are properly arranged, both boundary completion and surface filling-in of a neon color spreading illusion are caused. The boundary completion generates the illusory square that passes through the positions where the blue and black arcs touch. The surface filling-in causes the square to be filled with a bluish hue.

Three properties of boundary completion and surface filling-in are illustrated by neon color spreading (see the bottom of Slide 26). The first two boundary properties are that boundaries are completed between pairs of inducers in an oriented and inward fashion. If outward completion were possible, then a single dot in an image could cause a radial proliferation of boundaries that could seriously obstruct vision. By comparison, the spread of the blue color through the square is generated by small breaks in the blue boundaries where they touch the more contrastive black boundaries. The blue color can then spread in an unoriented manner outward in all directions until it hits the square illusory boundaries. These boundary and surface properties are manifestly complementary: oriented versus unoriented; inward versus outward.

Where do these boundaries and surfaces form? Slide 27 shows that boundaries are completed within several processing stages of the interblob cortical stream from the lateral geniculate nucleus, or LGN, through V1 interblobs, V2 interstripes, and V4. The surfaces are completed in the parallel blob cortical stream processing stages of the V1 blobs, V2 thin stripes, and V4. These are two of the brain’s computationally complementary processing streams (Fig. 2).

Fig. 3
figure 3

Seeing versus knowing (Slide 29)

What does the third boundary completion property of “insensitive to direction-of-contrast” mean in Fig. 2 (Slide 28)? This has to do with the classical distinction between seeing versus knowing, or seeing versus recognition. For example, in Fig. 3 (Slide 29), the lower left image shows an Ehrenstein Figure that is generated by blue lines pointing toward the center of an imagined disk. One can both see and recognize this disk because its interior is brighter than its background. This brightness difference is a visual illusion that is due to filling in of “brightness buttons” that are generated just beyond each of the line ends, whence this brightness spreads within the illusory circle that is also generated through the line ends.

In contrast, in response to the Offset Grating to the right of the Ehrenstein Figure, a vertical boundary is generated that passes through the line ends of the horizontal blue lines. We can recognize this vertical boundary, but we cannot see it: It is not brighter or darker, or nearer or further, from the rest of the background. This percept shows that one can consciously recognize objects that one cannot see. There are hundreds of such amodal percepts.

One plausible answer to the question “Why do we see?” is that “We see things to recognize them”. However, we can recognize the vertical boundary that is generated by the Offset Grating without seeing it. This is thus a counterexample to the hypothesis that we see things in order to recognize them, because we can recognize this vertical boundary without seeing it. This conclusion does not deny that seeing objects does often help to recognize them, but it shows that there must be a different answer to the question “Why do we see?”

I earlier noted that, due to hierarchical resolution of uncertainty, our brains seem to have created conscious states of seeing so that we can selectively use those perceptual representations upon which to base actions like looking and reaching.

Slide 29 shows that some boundaries are invisible. Slide 30 provides one of several reasons why all boundaries are invisible, at least within the interblob cortical stream that generates boundaries. In particular, consider what happens if you move along the circumference of the gray disk in the right figure of this slide. One passes from gray-to-white, then gray-to-black, then grey-to-white, etc. contrasts all along the circumference. These reversals of relative contrast are often found when an object is seen in front of a textured background.

If our brains only had separate boundaries that compute dark-to-light contrasts (e.g., gray-to-white) or light-to-dark contrasts (e.g., gray-to-black), then each type of boundary would have big holes in it. Brightness and color could spread through these holes during the filling-in process and thereby seriously degrade vision.

Slide 31 shows that boundary computation does begin with oriented local contrast detectors, called simple cells, that individually can respond to either a dark-to-light oriented contrast, or a light-to-dark oriented contrast, but not to both. If boundary processing ended here, then there would be big holes in the resulting boundaries.

Instead, at each position, pairs of like-oriented simple cells that are sensitive to opposite contrast polarities input to cells at the next processing stage that are called complex cells. Each complex cell can respond to both dark-to-light and light-to-dark contrasts at, and close to, its preferred position and orientation. Thus, by the time complex cells respond at the circumference of the gray disk image in Slide 30, they would build a boundary at every position around its circumference.

It is precisely because they pool signals from both polarities—that is, are insensitive to direction-of-contrast—as noted in Slide 32, the complex cells cannot represent visual qualia like differences in relative luminance or color. Said in another way: All boundaries are invisible! We can experience how salient boundaries may be, but strong boundary salience does not imply a visible difference of qualia.

Despite being invisible, boundaries are extremely useful in helping us to recognize objects, especially objects that are partially occluded in a three-dimensional scene, as in Slide 33. The dashed red lines in Slide 34 illustrate where amodal boundaries of partially occluded objects may be created in order to help to recognize these objects. The abutting three rectangles in the right image of Slide 35 gives rise to a compelling 3D percept of a vertical rectangle that is partially occluding, and in front of, a horizontal rectangle. Even though we “know” that the horizontal rectangle is “behind” the vertical rectangle, we do not see it.

This property of figure-ground separation is exploited in all pictorial art, movies, and TV that use a 2D image to generate representations of 3D objects. For example, the face in the famous Mona Lisa painting of Leonardo da Vinci in Slide 35 partially occludes the background of the scene. The occluded collinear background boundaries can nonetheless be amodally completed behind her, at least in the upper part of the painting.

There are several basic reasons why boundary completion and surface filling-in occur. One of these reasons is clarified by inspecting Slide 36, which shows a side view of the interior of an eye. After light passes through the lens of the eye and the retinal fluid that helps to maintain the eye’s shape, it needs to go past the nourishing retinal veins and all the other cell layers in the retina before it hits the photoreceptors. The photoreceptors that are activated by the light then send signals along axons via the optic nerve to the brain.

Slide 37 shows a top-down view of the retina. It includes the fovea, which is the part of the retina that is capable of high acuity vision. Our eye movements focus the fovea upon objects of interest several times each second. There is also a blind spot that is as big as the fovea. Here is where the axons from the photoreceptors are bundled together to form the optic nerve. No light is registered on the blind spot.

Even the simplest objects may be occluded by retinal veins and the blind spot at multiple positions before they can activate the retina. Slide 38 shows how this can happen to even a simple image like a blue line. This state of affairs raises several questions. For one, why do we not see retinal veins and the blind spot? This is true because our eyes rapidly jiggle in their orbits, even when we think that they are not moving. This jiggle generates transient visual signals from objects in the world. These transients refresh the neural responses to these objects. The veins and blind spot do not, however, generate such transients because they move with the eye. They are thus stabilized images. Hence, they fade. You may have noticed in an opthalmologist’s or optometrist’s office your own retinal veins or blind spot when he or she moves a small light alongside your eye in order to examine it. That motion can create transients with respect to the borders of the veins and blind spot and makes them momentarily visible.

Another important question is this: How do we see even images like a line if they can be occluded in multiple positions? Slide 39 shows that boundary completion completes boundaries within occluded regions and surface filling-in spreads colors and brightnesses from surrounding regions to complete the surface percepts of the occluded regions within these boundaries.

The percepts that are generated across the occluded regions are constructed at higher brain regions. Because they are not provided directly by visual inputs to the retinas, they are, mechanistically speaking, visual illusions. On the other hand, we often cannot tell the difference between the regions on the line that receive their signals directly from the retina, and those that have completed boundaries and filled-in colors and brightnesses. Both kinds of regions look equally “real”. This raises the question in Slide 40: What do we call a visual illusion? I believe that we tend to call illusions those combinations of boundary and surface properties that look unfamiliar or unexpected, as in the case of the invisible vertical boundary that is generated by the Offset Grating in Slide 29.

If boundaries are invisible, then how do we consciously see? Slide 41 suggests that we see the results of surface filling-in after boundaries define the compartments within which lightness and color spread. Slide 42 summarizes the fact that the stimulus that generates the percept called the Craik-O’Brien–Cornsweet Effect has the same background luminance, but a less luminous cusp abutting a more luminous cusp in the middle of the image (see the red line labeled stimulus). These two regions are surrounded by a rectangular black frame. The percept is, however, one of two uniform gray regions (see the blue line labeled percept). This percept may be explained by the fact that the boundaries which surround the gray regions restrict filling-into each of them. Then filling-in of the less luminous cusp in the left region leads to the percept of a uniformly darker gray region than does the filling-in of the more luminous cusp in the right region. A more complete explanation, and simulations, of this percept is given in Grossberg and Todorovic [43], as well as of the very different percept that is seen when the black region is replaced by a gray region that matches the gray of the stimulus background. Many other brightness percepts are also explained and simulated within that article.

We can now understand the last computationally complementary property of boundary completion and surface filling-in that is shown at the bottom of Slide 43. As I earlier noted, “insensitive to direct-of-contrast” can also be summarized by the statement that “all boundaries are invisible”. “Sensitive to direction-of-contrast” can be recast as “filling-in of visible color and lightness” since filled-in surfaces are what we can consciously see. Slide 44 can now summarize my prediction from 1984 that all boundaries are invisible in the interblob cortical stream, whereas all visible qualia are surface percepts in the blob cortical stream. I know many confirmatory experiments, but no contradictory ones, to the present time.

3 Toward a Mechanistic Understanding of the Aesthetic Struggles of Various Painters

We can now begin to apply these ideas to provide a better mechanistic understanding of the aesthetic struggles of various painters. Let us start with Henri Matisse. Slide 46 raises the provocative question: Did artists like Matisse know that all boundaries are invisible? Consider his painting, The Roofs of Collioure, from 1905 to understand a sense in which the answer to this question is Yes. Note that Matisse constructed much of this painting using patches of color to suggest surfaces. Slide 47 provides some quotations from Matisse about his life-long struggle to understand “the eternal conflict between drawing and color”. He wrote that “Instead of drawing an outline and filling in the color…I am drawing directly in color”.

The bottom image in this slide illustrates what this means. The color patches in this painting trigger the formation of amodal boundary webs in the cortical boundary stream. These boundary webs are then projected to the cortical surface stream where they organize the painting’s color patches in surfaces. These surface colors are what we see in the painting. By not “drawing an outline” to define these surfaces, Matisse ensured that he did not darken these colors. Generating vivid colors in their paintings was one of the goals of the Fauve artistic movement to which some of Matisse’s paintings contributed (Figs. 4 and 5).

Fig. 4
figure 4

Complimentarity! Many invisible boundaries! (Slide 48)

Fig. 5
figure 5

Continously induced and sparsely induced surfaces (Slide 49)

Thus, as Slide 48 notes, when discussing The Roofs of Collioure with your friends, you can impress them by saying that this painting illustrates Complementary Computing in art because it generates so many invisible boundary representations to define its colorful surfaces.

Another Matisse painting from 1905, the Open Window, Colloure, is illustrated in Slide 49. This painting brilliantly combines surfaces that are created with sparse surface color patches, as well as surfaces that are rendered with continuously applied paint. Both types of surfaces blend together into a single harmonious scene.

Many artists have experienced Matisse’s struggle to be “drawing directly in color”, as noted in Slide 50. Slides 51 and 52 include quotes that summarize the approach to painting by two famous plein air painters who belonged to the Cape Cod school of art, including its founder, Charles Hawthorne, and his most famous student, Henry Hensche. Hawthorne wrote, in part, “Let color make form—do not make form and color it. Forget about drawing…” Hensche expressed his own approach by summarizing the view of the great Impressionist painter, Claude Monet, that “color expressing the light key was the first ingredient in a painting, not drawing…Every form change must be a color change…” Monet himself reduced this perspective to its essentials by writing, as summarized more fully in Slide 53, that “here is a little square of blue, here an oblong of pink…paint it just as it looks to you,…”

Slide 54 further illustrates this perspective using the famous painting Femmes au bord de l’eau of the French pointillist painter, Georges Seurat. Despite the fact that this painting is constructed from little spots, or “points”, of color, it is consciously perceived due to the way in which boundaries complete between regions where feature contrasts change, and colors fill-in within these boundaries to form visible surface percepts. Slides 55 and 56 point out (in blue) that there are both large-scale boundaries that group regions of this painted scene, and small-scale boundaries that surround the individual color patches with which the painting was created. We can see both scales as our attention focuses upon different aspects of the painting.

It is all very well and good to discuss boundary completion and surface filling-in using words and images. But can we really understand these processes well enough to develop rigorous neural models that can process complex scenes? Slides 57–59 illustrate that the answer to this question is emphatically Yes. Indeed, the same brain processes of boundary completion and surface filling-in that enable use to appreciate Impressionist paintings also enable us to process natural images and images that are derived from artificial sensors.

Slides 57 and 58 illustrates this by showing how a Synthetic Aperture Radar, or SAR, image can be transformed by such a neural model into an image that can be easily interpreted by human observers. SAR is the kind of radar that can see through the weather, and is thus very useful in remote sensing and international treaty verification applications where SAR sensors in satellites and other airborne observers can observe activities on the ground even during bad weather conditions. The Input image in the upper left corner of Slide 57 contains five orders of magnitude in the radar return. This huge dynamical range is hard to represent on a powerpoint slide, and much of the image is darkened relative to the sparse, but very high intensity, pixels in it. The Feature image in the upper right corner of Slide 58 results from a process of “discounting the illuminant”, or compensating for variable intensities or gradients of illumination that could otherwise prevent the extraction of information about object form. This process normalizes the Input image without distorting its relative intensities. Despite this normalization process, the resulting images still exhibit its individual pixels, just as in the painting by Seurat.

The Boundary image in the lower left corner of Slide 58 shows the completed boundaries around and between sets of pixels with similar contrasts. Finally, the Feature image fills-in within the Boundary image. The result is the Surface Filling-In image in the lower right corner of Slide 58. One can here see a road that runs diagonally downward from the middle of the top of the image toward its lower right. One can also see individual posts along this road, the highway that runs beneath it, and the trees and shadows that surround the roads. The pixels in the Input image have here been largely replaced by shaded object forms that human observers can understand.

Slide 59 shows that the filled-in surface representation in Slide 58 is the result of processing the Input image using three different spatial scales: small, medium, and large. The small boundary scale detects local image contrasts best, such as the individual posts on the road. The large boundary scale detects more global features, such as the collinear structure of the road. A separate surface network corresponds to each boundary scale, and fills-in surface brightnesses within the completed boundaries at each of these three boundary scales. The final Surface Filling-in image in Slide 58 is a weighted sum of the three Surface Filling-In images in the bottom row of Slide 59.

4 Neural Models of Boundary Completion by Bipole Cells

The next group of slides explains how these processes work in a non-technical way. To this end, Slide 60 asks how our brains compute boundaries inwardly and in an oriented fashion between pairs or greater numbers of approximately collinear inducers with similar orientations?

Slide 61 proposes that the cortical cells which complete boundaries obey a property that I have called the bipole property. This name describes the fact that these cells receive signals from nearby cells via receptive fields that have two branches, or poles, on either side of the cell body. Suppose, for example, that a horizontal edge, as in one of the pac men of a Kanizsa square stimulus, activates such a cortical cell (shown in green). It then sends excitatory signals via long-range horizontal connections (in green) to neighboring cells. These signals do not, however, activate these neighboring cells because inhibitory cells (in red) are also activated by the excitatory signals. These inhibitory cells inhibit the cells that the excitatory cells are trying to excite. The excitatory and inhibitory signals are approximately the same size, so the target cell cannot get activated. It is a case of “one-against-one”.

Slide 62 shows the case in which an entire Kanizsa square is the stimulus. Now there are two pac men that are like-oriented and collinear on each side of the stimulus. Consider the pair of pac men at the top of the figure. Each of them can activate a cell whose long-range excitatory connections try to activate intervening cells. As before, they also activate inhibitory interneurons that try to inhibit these target cells. Why, then, does not the total inhibition cancel the total excitation, as before?

This does not happen because the inhibitory interneurons also inhibit each other (see red connections). This recurrent inhibition converts the network of inhibitory interneurons into a recurrent, or feedback, competitive network. I proved in Grossberg [23] that such a network tends to normalize its total activity. Thus, no matter how many inhibitory interneurons get activated, their total output remains approximately the same. The total inhibition to the target bipole cell thus does not summate like the excitatory signals do as more inhibitory cells are activated. This is thus a case of “two-against-one” so that the bipole cell can get activated if two or more approximately like-oriented and collinear neighboring cells send signals to it. This explains why boundary completion occurs inwardly and in an oriented manner from two or more neighboring cells, as noted in Slide 29. Slide 62 also includes, at its upper right corner, a schematic way to represent the longer-range excitatory (in green) and shorter-range inhibitory (in red) effects on a bipole cell’s firing.

Do bipole cells exist in our brains? I predicted that they do in an article that I published in 1984. That same year, a famous article was published in Science by von der Heydt et al. [66] that provided experimental support for the prediction in cortical area V2; see Slide 27. Slide 63 summarizes key properties of their neurophysiological data. In particular, either direct excitatory inputs to a bipole cell body, or similarly oriented excitatory inputs to both “poles,” or receptive fields, of a bipole cell, are needed to activate it. Moreover, an input to a receptive field is still effective in activating the cell if it is moved around within this receptive field. If, however, only one pole gets activated, then no matter how intensely this is done, the bipole cell does not fire.

Slide 64 shows that additional evidence for this kind of horizontal activation of cells in cortical area V1, which is the cortical area that feeds into V2, and which itself receives inputs from the Lateral Geniculate Nucleus, or LGN; see Slide 27. Both the longer-range excitatory influence (in blue) and the shorter-range inhibitory influence (in red) were found both in psychophysical and neurophysiological experiments by Kapadia et al. [49]. These excitatory effects are, however, of shorter range than they are in V2, and typically modulate, or sensitize, V1 cells to fire more to inputs directly to them, rather than fire them without such direct inputs.

Slide 65 shows some of the anatomical evidence for cells with long-range oriented horizontal connections.

The top left image in Slide 66 shows the oriented bipole cell receptive field that Ennio Mingolla and I used to simulate boundary grouping and completion properties in an article of ours we published in 1985 (Grossberg and Mingolla [39, 40]. The dot at the center of this image represents the position of the bipole cell body. The lines at either side of the cell body represent how strongly the cell body gets activated by inputs to the cell’s two receptive fields. In particular, the length of each line at every position and orientation represents the relative strength of the connection to the bipole cell body in response to an input with that position and orientation. Note that inputs can be received by the cell body from both collinear and nearly collinear positions and orientations, with the most collinear positions and orientations delivering the largest inputs, other things being equal. The upper right image represents psychophysical data of Field et al. [17] that support bipole cell properties. The two images in the bottom row represent the bipole receptive fields that were used in modeling studies by two sets of other authors.

5 Boundary Formation by the Laminar Circuits of Visual Cortex

We are now ready to consider some of the main concepts and mechanisms of Laminar Computing which, as Slide 68 notes, is another new paradigm for understanding how our minds work. Laminar Computing tries to clarify why all neocortical circuits are organized into layers of cells, often six characteristic layers in perceptual and cognitive cortices. Said more directly: What do layers have to do with intelligence?

Slide 69 depicts a simplified diagram of the circuits in cortical layer 2/3 that carry out perceptual grouping using long-range, oriented, horizontal excitatory connections, supplemented by short-range disynaptic inhibitory interneurons, in the manner that I already summarized in Slides 61–66. This slide also summarizes some of the article authors and dates that have supported this conception. Slide 70 asks what happens before layer 2/3. In particular, how do inputs reach the grouping layer 2/3?

Slide 71 provides more information about how the oriented local contrast detectors called simple cells, that were mentioned in Slide 31, do their job. Simple cells are the first cortical stage at which cells fire in response to preferred orientations at their preferred positions and spatial scales. Each simple cell can respond to either an oriented dark-to-light contrast or an oriented light-to-dark contrast, but not both. Slide 72 notes that simple cells are not sufficient, as I already noted when discussing Slide 30. As already noted in Slide 31, Slide 73 reminds us that simple cells of like orientation and position, but opposite contrast polarities, add their output signals at complex cells.

Slide 74 notes that complex cells are also not sufficient because they do not respond adequately at line ends or corners. Indeed, as Slide 75 remarks, multiple processing stages are needed to accomplish another hierarchical resolution of uncertainty. This one compensates for weaknesses in the ability of simple cells to detect oriented contrasts.

Slide 76 illustrates what goes wrong if only simple and complex cells process line ends. At a bar end, these oriented cells can respond at each position, as illustrated by the red lines in the left image. However, they cannot respond at a line end, as illustrated by the gap in the red boundary there. This problem occurs for every choice of simple cell scale. One just needs to choose the width of the line accordingly. Slide 77 asks: Who Cares? Why is this a problem in the first place?

Slide 78 shows that it is, in fact, a very serious problem because color could flow out of every line end during the process of surface filling-in, thereby leaving the scenic representation awash in spurious color.

Slide 79 summarizes the problem that needs to be solved: Somehow the brain needs to create a line end, called an end cut, after the stage where complex cells act. After the end cut forms, color will be contained within the line end. Slide 80 emphasizes that the process which creates end cuts carries out a context-sensitive pattern-to-pattern map, not a pixel-to-pixel map, since it would be impossible, looking just at a pixel with no boundary, to decide if it needs to be part of an end cut, or just left alone because nothing is happening in the scene at that pixel.

Yet another processing stage is needed to carry out this hierarchical resolution of uncertainty. Slide 81 depicts a circuit that contains, in addition to simple and complex cells, a subsequent stage of hypercomplex (or endstopped complex) cells that are capable of generating end cuts. The hypercomplex cells respond in two stages. The first competitive stage is defined by an on-center off-surround, or spatial competition, network. Using this network, each complex cell excites like-oriented hypercomplex cells at its position while inhibiting like-oriented hypercomplex cells at nearby positions. In addition to receiving these excitatory and inhibitory inputs, these hypercomplex cells are also tonically active; that is, they are activated even in the absence of external inputs, due to an internal source of activation.

In the absence of inputs from the first competitive stage, firing of the hypercomplex cells due to their tonic activation is inhibited by the second competitive stage, which is realized by a competition between hypercomplex cells at the same position that are tuned to different orientations. Maximal inhibition is delivered between hypercomplex cells that are preferentially tuned to perpendicular orientations. When all the hypercomplex cells receive only tonic activation, they can inhibit each other equally using this orientational competition.

Slide 82 explains how end cuts are created at the end of a vertical black line on a white background. Near the end of the vertical line, its vertical edges can activate vertical complex cells which, in turn, can activate vertical hypercomplex cells at its position, and inhibit vertical hypercomplex cells at nearby positions, including positions beyond the end of the line. Inhibition of these vertically oriented hypercomplex cells removes their inhibition from other oriented hypercomplex cells at the same positions. The most inhibition is removed from hypercomplex cells that are tuned to perpendicular orientations. When the activities of these cells are disinhibited, their tonic activation can drive them to fire. An end cut can hereby form.

Slide 83 shows the results of a computer simulation of how complex cells (left image) and hypercomplex cells (right image) respond to a line end. The line end is shown in gray in both images. The lengths of the oriented lines are proportional to the responses of the cells at those positions and orientational preferences. The complex cell responses in the left image exhibit strong vertically, and near vertically, oriented responses along the vertical sides of the line. Despite these strong responses along the sides of the line, there are no responses at the bottom of the line. This is due to the elongated shape of oriented simple and complex cells. In the current simulation, the receptive field size is shown by the dark dashed lines.

The hypercomplex cell responses in the right image of Slide 83 show a strong end cut that is perfectly aligned with the bottom of the line end (hyperacuity!) but also generates responses at multiple nearly horizontal orientations (fuzzy orientations). These near-horizontal hypercomplex cell responses result from the near-vertical complex cell responses.

Slides 84–86 illustrate some of the consequences of these end cut properties. In particular, Slide 84 notes that some kinds of printed fonts, such as Times and Times New Roman fonts, build in their own end cuts, in the form of serifs, which are marked in red. Thus, despite the fact that “our brains try to make their own serifs” using end cuts, adding serifs in fonts can facilitate readability. Slide 85 notes that the fuzzy orientations that occur in end cuts allow lines that are not perfectly parallel to nonetheless generate emergent boundaries by cooperation among their end cuts. Finally, Slide 86 notes that the global grouping that forms through line ends may, or may not, go through their preferred perpendicular orientations. In the upper two images, the emergent boundary is perpendicular to all the line ends. In the lower image, it is not. The boundary that ultimately forms is the one has the most support from all the inducers with which it can group.

Slide 87 reminds us that all of these possibilities are due to the fuzzy receptive fields of individual bipole cells. This state of affairs raises the question: Why are not all the groupings that form using fuzzy bipole cells themselves fuzzy, which would cause a significant loss of acuity if it were true? Why, moreover, do bipole cells have such fuzzy receptive fields in the first place?

Slide 88 suggests that a fuzzy band of possible groupings often does form initially (left image), and that this is a good property: If bipole cell receptive fields were too sharply defined, then there would be a close-to-zero probability that a grouping could ever get started. Keep in mind that our brains are made of meat, not silicon. Initial fuzziness is essential to initiate the grouping process using such an imperfect medium. Having gotten a grouping started, then the challenge is to choose the grouping with the most evidence, while suppressing weaker groupings (right image). This is done using another hierarchical resolution of uncertainty.

Slide 89 notes that sharp boundaries emerge from fuzzy bipole cells due to interactions within the larger network of which bipole cells form a part.

The computer simulations that are summarized in Slide 90 illustrate some of the sharp groupings that bipole cells can create in such a network. Images (a), (c), (e), and (g) represent the inputs to such a network. Each line in these images is proportional to the size of the input to a cell centered at the middle of the line and with the vertical orientational preference of the line. Thus, every input is composed of a “bar” of vertical features. The inputs differ only in whether or not the bars are aligned in rows, columns, or both. In (a), only the columns are aligned. In (c), both columns and rows are aligned. In (e), only the rows are aligned. And in (g), the rows are aligned and closer together.

Images (b), (d), (f), and (h) depict the steady-state responses of the bipole cells in this network. In (b), vertical boundaries are created between the bars. In (d), vertical and horizontal boundaries are created. In (f), horizontal boundaries are created. And in (h), both horizontal and diagonal boundaries are created, even though there are no diagonal orientations in the inputs. These simulations illustrate that the network is sensitive to the colinearity and orientations of input inducers, and that sharp boundaries can be completed using fuzzy bipole cell receptive fields. The simulation in (h) also shows how emergent diagonals can be created if there is enough evidence for them in the input inducers, just as they are in response to the bottom display in Slide 86. The rows needed to be brought closer together for this to happen so that they fell within the span of the diagonally oriented bipole cell receptive fields.

Slide 91 includes images that induce percepts which illustrate the properties of the simulations in Slide 90. In response to the upper left image of an E that is composed of smaller A’s, the top horizontal boundary of the E groups diagonal orientations of the A boundaries. The top horizontal boundary of the S emerges from the perpendicular line ends of the H’s, whereas the right vertical boundary of the S emerges from collinear grouping of the right sides of the H’s.

These properties have inspired works of art. Slide 92 shows a typography portrait of Sean Williams in which all the facial features and the hair exploit these properties of boundary completion.

Slides 93–98 show how the processes that have already been reviewed can explain the percept of neon color spreading. Slide 94 depicts a neon color spreading image that is composed of black crosses abutting red crosses. In this image, the contrast of the red crosses with respect to the white background is smaller than the contrast of the black crosses with respect to the white background. In response to this image, one of several percepts can be perceived. One can either perceive red neon color filling local shapes around the individual red crosses, such as diamonds or circles, or one can perceive diagonal streaks of color passing through a collinear array of red crosses.

Slide 95 depicts how neon color can appear to spread beyond a red cross and be contained by the illusory circle that is induced where the black and red regions touch. Let us now see how the first steps in generating a neon percept are caused in the simple-complex-hypercomplex network of Slide 96.

Slide 97 considers what happens where a pair of collinear black and red line ends touch. Vertically oriented complex cells respond along their vertical boundaries. Because the black-to-white contrast is larger than the red-to-white contrast, the complex cells that are along the black line end become more active than those along the red line end. Because of the first competitive stage, the black vertical complex cells inhibit red vertical hypercomplex cells more than conversely near where the two line ends touch. As a result, these red boundaries are inhibited, or at least significantly weakened, thereby causing a hole, or weakening, in them that is called an end gap. Red color can spread outside the red crosses through these end gaps during surface filling-in.

Due to the second competitive stage, the weakening of the red vertical hypercomplex cell activities disinhibits other oriented hypercomplex cells at those positions, especially horizontal hypercomplex cells, thereby creating end cuts, just as in the case of the line end in Slides 82 and 83.

After these end cuts form, the bipole cells that they activate can create an emergent boundary that best interpolates the end cuts, as illustrated by Slide 98. The red color that spreads outside the red crosses is blocked from spreading beyond this circular illusory boundary.

We can now apply these insights to better understand how various paintings look, starting with the paintings of Jo Baer (Slide 99). Slide 100 shows a group of three of Jo Baer’s paintings side-by-side. All of them have a black border. Within this border is a less contrastive border with a specific color: red, green, or blue, from left to right. The percepts show reddish, greenish, and bluish hues spread throughout the intervening canvas. How does this percept happen?

The main effect can be explained by the spatial competition of the first competitive stage (Slide 96), followed by surface filling-in. The black-to-white and black-to-red contrasts are larger than the red-to-white contrasts in the leftmost image. As a result, the red-to-white boundary is weakened, so red color can spread through the interior of the canvas. The same holds true for the green and blue contrasts.

A more vivid version of this effect was developed by Baingio Pinna, who calls it the watercolor illusion [56, 57]. In the image in Slide 101, there are four closed regions in which a dark blue wiggly line abuts a light blue wiggly line, which encloses a white interior region. The percepts within these regions is one of light blue color filling their interiors. This happens for the same reason that the Joe Baer effects do, because the dark blue contrast with respect to both the white background and the light blue contrast, is larger than the light blue contrast with respect to the white background. The effect is made stronger by using corrugated, or wiggly lines, whose surface area relative to the surrounded white interiors is much larger than straight lines would allow, thereby creating many more positions at which light blue color can flow within the weakened boundaries to fill the white interiors.

Slide 102 calls attention to the fact that the bluish regions also seem to bulge slightly in front of the white backgrounds that surround them. This may be explained as a special case of how cells with multiple receptive field sizes, or spatial scales, influence how we see objects in depth. Slide 103 shows more examples of this using shaded images that create compelling percepts of objects in depth. These techniques are called chiaroscuro and trompe l’oeil. Slide 104 notes that similar effects make many shaded and textured objects in 2D pictures appear to have a 3D rounded shape. I will now explain how responses of receptive fields with multiple sizes can create form-sensitive webs of boundaries that control filling-in of surfaces at multiple depths, thereby leading to these rounded percepts.

Slide 105 describes one factor that helps to explain how this happens. As an object approaches an observer, it gets bigger on the retina. As a result, other things being equal, a larger retinal image is closer. Slide 106 notes that smaller scales can respond better to small scales, whereas larger scales can respond better to larger scales so that, other things being equal, bigger scales can be associated with nearer depths during years of experience with perception-action cycles.

A big image on the retina is not, however, always due to a nearer object. For example, a very large object far away, and a smaller object nearby, can both generate retinal images of the same size. Both retinal image size and depth from an observer need to work together to disambiguate these different situations. How this “size-disparity correlation” generates more informative depth percepts is explained in Grossberg [27, 28].

Slides 107–113 describe some of the processes that enable an object like a shaded ellipse in a 2D picture to generate a compelling percept of a 3D ellipsoid. Slide 107 notes that, if boundaries were just edge detectors, there would be just a bounding edge of the ellipse (shown in red). Slide 108 shows how the ellipse would then look after filling-in occurs. It would have a uniform gray color after filling-in within the bounding edge, and would look flat. We know, however, from Slide 71 that simple cells are oriented local contrast detectors, not just edge detectors.

Slide 109 notes that, because of the way that simple cells respond to shaded images, different size detectors generate dense form-sensitive boundaries, that I have called “boundary webs” for short, at different positions and depths along the shading gradient. Slides 110–112 show that increasingly large receptive fields are sensitive to broader bands of shading, starting from the bounding edge and working toward the ellipse interior. Other things being equal, the small scales signal “far”, larger scales signal “nearer”, and the biggest scales signal “nearest”, other things being equal.

Fig. 6
figure 6

Equiluminant light creates less depth in the painting (Slide 118)

Fig. 7
figure 7

Strongly non-uniform light creates more depth in the painting (Slide 119)

Fig. 8
figure 8

T-junctions where vertical boundaries occlude horizontal boundaries, or conversely, lead to more depth in the painting (Slide 120)

As noted in Slide 113, the boundary web corresponding to each scale captures the gray shading in the small form-sensitive boundary compartments that it projects to the surface stream, where it regulates how the gray color will fill-in within that scale. We see this pattern of shading as it is distributed across all the scales. Because different scales tend to be associated with different depths, we perceive a shaded percept in depth.

This view of how 3D shape percepts are generated is supported by many computer simulations of human data about visual perception. In particular, it has succeeded in quantitatively simulating psychophysical data about human judgments of depth in shape-from-texture experiments. In Slide 114, although the 2D images of all of the five disks are composed of spatially discrete black shapes on a white disk, the ones to the left appear to have a rounded shape in depth, whereas those to the right appear to be increasingly flat. These percepts were quantitatively simulated using multiple-scale boundary webs and the multiple-scale filled-in surface representations that they induce.

Coming back in Slide 115 to the watercolor illusion, we can now explain its bulge in depth as a consequence of a multiple-scale boundary web, albeit one that is generated by just a few abutting wiggly lines of decreasing contrast. The chiaroscuro and trompe l’oeil images in Slide 116 also generate multiple-scale boundary webs but use gradual changes in contrast to induce them, so that more scales can be involved, leading to more gradual and vivid perceived changes in depth.

Slides 117–120 propose why the famous paintings by Claude Monet of the Rouen cathedral at different times of day lead to different conscious percepts. In Fig. 6 (Slide 118), the cathedral was painted at sunset when lighting was almost equiluminant across most of the pointing. As a result, color, rather than luminance, differences defined most of the boundaries, which were correspondingly weakened. Fine architectural details were not represented, so that coarser and spatially more uniform boundary webs were created, thereby leading to less perceived depth in the painting.

Figure 7 (Slide 119), in contrast, shows the cathedral in full sunlight that is very non-uniform across the painting, thereby creating strong boundaries due to both luminance and color differences. Due to the increased amount of detail, the boundary webs that form are finer and more non-uniform, leading to a more depthful percept.

Figure 8 (Slide 120) emphasizes another consequence of full sunlight by marking some of the T-junctions that are now clearly visible in the painting, leading to additional cues to perceiving relative depth, as in the percept of a partially occluded rectangle shown in red in this slide, and further discussed in Slides 34 and 35.

Let us now consider how these same mechanisms help to explain how quite different combinations of painterly properties are perceived. Let us start with the color field paintings of Jules Olitski (Slide 121). Slide 122 summarizes four of these “spray” paintings, so called because of the method that was used to create them. Slide 123 contrasts the percepts created by these spray paintings with those of Monet and other Impressionists. In the spray paintings, there are no discrete colored units (or at least very few), and no structured color or luminance gradients. Instead, diffuse boundary webs are spread over the entire surface. When they fill in, the resulting surface percepts are of a space filled with a colored fog and a sense of ambiguous depth. The quote of Olitski at the bottom of Slide 123 summaries his intention to create this kind of effect.

Quite different percepts are seen in paintings of Ross Bleckner (Slide 124). Slide 125 refers the reader to some of his paintings that create self-luminous effects. To explain self-luminous percepts requires a deeper analysis of how we see surface color and brightness. Slide 126 claims that at least two different processes can create these effects: Boundary web gradients and lightness anchoring.

Slide 127 presents some examples of how a picture can seem to glow if boundary web gradients exist; that is, if the shading that creates boundary webs varies systematically across space, from darker to lighter. Because the stronger boundaries can inhibit the weaker boundaries more than conversely, brightness can spread out of the inhibited weaker boundaries into regions where it can be trapped. The four images in the upper left corner illustrate how this brightness is trapped within the interior square of the images.

These four images, working from left to right in the top row, and then from left to right in the bottom row, have increasingly steep boundary web gradients. The steepest gradients enable stronger boundaries to more completely inhibit the weaker boundaries near to them, allowing more brightness to flow beyond them. This brightness summates in the interior square, thereby creating an increasing bright result that, in the final square, appears self-luminous.

The right column of Slide 127 shows a similar effect in its top row with the example of the double brilliant illusion. The rows beneath that summarize computer simulations using the Anchored Filling-In Lightness Model (aFILM) that I developed with my Ph.D. student, Simon Hong [35]. More will be said about aFILM in the next few slides, since it can explain the brightening effects due to boundary web gradients, as well as those due to lightness anchoring.

A remarkable percept is shown in the left pair of images in the bottom row, where two vases are shown side by side. The rightmost vase looked matte, or dull. A highlight was manually attached to this dull vase to create the vase in the left image. Now the entire vase looks glossy! This can be explained by the fact that the highlight includes luminance gradients that match the shape of the surrounding vase. The\boundary web of the highlight can thus be assimilated into the boundary web of the rest of the vase, thereby allowing brightness to spread from the highlight across the vase. Beck and Prazdny [2], who reported this percept, also rotated the highlight and removed its luminance gradients. Both effects prevented the rest of the vase from looking glossy, as would be expected from the above explanation because the brightness could then not flow into other shape-sensitive boundary webs of the vase.

Slide 128 asks what is lightness anchoring, while Slide 129 furthermore notes that we have thus far only considered how discounting the illuminant preserves the relative activities of luminance values, without saturating, as they are converted into perceived brightnesses. The phenomenon of lightness anchoring shows that more is going on when we perceive brightness.

Lightness anchoring additionally raises an issue that is summarized in Slide 129; namely, how is the full dynamic range of a cell used, not just its relative activities? Another way of saying this is to ask: How do our brains compute what is perceived to be white in a scene?

Slide 130 summarizes one hypothesis about how white is perceived. The great American psychologist, Hans Wallach, suggested that the highest luminance in a scene is perceived as white, the so-called HLAW rule. Slide 131 shows that this rule sometimes works, as in the top row of images. However, the bottom row of images shows that, if there is a very intense light source in a scene, renormalizing it to make the light source white can drive the rest of the scene into darkness.

My Ph.D. student, Simon Hong, and I realized that if one, instead, computes the blurred highest luminance as white (BHLAW), then that problem can be avoided, as shown by the computer simulations in Slide 132.

Slides 133 and 134 illustrate how the BHLAW rule works. Slide 133 shows a cross-section of a luminance profile in green, and the spatial kernel that defines the BHLAW rule in red. In this situation, the width of the luminance step is considerably narrower than that of the blurring kernel. As a result, when this scene is anchored to make the blurred highest luminance white, the maximal brightness of the step is more intense than white. It therefore appears to be self-luminous.

In contrast, if as shown in Slide 134, if the luminance step in a scene is at least as wide as the blurring kernel, then when the scene is anchored to make the blurred highest luminance white, the entire luminance of the step is seen as white.

Returning now to look at at the two examples of Bleckner’s paintings in Slide 135, we can see that the small bright regions look self-luminous because of lightness anchoring, whereas larger spatial luminance gradients look self-luminous due to the escape of brightness from graded boundary webs.

6 How Do We Consciously See a Painting?

None of the above results would make much sense if we could not consciously see objects in the world, including paintings. Fortunately, there has been considerable progress during the past 40 years to incrementally understand both how and why, from a deep computational perspective, we become conscious. Slide 138 summarizes a definition of the Hard Problem of Consciousness that expresses these issues. Readers who want to study more details about the Hard Problem than I will summarize here are invited to read my non-technical article Grossberg [34] about this topic that I published Open Access and also put on my web page sites.bu.edu/steveg. In particular, Slide 138 asks why any physical state is conscious rather than unconscious, and why conscious mental states “light up” in an observer’s brain. Slides 139–141 summarize my hypothesis that our brains “light up” to embody a conscious state when they go into a resonant state. Slide 142 additionally proposes that “all conscious states are resonant states”. As Slide 143 notes, not all brain dynamics are resonant, so consciousness is not just a “whir of information processing.

Slide 144 provides a non-technical definition of what a resonant state is. Namely, a resonant state is a dynamical state during which neuronal firings across a brain network are amplified and synchronized when they interact via reciprocal excitatory feedback signals during a matching process that occurs between bottom-up and top-down pathways.

Slide 145 summarizes my central claim that conscious states are part of adaptively behavioral capabilities that help us to adapt to a changing world. Conscious seeing, hearing, and reaching help to ensure effective actions of one kind or another. In particular, conscious seeing helps to ensure effective looking and reaching, conscious hearing helps to ensure effective communication and speaking, and conscious feeling helps to ensure effective goal-oriented action. This lecture does not describe the brain machinery that clarifies why evolution may have been driven to discover conscious states. Grossberg [34] does attempt to do this.

In brief, that article argues that evolution was driven to discover conscious states in order to use them to mark perceptual and cognitive representations that are complete, context-sensitive, and stable enough to control effective actions. This link between seeing, knowing, consciousness, and action arises from the fact that our brains use design principles such as complementary computing, hierarchical resolution of uncertainty, and adaptive resonance. In particular, hierarchical resolution of uncertainty shows that multiple processing stages are needed to generate a sufficiently complete, context-sensitive, and stable representation upon which to base a successful action. Using earlier stages of processing could trigger actions that lead to disastrous consequences. Conscious states “light up” the processing stages that compute representations that can control effective actions.

Slides 37–39 already illustrated this problem in the case of visual perception. How, for example, can you look at a part of a scene that is occluded by the blind spot? As summarized in Slide 39, processes like boundary completion and surface filling-in at higher processing stages are needed to overcome these occlusions. Boundary completion and surface filling-in are examples of hierarchical resolution of uncertainty. After a sufficiently complete surface representation is generated, a resonance develops that marks this representation as an adequate one upon which to base looking and reaching.

Slide 146 focuses on this question for the case of seeing and reaching. Slide 147 asks: What is this resonance? It proposes that a surface-shroud resonance “lights up” surface representations that are proposed to occur in prestriate visual cortical area V4. Surface-shroud resonances are predicted to occur between V4 and the posterior parietal cortex, or PPC, where a form-fitting distribution of spatial attention occurs in response to an active surface representation, and begins to resonate with it in the manner that I will explain in Slides 154–157.

Slide 148 proposes that, just as a surface-shroud resonance supports conscious seeing of visual qualia, a feature-category resonance supports conscious recognition of, or knowing about, visual objects and scenes.

How are feature-category resonances formed? Slides 149–153 briefly describe how feature-category resonances are generated using mechanisms and circuits of Adaptive Resonance Theory, or ART. As summarized in Slides 149 and 150, ART models how we learn to attend, recognize, and predict objects and events in a changing world, without being forced to forget things that we already know just as quickly. In other words, ART proposes a detailed mechanistic solution of the brain processes whereby our brains solve the stability-plasticity dilemma that is summarized in Slide 149; namely, how can we learn quickly without being forced to forget just as quickly? I am glad to be able to write that ART is currently the most advanced cognitive and neural theory, with the broadest explanatory and predictive range, about how our brains learn to attend, recognize, and predict objects and events in a changing world. These predictive successes include psychological and neurobiological experiments that have supported all of the main ART predictions.

ARTs explanatory range has also enabled it to shed mechanistic insight on how brain mechanisms may become imbalanced to generate mental symptoms of mental disorders that afflict millions of individuals, including Alzheimer’s disease, autism, Fragile X syndrome, schizophrenia, ADHD, visual and auditory neglect, medial temporal amnesia, and problems with slow wave sleep ([19, 29, 33, 34, 36, 42]).

In addition to applications of ART to clarify properties of mental diseases, it has been used in many large-scale applications to engineering and technology that need these properties. Some of these applications are listed in Slide 151, including the use of ART by the Boeing company in a parts design retrieval system that was used to design the Boeing 777.

ART can be used with confidence because its properties of learning, recognition, and prediction have been mathematically proved and demonstrated through extensive computer simulations on benchmark problems in a series of articles with Gail Carpenter during the 1980s and 1990s (e.g., Carpenter [5, 6]; Carpenter et al. [714]), including the property that it solves the stability-plasticity dilemma, which is also often called the problem of catastrophic forgetting. Most learning algorithms do experience catastrophic forgetting, including the currently popular Deep Learning algorithm. During learning by such an algorithm, an unpredictable part of previously learned memories can suddenly collapse. In other words, learning in these algorithms is unreliable.

Their learning is also often inexplicable. One cannot verify that even correct predictions have been made for sensible reasons. This is a serious drawback when considering whether to depend upon them for life and death decisions, such as medical decisions. In contrast, the adaptive weights of ART algorithms such as Fuzzy ARTMAP [9] can, at any stage of learning, be represented as Fuzzy IF-THEN rules which provide a transparent explanation of how the algorithm is making its decisions.

How does ART manage to achieve these useful properties. Intuitively, it is because ART models learn expectations about the world that focus attention upon the combinations of features that it expects to be useful. But why do we learn expectations and pay attention? Why are we intentional and attentional beings? Slide 152 notes that top-down attentive feedback encodes learned expectations that dynamically stabilize learning and memory. In other words, learned expectations and attention help us to solve the stability-plasticity dilemma! ART models the neural networks that embody how top-down expectations are learned, and how they enable us to focus our attention upon information that is expected from past experience to be informative.

Feature-category resonances are part of this stability-plasticity expectation-attention story. Slide 153 summarizes how a feature-category resonance develops between an attended pattern of features, called a critical feature pattern (depicted in light green), and an active recognition category at the next processing stage. The reciprocal bottom-up and top-down excitatory signals synchronize, amplify, and prolong cell activations. During such a resonance, the adaptive weights, or LTM traces, in both the bottom-up adaptive filters and the top-down expectations can learn to selectively fire the active critical feature pattern and category when a similar input pattern is experienced in the future. It is because such a resonance triggers learning that I have called the theory Adaptive Resonance Theory.

Feature-category resonances help to support conscious recognition of visual objects and scenes, but they do not directly support conscious “seeing”. Slides 154–158 provide some basic information about the surface-shroud resonances that do support conscious seeing. But first, what is an attentional shroud? Slide 155 notes that an attentional shroud is a surface-fitting distribution of spatial attention. Several excellent visual experimentalists had earlier noted that spatial attention tends to fit itself to surfaces that are attended. I predicted, in addition, how such a shroud enables learning of view-invariant object categories [19]. A view-invariant object category is a recognition category that can be activated by any view of an observed familiar object. I showed how shrouds support learning of such invariant categories by controlling how the cells that will become invariant categories can remain active as our eyes explore its various views to drive the category learning process. This insight was later generalized to explain how view-, position-, and size-invariant categories are learned ([1416, 18, 38]). How this learning process is proposed to happen is reviewed in Grossberg [34]. Some of the archival articles that preceded this review were written with various Ph.D. students, postdoctoral fellows, and other faculty. They are listed in Slide 155. Here I focus on related issues.

Slide 156 illustrates a one-dimensional cross-section of a simple scene in which two luminous bars occur, the left one a little more luminous than the right one. Both bars send topographic bottom-up excitatory signals to the spatial attention region, where they trigger a widespread spatial competition for attention.

In addition, as Slide 157 summarizes, the activated spatial attention cells send topographic top-down excitatory signals back to the surfaces that activated them. The totality of these interactions defines a recurrent, or feedback, on-center off-surround network whose cells obey the membrane equations of neurophysiology, also called shunting interactions. I mathematically proved in Grossberg [23]—see also the review in Grossberg [24]—how such a network can contrast-enhance the attentional activities that focus upon the more luminous bar while also inhibiting the attention focused on the less luminous one. Because such a network tends to normalize the total activity across the network, increasing attention to one bar automatically diminishes the attention that is paid to the other bar.

The net effect of these recurrent interactions is a surface-shroud resonance. Due to the top-down excitatory signals, the attended surface appears to have greater contrast, a property that has been reported both psychophysically and neurophysiologically.

Slide 158 summarizes the claim that an active surface-shroud resonance means that sustained spatial attention focuses on the object surface. The recurrent interactions sustain the attentional focus.

Slide 159 summarizes the critical claim that, in addition to its role in sustaining spatial attention on an object, a surface-shroud resonance supports conscious seeing of the attended object, in particular, a painting, while our eyes explore it. The talk does not summarize the large amount of psychological and neurobiological data that are consistent with this claim, but my article Grossberg [34] does do this.

Slide 160 summarizes the distinct resonances that support knowing versus seeing. A surface-shroud resonance, with the shroud in posterior parietal cortex (PPC), supports conscious seeing, whereas a feature-category resonance, with the category in inferotemporal cortex (IT), supports knowing. We can know about a familiar object when we see it because both resonances can synchronize their activities via shared circuits in prestriate visual cortical areas such as V2 and V4.

This distinction also enables us to understand various clinical data. For example, Slide 161 notes that, if the knowing resonance is damaged, then patients with visual agnosia can nonetheless accurately reach toward an object even if they cannot describe the orientation or other properties in space of the object that they are reaching. This example dramatizes the claim that seeing supports reaching, even if knowing does not occur.

Slide 162 emphasizes dual, but coordinated, functions of PPC in doing this. First, there is the top-down attention from PPC to V4 that focuses sustained spatial attention upon an object as part of a surface-shroud resonance. In addition, there is a bottom-up command from this attentive focus to motor control networks further downstream that carries out an intention to move to the attended object. Attention and intention are well-known to both be parietal cortical functions, and some of the articles that have contributed to this insight are listed. The theory clarifies why this so from the perspective of explaining how and why we become conscious of visual qualia.

My final Slide 163 summarizes some of the brain designs that this lecture has used to explain properties of how we consciously see and know things, and how these processes help to guide artists in making visual art. These designs clarify that our brains compute very differently than traditional computers, and from the currently popular algorithm in machine learning and AI called Deep Learning. Adaptive Resonance Theory has also been used in machine learning and AI applications, as Slide 151 has illustrated. ART can thus shed light upon the artistic process as well as provide algorithms for large-scale applications in engineering and technology that require autonomous adaptive intelligence in response to rapidly changing environments that may be filled with unexpected events. As I have already noted above, ART has also been used to provide mechanistic neural explanations of mental disorders that afflict millions of individuals, such as Alzheimer’s disease, autism, Fragile X syndrome, schizophrenia, ADHD, visual and auditory neglect, medial temporal amnesia, and problems with slow wave sleep. How ART contributes to such an understanding is explained in a series of articles with several collaborators [19, 29, 33, 34, 36, 42]. Deep Learning cannot do any of these things. I therefore welcome artists, as well as scientists and technologists, to further study ART and to help develop its ability to provide new insights and applications in all of these fields.