Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

The study of perception has been a major part of my research work, and in this review I will mainly talk about perception. I will not extend this into thinking and the neural basis of thinking, because the phenomena related to perception are often confusing enough.

I will show that by doing neurophysiological experiments, and subsequently imaging experiments, one can find some very interesting correlations between neural activities of some sort - there are different types of neural activities - and the perceptual phenomena we want to investigate. However, I will also raise the question as to whether this is really bringing us even a single step forward. For the last years, I have been more and more convinced that these correlations may lead us into a dead end and that what we need are better theories and not necessarily more data.

In this article I will mainly report about results of single-cell recordings of neural activities (spike potentials) during the performance of binocular rivalry tasks. These measurements have been made in different visual areas of the cortex of monkeys. One of the conclusions that ensues from this is that there is no such thing as a single neuron or even a single area in the brain which is responsible for the interesting suppression effects in the context of binocular rivalry. We will not make significant progress unless we have a better understanding of the activities of neural networks and their interaction.Footnote 1

2 Binocular Rivalry Experiments

We know many cases where a physical stimulus is far beyond the threshold and could be perceived very well, and yet it permanently leaves and re-enters our perception. In order to get a better understanding of what is happening in different visual areas, we conducted animal experiments (Logothetis and Schall 1989; Leopold and Logothetis 1996; Sheinberg and Logothetis 1997). Furthermore, we used the paradigm of binocular rivalry which, in general, allows a better control of the experimental situation. In this case, two patterns that are obviously very different were presented (Fig. 1). For very interesting physiological reasons these patterns never fuse into some kind of transparency. It seems that already very early in the visual system there are certain assumptions about the world that are literally instantiated in the connectivity and the interactions in the visual areas, and because logically one cannot have two objects at the same time in one place, the system does not permit any kind of perception of transparency here. We perceive the two images in alternation.

Fig. 1
figure 1_11

Binocular rivalry. In a simultaneous presentation of two different types of stimuli, only one of them is perceived consciously while the other is suppressed. After a few seconds the conscious perception switches and the other image is suppressed

During the last years we investigated the problem of what happens in the brain when the perception of a stimulus is suppressed. Does it entirely disappear, leaving no representation in the visual areas? Or are there active representations of that particular perception, which do not reach the level of activity that corresponds to conscious perception?

In many experiments, we did not use the free running rivalry which I just described but we presented one of the objects to one of the eyes first (the so-called dichoptic presentation), and 500-600 ms later we presented the other object to the other eye. The advantage of this dichoptic set-up is that, after the onset of the second object, one is guaranteed to perceive this object. The so-called flash suppression of the first object is aligned with the onset of the second stimulus. This is much more convenient for studying the changes in neural activity (Sheinberg and Logothetis 1997; Wolfe 1984). As far as the physiological mechanisms are concerned, the two situations (the free running rivalry and the rivalry with triggered flash suppression) are interchangeable.

The kind of perception in binocular rivalry is similar (but not identical) to the satiation experiments in the context of bistable perception that psychologists did for many years. An example is shown in Fig. 2. In the ambiguous version one sees a circle (that is perceived as an ellipse) together with a bar. In roughly half of the cases one perceives the bar as going from left to right into the circle and in the other half the bar seems to be going from right to left. If, however, one is primed for a few seconds with the unambiguous version of the image first (Fig. 2b), and then one is presented the ambiguous version, one sees the bar going into the opposite direction.

Fig. 2
figure 2_11

Ambiguous perception. In the ambiguous figure (a), we sometimes perceive the bar as going from right to left through the circle and sometimes the other way round. If we are primed for a few seconds with the unambiguous version (b), we will perceive the bar going into the opposite direction when the ambiguous version is presented

These are some of the phenomena for which we studied the physiological activities. For our experiments we used trained monkeys. I will not describe how these monkeys are trained (see e.g., Logothetis 1999), but you can be absolutely sure that the monkeys are indeed reporting what a human would report. To train these animals appropriately is a whole science in its own right. We are convinced that when these monkeys report perceptual changes, these are really perceptual changes, just like those experienced by human observes. They do not randomly hit the lever. This is clear from the temporal characteristics of the perceptual alternations as well as from the almost identical psychometric functions of humans and monkeys in experiments, in which stimulus strength is varied and its effect on alternation rate is examined.

Naturally correct behavior follows extensive training and the utilization of many different clever tricks that can be used as telltale signs of the animal’s perception. Using all these behavioral tricks, we trained the animals to pull and hold one lever when they see one pattern and to pull and hold the other lever when they see the other pattern. We also trained them to refrain from holding or pushing levers if they see a mixture of the two, which happens for 200-500 ms during the transition time, because we wanted to be as sensible as possible to the changes in the cell activity.

3 Extracellular Recording in the Visual Cortex

We recorded from the visual cortex, which for monkeys is almost identical to the human cortex (Fig. 3, left). The information starts in the retina. Here one has a beautiful map of the stimulus, and it proceeds to the geniculate body where one finds another map which is isomorphic to the first one in a precise mathematical sense. Then there is the optical radiation that goes back to the visual area V1. The main difference in the visual cortex between humans and monkeys is that the monkey fovea is projected more to the front compared to humans. The growth of the frontal lobe during the evolution of the human brain pushed some of the visual association in the primary visual cortex backwards. These relations can also be shown nicely in MRI-images (Fig. 3, right).

Fig. 3
figure 3_11

(left) The optical path in the brain of a monkey. From the retina the optical nerves go to the left and right geniculate nuclei. The optical radiation goes back to V1. In the human cortex, the fovea projects further back due to the growth of the frontal lobe. (right) MRI-image of the brain of a monkey. The two geniculate bodies of the thalamus light up as well as the areas of the primary visual cortex (V1)

However, the primary visual cortex, or V1, is just the beginning of the story. In Fig. 4a, one recognizes different colors and different names which correspond to the different visual areas. In humans and monkeys about 40-45% of the neocortex is visual. Obviously, a lot of cortical machinery is devoted just to vision. By digitizing the brain and unfolding the image one can generate maps like the one in Fig. 4b which shows the positions of the different areas (Lewis and van Essen 2000). It is possible to tell exactly from which of these areas one is recording.

Fig. 4
figure 4_11

(a) The different areas in the visual cortex. (b) By digitizing and unfolding this image one obtains very precise maps of the different visual areas

There are many recordings from the different visual areas during the execution of the tasks I mentioned in the beginning. It is well known that a lot of cells fire even if the subject is under anesthesia or sleeping. We have long known that we are mostly unaware of the activity in the brain that maintains the body in a stable state - one of its evolutionarily most ancient tasks. Our experiments show we are also unaware of much of the neural activity that generates our conscious experiences. The surprising result from electrophysiology is that there are many neurons that continue to be stimulus-selective in conditions in which we have no conscious experiences. The following questions are thus reasonable: Are there active neurons, that determine whether you see an object or not? Are the active neurons everywhere or are they concentrated in one area? Is this area controlling everything?

Our laboratory mostly made extracellular recordings. This is important, because it is essentially the methodology that I will put under criticism.

When a stimulus - basically a change in the concentration of neurotransmitters - enters a neuron it causes a depolarization of the membrane. An electrode outside but close to the neuron senses a negativity, because the positive ions flow into the neuron. Due to Poisson’s law, these currents run around in loops and, therefore, there will be a positivity far away in the non-activated area. These loop currents generate dipole fields which lead to a voltage difference in the order of a few hundred microvolts that can be measured very precisely. If there were only one neuron, we could deduce the cell activity quite well by measuring these voltage differences. However, most of us happen to have more than one neuron which makes things very complicated because different dipole fields can influence and even annihilate each other.

Basically, the fields depend on the geometry of the cell arrangement. However, the geometry of the cortex, as well as the geometry of the hippocampus, the cerebellum and other areas, helps us very much. As one can see in Fig. 5 (left), the fascicles of the apical dendrites run together from deeper layers to upper layers in very strong bundles (Peters and Kathleen 1994). These bundles generate a so-called open field where one charge is in one location and the other type of charge is in another location (Fig. 5, right). This can be measured and with the appropriate electrode one finds very strong sum potentials (Logothetis et al. 2007).

Fig. 5
figure 5_11

Bundles of fossicles of apical dendrites generate the so-called open field

If the electrode is very close to a neuron one can actually measure something that is called extracellular spike- or extracellular action potential. A simple mathematical transformation of this spike potential allows the precise determination of the intracellular action potential that has been measured long ago by biophysicists. This kind of recording is what people, including us, have been doing for many years, basically ignoring all other information - unfortunately.

The electric signal is characterized by time-varying spatial distributions of action potentials superimposed on relatively slow varying field potentials. (Fig. 6a). This signal is turned into a binary function by detecting the spikes and setting the function to 1 at their time of occurrence; otherwise to zero (Fig. 6b). One then generates a representation of the instantaneous rate by counting the number of spikes produced within a time bin of the order of 20, 50 or 100 ms (Fig. 6c).

Fig. 6
figure 6_11

(a) Recorded voltage variation as a function of time, (b) binary spike function, (c) spike rate

Figure 7 shows a recording from the area V1 in the striate cortex together with the reporting of the monkey (Leopold and Logothetis 1996). The data in Fig. 7a correspond to an unambiguous presentation of different stimuli. As one would expect, the tuned cells fire if the stimulus corresponds to the preferred orientation of the cell and they do not fire in the other case. Figure 7b shows the data when the stimulus is presented in a binocular rivalry task. Now the firing of the cells is quite random and independent of the reporting of the animal.

Fig. 7
figure 7_11

Recording from the area V1. The bars underneath the instantaneous firing rate of the neuron show the periods where the animal reports seeing the prefered orientation of the cell. (a) Firing rate during the unambiguous presentation of the stimulus, (b) firing rate during the binocular rivalry task

The situation changes quite drastically if one records from the areas V4 or MT, where V4 represents mostly the color and form of objects, and MT represents information about motion. In this case, the neurons tend to fire about 300-400 ms before the animal reports the preferred orientation. This corresponds roughly to the latency of the animal. So, the firing rate of the neuron is correlated to the bars which indicate what the animal sees (Fig. 8).

Fig. 8
figure 8_11

Recording from the visual areas V4 and V5. The activity of the neurons is correlated to the reports of the monkey. Some of the neurons explicitly fire only when, according to the animals report, the preferred stimulus is hidden

While in the striate cortex about 10-13% of the neurons are active during a stimulation, in the extrastriate cortex about 40-45% of the neurons participate (Leopold and Logothetis 1999). They modulate their activity according to the perceptual changes. But about half of them will explicitly fire only when the preferred stimulus is hidden. These results show that there is a representation of both stimuli in the extrastriate cortex. However, only one of them reaches consciousness and the other doesn’t and remains hidden.

One step higher one arrives at the so-called inferior temporal cortex, where people, including ourselves, have described the existence of very complex physiological properties. In this area, cells will only fire if the stimulus corresponds to complex objects - faces, hands, etc. In the recordings, we find cells whose activity is correlated with the report of the monkey (Sheinberg and Logothetis 1997). During the rivalry task (gray area in Fig. 9) one observes a long period where the cell is not firing and then, suddenly, there is an increase of the cell activity and immediately after this increase the report of the monkey comes and indicates: “I see the right stimulus,” which happens to be the preferred stimulus of the cell.

Fig. 9
figure 9_11

Recording from the inferior temporal cortex. The gray area corresponds to the rivalry situation; in the period before and after the stimulus is unambiguous

In the inferior temporal cortex over 95% of the neurons participate in this activity. The neurons will fire exclusively when the preferred stimulus is consciously perceived and they will not fire for the hidden stimulus. All the mutually antagonistic interactions - excitatory and inhibitory activations or membrane potentials - happen in the early visual areas, but not at all in the very late visual areas. The inferior temporal cortex is the last station where exclusively visual information is processed in the cortex. After that, all other areas are multimodal.

What do we learn from all that? The most important lesson is that no single area alone is responsible for the suppression. There was a long-lasting psychophysical discussion claiming that there is an area where this suppression occurs (for a history of this discussion see e.g., Blake and Logothetis 2002). It seems to me that, at least from a physiological point of view, we cannot hold this claim, because the activity is distributed all over the place.

From these results we come to the following conclusions:

  • The vast majority of V1 neurons are active whether or not the stimulus is perceived. That explains a huge number of psychophysical effects which will not be discussed here.

  • The cells in IT follow the sequences of perceptual dominance and suppression.

  • The neurons in the early extrastriate cortex may fire selectively for the dominant or for the suppressed stimulus.

  • In primary and early extrastriate cortices, small changes in the firing of cells may be sufficient to instigate a perceptual shift.

  • The interneuron response-coherence is related to the system’s stability.

Concerning the last point one can show that the synchronization between cells breaks down in the ambiguous situation, and there is considerable synchronization even in the spontaneous activity for an unambiguous stimulation.

4 Rivalry Related Tasks

As far as the physiology is concerned, all these results seem to be very general, whether one considers binocular rivalry, bistable perception like the Necker cube (Necker 1832), or figure-ground illusions like in the case of the vase and the face (Rubin 1958). I will describe two more examples related to the perception of objects.

The set-up of the first example is again a dichoptic presentation where one eye looks at some object (e.g., a red circle) and the other eye looks at a background of moving dots (Fig. 10), which do not intersect with the area where the object is presented (Wilke et al. 2003). Furthermore, object and background are not presented simultaneously, but the object is presented first and with a little time shift the background is flashed on. During the flashing of the background, the red stimulus sometimes disappears from our conscious perception. In this case, there is no rivalry, because there is no coexistence of forms (the dots of the background do not enter the part of the visual perception which is occupied by the object in the other eye).

Fig. 10
figure 10_11

In a dichoptic asynchronous presentation, the perception of the object (left) sometimes vanishes with the onset of the background (right)

It turns out that the probability of disappearance of the red object, the so-called mask effect, depends on several parameters: it depends on the density of the moving dots, it depends on the distance the moving dots keep from the central object (the larger this distance, the later the onset of the disappearance and the later the onset of the neural responses), and, finally, it depends on the target-surround asynchrony. It does not depend on the nature of the object; what is going to disappear could be virtually anything: a face, an object, several objects, etc. The probability of disappearance is maximal (almost a hundred percent), if there is a shift of one second between the onset of the first image - the object - and the flashing of the background.

The second example also exhibits interesting relationships between cell activation in the inferior temporal cortex and actually seeing the object. We are all familiar with the situation of searching for our keys: We look a hundred times and probably even gaze at the keys during this search but we do not see them. Then at some point there is an “Aha!” and we find the keys.

This is exactly what the monkey does in the following task. The animal is supposed to find the cat in the picture of Fig. 11; it knows the object. In general, monkeys are very good at detecting embedded objects. In the beginning, the gaze of the monkey has a distance of ten degrees from the embedded object. The monkey looks around and at some stage the distance becomes very small - the object is within the fovia - but there is no cell activity and there is no response of the animal. Then, a few hundred milliseconds after the gaze of the monkey has left the site of the object, there is a sudden increase of cell activity and immediately after that the monkey gives the response that he sees the object. So, even if the gaze of the monkey is right on the object, the cell doesn’t fire - not until the monkey sees the object consciously.

Fig. 11
figure 11_11

Hidden in the picture is the image of a cat which the monkey is supposed to find. Below one sees the distance of the gaze of the monkey (measured as an angle) from the object. At one point, the gaze is right on top of the object, but only after the cell activity increases does the monkey report seeing the object

5 Things We Learn and Things We Do Not Learn

Once more I raise the question: What do we learn from these data? This is a problem that has captured me for the last 7-8 years. There is no question that we learn a lot about cell properties. We now have a huge repertoire of results and with these results we can basically characterize cells, and we even get some ideas about the complexity of the connectivity of the cell. Furthermore, we learn about areas that may or may not be related to particular tasks or to particular stimuli.

We learn something about the percentage of neurons showing certain characteristics. We know that particular areas have a lot of neurons that respond to the color red and other areas have a lot of neurons that respond to movement. Furthermore, we learn something about the effects of mental states on the firing neurons. People have described effects of attention, short-term memory, and all kinds of things that seem to have an impact on the cell.

The following is a list of things which we do not learn, however:

  • Computational rules. We have almost no idea how the things I mentioned actually do happen. On the other hand, we will never really make progress unless we have a better understanding of this. What we do today is some kind of microphrenology: describing correlates of everything one can imagine.

  • The role of recurrence, feedback, excitation - inhibition. We often pretend that the cortex is just there, and when an input is coming in, there is some data processing in the cortex and it sends the results somewhere else. But the input, usually from the previous site, the thalamus or another area, is extremely small. Then there is an amplification with a positive feedback. It could lead to a run-away excitation, but it is very tightly controlled by inhibitory neurons. Only the deviations in this balance between excitation and inhibition create a response. To understand these mechanisms is much more difficult than to simply assume that the neuron is just sitting there and is doing absolutely nothing until the stimulus comes in and then it gets excited.

  • The role of interneurons. Interneurons are very crucial in all these processes but I will not go into detail about this.

  • The cell-network-behavior. This is a very interesting relationship whose relevance is only slowly accepted by the community. At present, we often try to get rid of the “network.” We assume that there is only the cell and the behavior. But this assumption is ridiculous. Instead, we should think of the cell as having a crucial role within the network, which is very interesting to understand, and then the network plays a role in the behavior.

To make the last point clearer, let me describe what happens when a stimulus is presented. For instance, we presented objects with different cues, and we tried to see whether there is some kind of cue constancy or shape dependence. One observes a widespread activation which is far from being random: in repeated presentations of the same stimulus one always finds the same activated spots. One also observes local activations, but at the same time one finds these activations in many different areas.

We also know that these areas are incredibly interconnected, sometimes in the most confusing way. One area separates the input and sends the information to the next area, and then the next area respects nothing about this separation of the first area. Obviously the brain does the job, but not in the simple way that we sometimes want it to.

Furthermore, the cortical manifold is interacting a lot with different substructures. We know that, for example, pulvinar may play a critical role in synchronizing different cortical areas that are supposed to respond together, and the Claustrum also plays a very important role in the communication between these areas. We have only just started to understand some of the relevant interplay between the different parts of the brain.

6 Conclusion

I have used the example of binocular rivalry and dichoptic presentations to illustrate the complicated interplays between different areas of the brain which lead to the difference between what we see and what we have in mind. Not only is the step from neurons to perception a huge one, but there are all kinds of intermediate levels which we are just about to begin to understand.

New methods which we are presently developing may help to collect more data about these intermediate levels. Some of these methods include intracortical recording, physiological and functional cerebral blood flow studies, studies using neuromodulator injections, the investigation of the functional neurovascular coupling, microstimulation and fMRI, perfusion and hypercapnia, spectroscopic imaging, the investigation of structural neurovascular coupling, investigation of the connectivity with MR Tracers, and, finally, molecular imaging, i.e., the development of chemical substances that can substitute hemodynamic responses with either calcium or voltage images. But, as I said in the beginning, in the end what we need are not necessarily more data but a theory and a plausible theoretical context within which data can be better (and more intelligently) interpreted.