Introduction

A classical approach to human cognition proposes that the processes related to perception, high-level cognition (such as decision making), and action, take place in successive and relatively independent stages (e.g., Sternberg, 1969). This approach has been challenged by the view that all mental processes and their underlying brain mechanisms are strongly shaped by the need to serve action (Clark, 1996; Noë, 2004; Barsalou, 2008). In the development of action-based theories of cognition the concept of affordance (Gibson, 1979), for which objects and the environment around us are seen by organisms in terms of the opportunity for the actions they offer, has a key role. Behavioural and brain imaging experiments have shown that the simple sight of an object tends to elicit (internal) motor representations, suggesting that for organisms the very notion of object has fundamental action components (Jeannerod, 1994; Arbib, 1997; Rizzolatti, Fogassi, & Gallese, 1997).

Cognitive psychologists have developed an experimental paradigm to investigate these issues from a behavioral perspective, the affordance compatibility paradigm. In a typical experiment (Tucker & Ellis, 2001; 2004), the participants are requested to respond to visual objects, for example to classify them as natural or artifact, by producing either a precision or power grip action on a custom joystick. Performance is enhanced (with faster responses and fewer errors) whenever the response grip is the same as that afforded by the object, compared to the incompatible cases.

A previous computational model (TRoPICALS, see Table 1 for this and all other acronyms) of these phenomena (Caligiore, Borghi, Parisi, & Baldassarre, 2010a) attempted to capture the key principles underlying compatibility effects (see the original paper for an indication of the effects accounted for). These principles state that: (a) the brain is organised along two broad major neural pathways (Ungerleider & Mishkin, 1982; Milner & Goodale, 1995): a dorsal neural pathway, within which object affordances are translated into potential actions (Rizzolatti, Luppino, & Matelli, 1998), and a ventral neural pathway, where information on context and object categories are elaborated (Grill-Spector & Malach, 2004; Weiner & Grill-Spector, 2012); (b) the prefrontal cortex (PFC), acting within the ventral neural pathway, modulates the selection of actions based on affordances on the basis of the current goals of the agent (Miller & Cohen, 2001; Fuster, 2001); (c) information on the actions afforded by the object and information from PFC on the agent’s goals is sent to a “clearing mechanism”, based on a neural competition (Erlhagen & Schöner, 2002; Cisek, 2007), that generates the reaction times (RTs) of action initiation (a fourth and last principle of the model, related to language, is not considered here as out of the scope of this work). When these mechanisms work in an integrated fashion, they explain the RTs found in compatibility effect experiments: when the information on affordances (dorsal pathway) and on goals (PFC in the ventral pathway) agree, the neural competition generates a fast response, whereas when they disagree it generates relatively slow RTs. The value of the model resides not only in its capacity to account for several compatibility effects but also in the fact that such an account is based on a macro-architecture of the model, and on specific functions of its components, that have been constrained on the basis of data on the broad anatomical organisation and functioning of the brain areas involved (see Caligiore et al., 2010a, on this and other methodological principles used to build the model, together forming the computational embodied neuroscience approach).

Table 1 Acronyms used in the article

The potential of TRoPICALS in explaining compatibility effects is also due to the “embodied nature” of some of its features. In particular, the use of a realistic two-dimensional simulated retina allowed the two neural pathways of the model (dorsal and ventral) to process different aspects of objects, namely object affordances (based on object shape) and object categories (based on the object’s general appearance): this would have not been possible by using an abstract representation of objects (e.g., a symbolic one). Moreover, the necessity of the system to use all available information to finally produce a unique motor behaviour generated the need to have the “clearing mechanism” that ultimately generated the RTs comparable with those of the target experiments (one idea of the embodied cognition perspective is that all information available to the brain needs to funnel into overt actions and this profoundly influences the internal representations and processing of such information within the system, see Cisek, 2007, on this, and also Parisi, Ceccon, & Nolfi, 1990; Nolfi, 2009, on computational perspectives on the effects of embodiment on cognition). To avoid false expectations in the reader; however, it is also important to anticipate that the “level of embodiment” of the system used here (as the original TRoPICALS) is quite low as it uses only a simple image and a two degrees of freedom motor output. Indeed, the value of this work does not reside in the computational and robotic sophistication of the model, but rather in the bio-constrained system-level account of the target experiments, as is further explained below (see Borghi, Di Ferdinando, & Parisi, 2011, for another model on compatibility effects that does not include constraints on brain-anatomy but has a stronger embodiment).

This new work accomplishes another important step, with respect to the original formulation of TRoPICALS, in understanding the grounding of cognition in the real world. The world is full of objects and features most of which are irrelevant to the agent’s purposes (distractors). The key idea investigated here is that if the internal representation of objects involves various aspects of the affordances they elicit, then the processing of the affordances related to distractors might influence the representation of target objects in complex ways. This is indeed what has been observed in the experiments of Ellis, Tucker, Symes, and Vainio (2007). The authors required their participants to categorize the shape of abstract, three-dimensional target objects, in displays containing two objects, by performing a precision or power grip. The usual target compability effect was observed; it was easier to classify the object with a response that was congruent with the grip it afforded. In contrast, the ignored object (the distractor) gave rise to a negative compatibility effect. It was actually harder to respond to the target when the distractor afforded the same grip compared to the incompatible cases.

The explanation and modeling of these phenomena is quite challenging as a number of different, and possibly contrasting, pieces of information reach the areas of brain that have to process: (a) the top-down information on the categorisation task to be performed, based on the target; (b) the top-down information on the need to ignore the distractors; (c) the bottom-up information on the target affordances; (d) the bottom-up information on the distractor affordances. What could be the nature of the mechanisms that succeed in merging these pieces of information and that result in the RTs measured in the behavioural experiments? Ellis et al., (2007) suggest that the target-related compatibility effects are produced by the agreement/disagreement of the task response with the target object’s afforded actions (as also proposed by Tucker & Ellis, 2001; 2004), whereas the novel distractor-related effects are a consequence of the need to inhibit the distractor, including inhibition of the actions associated with it. However, the detailed mechanisms that might lead to these effects, and the interplay between the various sources of information, is not clear. Furnishing this account with a model having a macro-architecture that fulfills biological constraints, as done here, renders the explanation even more challenging.

This work presents a version of TRoPICALS where the PFC control of action and the affordance representations have been extended to take into account the information related to the distractors, in particular their top-down and bottom-up effects on action selection (points “b” and “d” mentioned above; note that some other parts of the original model, necessary to account for compatibility effects related to language or to the performance of reaching movements, are not used here as they are not relevant to the effects under investigation).

More specifically the model has been extended as follows. First, the dorsal and ventral pathways of the model are now capable of processing information related to the distractor and to the target in parallel. Second, the PFC control of action is extended to include an inhibitory control along with the original excitatory control: this allows the model to refrain from executing the actions suggested by the distractors. This extension is based on the idea that PFC might play a double role in its top-down guidance of action selection, namely (Knight, Staines, Swick, & Chao, 1999): (a) producing a positive bias that facilitates the triggering of the actions requested by the target and goals; (b) producing a negative bias that inhibits the execution of actions that are suggested by objects but that are not needed based on current context and goals.

With respect to the last point, key empirical evidence that gives important insights on excitatory and inhibitory mechanisms involved in compatibility effects come from the research on Parkinson’s disease (PD) patients (Lang & Lozano, 1998; Redgrave et al., 2010). PD involves the damage of excitatory and inhibitory mechanisms underlying action selection and execution. These difficulties are caused by the loss of dopaminergic cells of the nigrostiatal pathway injecting dopamine into the basal ganglia (BG), in particular into their portions that form loops with the premotor (PMC) and the motor cortex (MC). These loops play a key role in action learning, selection, and preparation and their damage in PD patients has a particularly strong effect on the initiation of voluntary movements. This deficit is attributed to the low activation of the supplementary motor area (SMA), again caused by dopaminergic deficits in this case involving the portions of BG that form loops with this cortical area (Jahanshahi et al., 1995). Indeed, the SMA bridges the PFC (where goals and needs are represented) to the PMC/MC (responsible for action preparation and execution) and so plays a crucial role in generating actions with an internal origin (Nachev, Kennard, & Husain, 2008; Haggard, 2008). In contrast to this difficulty in initiating voluntary actions, PD patients are strongly affected by affordances (Galpin, Tipper, Dick, & Poliakoff, 2010). It has been suggested that such sensitivity to externally evoked actions can help the movements of PD patients by compensating the effects of the low activation of the SMA (e.g., see Galpin, Tipper, Dick, & Poliakoff, 2010; Oguro et al., 2009). Given that the sites of brain damage in PD patients are known, it is possible to simulate similar damage in our model and furnish empirical predictions on compatibility effects in PD patients.

Summarising the goals of the paper, this work presents an enhanced version of (some components of) the model TRoPICALS that furnishes detailed hypotheses on the possible mechanisms underlying the compatibility effects produced by target objects and distractors. These hypotheses are based on system-level architectural principles constrained by the known macro-anatomy and macro-functions of relevant areas of brain. The model also furnishes some detailed predictions on the possible behaviour that PD patients might exhibit in compatibility experiments involving both targets and distractors.

The target psychological experiment and its simulation

Ellis, Tucker, Symes, and Vainio (2007) had their participants select a target object (cued by its colour) from a two-object scene and classify it as a ‘curved’ or ‘straight’ object by pressing a response device with either a precision or power grip. The stimuli consisted of combinations of four abstract, three-dimensional objects: two large objects (cylinder and parallelepiped) and two small objects (sphere and cube). The target and distractor on each trial always differed in terms of their response category (curved or straight).

The simulations aimed at reproducing this experiment, simplifying secondary aspects of it. The simulated participant could see eight different objects drawn from the original experimental set: four blue target objects (small sphere, large cylinder, small cube, large parallelepiped), and four red distractor objects with the same shape as the target objects. The small objects could be graspable with a precision grip, whereas the large objects with a power grip. In the simulated experiments, the nervous system of 25 participants was simulated using 25 different instances of the model, obtained with 25 different seeds of a random number generator (hence, different initial connection weights and learning history). After this training, the response RTs of the participants was recorded.

Before the experiment, the simulated participant first learned to associate a suitable kind of grip (e.g., a precision one) to each object (e.g., a small sphere). This learning procedure was used to mimic what happens in the life of real participants when they learn to suitably respond to affordances of objects. Note how this is an essential element of the explanation of the compatibility effects presented here as such explanation relies on the hypothesis of the reactivation of internal representations of affordances acquired before the psychological experiment.

Methods

The body of the simulated participants (camera and robotic hand)

The model sent grasp commands to a simulated agent endowed with a human-like hand and visual system (Fig. 1a; see Caligiore et al., 2010a, for more details). The simulated hand had the same parameters as the humanoid robot iCub (http://www.icub.org). The visual system was formed by a simulated “eye” represented by a 630 × 630 pixel RGB camera. The eye was controlled by a hardwired colour-based “focussing reflex” that allowed it to foveate the barycentre of target objects. During the experiments the agent was exposed to a scene showing two objects: a target and a distractor. The target was chosen from four different blue objects: two large objects (cylinder: radius 34 mm, length 70 mm; parallelepiped basement side 60 mm, length 80 mm), and two small objects (sphere: radius 15 mm; cube: side 25 mm); the distractor was chosen from the same objects as the target, but with a different colour (red instead of blue). The image of the object was directly sent to the simulated camera. The hand and the objects were simulated on the basis of a 3D physical engine (NEWTON) whereas the eye was simulated based on a 3D graphic interface (OpenGL).

Fig. 1
figure 1

a The simulated robotic hand and eye, used to test the model, interacting with a simulated cylinder. The line that goes to the object marks the gaze direction, the other four lines mark the visual field. b Architecture of the modified TRoPICALS model used in this work. The figure highlights the hardwired connections and the connections which are updated with Hebbian or anti-Hebbian covariance learning rules, or with a Kohonen learning rule

The grasping movement was implemented in a minimalistic way using two “virtual fingers” (Iberall & Arbib, 1990). In particular, the model issued only a two-value command to the hand in order to implement a grip. The degrees of freedom (dfs) of the thumb were changed proportionally to the first command value, whereas the dfs of the four remaining fingers were changed proportionally to the second command value. The arm and wrist were kept still as in the target experiment. The grasping signal was encoded by the output 2D neural map (PMC in Fig. 1b). The activation of such neurons represented the desired hand posture in terms of joint angles (equilibrium points, Feldman, 1986). These angles were sent to a proportional derivative controller (PDC) used to mimic, in a simplified way, the spring and damping properties of muscles (see Berthier, Rosenstein, & Barto, 2005; Caligiore, Guglielmelli, Borghi, Parisi, & Baldassarre, 2010b, for similar approaches, and Caligiore et al., 2010a, for the equations and parameters). The PDC generated torques that decreased the difference between the desired joint angles and the actual ones. Gravity had no effect as the fingers moved horizontally.

Despite using only two dfs, the set-up illustrated above is sufficient to produce different actions in terms of different final apertures of the hand. This minimal flexibility of the system allowed it to learn to perform different grips (or, more precisely, “hand apertures”) in correspondence to different sizes of the objects (see below). This allowed the system to perform small and large apertures as needed in the simulation of the target experiments (see Caligiore et al., 2010a, for some examples). This minimal level of sophistication does not aim to compete with the accuracy and richness of other computational models in reproducing grasping actions (e.g., Oztop, Bradley, & Arbib, 2004), but is enough to tackle the target problems discussed in the introduction while at the same time keeping the simplicity of the model to a maximum. In the following we shall use “power grip” and “precision grip” instead of “large hand aperture” and “small hand aperture”, respectively. This just to keep the homogeneity among the terms used in the simulations and the terms used in the experiment with real subjects. In this way the data comparison will be easier.

The architecture of the model

Figure 1b illustrates the architecture of the model. This is formed by five components each corresponding to a different brain cortical area: the visual cortex (VC; this is formed by three RGB maps of 21 × 63 neurons), the anterior intraparietal area (AIP, located in the parietal cortex, PC; one map of 21 × 63 neurons), the premotor cortex (PMC; one map of 21 × 21 neurons), the ventral occipito-temporal cortex (VOT; one map of 21 × 21 neurons), and the prefrontal cortex (PFC; one map of 21 × 21 neurons). The choice of these components broadly agrees with brain imaging evidence showing which cortical areas are active during the performance of compatibility-effect experiments (Grèzes, Tucker, Armony, Ellis, & Passingham, 2003). The functions played by the components of the model and the biological reasons for assuming them are now considered in detail. Note that all the equations for the implementation of the components and their parameters that are not reported here can be found in Caligiore et al., (2010a). Preliminary ideas about the model presented here were also discussed in Caligiore et al., (2011).

Visual cortex (VC)

Neuroscientific evidence on primates and humans (Van Essen et al., 2001; Grill-Spector & Malach, 2004) shows that VC extracts increasingly abstract information from images in succeeding stages: from simple edges to complex visual features (Hubel, 1988; Vinberg & Grill-Spector, 2008). These processes are important for both the visual ventral neural pathway (e.g., they support object recognition in VOT) and the visual dorsal neural pathway (e.g., object shape and other features contribute, together with somatosensory information, to the extraction of affordances from objects within AIP).

In contrast to the original model, VC processes the image of a target and a distractor at the same time. As before, VC is formed by three maps encoding three colours (red, green, and blue). However, now VC has a central region representing the fovea and its surroundings, and two lateral regions representing the peripheral left and right parts of the retina (this strong simplification is enough for the purposes of this work). The central region is always activated by the target object whereas either one of the peripheral regions is activated by the distractor (the model assumes that the eye always foveates the target on the basis of the focusing reflex illustrated above). The neurons of the central region have an activation which ranges in [0, 1] whereas those of the peripheral regions have an activation which ranges in [0, 0.4] to simulate the lower density of receptors of the peripheral areas of the retina. The three colour maps encode the information about shape and colour of the seen object obtained through three distinct Sobel filters (Sobel & Feldman, 1968) applied to the three colour maps. These processes abstract the edge detection processes performed by the retina and the subsequent early stages of VC.

Anterior intraparietal area (AIP)

AIP is a key region for the detection of affordances (Fagg & Arbib, 1998; Oztop, Bradley, & Arbib, 2004). In this respect, evidence from monkeys (Rizzolatti, Luppino, & Matelli, 1998; Murata, Gallese, Luppino, Kaseda, & Sakata, 2000) and humans (Culham & Kanwisher, 2001; Simon, Mangin, Cohen, Hihan, Dehaene, 2002) shows that AIP encodes information important for guiding the control of object manipulation (e.g., object shape and size).

In the model, AIP simply encodes the object shape, in that its neurons are activated with the average of the activation of the three corresponding RGB edge-detection neurons of VC. This implies that the model assumes that when the system processes two objects located in different positions at the same time (e.g., target and distractor) such processing activates different areas of AIP (Behrmann, Geng, & Shomstein, 2004). Note that the representation of only shape is a strong simplification with respect to affordance information encoded in AIP. This assumption is however sufficient for the scope of this work (cf. Caligiore et al., 2010a, for further discussions on this).

As in Caligiore et al., (2010a) the activation of AIP neurons is scaled according to the object size using a coefficient equal to 1 for large objects and 3.2 for small ones. This assumption is derived from the evidence that small objects activate a greater number of AIP neurons than large ones (cf. Ehrsson et al., 2000). This avoids possible distortions of the RTs due to the number of VC neurons activated by different objects, as shown in pilot experiments.

Premotor cortex (PMC)

Experiments on monkeys and humans (Rizzolatti, Luppino, & Matelli, 1998; Rizzolatti et al., 1998; Rizzolatti & Craighero, 2004) show that activation of some PMC neurons (“mirror neurons”) encoding grasping actions fire both when actions are performed and when they are observed. As a part of the system forming loops with basal ganglia BG (Kandel, Schwartz, & Jessel, 2000; Cisek & Kalaska, 2005; Redgrave, Prescott, & Gurney, 1999), PMC also plays an important role in action selection. For simplicity, here we do not explicitly simulate the BG but implement a neural competition within PMC that abstracts the processes underlying action selection performed by the BG–PMC system.

In the model, PMC encodes the motor commands issued to the robotic hand. The desired hand angles are “read out” from the PMC map as a weighted average of the desired angles of each neuron (encoded by their position within the neural map), with weights of the average corresponding to the activation of the neurons themselves (“population code hypothesis”, Pouget, Dayan, & Zemel, 2000). Importantly, in the model PMC implements the selection of hand postures through a dynamic neural competition process involving leaky neurons connected by reciprocal inhibitory connections. In detail, the PMC is a dynamic field network (Erlhagen & Schöner, 2002) that gets as input the (one-to-one) signals from AIP and the PFC. The leaky PMC neurons have lateral, excitatory, short connections, which form neural clusters, and lateral inhibitory long connections, that leads to competition between neural clusters. In particular, each neuron sends a connection to each other neuron of the map equal to a Gaussian function of the distance with it (the height α of the Gaussian was set to 1.2 and its standard deviation σ to 0.6) minus a fixed inhibitory value (I = 0.4).

When one cluster wins a competition it suppresses all other clusters and when a given threshold (set to 0.7) is exceeded, a grasping action is triggered with command values based on the reading out of the map described above. The model simulates RTs as the time needed by at least one neuron of the PMC winner cluster to reach the threshold. In the real experiments RT is measured as the time elapsing between the visual stimulus presentation and the completion of the grip action (cf. Ellis, Tucker, Symes, & Vainio, 2007). However, the real experiments use a customised joystick for which the hand “is already in contact” with the part of the joystick that it has to act on, so the actual duration of the movement is negligible and hence we have not considered it in the model. Also, we did not consider the time needed by the signal from premotor cortex to reach the motor cortex, spinal cord, and muscles as: (a) we wanted to keep the model as simple as possible, so we decided to not simulate these further neural areas; (b) the time needed by the signal for this further propagation is expected to be similar in compatible and incompatible cases; (c) the model aimed to reproduce qualitative difference between compatible and incompatible cases, not quantitative ones.

The PFC-PMC and the AIP-PMC connection weights, which were trained (see below), could achieve a maximum value of 0.35 and 0.15, respectively. This constraint allowed PFC signals to overwhelm the AIP affordance-related signals when necessary (Miller & Cohen, 2001; Caligiore et al., 2010a).

Ventral occipito-temporal cortex (VOT)

The inferior temporal cortex in monkeys (IT), and its homologous VOT in humans, is the highest-level visual processing stage of the ventral neural pathway and plays a key role in visual object recognition (Van Essen et al., 2001; Logothetis, Pauls, & Poggio, 1995; Grill-Spector & Malach, 2004; Vinberg & Grill-Spector, 2008).

In the model, VOT is represented by one self-organising map (“SOM”; Kohonen, 1997). The map receives all-to-all connections from the three RGB maps of the VC. An important assumption of the model is that when the VC-VOT connection weights corresponding to one of the three regions of VC are updated (see below), the corresponding weights of the other two regions are also updated (but those of the peripheral regions are updated with a learning rate that is 40% of that used for the fovea-region connections to reflect their lower density of receptors, cf. Grill-Spector, 2008). This technique (cf. Plunkett & Elman, 1997) is used to assure a spatially-invariant representation of objects within VOT typical of high-level visual processing stages of brain. Note that this assumption also allows VOT to represent two or more different objects at the same time when these are perceived contextually. The SOM map is activated using the same equations and parameters used in the previous version of TRoPICALS.

Prefrontal cortex (PFC)

Primates exhibit very flexible behaviour thanks to their capacity to learn rich repertoires of actions. This, however, also generates the problem of the potential interference between actions as many of them can be executed at each moment. PFC plays a key role in biasing the selection of the actions to be performed at each moment on the basis of the current context and goals (Fuster, 2001; Miller & Cohen, 2001; Deco & Rolls, 2003). Importantly, PFC implements working memory, so it is capable of keeping track of the recent past and use it to make decisions (Fuster, 1997).

In the original version of TRoPICALS, the PFC received information not only about objects (VOT) but also on the task to accomplish, and integrated it on the basis of a Kohonen algorithm. To solve the experimental task considered here, the PFC needs only the information from the visual scene, so it gets information only from VOT. To simulate the working memory properties of the PFC, in the model it is a map of leaky neurons activated one-by-one by the corresponding neurons of VOT. Pilot tests showed that this property of PFC neurons prevents the PFC inhibitory signals suppressing the signals reaching the PMC via the dorsal pathway before they have an effect on RTs.

Learning mechanisms

The model is trained in two learning stages which mimic learning during life and learning during the psychological experiment.

Phase 1. Learning to interact with objects during life

The first phase simulates the participants’ learning to grasp objects during life. During this phase the model acquires: (a) AIP-PMC connections (affordances) within the dorsal pathway; (b) VC-VOT connections (objects’ identity) within the ventral pathway. The training was performed by repeatedly presenting, one by one, the eight objects to the model (trials). For each object presentation we systematically varied the colour (target: blue; distractor: red), and the position in space of the object (central or periphery positions). At each object presentation, VC performed colour-based edge detection of the object image and AIP performed colour-independent shape detection.

The AIP-PMC all-to-all connection weights were updated to form Hebbian associations between the perceived shape of the object (AIP) and the corresponding hand posture (PMC). To this purpose, the object was set close to the hand palm, the hand dfs were progressively decreased, and the resulting hand angles (averaged for each virtual finger) were used for learning based on a covariance Hebb learning rule (Dayan & Abbott, 2001; Caligiore et al., 2008). This allowed the model to perform a suitable grasp action with the hand in correspondence to the seen object. The maximum value of the weights was set to 0.15. The VC-VOT connection weights were updated on the basis of a Kohonen learning rule (Kohonen, 1997). This allowed the ventral stream to acquire the capacity to categorise objects on the basis of their appearance.

Phase 2. Learning to accomplish the experimental task

The second learning phase mimicked learning to perform the experimental task. This involved repeated interactions (trials) with the objects presented in isolation (either the target or the distractor). At each step of a trial the model perceived the object and this activated the VC, AIP, VOT, and the PFC.

If the perceived object was the target, the PMC was activated so as to perform the grip requested by the psychological tasks (power grip for straight objects, precision grip for spherical objects); this amounts to assuming that the correct grip, dependent on the experimental instructions and apparatus, was performed thanks to memories and processes related to such instructions not explicitly simulated here. In particular, the PFC-PMC connection weights were updated on the basis of the Hebb covariance learning rule mentioned above (the maximum weight value was set to 0.35).

If the perceived object was a distractor, the PMC was activated so as to perform the grip according to the affordance evoked by the object (power grip for a large distractor, precision grip for a small distractor), so always in agreement with the signal coming from the AIP to be inhibited. The PFC-PMC connection weights were updated on the basis of a negative learning coefficient so as to implement an anti-Hebbian covariance learning rule that progressively forms inhibitory connections (Lisman, 1989).

Results

This section reports and discusses the results of the simulations of selecting and responding to a target object with a simultaneously present distractor. The results of the simulations replicate and account for the main results of Ellis, Tucker, Symes, and Vainio (2007). The using of firing rate neurons which reproduce the functioning of real neurons with a relatively high level of abstraction (Dayan & Abbott, 2001) entailed the derivation of only qualitative data on RTs (cf. Caligiore et al., 2010a for a further discussion on this point). In addition, the section also presents two testable predictions on the possible consequences that damages in excitatory and inhibitory mechanisms have on volitional movements in PD patients (cf. Haggard, 2008; Knight, Staines, Swick, & Chao, 1999).

During the experiment, the simulated participants were shown scenes containing the target in a central position and the distractor in one of the two peripheral positions. All data reported below refer to 25 repetitions of the experiment run with different simulated participants having different initial, random connection weights.

Given that the distribution of the simulated data was not normal, we transformed the data by means of a logarithmic transformation [Log 10 (RTs)]. The transformed simulated data were subjected to an analysis of variance (ANOVA) with the factors: target (large vs. small), distractor (large vs. small) and grip (power vs. precision). All main effects and all interactions were significant. The main effect of target (F(1,24) = 35.42, MSe = 0.013, p < 0.0001) was due to the fact that large targets (M = 2.31) responded slower than small targets (M = 2.22); the effect of distractor (F(1,24) = 20.65, MSe = 0.002, p < 0.001), was due to the fact that processing large distractors (M = 2.25) required less time than processing small distractors (M = 2.28). Both results differ from those found in the target experiments, but this was not a target of this study (as suggested by Ellis and colleagues, the device used by the participants to perform a kind of grip was harder to use for a precision grip (just because of mechanical issues) and this tended to reduce the difference between power and precision responses to precision objects thus disguising the distractor effect in this case). The main effect of grip reflected the results found with human participants (F(1,24) = 103.34, MSe = 0.008, p < 0.0001) as precision grip responses (M = 2.20) were faster than power grip ones (M = 2.33).

The interaction between target and distractor was significant (F(1,24) = 23.27, MSe = 0.003, p < 0.0001). Post hoc Newman-Keuls tests showed that, while with small targets there was no difference between distractors, with large targets RTs were faster with large distractors (M = 2.28) than with small ones (M = 2.35) (Newman-Keuls, p < 0.001).

The interaction between target and grip was significant (F(1,24) = 453.69, MSe = 0.008, p < 0.0001). Post hoc Newman-Keuls tests showed that all the comparisons were significant, beyond the not very interesting comparison between large target graspable with a precision grip and small target graspable with a power one. These results accord with those described by Ellis, Tucker, Symes, and Vainio (2007): responses are faster when the target is compatible with the grip affordance, and slower when the response is incompatible with it. The advantage of compatible pairs was particularly marked with small targets which elicited a precision grip.

Post hoc Newman-Keuls on the interaction between distractor and grip (F(1,24) = 54.30, MSe = 0.003, p < 0.0001) showed that all comparisons were significant. Interestingly for us, while with large distractors responses with power grip (M = 234) were significantly slower than those with precision grip (M = 2.15), with small distractors the precision grip (M = 2.25) was significantly faster than the power grip (M = 2.31). The overall pattern of results is similar to the one found by Ellis Tucker, Symes, and Vainio (2007) and confirm the presence of a negative compatibility effect.

Neural mechanisms underlying target and distractor effects

The target-related compatibility effects shown on Fig. 2a, b can be accounted for by considering that in the target-incompatible trials the processing of the target by the ventral pathway (VC-VOT-PFC-PMC) evokes an action different from the action evoked by the dorsal pathway (VC-AIP-PMC) (e.g., a precision grip to categorise a large cylinder), thus causing a conflict within the PMC (Fig. 3a). As the PFC-PMC signal is stronger than the AIP-PMC signal, the excitatory bias from PFC wins the competition (e.g., by triggering a precision grip to correctly categorise the large cylinder) but the resulting RTs are relatively long. Indeed, when PFC and AIP signal clusters mismatch they lead to a slower charge of the PMC leaky neurons which will win the competition, so these neurons will take longer to reach the threshold required to trigger the action. Instead, in the target-compatible trials (Fig. 3b) the signals from PFC and AIP will match and converge onto the same action represented by the neurons within PMC, these neurons will rapidly charge and reach the action triggering threshold, and so the RTs will be relatively fast.

Fig. 2
figure 2

Average reaction times (y-axis) versus kind of grip (x-axis). a, c Data from real participants in the experiments of Ellis, Tucker, Symes and Vainio (2007) (reproduced with permission). b, d Data produced by the model. a, b Data relative to the target-based compatibility effects. c, d Data related to the distractor-based compatibility effects

Fig. 3
figure 3

a-b Neural mechanism underlying target-related compatibility effects. a Example of PMC activation in the case of incompatibility: the signals from PFC to AIP generate two neuron clusters competing until the PFC cluster suppresses the other and starts the action corresponding to it. b Activation of the PMC in the case of compatibility: the biases from the PFC and the AIP overlap and cause only one cluster of neurons to form and generate the action to exsecute. Activations after 70, 100, and 300 ms. c Neural mechanism underlying distractor-related compatibility effects: average signal (projected on one dimension of the map) received by PMC neurons in correspondence of a scene recalling a power grip to categorise as “straight” a small cube-target, and to inhibit the automatic response elicited by a large cylinder-distractor

The distractor effects shown on Fig. 2c, d can be explained by noting that the processing of the distractor by the ventral pathway (VC-VOT-PFC-PMC) always sends signals to the neurons representing the same action recalled by the dorsal pathway (VC-AIP-PMC) by the same object, but such signals travel along inhibitory connections (indeed, these connections are developed by the participants precisely to inhibit the affordances when these should not lead to an action execution). When the action requested by the experiment is the same action evoked by the distractor, the inhibition from the distractor tends to inhibit such action and this results in longer reaction times.

As an example, consider the case reported in Fig. 3c related to a distractor-compatible trial where the action requested by the experiment is the same action evoked by the distractor: a power grip to categorise a straight target object (a small cube) with a large distractor (cylinder). In this case, the inhibition caused by the distractor via the ventral pathway fully inhibits (as it is larger) the affordance-related activation caused by the same distractor along the dorsal pathway. However, such inhibition in part also inhibits the target-related activation travelling along the same ventral pathway and so slows down the production of the action requested by the psychological experiment (also note the excitation caused by the precision-grip affordance of the target caused via the dorsal pathway, incompatible with the power-grip action requested by the same target to accomplish the task).

Predictions of the model: compatibility effects in Parkinson’s disease (PD) patients with damage in the volitional movements circuits and in the action selection circuits

We used the model to simulate the dopamine deficit impairments caused by PD on the circuits involving the loops formed by BG with the SMA, the cortical “bridge” which allows the PFC to exert voluntary executive control on the PMC, and also the dopamine deficits that PD produces on the BG loops that allow PMC (and motor cortex) to perform action selections. Assuming that the first type of damage renders both the PFC-PMC excitatory and inhibitory biases less effective (Jahanshahi et al., 1995), we reproduced the impairments in the model by reducing the maximum absolute value achievable by the connection weights of the PFC-PMC pathway (the maximum value achievable by the connection weights was changed from 0.35 to 0.25). Assuming that the second damage renders the action selection process of the BG-PMC loops less effective, in particular that the lower dopaminergic levels imply a less effective disinhibitory mechanism within the BG (Lang & Lozano, 1998; Redgrave et al., 2010), we simulated this damage by reducing the strength of the excitatory signals which fuel the dynamical competition within PMC (the height α of the Gaussian function was set from 1.2 to 0.5). The training processes used for the intact model were also used with the lesioned models.

Figure 4a and d show target- and distractor-related compatibility effects exhibited by PD patients simulated by TRoPICALS by implementing both impairments described above, or either one of the two impairments. The data related to the double lesion condition can be considered to represent the condition of real PD patients. The data related to the single lesion could not be possibly obtained with real patients and are obtained thanks to the possibility of implementing focused lesions in the model. These tests are important as they allow the isolation of specific aspects of complex diseases, such as the PD, and their affects on observed behaviour. The data reported in the figures and analysed below refer to average data obtained by repeating the experiment with twenty-five different simulated participants for each lesioning condition.

Fig. 4
figure 4

Average reaction times (y-axis) versus kind of grip (x-axis) for different kinds of damages of PD patients simulated with the model. ac Compatibility effects related to the target objects. df Negative compatibility effects related to the distractors. a, d Compatibility effects related to a fully lesioned model reproducing two types of PD damages: those related to the volitional PFC-SMA-PMC neural pathway, and those related to the action-selection BG-PMC circuit. b, e Behaviour of the model with only the damages of the volitional PFC-SMA-PMC neural pathway. c, f Behaviour of the model with only the damages of the action-selection BG-PMC circuit

As in the baseline simulation, we performed an ANOVA on the normalized data [Log 10 (RTs)] with the factors: target (large vs. small), distractor (large vs. small), and grip (power vs. precision). When both circuits were lesioned, the effect of grip was the only significant main effect (F(1,24) = 99.63, MSe = 0.046, p < 0.0001): RTs were slower with power (M = 2.81) than with precision grips (M = 2.50). The interaction between target and grip was significant (F(1,24) = 366.54, MSe = 0.044, p < 0.0001), indicating that the compatibility effect was preserved. The interaction between distractor and grip was significant as well, (F(1,24) = 7.75, MSe = 0.022, p < 0.05), due to the fact that with power grip large distractors (M = 2.83) were processed slower than small distractors (M = 2.77; with precision grip, instead, the difference among distractors did not reach significance. Thus we found only a partial negative compatibility effect.

In the single lesioning condition, in which only the BG-SMA circuit was damaged, the main effects of the target, distractor and grip were preserved. Large targets (M = 2.39) were processed slower than small ones (M = 2.29); (F(1,24) = 29.31, MSe = 0.02, p < 0.0001); large distractors (M = 2.31) were processed faster than small ones (M = 2.37); (F(1,24) = 53.46, MSe = 0.03, p < 0.0001); and power grip (2.41) was slower than precision one (2.27); (F(1,24) = 58.57, MSe = 0.016, p < 0.0001). In addition, the interaction between target and distractor was significant, (F(1,24) = 7.74, MSe = 0.004, p < 0.05), due to the fact that with small targets, the disadvantage of the precision grip over the power one was more marked than with large targets. Finally, we found a significant interaction between target and grip (F(1,24) = 1,086.29, MSe = 0.005, p < 0.0001), indicating that the compatibility effect was preserved.

In the ANOVA applied to the transformed data obtained by lesioning only the BG-PMC competitive mechanism, all main effects and interactions were significant. Large targets (M = 2.51) were responded to slower than small targets (M = 2.40); (F(1,24) = 10.39, MSe = 0.049, p < 0.01), large distractors (M = 2.43) were faster than precision distractors (M = 2.48) (F(1,24) = 17.12, MSe = 0.005, p < 0.001) and power grip (M = 2.54) was slower than precision grip (M = 2.37), (F(1,24) = 41.82, MSe = 0.035, p < 0.0001). The interaction between target and distractor (F(1,24) = 15.45, MSe = 0.005, p < 0.01) was due to the fact that, while with small distractors no difference was present, the performance with large targets was better with large than with small distractors. The interaction between target and grip, (F(1,24) = 141.82, MSe = 0.06, p < 0.0001) revealed that the compatibility effect was preserved. Finally, the interaction between distractor and grip (F(1,24) = 48.99, MSe = 0.004, p < 0.0001) revealed that, with the precision grip, large distractors were faster than small ones.

These analyses highlight some important points. First, the model with both lesions (most similar to a fully expressed PD damage) predicts that the PD patients having a level of impairment comparable with that of the model would still exhibit target-related compatibility effects while failing to exhibit clear distractor-related negative compatibility effects. Second, the simulations related to the specific damages caused by the PD revealed that the damage of the PFC-SMA-PMC pathway leads to the elimination of the distractor-related (negative) compatibility effect as the lower “volitional signals” related to the distractor and supported by this pathway are not enough to exert a strong influence on action. Last, the damage of the BG-PMC circuit, which underlies the integration of information from various sources and implements action selection as an outcome, would leave all effects intact. Concerning the interaction between the target and the grip, the reason why it is significant both in the control and the lesioned group is that the effect of the affordance, even if reduced, is still present. Instead, the reason why the interaction between the distractor and the grip is maintained is different: the cause of the effect, namely the top-down suppression mechanism, is not impaired by such lesion.

Conclusions

This paper presented an enhanced version of the embodied computational model TRoPICALS. Caligiore et al., (2010a) showed that TRoPICALS, thanks to the constraints used to formulate its overall functioning principles and specific assumptions (neuroscientific data, behavioural data, embodiment, and reproduction of learning processes), was able to replicate the results of a number of experiments on object to action compatibility effects, to provide a neural-based account of such results, and to advance new predictions to test in novel experiments. The present work shows that TRoPICALS also replicates and accounts for further results on compatibility effects in scenes having multiple objects. It also allows the formulation of specific predictions on the possible outcome of the same experiments if run with PD patients.

The major novelty of the present work is the inclusion, within TRoPICALS, of two different circuits connecting the prefrontal cortex to motor areas, one excitatory and one inhibitory. Both are involved in the accomplishment of task responses when target- and distractor-objects are presented simultaneously. This enhanced the model in two ways. First, it allowed it to replicate and provide a brain-based neural account of the results obtained by Ellis, Tucker, Symes, and Vainio (2007) on compatibility effects in the presence of distractors. This account is based on the idea that the prefrontal cortex might play a double role in its top-down guidance of action selection: (a) a positive bias in favour of the action requested by the experimental task; (b) a negative bias directed to inhibiting the action automatically evoked by the distractor (Knight, Staines, Swickc, & Chaoc, 1999).

The hypothesis concerning the excitatory/inhibitory connections linking the prefrontal cortex to motor areas also had a second advantage: it allowed us to advance specific predictions on the behaviour that PD patients would exhibit in multiple object experiment. The prediction indicates that: (a) target-related compatibility effects are still present in PD patients (in line with Oguro, Ward, Bracewel, Hindle, & Rafal, 2009); (b) distractor-related compatibility effects would tend to disappear in the PD patients mainly due to the overall higher inhibitory effects caused by dopamine depletion caused by the disease.

Overall, we believe that the results presented here have a number of important implications for the literature on compatibility effects and implications for how knowledge on objects and the world is represented in the mind. First, the replication of the experimental results on compatibility effects in the presence of distractors provides a neural account of the mechanisms underlying them. Second, the model allows specific predictions that can be verified with PD patients. In this respect, the possibility of separately lesioning different circuits as it happens in PD allows understanding which specific aspects of it produce which specific effects on behaviour and knowledge representation. Third, the finding that with PD-like lesions the main target-related compatibility effects are preserved while the distractor-related ones tend to disappear has important theoretical implications as it suggests that the excitatory mechanisms underlying compatibility effects are more prominent and robust than inhibitory ones. Importantly, note how all these results point to the fundamental role played in cognition by the embodied/action-based components of the internal representations of objects. These components are both related to the affordances of objects and to the specific actions that can be implemented on them, or should not be implemented on them, on the basis of prefrontally-driven higher-level cognitive processing.