1 Introduction: Seeing-In and the Visual System

The debate about the nature of picture perception is one of the most important debates in the current investigation in the philosophy of perception (Lopes 2005; Matthen 2005; Kulvicki 2006; Hopkins 2003, 2010, 2012; Nanay 2010, 2011, 2017).

Picture perception puts us in a peculiar visual state: seeing-in. Seeing-in is related to a particular visual phenomenon: we simultaneously visually represent both the depicted object and some of the properties of the picture’s surface (the locus classicus is Wollheim 1980, 1987, 1998; but see Nanay 2011, 2017; Lopes 1996, 2005). Simultaneity is about visual representations in general, which can be conscious or unconscious. Thus, this notion does not entail that we consciously perceive both the picture’s surface and the depicted object. Most of the time, we consciously see the depicted object, while we may or may not attend to the surface—and, indeed, we don’t (Nanay 2011: 461–464, 2017: Sect. 2): “If we are simultaneously attending to both the depicted scene and the picture surface, then there seems to be something contradictory or disjoint about our simultaneous experience of both of these. But, crucially, this objection does not apply if pictorial twofoldness is understood not as simultaneous attention, but as simultaneous (conscious or unconscious) representations” (Nanay 2015: 192; I cannot review all the arguments in support of this view here, see Hopkins 2010; Nanay 2011, 2017; Ferretti 2017c, forthcoming).

A crucial contribution to the debate on picture perception was offered when Nanay (2010, 2011, 2015) proposed, following Matthen (2005), that in order to understand the particular visual state we are in when we see an object in a picture–that is, in order to understand seeing-in–we should turn to vision neuroscience. To this extent, the simultaneity of the visual representations occurring during picture perception can be investigated by analyzing what the most important account we have from vision science, the two visual systems model (henceforth: TVSM) (Milner and Goodale 1995/2006), teaches us about the functioning of our visual system. The account of picture perception that follows the results offered by the TVSM is called the ‘Dorsal/Ventral Account of Picture Perception” (henceforth: DVAPP) (Nanay 2015; Ferretti Ferretti 2017a). The DVAPP is a sound philosophical theory because it investigates the nature of the perceptual state we are in (as well as the nature of the peculiar visual experience related to such a state) during picture perception (Nanay 2015: Sect. 4; 2011; 2017). It is, moreover, a sound psychological theory because it carries out such an investigation by using the evidence from vision science, i.e. by analyzing, for example, the psychological and the neurophysiological underpinnings of such a perceptual state (Nanay 2015: Sect. 3; Ferretti 2017a).

Now, the TVSM suggests the presence of (at least) two main visual pathways in our visual cortex, which have distinct anatomo-functional characteristics (Milner and Goodale 1995/2006): a ventral stream (the occipito-temporal network, from the primary visual cortex to the infero-temporal cortex) for conscious (but also unconscious) visual object recognition, which subserves perception from an allocentric frame of reference; a dorsal stream (the occipito-parietal network, from the primary visual cortex to the posterior parietal cortex, with specific connections to the premotor areas) for the unconscious visual guidance of action and the related attribution of action properties to the objects we perceive, which subserves perception from an egocentric frame of reference.

Following Hopkins (1998), Nanay suggests that “any account of seeing-in must be able to tell how this experience represents the picture’s surface and how it represents the depicted object” (Nanay 2010: 199). Then, the DVAPP suggests, in the light of the TVSM, that “the twofold experience of pictures corresponds to the dichotomy between our dorsal visual processing of the surface of the picture and our ventral visual processing of the depicted scene” (Nanay 2011: 464). This is for a simple reason. First, our visual system is divided into two main visual streams with the computational characteristics above described. Second, pictures are objects characterized by two main perceptual components, a surface/vehicle and a depicted scene/object. Third, seeing-in is indeed based on the simultaneous occurrence of two visual representations: one related to the surface and one related to the depicted object.

Such an analysis of the nature of seeing-in leads, according to Nanay (2010, 2011, 2015), to the following four claims:

  • (a) The depicted scene is represented by ventral perception.

  • (b) The depicted scene is not represented by dorsal perception.

  • (c) The picture’s surface is represented by dorsal perception.

  • (d) The picture’s surface is not necessarily represented by ventral perception.

Seeing-in can be given only when dorsal vision represents the surface whereas ventral vision represents the depicted object (Nanay 2011: 466, 477), or when ventral vision represents both the surface and the depicted object (Ibid.). The reason is that, following (b), dorsal perception of the depicted object is not possible. Indeed, seeing-in is a peculiar visual state in which the visual brain of the subject has to simultaneously visually represent both the surface and the depicted object. Then, in order to enter such a peculiar visual state, at least one stream must represent the surface and at least one stream, not necessarily the same one, must represent the depicted object.

As I said, according to the DVAPP, (conscious) visual recognition, which is subserved by ventral processing, can sometimes represent both the surface and the depicted object. In this case seeing-in is, according to the DVAPP, inflected, i.e. “the characterization of the properties by which a certain subject is seen in a given picture as having refers to the design properties of the picture’s vehicle, i.e., to the visible surface properties of that vehicle that are responsible for the fact that one such subject is seen in it, precisely taken in such a design role” (Voltolini 2013). Nanay suggests that “any account of inflected seeing-in must be able to tell how our inflected experience represents these two entities and how these representations are different from the way we represent surface and scene in the case of uninflected seeing-in. It is not enough to say that we see both the picture’s surface and the depicted object, because in this case it remains unclear how seeing the design in the case of uninflected seeing-in is different from doing so in the case of inflected seeing-in” (Nanay 2010: 199). Thus, inflection depends on the fact that not only the depicted object, but also the picture’s surface is represented by the cortical portion of our visual system that is involved in (conscious) object recognition—i.e. the ventral stream (p. 202)–so that we consciously visually represent (i.e. recognize) the depicted object as having design-scene properties (p. 203). Inflection is not possible when we only have a dorsal representation of the surface (along with a ventral representation of the depicted object) because dorsal vision is not responsible for (conscious) visual recognition.

However, seeing-in can also be uninflected: we (our ventral processes) do not consciously represent the design-scene properties of the surface as such (p. 197)—the surface being only dorsally represented.

That said, it has been recently shown (Ferretti 2016a, b, c, 2017a, c) that also dorsal vision can represent depicted objects: it can attribute action properties to these depicted objects apparently presented in the peripersonal action space of the observer. This leads us to reformulate (b) in the following way:

  • (b1) The depicted object can be represented by dorsal perception

This claim can be defended without any conflict with the DVAPP (Ferretti 2016a: 4.2). Also, this claim does not endanger the notion of inflection offered by the DVAPP: even if dorsal perception can represent also the depicted object, seeing-in is, as said, inflected only when the visual chunk involved in (conscious) visual recognition, that is, the ventral stream, is attuned to both the picture’s surface and to the depicted object. Dorsal vision is unconscious, and is not responsible for visual recognition; thus, it cannot be responsible for such a peculiar visual effect.Footnote 1 I’ll get back to inflection in (Sect. 4.4).

The aim of the present paper is to suggest a new angle on picture perception, by showing that the answer to the question about the nature of the perceptual state we are in during picture perception is much more complex. This will be done by starting from the result concerning the possibility of a dorsal representation of depicted objects and on the basis of further recent evidence from visual neuroscience. If we admit the possibility of a dorsal representation of depicted objects, we have a sort of closure of the circle: both streams can represent both the surface and the depicted object (Ferretti 2016a: Sect. 5). This point opens to the possibility, in line with neuroscience, of an important new claim, which is the main twofold claim of this paper: both when we perceive the picture’s surface/vehicle and when we perceive the depicted object, the perceptual state we are in is subserved by both dorsal and ventral vision. Furthermore, since recent experimental evidence crucially suggests that most of our visual capacities are generated by an interaction between the streams, it is possible to suggest that seeing-in is subserved by dorsal/ventral interactions.

This twofold claim, which below will be further divided into four sub-claims, allows to improve the DVAPP, by taking into account the current status of the TVSM in vision neuroscience, in relation to the new results concerning interstream interaction and by using them in order to give a reliable and updated empirical background to our best empirically informed philosophical theory of depiction. Before cashing out my account, we have to briefly examine these results on interstream interaction.

2 Seeing-In: Beyond the Initial Formulation of the Two Visual Systems Model

The DVAPP follows the version of the TVSM that suggests the presence of a functional dissociation between the streams (Nanay 2011: 465; Ferretti 2016b, 2017a). Such a dissociation is based on two general arguments: the argument from the studies on cortical lesions and the one from illusions. The first suggests that the two visual paths can be dissociated due to cortical lesions. Dorsal lesions impair visual guidance of action (optic ataxia), but not object recognition; ventral lesions impair visual recognition (visual agnosia), but not action guidance (see Milner and Goodale 1995/2006; Jacob and Jeannerod 2003). The second shows that, in healthy humans, only ventral perception is fooled by perceptual illusions (Milner and Goodale 1995/2006; Aglioti et al. 1995; Nanay 2011: 465, 2015: 184; Briscoe 2009; Briscoe and Schwenkler 2015; Ferretti 2016b: 5.2).

Now, in general, the original view proposed by the TVSM about the dissociation has been recently questioned, both on philosophical (Kozuch 2015; Wu 2014; Shepherd 2015; Mole 2009; Briscoe 2009; Ferretti 2016b, c; Zipoli Caiani and Ferretti 2016; Brogaard 2011a, b; Nanay 2013) and empirical grounds and concerning several visual tasks (Verhoef et al. 2011; Perry and Fallah 2014; Wokke et al. 2014; Borra et al. 2007; Van Polanen and Davare 2015; Hoshi and Tanji 2007; Cohen et al. 2009; Ferretti and Chinellato In press). First, such a dissociation is not very deep in healthy humans, because all the complex forms of human visual processing seem to rely on an anatomo-functional interplay between the streams (henceforth: interstream interplay) (Jacob and Jeannerod 2003: 255; Briscoe 2009: footnote 8; Chinellato and Del Pobil 2016: 2.3.1.3; Ferretti 2016b; Fogassi and Luppino 2005; Borra et al. 2007; Zanon et al. 2010). Accordingly, neurophysiology of vision suggests that there is no rigid functional separation between the streams at various points in the visual processing (Schenk and McIntosh 2010; Ferretti 2016b: 5; Chinellato and Del Pobil 2016). Indeed, the streams interact via anatomical (Kravitz et al. 2011, 2013) and functional connections (Jacob and Jeannerod 2003; Briscoe 2009; Cloutman 2013; Chinellato and Del Pobil 2016; Ferretti 2016b: 5). Summing up, recent treatments of the topic have suggested a crucial reformulation of the TVSM, especially in relation to the old view of a deep dissociation (de Haan et al. 2018; Rossetti et al. 2017; Goodale and Milner 2018).

In particular, the above mentioned arguments in support of the dissociation are not reliable for the following reasons. First, even dorsal vision-for-action can be affected by illusions (Kopiske et al. 2016; Bruno and Battaglini 2008; McIntosh and Schenk 2009; Briscoe 2009; Ferretti 2016b: 5.2). Second, even ventral conscious vision can generate egocentric representations of the encoded targets (Briscoe 2009). Third, the contents of visual consciousness and awareness should not be exclusively identified with the content of ventral vision (Clark 2009; Schenk and McIntosh 2010; Ferretti 2016b: 5.5), because no conclusive evidence suggests that dorsal vision is unconscious (Nanay 2013; Gallese 2007; for a critic see Brogaard 2011a, b; see also Ferretti 2016b: 5.5). Also, most of the visual processing involved in vision-for-action can result from interstream interaction (Chinellato and Del Pobil 2016) and, thus, ventral visual recognition can contribute to dorsal action guidance (Nanay 2013; Brogaard 2011a, b; Briscoe 2009; Ferretti 2016b: 5.5). Similarly, ventral conscious visual recognition makes use of the information managed by the dorsal stream (Gallese 2007; Brogaard 2011a). So, though object recognition and visual action guidance are mainly subserved, respectively, by ventral and dorsal processing, they are not exclusively subserved by them: they depend on different kinds of interstream interaction, in which one stream plays the predominant role and the other stream offers a computational contribution (Sects. 4, 5).

The DVAPP is the best philosophical account of picture perception we have in the light of vision science and represents a crucial step forward for the debate on pictures. But it investigates picture perception by assuming dissociation between the streams. However, our best model of picture perception should say something about how we should conceive seeing-in in the light of the new evidence concerning the TVSM, which suggests interstream interaction. My proposal describes the neural dynamics of seeing-in by taking into account this new evidence.

If we endorse that also the dorsal stream represents the depicted object (Ferretti 2016a) (Sect. 1), and we follow the evidence about interstream interaction, it is possible to claim that both the depicted object and the surface are visually represented by both streams in a specific sense: they are visually represented by dorsal/ventral interactions, in relation to both object recognition, i.e. the cases described by (a) and (d), and vision for action concerning the attribution of action properties, i.e. the cases described by (b) and (c).

In light of this possibility, here I will defend the following four new claims, which are inspired by Nanay’s account:

  • (A) The visual recognition of the depicted object, a function attributed to ventral vision (a), is shaped by interstream interplay.

  • (B) The attribution of action properties to the depicted object, a function attributed to dorsal vision (b1), is shaped by interstream interplay.

  • (C) The attribution of action properties to the picture’s surface, a function attributed to dorsal vision (c), is shaped by interstream interplay.

  • (D) The visual recognition of the picture’s surface, a function attributed to ventral vision (a), is shaped by interstream interplay.

3 Interstream Interplay in Picture Perception

In what follows, I provide an argument for each of the four points listed above. First, I analyze the claims concerning interstream interplay about object recognition and the attribution of action properties in the case of depicted objects: A (Sect. 3.1), B (Sect. 3.2). Then, I analyze these two functions in relation to the picture’s surface: C (Sect. 3.3), D (Sect. 3.4). I will also explain the nature and the relation of these different interplays (Sect. 4). Finally, I will list the advantages of my account with respect to the original formulation of the DVAPP (Sect. 5).

3.1 (A) Visual Recognition and Interstream Interplay: The Depicted Object

Following the version of the TVSM that suggests dissociation, ventral vision is responsible for (conscious) visual recognition. So, visual recognition of the depicted object should be subserved by ventral vision (Nanay 2011). However, several sets of evidence suggest that visual recognition is actually subserved by interstream interplay. Thus, also visual recognition of the depicted object should be the result of such interplay. We can now analyze the evidence in support of this idea.

Visual object recognition can be conscious or unconscious (Milner and Goodale 1995/2006; Jacob and Jeannerod 2003). This holds for picture perception, as well as for face-to-face perception (Nanay 2011, 2015, 2017). Now, we have evidence of a dorsal contribution to ventral processing during object recognition. This contribution can sometimes be responsible for the conscious dimension of object recognition, and sometimes responsible for its unconscious dimension. Such a contribution is at work also during picture perception. We can now analyze the compelling experimental results in favor of this perceptual fact.Footnote 2

Let us start from conscious recognition. While the previous view was that ventral vision is, alone, responsible for high quality conscious visual processing, there is now substantial evidence that dorsal processing plays an important role in such a perceptual task. For example, in the case of neglect, dorsal lesions disrupt the conscious awareness of the quality of objects (Gallese 2007: 8). In particular, we are talking about lesions at the inferior parietal lobule (IPL), which is related to a subcomponent of the dorsal stream, the ventro-dorsal stream, which is active in picture perception (Ferretti 2016a, b, c, 2017a, b).Footnote 3 Drawing on this evidence, it has been suggested that, without the contribution of dorsal processing, ventral processing is insufficient to obtain high-level conscious (spatial) perception. But even if dorsal processing is functionally necessary, it is functionally insufficient, on its own, for normal visuospatial awareness (for a critical review see Brogaard 2011a, b; Ferretti 2016b: 5.5).

Indeed, we can recognize the role played by the IPL in visuospatial awareness, without being committed to the claim that dorsal representations are, alone, conscious–which is a controversial claim at the moment (Brogaard 2011a, b). Gallese suggests that “Neglect patients are able to process stimuli presented within the neglected field up to a categorical semantic level of representation. However, they are not aware of them in the absence of IPL processing. This implies that the parieto-premotor circuits of the ventro-dorsal stream must be intact for achieving awareness even of those stimuli, such as fruits or animals that are mostly analyzed in the ventral stream. Lesions of sensory-motor circuits, whose primary function is that of controlling movements of the body or of body parts towards or away from objects, produce deficits that do not exclusively concern the capacity to orient towards objects or to act upon them. These lesions produce also deficits in body, space, and object awareness” (2007: 9). For Brogaard (2011a), “one hypothesis is that the IPL transmits information to the ventral stream, perhaps via feedback to striate cortex, and that this feedback of information is required in order for ventral stream processing to give rise to conscious spatial representations. According to Brogaard, “This hypothesis is consistent with Jean Bullier et al.’s (2001) suggestion to the effect that feedback from the dorsal stream to striate cortex can influence ventral stream processing. On this view, the two visual streams interact via extrastriate-striate or patietal-striate feedback” (2011b: 1094). This piece of evidence clearly suggests that conscious vision responsible for object recognition is subserved by dorsal/ventral interactions, given the role played by the dorsal stream in the shape recognition mainly subserved by ventral processing.

In line with these results–and while it is usually suggested that the scope of dorsal shape processing is for motor purposes (Theys et al. 2015)–reliable evidence shows that dorsal responses are partially involved in shape processing for object recognition that is not directly or necessarily related to motor processing for action performance (Grill-Spector and Malach 2004; Grill-Spector et al. 1999; Sawamura et al. 2005; Sereno and Maunsell 1998; Konen and Kastner 2008; Laycock et al. 2009; Sim et al. 2015) and that, unlike the case mentioned above, is not necessarily consciously accessed, but rather remains at the subpersonal level of visual processing that generates unconscious visual states. Indeed, while dorsal processing has a peculiar selectivity for (depicted) tools (Ferretti 2016b; Rice et al. 2007; Fang and He 2005), it does not necessarily respond only to manipulable objects (Laycock et al. 2011).Footnote 4 In this respect, different neural families within the same portion of the dorsal stream can generate different responses, either related to action, or related to recognition, and with respect to different target objects. For example, different neurons in the famous AIP-F5 circuit mainly involved in computations for grasping (Ferretti 2016a, b) respond to different (depicted or real) objects and encode shapes, size and orientation for object recognition, which are only sometimes used for grasping purposes (Chinellato and del Pobil 2016: 2.3.2.1; Raos et al. 2006; Theys et al. 2012, 2015; Ferretti 2016a; Murata et al. 2000). This is also suggested by the construction of experimental settings in order to study object recognition in dorsal areas, with respect to different kinds of targets: “Investigating the neural representation of object shape demands systematic stimulus manipulations (e.g., stimulus reduction), therefore visual object representations can be primarily studied in neurons that respond to images of objects (either 3D or 2D), as in AIP and F5a” (Theys et al. 2015: 7)—for example, most of the F5a neurons responsive for objects’ depth structure are visual-dominant (see also Chinellato and del Pobil 2016). “Neurons in F5p and F5c, in contrast, respond selectively to real-world objects (Raos et al. 2006) but not to 3D images of objects (Theys et al. 2012), most likely because these areas represent grip types, which are not activated by images of objects” (Theys et al. 2015: 7; see also Ferretti 2016a). Accordingly, we have reliable evidence that the parietal cortex (the inferior intraparietal sulcus) plays an important role in object identification (Xu 2009) and that, thus, object recognition is not exclusively a ventral affair. This involvement of the parietal cortex (in particular of the inferior intraparietal sulcus) in object processing is confirmed by different experimental results (Xu and Chun 2009: 168; Xu 2009: 516; Bettencourt and Xu 2013)—note that the posterior parietal cortex responds to stimuli that are not associated with action. Furthermore, the response of the lateral intraparietal cortex for shapes that do not exhibit any graspable features shows that the dorsal responses to shapes can be effectively detached from the computations related to motor interaction (Sereno and Maunsell 1998). Therefore, like the AIP (and F5, as well as, in general, the ventro-dorsal stream) (Ferretti 2016a), also the lateral intraparietal cortex responds both to 2-D and 3-D shapes (Durand et al. 2007).

These results suggest that different neural groups and sub-cortical portions of the dorsal stream respond to both normal (real) and depicted objects in relation to shape processing involved in object recognition that is not related to action.Footnote 5 It is not by chance that:

“while neuroanatomical dissociations do exist between a dorsal and ventral visual pathway, interpretations of the function of these streams is less certain. Specific tasks such as object and face recognition may not be subserved exclusively by ventral stream mechanisms, and there is some emerging evidence to suggest that certain aspects of object recognition, such as recognition of an object’s orientation in space, may be processed by dorsal-stream mechanisms (…). Dorsal-stream mechanisms may be more integral to visual perception than previously thought, and may be directly implicated in object recognition mechanisms that are thought to be purely “ventral”” (Farivar 2009: 151).

Also,

“Taken together, the studies that have directly assessed the response to shape in the dorsal and ventral streams seem to suggest that dorsal regions do encode certain aspects of the objects including shape, size, orientation and viewpoint in manner similar to the representations in the ventral stream” (Ibid: 148).

So, in many cases, both streams show very similar response patterns–“the dorsal system for object information showed very similar response patterns to those in the ventral system” (Konen and Kastner 2008: 229)–and their computation of shape does not seem to be much different given that there is the possibility that “there is parallel encoding of object information in the two pathways” (Ibid.), as suggested by results concerning their comparable activations during the encoding of shapes (Zachariou et al. 2014) and the evidence that different cortical portions of the dorsal and of the ventral pathways are selective for 3-D shapes (Sereno et al. 2002). Consistent with these findings, “units in posterior parietal cortex contribute to attending to and remembering shape features in a way that is independent of eye movements, reaching, or object manipulation. These units show shape selectivity equivalent to any shown in the ventral pathway” (Sereno and Maunsell 1998: 500). However, dorsal responses may be different in function of the point of view, i.e. they are viewpoint-variant, whereas ventral responses are viewpoint-invariant (Konen and Kastner 2008: 224; Farivar 2009; see Chinellato and Del Pobil 2016: Sect. 2 for a complete review).Footnote 6

In addition, were the reader not satisfied with this evidence, dorsal processing is also crucial in the recognition of words during reading, insofar as it is involved in the spatial encoding of the position of the letters the word is composed of (Pammer et al. 2006). Note that, in these experimental settings, the words are shown on a screen. This genuinely counts as a case in which dorsal vision is performing recognition, during reading, in relation to a depicted object, i.e. the depicted word on the screen (for the behavior of dorsal processing with pictures and screens see Ferretti 2016a; Zipoli Caiani and Ferretti 2016).Footnote 7

All I have said specifically suggests a dorsal involvement in visual object recognition, even in the case of pictures. But there is also further evidence we can consider. One of the most important pieces of evidence concerning interstream interplay during visual recognition is the one that shows that dorsal processing facilitates object processing performed by the temporal areas of the ventral stream, insofar as dorsal vision is responsible for directing attention—with its orbito-frontal and fronto-parietal projections–necessary for ventral processing to start object reconstruction. The importance of dorsal vision in spatial attention is well recognized in the literature (Ikkai et al 2011; Valyear et al 2006; Noudoost et al 2010).Footnote 8 However, it is undeniable that the subcomponents of the occipito-temporal cortex, related to the ventral stream, remain the crucial areas mainly involved in object recognition (Bar 2003; Bar et al. 2006; Barrett and Bar 2009; Milner and Goodale 1995/2006). Furthermore, it has been suggested that there is a functional division of work in object recognition, specifically concerning object individuation and object identification, between the intraparietal sulcus related to the dorsal visual brain and the lateral occipital complex related to the ventral visual brain (Xu 2009: 516; see also Chinellato and Del Pobil 2016: Sect. 2). Accordingly, important evidence concerning dorsal response to object information has been reported by Freud et al. (2015a). They showed that while ventrally damaged subjects show serious problems in object recognition (this is a famous point, see Milner and Goodale 1995/2006; Briscoe 2009), there is, still, also little dorsal response to object structural information. And even if dorsal and ventral computations in visual processing are built independently, in healthy humans the presence of several anatomo-functional connections between the two streams suggests that they interact (Freud et al. 2015b). This suggests that, as for the case of object awareness described above, dorsal vision cannot be, alone, responsible for object recognition. However, its activity may play an important role in this process. Indeed, dorsal processing is involved in low-level processing concerning object discrimination (Laycock et al. 2011) and seems to play a key role in the “derivation of the global geometry of the object” (Freud et al. 2015a: 11), thanks to the parietal activity involved in the encoding of spatial and 3-D information (Durand et al. 2009; Georgieva et al. 2009; Nelissen et al. 2009; James et al. 2002; Taira et al. 2001; Sakata et al. 2003; Srivastava et al. 2009; Orban 2011; Tsutsui et al. 2002, 2005; Verhoef et al. 2015). In this respect, Freud et al. (2015a) discuss several results showing that “dorsal regions (…) are sensitive to object shape (…) and interact closely with ventral cortex to aid recognition” (p. 11). This is in accordance with the evidence that perception of 3-D shape is subserved by a ventral-dorsal interplay (Peuskens et al. 2004). This is also in line with the evidence that different visual stages of object processing are based on specific anatomo-functional interactions between the streams, such as in the case of the vertical occipital fasciculus (Takemura et al. 2016; Yeatman et al. 2014) and of the caudal intraparietal sulcus (Chinellato and del Pobil 2016).

Finally, note that the AIP, which is taken to be a crucial area of the dorsal stream (Culham et al. 2006), participates in object recognition performed by the inferotemporal areas in the ventral stream, such as Tem, Te, Teo, with which the AIP has reciprocal connections (Verhoef et al. 2011; Zanon et al. 2010; Fogassi and Luppino 2005; Tanaka 1996; Chinellato and Del Pobil 2016; Rizzolatti and Sinigaglia 2008). Moreover, the above mentioned evidence on the functional responses to shapes by the lateral intraparietal cortex is in accordance with that showing its anatomical projections to the ventral stream (Webster et al. 1994).

There is also another important point. As mentioned previously, depicted objects can be visually represented as apparently falling within the peripersonal action space of the observer and as apparently reachable (Ferretti 2016a)—note that peripersonal localization, which is the localization of an object within the motor space within our reach, requires egocentric localization (but not vice versa): to localize an object in one’s action space, the space within one’s reach, means localizing it from one’s point of view. This claim concerns the subpersonal level of perception, insofar as it regards dorsal vision. But some authors suggested that even our conscious vision, mainly (but not totally) subserved by the ventral stream, can quasi-egocentrically represent, to some extent, depicted objects (Briscoe 2009). The idea that we can quasi-egocentrically perceptually represent depicted objects by representing their relative depth cues (i.e. depth relations within and among objects) is accepted in the literature (Briscoe 2009: 447, 6.1–6.4, 2016; Grush 2000; Millikan 2004: 123; Ferretti 2016a: 4.1; Vishwanath 2014: 155; Cutting 2003; Hecht et al. 2003)—even if absolute egocentric localization (i.e. depth relations concerning the observer’s possible motor action within the peripersonal space) is not possible with depictions (see Sects. 3.4, 4.1) (Ibid.). This is in line with the idea that also conscious ventral vision mainly works in egocentric coordinates (Foley et al. 2015; Briscoe 2009: 447) and with the evidence on interstream interplay that each stream can access the information managed by the other stream even concerning the frame of reference (Briscoe 2009; Briscoe and Schwenkler 2015).Footnote 9

Summing up, we saw that dorsal vision computes the structures of shapes even when motor response is not recalled by the properties of the target encoded. The information resulting from this computation can then be used by ventral vision. This crucially suggests that “normal object recognition likely requires the integrative action of the dorsal and ventral streams” (Farivar 2009: 145). This is true even in the case of depicted objects: both streams respond to object shapes, in different manners and for different purposes, in the case of both 2-D and 3-D targets.

The evidence exposed in this section supports the claim that recognition of depicted objects is subserved by really complex computational processes given by interstream interplay. However, this interplay is mainly ventral, given that ventral processing is the cutting edge of object recognition. The reader should note that saying that an interplay is ‘mainly ventral’ means that, although its processing is given by interstream interactions, ventral processing plays the predominant role with respect to dorsal processing in the task subserved by this interplay. The same holds when I talk of a ‘mainly dorsal interplay’. I’ll get back to this point in (Sects. 4.1, 4.2), where I will explain the role of the mainly ventral interplay in both pictorial recognition and the recognition of real objects, like surfaces.

3.2 (B) The Attribution of Action Properties and Interstream Interplay: the Depicted Object

We saw that our vision-for-action can attribute action properties to depicted objects apparently presented in the peripersonal action space of the observer (Sect. 2). I maintain here the discourse at the visuomotor subpersonal level. Saying that depicted objects offer action possibilities does not mean, of course, that we are consciously perceiving a real action possibility, but just that some part of our visuomotor brain behaves as if the action properties pertaining to the depicted object pertained to a real object (Ferretti 2016a: 4.1, 2017c).Footnote 10 We have reliable evidence that such a visual attribution is subserved by interstream interplay, which is mainly dorsal.

The crucial argument here in order to suggest that the attribution of action properties to depicted objects is subserved by interstream interplay is the following. Everyday objects offer us a variety of action possibilities and, thus, different motor acts to perform upon them. The selection of the appropriate motor act does not depend only on the layout properties displayed by the object, but on our motor expertise, as well as on what we intend to do with it (Zipoli Caiani and Ferretti 2016; Ferretti and Zipoli Caiani 2018). The interplay between the analysis of the physical properties (pragmatic analysis) and the object identity (semantic analysis) is due to connections between the ventro-dorsal stream and the ventral one (Ferretti 2016b: 5.3; Zipoli Caiani and Ferretti 2016; Chinellato and Del Pobil 2016).

Even when we attribute action properties to depicted objects, the action represented is very specific, depending on the kind of depicted object we perceive (a depicted mug and a depicted hammer will recall different action properties). Indeed it has been suggested that, most of the time, the dorsal response related to the surface—this point is related to the claim (c) of the DVAPP–is different from the one related to the depicted object because the action properties recalled by the two objects are different (Ferretti 2016a: 4.1). But if we want to account for this specificity, we have to account for the mix between semantic and pragmatic responses, which is given by the following interplay. The attribution of action properties to depicted objects is possible thanks to the parieto-premotor circuit composed by the anterior intraparietal area (AIP) and the most rostral part of the ventral premotor cortex, namely F5. The AIP selects the geometrical properties to be translated into action properties and to be sent to F5 for the encoding of proper motor acts (for a review concerning this visuomotor transformation see Ferretti 2016a, b). The detection of the action properties related to the semantic functions of the object—e.g. think about the different grips we can use in order to use a pen in different ways: writing vs. throwing the pen–is possible because the AIP participates in the object recognition mainly performed by the inferotemporal areas related to the ventral stream, with which the AIP is connected (Fogassi and Luppino 2005: 627; Rizzolatti and Sinigaglia 2008: 36–38; Ferretti 2016b: 5.3; §3.1). After the semantic analysis, the information is sent from the AIP to F5. At this point, on the basis of this analysis, the neural populations in this circuit generate a competition that will determine the selection of the most appropriate motor act, among those computed, with respect to the action possibilities detected (Rizzolatti and Sinigaglia 2008: 36–38; Kandel et al. 2013: Chap. 19; Zipoli Caiani and Ferretti 2016).Footnote 11

Furthermore, we know that, while the spatial location of the object is mainly computed by the dorsal system, which does not need high-quality visual information managed by the ventral stream for this task, in order to build a reliable representation of a possible motor interaction–especially during the grip shaping–dorsal vision needs the selection of the object features on the basis of the semantic encoding performed by ventral areas (Goodale and Milner 2004). In particular, dorsal processing can access memory-stored information about objects processed by the ventral stream (Singhal et al. 2007, 2013; for an analysis see Briscoe 2009) to compute suitable motor processing.

Summing up, we have to account for the fact that we perceive that the action property of the surface is usually different from the one of the depicted object. But we also have to account for the fact that we can perform different action property attributions in relation to different depicted objects. The computation of semantic information is necessary to perform highly specific motor responses with respect to different action properties related to the semantic functions of the objects we deal with. This is true even in the case of pictorial action properties (Ferretti 2016a: 4.1). Indeed, if the ascription of action properties needs a semantic representational component, and if they can be ascribed to depicted objects, therefore, when we ascribe action properties to depicted objects, we need such a semantic representational component. But this representational component, crucial for action preparation, and activated even in the case of subpersonal visuomotor responses to depicted objects, is subserved by a ventral-dorsal interplay. All this suggests that motor responses and action property attribution in the case of depicted objects are given by the functional interstream interplay of dorsal pragmatic processing and ventral semantic processing: the dorsal visuomotor response about the potential grip can distinguish between the functional differences of several objects with the help of the ventral semantic encoding.Footnote 12

3.3 (C) The Attribution of Action Properties and Interstream Interplay: The Picture’s Surface

What I need to say here is that the attribution of action properties to the picture’s surface, which is a real object, is similar to the processing reported in (Sect. 3.2), concerning the attribution of action properties to the depicted object. Such an attribution is, thus, subserved by the same kind of interstream interaction. As already pointed out, since action representations are very specific, the contribution of the semantic computational component given by ventral processing is important for the pragmatic dorsal component in order to represent suitable action properties related to the surface. Thus, interstream interplay allows relying on specific motor responses, related to specific action properties, with respect to the surface and the depicted object, given that those related to the surface are usually different from those related to the depicted object (most of the time, we represent a precision grip with the surface, while objects depicted might recall very different kinds of grips). The interplay discussed here is the same as that discussed in (Sect. 3.2): a mainly dorsal one. Moreover, it cannot be consciously accessed, and, as we shall see, it cannot be responsible for the visual detection of presence for reliable motor interaction. For this visual detection we need a mainly ventral interplay, which is the same as that discussed in (Sect. 3.1) and which can distinguish between real objects like surfaces and depicted objects–except for special cases (Sects. 3.4, 4.1, 4.2).

Now I can describe the interplay related to the recognition of the surface.

3.4 (D) Visual Recognition and Interstream Interplay: The Picture’s Surface

The cortical visual mechanisms at the basis of recognition discussed in (Sect. 3.1), concerning the recognition of both real and depicted objects, are the same at the basis of the recognition of the surface—although, as we shall see, the representational activity of these same cortical mechanisms is what really allows us to distinguish between the real and the pictorial (Sect. 4.2). We saw that, while ventral processing is the cutting edge of visual recognition, it also needs information managed by the dorsal stream in performing such a task.

But there is something more to be specified here. Unlike a depicted object, to which we can ascribe action properties, the surface is perceived as a real and present object we can really interact with. In (Sects. 4.1, 4.2), I will suggest the difference between pictorial recognition and the recognition of real objects like surfaces. Before doing that, it is important to note that the capacity to recognize the presence of a surface as a real and present object that offers possibilities of reliable interaction is very important for our experience of pictures, that is, for us to be in a visual state of seeing-in. This point deserves careful examination.

Vision science has shown that perceiving the presence of a surface as a real object we can interact with is what allows us not to be under the illusion that the depicted object is a real and present one (Sect. 4.3). Indeed, evidence shows that, when there is no possibility of having a visual representation (neither conscious, nor unconscious) of the surface, the depicted object looks present (Vishwanath 2011, 2014; Vishwanath and Hibbard 2013; for a discussion see Ferretti 2016c, forthcoming). The more the surface is non-visible, the more the object looks real (Ibid.): “In the absence of visible picture surfaces, it is plausible that the brain attributes the accommodation response to the pictorial objects, and assigns any associated distance information to them, allowing absolute depth values to be derived” (Vishwanath and Hibbard 2013: 1682–1683; see also Vishwanath 2014: 159–160). Perception of egocentric absolute depth concerns the fact that the “observer has knowledge of the depth relations scaled in some meaningful way to the actions of the observer” (2011: 222; see also p. 206).Footnote 13 Ascribing absolute egocentric depth values to the objects we deal with is what leads us to visually represent an object as present in our peripersonal action space and as offering reliable possibility for motor interaction (Ferretti 2016c). When the surface is not visible, our visual system (mis-)represents the possibility of reliable motor interaction in a pictorial space that no longer looks pictorial at all. This happens, for example, in the case of the famous trompe l’oeil pictures, in which the feeling of presence can be accounted for in terms of interstream interplay (Ferretti 2016c) that is in line with the account of trompe l’oeils offered by the DVAPP (Nanay 2015) (Sect. 4.3). However, most of the time this does not happen because we can perceive, at least unconsciously (see Sects. 4.3, 4.4), the surface as present (Ferretti 2017c, forthcoming).

All this suggests that perceiving the surface avoids the possibility of being fooled that the depicted object is real and present. This is what allows us to reach a visual state of seeing-in, during picture perception: a twofold visual state in which we simultaneously visually represent a depicted object as such, i.e. a pictorial content (most of the time represented consciously), and the surface in which the pictorial content is visually encoded, i.e. the bearer of such a content, which we visually represent, at least unconsciously (Sect. 4.4), as a real object we can interact with.

From what I said, it follows that the representation involved in the recognition of presence for reliable motor interaction (Vishwanath 2014) is, in picture perception, usually related to the surface. It has been suggested that this representation depends on interstream interplay (Ferretti 2016c). Here I suggest that this is a mainly ventral interplay. Since this is a crucial point, in the next sections I address the differences between a mainly ventral and a mainly dorsal interplay, and explain their role in picture perception, in relation to recognition, ascription of visual presence and action property attribution. I will also explain how we can have action properties attribution with, but also without, the ascription of visual presence, as well as how we can have visual recognition with, but also without, the ascription of visual presence.

4 Different Kinds of Interstream Interplay

We saw, through different examples, that both the visual representation of the surface and the visual representation of the depicted object are subserved by an interstream interplay. This is true for different tasks of (conscious or unconscious) recognition and action property attribution. Indeed, we can recognize, consciously or unconsciously, the depicted object (Sect. 3.1). We can unconsciously attribute action properties to both the picture’s surface (Sect. 3.3) and the depicted object (Sect. 3.2). We can recognize—mainly consciously, but also unconsciously (I’ll get back to this point in Sect. 4.4)–as present and as offering reliable motor interaction, only the surface (Sect. 3.4). Therefore, during picture perception, sometimes action property attribution is linked to the ascription of visual presence, as in the case of the perception of the surface, while sometimes it is not, as in the case of the perception of the depicted object. But also visual recognition is sometimes linked to the ascription of visual presence, as in the case of the perception of the surface, and sometimes it is not, as in the case of the perception of the depicted object. However, all these representations are subserved by interstream interplay, which can be mainly ventral or mainly dorsal. How is this possible? What is the difference between the interplays that subserve these processes? Which interstream interplay can be consciously accessed? Furthermore, we know there are pictorial illusions, like trompe l’oeils, which foster in us the visual feeling of presence of the depicted object, which is perceived as if it were a real one. What is the behavior of the interstream interplay in the case of trompe l’oeil? These questions need an answer in order for my theory to be sound. In this section, I will offer an answer to these questions. Before doing that, we need to analyze the general differences between the interplays described.

Consider the interplay of the kind reported in (Sects.  3.2, 3.3), thanks to which our visual system subpersonally ascribes action properties to both the surface and the depicted object. This is a mainly dorsal interplay. The reader should recall that, when I say that an interstream interplay is ‘mainly dorsal’, it means that, though its processing is given by interstream interactions, the functional processing of the dorsal stream plays the major role with respect to the one played by the ventral one in the task subserved by this interplay. The same holds when I talk of a ‘mainly ventral interplay’ (Sect. 3.1). This is perfectly in line with the results from the empirical literature (Kravitz et al. 2011, 2013; Ferretti 2016b, c; Chinellato and Del Pobil 2016). In the case of this interplay, we are dealing with a low-level attribution of action properties, which cannot be consciously accessed, and which is related to a subpersonal motor response. The reader should note that this process cannot be responsible for the perception of the object as a real object we can interact with. Indeed, this interplay represents both depicted and normal objects without computing the visual difference concerning presence. In this case, ventral perception only elaborates, at the subpersonal level, the information about the semantic properties of the object shape that are usually related to the grip that has to be used, which is computed by dorsal processing. We can also address the neural correlates involved in this interplay. For example, the subcomponent A of the ventral stream and the subcomponent B of the dorsal stream interact for a specific purpose. A (e.g. inferotemporal cortical sub-portions related to the ventral stream, Sects. 3.1, 3.2) is responsible only for the semantic information related to the shape, which is not consciously accessed in this case. This information is used by the subcomponent B of the dorsal stream (e.g. AIP-F5) to build a potential motor act (Sect. 3.2).Footnote 14 The motor act is stored in a sort of neural motor quiver and is not consciously accessible (Jeannerod 2006; Chinellato and Del Pobil 2016; Raos et al. 2006; Ferretti 2016a, b). This process is subpersonally and automatically triggered every time very specific low level visuomotor cues are detected thanks to the geometrical characteristics of the object (Ferretti 2016a, b, c, 2017b).Footnote 15

Consider now the kind of interplay that represents whether the object we deal with is really present and reliably manipulable, like the surface, or is just a pictorial one. This is a mainly ventral interplay. We can address the neural correlates here too. For example, the subcomponent A1 of the ventral stream given by the visual areas that are responsible for high-level object processing (e.g. inferotemporal areas and the lateral occipital area (LO) contained in the lateral occipital complex (LOC), etc.) (Chinellato and Del Pobil 2016; Briscoe and Schwenkler 2015) and the subcomponent B1 of the dorsal stream (e.g. inferior parietal lobule, ventro-dorsal stream) interact for specific purposes, which are the following. In this case, A1 triggers not only a semantic representation, but also–and this is the main difference with respect to the former interplay–its peculiar ‘response selection’, which is the process of high-level recognition of particular visual features (Sect. 4.2) by which the ventral stream attests that the object is a real, present one, and not a pictorial one, and thus offers reliable motor interaction (Westwood et al. 2002; Ferretti 2016c). Here B1’s processing is responsible for the computation of the information, used by the ventral stream, that is related to this high-level object reconstruction—we saw that some portions of the dorsal stream, the specific parietal areas mentioned in (Sect. 3.1), are also involved in recognition. Thanks to the process of response selection, the ventral stream is the pathway mainly involved in the selection of targets for reliable action, a process called action planning, though some of its subcomponents also select specific ways of interaction, with respect to the semantic information related to the object, which is used by the dorsal stream, as we saw, for the interplay above discussed.

Note that the dorsal stream alone, as well as the mainly dorsal interplay described above, cannot distinguish whether an object is real or pictorial (Westwood et al. 2002; Ferretti 2016a, b, c). On the basis of the geometrical properties detected, the mainly dorsal interplay represents the action possibilities recalled by the geometrical arrangement encoded and in relation to its semantic meaning, thanks to a minimal contribution of the ventral stream (which as said, in this case, is not related to response selection, but only offers a minimal contribution about semantic processing). Thanks to this representation, a set of motor acts is stored, as we saw, in a sort of neural visuomotor quiver, regardless of the fact that the object is real or depicted (Ferretti 2016a, b). Indeed, a mainly dorsal interplay can distinguish between the action possibility offered by a pen and the one offered by a handle, on the basis of their geometrical arrangement and their semantic aspect. It can also respond to these differences in the case of objects in pictures and can distinguish between graspable and non-graspable depicted objects (Rice et al. 2007; Ferretti 2016a). But it cannot distinguish between real and depicted objects (Westwood et al. 2002; Ferretti 2016a, c).

However, it is only by truly recognizing an object as real and present (i.e. as not depicted) that response selection triggers action planning toward it. Now, action planning depends on the mainly ventral interplay (see below). It is important to note that only when action planning is triggered toward an object,Footnote 16 and we decide to act on it, we can effectively use the thin motor parameters computed by the mainly dorsal interplay, which are represented on the basis of the geometrical arrangement of the object, and are stored in our visuomotor quiver, to generate overt visuomotor interaction with it (Milner and Goodale 1995/2006; Zipoli Caiani and Ferretti 2016; Goodale and Milner 2004; Milner and Goodale 2008; Chinellato and Del Pobil 2016; Ferretti 2016b, c; Ferretti and Chinellato, In Press). This point is in line with the initial distinction between ‘ventral action planning’ and ‘dorsal motor programming’ suggested in the literature (Dijkerman et al. 2009; Milner and Goodale 1995/2006, 2008; Briscoe and Schwenkler 2015).Footnote 17 The former allows us to establish the nature of the object we face and, thus, whether it can be selected for action–it also establishes the general way to act with respect to our intentions (Milner and Goodale 1995/2006: 244). The latter computes the specific, thin motor parameters to be used for overt interaction (Ibid.).Footnote 18

That said, we have clear evidence that, though action planning (related to response selection) is a mainly ventral phenomenon, and motor programming is a mainly dorsal phenomenon, they are indeed subserved, respectively, by a mainly ventral and a mainly dorsal interplay. Consider first the evidence that motor programming is subserved by a mainly dorsal interplay. We know that not only is the lateral occipital complex (LOC) of the ventral stream crucial for high-level recognition (conscious or unconscious) (Chinellato and Del Pobil 2016: 2.4.1; Briscoe and Schwenkler 2015: 3.2.1), but that it is also involved in real-time visuomotor processing (Briscoe and Schwenkler 2015: 3.2.1). Indeed, the lateral occipital (LO) area contained in the LOC “contributes viewer-relative information about a target’s shape that augments the dorsal stream’s own bottom-up sources of input” (Briscoe and Schwenkler 2015: 1453).Footnote 19 This clearly suggests that ventral processing can also contribute, to some extent, to motor programming (p. 1437). But we should not forget that it is the visuomotor transformation performed by the dorsal stream, described above (Sects. 3.2, 3.3), that has the main role in computing, during motor programming, the specific, thin motor parameters for interaction. Consider now the evidence that action planning is subserved by a mainly ventral interplay. Even if ventral processing is the cutting edge of response selection and action planning, it is not, alone, sufficient for this task, in the light of the evidence (Sect. 3.1) that some dorsal areas are, to some extent, crucial for visuospatial awareness and recognition of objects presented in the peripersonal action space. Indeed, though high-level object recognition is mainly ventrally subserved, the information coming from dorsal processing (especially the IPL and the V-D) is crucial for ventral processing to accurately build representations concerning visuospatial awareness of objects (see the analysis by Brogaard 2011a: 1094 of Gallese 2007 and Bullier et al. 2001; Goodale and Milner 2004; Ferretti 2016c: Sect. 3; see also Sect. 3.1). This clearly suggests that dorsal processing is also, even if to a minimal extent, crucially involved in the manipulation of the information managed by ventral processing for object recognition, which is crucial to generate response selection and action planning. However, it remains true that the main role in establishing, through response selection, whether an object is real and can thus be selected for action planning,Footnote 20 is mainly done by the computational resources of the ventral stream.

My analysis is important to understand the role of interstream interplay in generating seeing-in, i.e. to understand the difference between the mainly ventral interplay that detects presence, which, in usual picture perception, represents the surface as present for motor interaction and the depicted object as a non-present object, and the mainly dorsal interplay involved in the low-level action property attribution, which can be performed also with a depicted object, not only with real objects like the surface. In the next two Sects. 4.1, 4.2. I will explain the relation between action property attribution, ascription of presence and visual recognition. This will explain why, though the representation of both the depicted object and of the surface are given by interstream interplay, we visually feel only the surface as present, while representing the depicted object as a non-present, pictorial object. As we shall see, this representational equilibrium is crucial for us in order to enter seeing-in.

4.1 Interstream Interplay, Action Properties and Visual Presence

We can attribute action properties to present objects, like a surface (Sect. 3.3). But we can also attribute action properties to non-present objects, like a depicted object (Sect. 3.2). How can we have action property attribution both with and without the ascription of visual presence? In order to answer this question, consider here two examples of different interplays.

In the case of the surface, as with all the real objects we deal with, the mainly ventral interplay establishes the real presence of it and triggers action planning toward it. Then, should we decide to act, the computational operations realized by the mainly dorsal interplay, concerning action property attribution, can be used for overt action. But things become more interesting in the case of depicted objects. During the visual representation of the depicted object, the interplay responsible for the representation at the basis of high-level recognition of presence, which is mainly ventral, activates a response selection that establishes that the object is depicted. Thus, there is no trigger of action planning, usually performed only with present objects, like the surface. Thus, in the visual representation of the depicted object, the visuomotor response given by the mainly dorsal interplay concerning action property attribution is triggered, but the congruent motor act cannot be performed.Footnote 21

At this point, the reader should note that saying that we can ascribe action properties to depicted objects does not mean, of course, that we perceive the possibility to reliably act upon them. It just means that some parts of our visuomotor brain (involved in generating the mainly dorsal interplay) directly respond to the geometrical configuration that, in the case of normal objects, would instantiate action properties (Ferretti 2016b: 3804, 2017c).Footnote 22 This is possible for a simple reason. Even without response selection and action planning, given by the mainly ventral interplay, the mainly dorsal interplay elaborates and stores the motor act related to the action property. Thus, action property attribution happens before the object is visually felt as present, and even without it being visually felt as present (Ferretti 2016b, 2017c). This is because dorsal vision cannot distinguish between normal and depicted objects (Ferretti 2016b; Westwood et al. 2002) and because dorsal responses are faster than ventral responses: due to the magnocellular advantage, the (mainly) dorsal low-level computations of the very specific motor parameters for action performance are triggered before the high-level (mainly) ventral processing involved in recognition occurs (Barrett and Bar 2009; Laylock et al. 2007). But the result of these computations concerning motor programming, performed by the mainly dorsal interplay, is stored and can be effectively used only after response selection and action planning are performed by the mainly ventral interplay involved in the encoding of presence (Briscoe and Schwenkler 2015; Briscoe 2009; Goodale and Milner 2004; Milner and Goodale 2008; Chinellato and Del Pobil 2016; Ferretti 2016b, c; Ferretti and Chinellato In Press; Zipoli Caiani and Ferretti 2016; Westwood et al. 2002). So, in the case of depicted objects, these mainly dorsal computations for motor programming are not used and then decay.Footnote 23 Conversely, when we perceive the surface, response selection given by the mainly ventral interplay allows us to represent it as present. And this elaboration triggers action planning, which can subsequently make use of the previously stored information about the specific motor act we might actually use, represented by the mainly dorsal interplay.

Summing up, while, to some extent, in both of these interplays there can be action property attribution, these attributions are different. The mainly ventral interplay establishes whether or not there is a possibility for action planning, that is, whether or not the object we deal with is real and present for interaction. The mainly dorsal one establishes how, from the point of view of motor programming, we can effectively act on the object on the basis of the action properties elicited by its geometrical arrangement and elaborates the thin motor parameters regardless of the fact that such an arrangement pertains to a depicted or to a normal object. Even if the action property attribution related to the visuomotor parameters computed by the mainly dorsal interplay is activated in the case of both normal and depicted objects, it is only when the mainly ventral interplay recognizes that the object is present, as in the case of the surface, that action planning is triggered and the computations made by the mainly dorsal interplay can be used (should we want to act upon it) for overt motor action.Footnote 24 The information related to these mainly dorsal computations remains stored and cannot be used for overt action in the case of depicted objects (Sect. 3.2). Therefore, without the mainly ventrally subserved response selection, which establishes the real presence of the target for suitable action planning, every low-level action property attribution, mainly dorsally subserved, remains disconnected from the representation of presence (cfr. Sect. 3.2).Footnote 25 It is always stored at the service of the mechanism of visual recognition of presence for action planning performed by the mainly ventral one.

All this explains why we can have action property attribution both with the ascription of visual presence, as in the case of the surface (Sect. 3.3), but also without it, as in the case of the depicted object (Sect. 3.2).Footnote 26

Now, a mainly ventral interplay is crucial to recognize the surface as a present object. And this recognition is crucial to obtain seeing-in, i.e. to perceive a pictorial content as such, encoded in a surface represented as a real and present object. Without this recognition, given by this specific interplay, we would have an illusion of presence of the depicted object (as suggested in Sect. 3.4, but see also Sect. 4.3). The representation performed by the mainly ventral interplay in the case in which it tracks the presence of the surface is different from the representation performed by this interplay in the case of the recognition of the depicted object, which is recognition of a pictorial object that does not display any visual presence for interaction. The next section explains the different representational activities realized by the mainly ventral interplay in relation to these two perceptual situations. This will allow me to explain how it is possible to have visual recognition with ascription of visual presence, as in the case of the surface (Sect. 3.4), but also without it, as in the case of the depiction (Sect. 3.1).

4.2 Interstream Interplay, Visual Recognition and Visual Presence

Differently from recognition of presence for interaction, as in the case of the perception of the surface (Sect. 3.4), pictorial recognition is recognition without the ascription of presence (Sect. 3.1). How can we have visual recognition both with and without the ascription of visual presence? Consider here the interplay reported in (Sect. 3.1), which subserves the recognition of the depicted object, and the one involved in the recognition of the picture’s surface, described in (Sect. 3.4). In both these two cases of recognition, the interplay at work is a mainly ventral one, because it is the one involved in the recognitional ability to distinguish between present objects for suitable action planning, such as the surface, and merely pictorial objects. In what follows, I explain how the difference between these two cases regards the representational result of the computations made by the mainly ventral interplay.

We already saw that response selection, performed by the mainly ventral interplay, visually discriminates whether an object is real or merely pictorial. Crucially, vision science has shown that response selection concerning the presence of an object is possible on the basis of the high-level visual detection of peculiarly enhanced visual features that only real objects seem to display in such an enhanced manner, whose detection allows ascribing absolute depth to the object, which, in turn, permits to ascribe presence, in line with the discussion provided in (Sect. 3.4). We are talking about the detection of an enhanced sense of vividness, glossiness, plasticity, spatial immersion of the object, which can be displayed, except for special cases (Sect. 4.3), only by real and present objects (Vishwanath 2014). In particular, response selection triggers action planning only with real objects because only the perception of real objects, like the surface, offers special visual features such as a sense of ‘real separation in depth’ of the objects in the visual scene, a ‘characteristic visual impression of solidity’ (Vishwanath 2014: 174), ‘a sense of clarity and visual sharpness’ which leads to ‘a more enhanced impression of color and color variation’ and the perception of ‘material qualities such as glossiness, shininess, roughness etc.’ (ibid.) as well as a sense of tangibility as an ‘impression of the manipulability of a real material object’ and of spatial immersion as an ‘impression of the capacity to move through a palpable negative space’ (ibid.; see also Vishwanath and Hibbard 2013; Barry 2009). All this allows to perceive the ‘vivid sense of protrusion where a tangible solid object reaches or looms out through the negative space toward the observer’ (ibid.), which allows us to reach the peculiar visual sense of reality that in vision science is called the ‘plastic effect’ (Vishwanath 2011: 224, 225, 2014; Vishwanath and Hibbard 2010), which is related to the visual sense of a capacity to effectively interact with present objects in our peripersonal action space.

The visual representation of these features, which, as said, only real objects seem to display in such an enhanced manner, leads to the possibility of an ‘absolute egocentric depth localization’ of the object (Sect. 3.4), that is related to the possibility of representing the object as falling in the subject’s action space and as salient with respect to the observer’s motor action. Thus, the object is represented as present (Vishwanath 2011, 2014; Vishwanath and Hibbard 2010, 2013; Ferretti 2016c, 2017c, forthcoming). Therefore, the representation of such visual features is linked to response selection and action planning: when our visual brain detects them, we perceive the object as a manipulable, present object falling in our peripersonal action space (Vishwanath 2014; Ferretti 2016c). This explains how the peculiar nature of the feeling of presence, related to the perception of the possibility of reliable action detected by action planning, depends on a particular visual recognitional process, linked to response selection, of particular visual features.Footnote 27 The recognition of these features allows our visual system (our mainly ventral interplay) to detect the actual presence of a real object in our peripersonal action space and detects an actual possibility of action that can be satisfied by the motor act previously computed by our mainly dorsal visuomotor system.

Now, the surface is a real object. Thus, in usual picture perception, we perceive only the surface as present because we attribute absolute depth and, thus, absolute egocentric localization only to it. This in virtue of the fact that only the surface displays, in quality of a real object, these peculiar visual properties in a particular enhanced manner. Following vision science, this perception of the surface modulates our perception of the depicted object. Such a representation, related to the surface, of these peculiarly enhanced visual properties, which automatically leads to the ascription of absolute depth, egocentric localization and presence for reliable interaction prevents us from visually ascribing those same peculiarly enhanced properties and, consequently, absolute depth and egocentric localization to the depicted object, with the consequent risk of falling into the illusion of its presence in our peripersonal action space (Sect. 3.4; Ferretti 2016c, 2017c, forthcoming).Footnote 28

Indeed, for this reason, except for special cases of pictorial illusion in which the surface is not visible (Sect. 4.3), even the most wonderful depicted objects do not display these features in the same enhanced manner as real objects, like the surface, do. Thus, with depicted objects, we can partially access (i.e. we access in a less enhanced manner) these visual features: pictorial vividness, pictorial color variation, pictorial shininess (etc.) are always less enhanced than the vividness, color variation and shininess (etc.) real objects display. Given this, we cannot ascribe to normal, non-trompe l’oeil, depicted objects absolute egocentric depth (Vishwanath 2014; Vishwanath and Hibbard 2013; Barry 2009; Ferretti 2016c). For this reason, depicted objects only display pictorial presence, not real presence. Therefore, with them we have a complete absence of the ‘plastic effect’ (Vishwanath 2011: 225, 227; for a complete philosophical analysis of this evidence see Ferretti 2016c: 2.4).Footnote 29

Of course, as we have seen, visual recognition, even of high-level nature, is crucial for the appreciation of depicted objects, especially in the case in which we deal with wonderful naturalistic and painterly (non-trompe l’oeil) pictures, whose complex pictorial arrangement stimulates our thin visual recognition (as well as our emotional responses, Ferretti 2017a). Nonetheless, we do not visually represent them as present precisely because this form of recognition is not the same as the one concerning the perception of the surface and related to the detection of visual presence. On the one hand, we talk about pictorial recognition: recognizing complex features of a non-trompe l’oeil depicted object–no matter how fine grained this recognition may be and how complex the painting is–is, still, recognizing these features as pictorial (e.g. pictorial vividness, pictorial color variation, etc.), as in the case concerning the interplay reported in (Sect. 3.1). This is, one might say, recognition about presence in the realm of pictures, or pictorial presence (Aasen 2015). On the other hand, we talk about recognition concerning presence for motor interaction, which relies on the detection of particular visual features that, as suggested, depicted objects, and even complex paintings, cannot display. This is why we recognize (non-trompe l’oeil) paintings–even those able to stimulate our visual recognition with a very well-made pictorial arrangement and with accurate visual features (e.g. particular colors, geometry, etc.)–as pictorial objects: with them, we lack the possibility of detecting and recognizing those particularly enhanced visual features responsible for the perception of absolute depth and, in turn, of visual presence, with the consequent representation of suitable action planning. Our visual system–i.e. the mainly ventral interplay–normally attributes these peculiar visual features only to real objects like the surface.Footnote 30

Now, both pictorial recognition (Sect. 3.1) and the recognition of presence related to the surface (Sect. 3.4) are subserved by a mainly ventral interplay, whose cortical portions (Sect. 4.1) are activated in both cases. At this point, the reader should note that the crucial difference between these two cases lies at the representational level.

In the case of pictorial recognition, related to the depicted object, the activation of the mainly ventral interplay subserves response selection, which cannot track the peculiar visual features, described above, that are responsible for the ascription of visual presence. Thus, this interplay recognizes that the object is a pictorial, non-present object and action planning is not triggered. Differently, in the case of recognition of presence related to the surface, the activation of the mainly ventral interplay subserves response selection, which tracks the peculiar visual features described above that are responsible for the ascription of visual presence. Thus, this interplay recognizes the object as a real and present one and triggers action planning toward it. Again, it is the possibility of building this representation of the surface as present that leads us to represent the depicted object as pictorial. The equilibrium between these two representations allows us to enter pictorial experience, i.e. seeing-in (§3.4, Ferretti 2016c, 2017c, forthcoming).Footnote 31

This explains how it is possible to have visual recognition both with the ascription of visual presence, as in the case of the surface, and without it, as in the case of the depicted object. And, as we shall see, in line with the idea of simultaneity expressed in the literature (Sect. 1), the interplay involved in the recognition of the depicted object (cfr. Sect. 3.1) is usually conscious, but might be, in particular cases, unconscious (Nanay 2011, 2017), while the one related to the recognition of the surface as a present object we can interact with (cfr. Sect. 3.4) is usually unconscious, but can be consciously accessed,Footnote 32 with some constraints that I will specify in (Sect. 4.4).

Summing up, coupling this point with the one proposed in (Sect. 4.1), the theory developed here suggests that in both the perception of the surface and that of the depicted object there are different interplays respectively responsible for recognition, ascription of presence and action property attribution. In the case of the surface, the mainly dorsal interplay computes the action properties, while the mainly ventral interplay recognizes, through response selection, the object as a present one we can interact with, on the basis of its peculiar visual features. Thus, action planning can be triggered by the mainly ventral interplay toward the surface, and the thin computations concerning overt motor interaction, performed by the mainly dorsal interplay, can be effectively used. In the case of the depicted object, the mainly dorsal interplay computes the action properties, but this representation remains stored in the motor quiver and cannot be used. This is because the mainly ventral interplay recognizes the object as being a pictorial one. Thus, response selection cannot trigger reliable action planning. Note that saying that response selection triggers suitable action planning does not mean that we will effectively act on the object recognized as a real one, but only that we recognize the possibility for reliable interaction.

All this suggests that the mainly ventral interplay ascribes presence for interaction only to the surface.Footnote 33 This is a countervailing factor that blocks the feeling of presence of the depicted object. For this reason, in line with (Sect. 3.4), the mainly ventral interplay represents the depicted object as non-present. As I will explain in the next section, in the case of trompe l’oeil perception–as well as in specific experimental settings able to avoid surface visibility–no countervailing factor of this kind is possible. Thus, what actually is a depicted object is visually felt as present.

4.3 Interstream Interplay and Trompe l’oeil

As said in (Sect. 3.4), sometimes the representation subserved by the mainly ventral interplay can attribute presence to the depicted object. This happens only when the surface is not visible, because it is skillfully constructed in such a way that our visual system cannot detect it. Following the evidence by vision science, it seems that, only in this case, due to the invisibility of the surface, can the peculiar visual features described above (Sect. 4.2), related to absolute egocentric localization, the plastic effect and, thus, visual presence, be fully ascribed, due to the illusion, to the depicted object. Thus, the depicted object looks present for reliable motor interaction (Vishwanath and Hibbard 2013: 1682–1683; Vishwanath 2011, 2014: 159–160; Ferretti 2016c). This happens, for example, in the case of trompe l’oeil pictorial illusions. Now, while both normal picture perception and trompe l’oeil perception are subserved by interstream interplay, there is a main difference between these two perceptual situations.

In the case of trompe l’oeil pictures, the mainly ventral interplay responsible for high-level visual recognition of peculiarly enhanced visual features, as well as for the consequent ascription of visual presence and suitable action planning, cannot visually represent the surface, which is not visible. This leads this interplay, as said, to attribute the visual presence to the pictorial object (in line with Sect. 3.4). In this special illusory case, with respect to the depicted object, this interplay works as it usually does with the surface in the case of usual (non-trompe l’oeil) picture perception, or with a normal object during face-to-face perception, where no surface is present. Conversely, in normal cases of picture perception, this same interstream interplay is always able to ‘find’ (detect, or track the presence of) the surface, i.e., to visually represent the surface as present. As seen, this perceptual fact rules out the possibility of an attribution of visual presence to the pictorial object, which is recognized as such. Therefore, in the trompe l’oeil case, the visual feeling of presence still depends on the representational activity of the mainly ventral interplay, which, due to the illusion, is attuned to the depicted object, because it cannot find the surface. This is, indeed, a misrepresentation, concerning the processing of response selection and action planning, about the presence of the depicted object. Recall that the action property attribution of the mainly dorsal interplay is active even with depicted objects (Sects. 3.2, 4.1). Thus, its behavior cannot be responsible for this deception: normal and depicted pictures make no difference to the mainly dorsal visuomotor responses. Only the representational activity of the mainly ventral interplay is responsible for the distinction between present and pictorial objects, and for the trigger of action planning with present objects, which can subsequently make use of the computational results of the mainly dorsal interplay. Thus, only a misrepresentation of this interplay can lead to a breakdown in picture perception, like the one reached with trompe l’oeils.Footnote 34

4.4 Interstream Interplay and Conscious Accessibility

We saw that the recognition performed by the mainly ventral interplay can be, in general, conscious or unconscious (Sect. 3.1).Footnote 35 We also saw that the visual representation of the surface as present, subserved by the mainly ventral interplay, is what allows us to have usual pictorial perception, avoiding the illusion of presence of the depicted object, as in the case of trompe l’oeils. Here I suggest that, in ordinary picture perception, we need to have an at least unconscious representation of this kind. This is in line with the notion of simultaneity reported in the literature (Sect. 1). I also suggest the possibility of a conscious accessibility of this representation and explain the relation between my commitment to this possibility and my commitment to the notion of inflection endorsed by the DVAPP. So, why do we need an at least unconscious representation, subserved by the mainly ventral interplay and involved in the visual detection of presence, related to the surface?

In order to understand this point, we can consider the case of trompe l’oeils. We are consciously visually focused on the pictorial object which, however, looks present and not pictorial at all. This is because our visual brain cannot visually represent (neither consciously nor unconsciously), the presence of the surface.Footnote 36 For this reason, we cannot even shift our conscious visual attention to the surface.Footnote 37 If we had, at least, an unconscious representation of the surface, this shift would be possible by consciously accessing this unconscious representation.Footnote 38 Thus, we would not be fooled. This suggests that, in order not to be fooled into perceiving the depicted object as present, we must be capable of relying, at least, on an unconscious visual representation, given by the mainly ventral interplay, responsible for the detection of visual presence, of the surface. The presence of this unconscious representation is what allows, with normal pictures, to remain consciously visually focused on the depicted object, without experiencing any visual feeling of presence of it, but with the possibility of shifting our conscious visual attention to the surface. Indeed, if the surface can be visually represented at least unconsciously, even if most of the time our conscious visual attention ignores it, we can always subsequently direct our visual conscious attention to it (Nanay 2011: 473). This is in line with the idea that even an unconscious perception of the surface can deeply influence the way we consciously perceive the depicted object (Nanay 2011, 2017; Ferretti 2016c, 2017c, forthcoming): the unconscious perception of presence of the surface avoids the possibility of consciously perceiving the depicted object as present. This also implicitly suggests that we are fooled by trompe l’oeil not because we visually represent the surface unconsciously, but because we also lack an at least unconscious representation, subserved by the mainly ventral interplay, of the surface: we cannot rely on any representation of it. Indeed, if we were tempted to suppose that with trompe l’oeils we unconsciously represent the surface, then usual picture perception (during which we usually represent the surface unconsciously) and trompe l’oeil perception would be the same phenomenon from the point of view of the notion of simultaneity of the visual representations advocated in the literature (Sect. 1). Both of them would depend on an unconscious representation of the surface and a conscious representation of the depicted object. However, only during trompe l’oeil perception does the object look present. Thus, the hypothesis cannot be right: the trompe l’oeil effect depends on a lack of an at least unconscious representation of the surface.

One might argue that this idea recalls the famous point advocated by Gombrich (1960), but it is not exactly so. Gombrich holds that either we ‘see’ the picture’s surface or we ‘see’ the depicted object and ‘seeing’ both of them at the same time is not possible. The alternation here is about ‘seeing’ in general and concerns any kind of (conscious or unconscious) visual representation: “is it possible to ‘see’ both the plane surface and the battle horse at the same time? (…) the demand is for the impossible. To understand the battle horse is for a moment to disregard the plane surface. We cannot have it both ways” (Ibid.: 279). The shift I am talking about only concerns conscious visual attention, and here it is assumed that we are always visually representing (in any sense of visually representing, which can be either conscious or unconscious) both the surface and the depicted object (something not possible in Gombrich’s account, see Nanay 2017), although, as suggested at the start, we do not consciously visually attend to the surface, but only to the depicted object (Nanay 2011, 2015, 2017). Thus, the point made here is neither in accordance with Gombrich’s idea, nor in conflict with the interpretations of Wollheim’s account of seeing-in offered in the literature on picture perception (Nanay 2011, 2015, 2017; Lopes 2005; Cavedon-Taylor 2011; Hopkins 2010, 2012).

Although my commitment in this paper is only about the simultaneity concerning the conscious visual representation, given by the mainly ventral interplay, of the depicted object (i.e. pictorial recognition) and the at least unconscious representation of the surface (i.e. recognition of presence for interaction), as the reader can note in the lines above, I also endorse the possibility of a conscious accessibility of the representation of presence, given by the mainly ventral interplay, related to the surface.Footnote 39 What is the relation between this possibility and the conscious perception of the depicted object? At this point, I should say something about the notion of simultaneous consciousness of both the surface and the depiction. The DVAPP follows the notion of representational simultaneity reported above, but also endorses that we might sometimes be conscious, at the same time, of both the surface and the depicted object. In these cases, some properties of the surface are (consciously) represented as design-scene properties (Nanay 2010: 203) by the cortical portion of our visual system that is involved in conscious visual object recognition, that is, according to the DVAPP, the ventral stream (p. 202). This is supposed to count, according to the DVAPP, as inflected seeing-in (Nanay 2010). It has been suggested that this perceptual fact about simultaneous consciousness is implausible, because it would lead to a very odd pictorial experience (Hopkins 2010, 2012; Nanay 2017: Sect. 2). It is debated whether, and to what extent, inflection is either necessary (Voltolini 2013), or only possible (Nanay 2010; Lopes 2005), or problematic (Hopkins 2010, 2012), and about whether having a conscious representation both of the vehicle and of the depicted object is a necessary condition of inflection (for different positions see Nanay 2010, 2011, 2017; Lopes 2005; Hopkins 2010, 2012; Voltolini 2013; Cavedon-Taylor 2011). Recently, even Nanay (2017: Sect. 2) has suggested that if we endorse the notion of simultaneous representation, without endorsing the notion of simultaneous consciousness of both the surface and the depicted object, we can avoid the problems addressed in the literature (Hopkins 2010, 2012).

First of all, I do not want to take a stand on the issue about whether having a conscious representation of both the vehicle and the depicted object is a necessary condition for inflected seeing-in (for the debate see Nanay 2011, 2017; Hopkins 2010, 2012; Voltolini 2013; Cavedon-Taylor 2011).

Second, I am not defending the idea that we can have a conscious representation both of vehicle and of the depicted object simultaneously (independently of whether this counts as inflection or not). As said, I endorse the possibility of a conscious accessibility of the representation, given by the mainly ventral interplay, of the surface as a present object we can interact with. Thus, I endorse the possibility of shifting our conscious visual attention from the depicted object to the surface, which is, most of the time, at least unconsciously represented. But this does not entail the idea that we simultaneously consciously represent both the surface and the depicted object. I am just saying that, while we often consciously see the depicted object, while unconsciously visually representing the surface, it is also possible that we consciously access the representation of the surface. However, at this point, i.e. when we consciously perceive the surface, we are not consciously representing the depicted object anymore. This is perfectly in line with the notion of simultaneity about representation, not about consciousness, endorsed in the literature (Nanay 2011, 2015, 2017). Unconsciously representing the surface as present, through the mainly ventral interplay, is related here to the notion of ‘surface-seeing’, which is not ‘design-seeing’ (the latter being, according to the DVAPP, the one related to inflection; as specified, a commitment that I do not endorse here) (Lopes 2005: 37; Cavedon-Taylor 2011: 275; Nanay 2010: 203; Voltolini 2013: Sect. 3). Indeed, I am not saying that the conscious accessibility of ‘surface-seeing’ occurs along with a conscious representation of the depicted object so that, at this point, some properties of the surface are seen as design-properties. My point is in line with the idea that our recognitional visual apparatus usually represents the surface unconsciously and can, sometimes, consciously represent the surface without representing ‘design-scene’ properties (Nanay 2010: 203). And this conscious representation of the surface does not occur with the conscious representation of the depicted object. In line with my discussion of trompe l’oeils (Sect. 4.3), and in relation with the notion of simultaneity expressed above, this recalls the idea by Lopes that “a picture may only trompe l’œil when it suppresses not just design seeing but also surface seeing, for seeing face to face is not surface seeing” (2005: 37; Cavedon-Taylor 2011: 276–278).

Now, even if I am neutral about the possibility of simultaneous consciousness and inflection, my point is important because it offers an explanation of the nature of the conscious representation of the surface as present, which, as explained, depends on the conscious accessibility of the mainly ventral interplay. This explanation is, in principle, relevant for all parties in the literature, for it is important for all those who want to explain the nature of the conscious representation of the surface as present, regardless of whether they endorse that this representation can occur along with the conscious representation of the depicted object.

5 The Advantages of My Account

Now, we can see the main differences between my account and the DVAPP. Following the assumption of dissociation between the streams, the DVAPP suggests that ventral vision represents the depicted object and, possibly, also the surface. Dorsal vision, conversely, cannot represent the depicted object, but only the surface. This is supposed to explain why, in picture perception, we feel only the surface as present for interaction: only this component of the picture can be represented dorsally. This explanation, however, does not take into account some crucial empirical facts shown by the current literature on the visual streams, which are, indeed, meticulously taken into account by my proposal.

First, dorsal representations of action properties are triggered even with depicted objects (Sect. 1): both the surface and the depicted object can be dorsally represented. Therefore, we cannot explain the fact that the depicted object does not look present, but only the surface does, by claiming that dorsal vision can represent only the latter, but not the former.

An investigation into the nature of picture perception has to invoke, then, the evidence on interstream interplay. As I suggested, the fact that we do not perceive the depicted object as present is due to the fact that we perceive the surface as such (Sects. 3.4; 4.3). The crucial role in such representations is played by a mainly ventral interplay (Ibid.). Thus, neither dorsal visuomotor processing alone, nor the mainly dorsal interplay are sufficient for this task (Sects. 4, 4.1)–though the dorsal stream has a minimal role in it.

This also shows that trompe l’oeil experience cannot be explained, as the DVAPP suggests, by saying that only in this case do we have a dorsal representation of the depicted object: this representation is always at work also during usual, non-trompe l’oeil picture perception. The explanation is that, in this case, the mainly ventral interplay cannot track the presence of the surface, qua not visible. Thus, it is attuned to the depicted object (Sect. 4.3).

In light of what I said, the explanation provided by DVAPP is the best we have if we accept dissociation (Ferretti 2016b, 2017a, b). But it is not the right one if we extend our explanation to the new evidence about interstream interaction, which must be invoked by a philosophical account of picture perception that aims to be an empirically informed one.

We saw that, whereas it is in principle true that ventral processing is mainly (but, as said, not totally or exclusively) involved in recognition and planning, which can be conscious or unconscious, while dorsal processing is mainly involved in motor programming for action guidance, which turns out to be only unconscious (Kozuch 2015; Clark 2009; Ferretti 2017b), each stream contributes, to some extent, and with respect to different contexts and tasks, to the specific functional operations for which the other one is specialized (Chinellato and Del Pobil 2016: Sect. 2.4.1, 2.3.1.3). Thus, we must talk about different forms of interstream interaction, rather than simply dorsal or ventral processing. That said, we should bear in mind that: “interconnections do not imply duplication of function” (Milner and Goodale 2010; see also Clark 2009; Ferretti 2017b): ventral areas represent (with the dorsal contributions needed) the cutting edge of recognition, while dorsal areas represent (with the ventral contributions needed) the cutting edge of motor processing. This is in agreement with the evidence that both streams can be involved, respectively, in very different visual activities, albeit respectively mainly inscribed in object recognition and action guidance (Kozuch 2015; Clark 2009; Ferretti 2017b).

Furthermore, both streams are complex ensembles of different cortical portions with different neural families, interacting in different contexts and with different computational results (Kravitz et al. 2011, 2013; Ferretti 2016b, c; Chinellato and Del Pobil 2016). Therefore, talking about dorsal/ventral interactions only in general terms, without mentioning the interactions between these subportions, and distinguishing between a mainly ventral and a mainly dorsal interplay, with their representational context dependence, would have been improper.Footnote 40

In light of this evidence, it is safe to say that the neural dynamics of seeing-in are shaped by different forms of interstream interplay in relation to different tasks.

My proposal can take into account all these things together. I offered a full explanation of the behavior of the mainly dorsal interplay and of the mainly ventral interplay, in relation to the perception of the surface and of the depicted object and concerning action property attribution, visual recognition and the ascription of visual presence. This explanation respects the complexity of these interactions.

The analysis of the activity of these neural mechanisms suggests that the real visual difference between depicted objects and real objects is due to the computational activity of the mainly ventral interplay, which tracks the presence of the surface and, in doing so, avoids the possibility of ascribing presence to the depicted object. Therefore, it plays a crucial role in shaping seeing-in.

Summing up, there is an equilibrium between the recognition of the surface as present and the recognition of the depicted object as non-present, both subserved by a mainly ventral interplay. This equilibrium is always maintained with normal picture perception, but can be broken with trompe l’oeil pictures. This explains the relation and the difference between trompe l’oeil, face-to-face and normal picture perception.

6 Conclusion

The DVAPP is our best model of picture perception in the light of the TVSM. But it follows the notion of functional dissociation between the streams. However, such a notion has been recently questioned on both conceptual and experimental grounds (Sect. 2). My account inherits the philosophical idea proposed by the DVAPP, but extends it in the light of the new evidence on interstream interaction from the neurophysiology of vision. If my explanation is not endorsed, all the experimental results about interstream interaction, which are crucial to explaining how we can obtain seeing-in, remain unexplained under the DVAPP. However, the philosophical notion of seeing-in proposed by the DVAPP is maintained. My account is important because explaining how we visually represent the surface and the depicted object is desirable for any account of seeing-in (Hopkins 1998, 2010; Nanay 2010). And offering such an explanation by following the most updated evidence from the TVSM is the main aim of an empirically informed philosophical theory of picture perception like the DVAPP (Nanay 2011, 2015; Ferretti 2016a, c, 2017a, b). To this extent, my proposal explained the complex neural dynamics of seeing-in by reconciling our best philosophical theory of picture perception with the contemporary neurophysiology of vision. In doing so, I have been inspired by the idea that only a combination of empirical and philosophical research can give us a satisfying theory of picture perception in particular (Nanay 2015), as well as a well-founded theory of perception in general (Block 2014: 570–571).Footnote 41