1 Introduction

Representationalism about perceptual experience is driven by two observations. First, that perceptual experience often puts us in contact with the world around us: when it is successful, we are aware of objects and their properties. Second, that perceptual experience can fail to constitute such awareness: when we fall prey to illusion or hallucination, we are not immediately aware of objects and their properties. It is suggested that we can reconcile these two observations by thinking of perceptual experience, on the model of intentional states like belief or desire, as fundamentally a matter of entertaining content. When a subject is perceptually aware of a material object, for instance in a veridical experience as of a blue ball, their experience is fundamentally one of entertaining content of a certain sort, and it would remain one of entertaining content (albeit perhaps a different, object-independent content) were they to experience a subjectively indistinguishable hallucination. In this way, perceptual experiences may be occasions for the awareness of objects and their properties even if, qua instances of that mental kind, they are not fundamentally occasions of such awareness.

In making their characteristic claims about the nature of perceptual experience, representationalists are not simply claiming that perception involves the transmission of content through some level of one’s mental economy. Their claim is that perceptual experience is a conveying to the subject that their environment is such-and-such a way, and, simply by virtue of so-conveying, that things are such-and-such a way is available for judging, doubting, etc.Footnote 1 As Wilson (2018) puts it:

Part of the appeal of representationalism stems from the intuitive idea that every experience has a […] ‘face value’ at which it may be accepted or declined […]. It follows from this conception of experience that they convey […] a representational content whose accuracy conditions describe the circumstances under which that experience may be considered veridical (202).

It is not enough for this general claim about the interface between perceptual experience and cognition to be refuted that perceptual experience be richer or more fine-grained (Speaks 2005; Tye 2006). The idea is not that the total content of a perceptual experience may be endorsed or doubted in the form of one complex judgment. It is that the subject is in a position, perhaps by attending or deploying demonstrative thought-vehicles, to endorse or doubt any given aspect of that total content.

The question we will be concerned with here is whether perceptual experience might present a subject with an object despite her inability to represent it in thought. The working hypothesis has naturally been that perceptual experience is not cognitively elusive in this sense: if an experience represents an object as having some property, perhaps even in some analog, iconic, non-propositional or nonconceptual format, it is possible on the basis of that experience for the subject to author a perceptually-based judgment like That is F (or The F is G), where the extension of the subject term is the object represented in the relevant way.

Face value::

If one’s perceptual experience has content which involves, is satisfied by, or is otherwise about an object, one is in a position, by attending and taking one’s experience at face value, to make a perceptually-based judgment (or other cognitive attitude) that involves or is satisfied by that object.

By ‘involves’ we mean something like ‘is singular with respect to’ or ‘is partly constituted by’. While we shall remain neutral on the nature of perceptual-experiential content, we assume a broadly structured ‘Russellian’ view of content at the level of thought, for convenience.Footnote 2

We take face value to be a plausible and widely held thesis among representationalists. For instance, Burge (2010a), while holding that the format of perceptual-experiential representation is iconic (174, n. 48; 540),Footnote 3 appears to endorse it when he writes

the occurrent singular elements in perception—what I call singular, context-bound perceptual applications—are also connected to occurrent singular elements in a propositional content—what I call singular, context-bound applications in thought. […] The latter can take over the referents of the counterpart perceptual applications (546).Footnote 4

Burge’s claim is that appropriately-formed perceptual judgments inherit representational features from perceptual experience, in particular their referents. Perceptual experiences carry content which might be illustrated: ThatF (‘That’ being what Burge calls the singular applicational element and ‘F’ the attributive). A corresponding perceptually-based judgmentThat is F—will express a proposition with the very object(s) and property instance(s) represented in the perceptual experience as constituents.

In this paper we show that face value is one of a package of orthodox representationalist claims which are in tension. The remainder is structured as follows. In Sect. 2.1 we recover a principle which links the content of visual experience with facts about what one sees. Section 2.2 then presents a case of seeing, adapted from Anscombe (1974). Along with face value, the principle in Sect. 2.1 leads us to predict that the subject in the case is in a position to make a perceptually-based judgment involving or satisfied by those objects. But this prediction is not borne out. After articulating the tension in the form of an explicit argument (Sect. 3), we explore several possible solutions in Sect. 4. Each reveals something important about perceptual experience and its interface with cognition and related phenomena.

2 Seeing and content, in two cases

Face value posits a semantic connection between the content of a perceptual experience and the content of cognitive attitudes (such as judgment) which are available to its subject. In this section we recover the following thesis from the literature: if a subject sees an object, the content of her visual experience involves, is satisfied by, or is otherwise about that object. Given face value, we then have a sufficient condition for such content at the level of perceptual judgment. This will form the basis for the puzzle presented in Sect. 3.

2.1 The mirror argument

When we are characterizing the content(s) of a subject’s perceptual experience, does their seeing an object entail that their visual experience represents it as being a certain way? An affirmative answer is suggested by standard uses of the following kind of case.

Case 1

Sally is looking ahead at what is, unbeknownst to her, the reflection of a cube in a cleverly disguised mirror placed at a 45° angle. Although the reflected cube is white, lighting conditions make it appear yellow. Obscured by the mirror and directly behind it is an exactly similar cube which is yellow.

As Grice (1961: 69–70) concluded, Sally sees the white cube and does not see the yellow cube.Footnote 5 A lesson to be drawn is that what we shall call austere descriptivism about the content of perceptual experience cannot be right. For according to austere descriptivism,

we can take perceptual content to be existentially quantified content. A visual experience may present the world as containing an object of a certain size and shape, in a certain direction, at a certain distance (Davies 1992: 26).

If this view were correct, Sally’s perceptual experience would be veridical: there is a yellow cube of the sort she takes there to be in the location in which she takes there to be one. Yet what Sally sees is the white cube, and so the content of her visual experience must characterize it as being some way (namely, yellow). Since her perceptual experience must not be veridical, austere descriptivism must be incorrect.

…according to [austere descriptivism], my experience is accurate or veridical. It ‘says’ there is a yellow cube located in front of me, and there is such a cube. But I do not see that cube. I see something […] that does not have the properties in question (Tye 2009: 79).

In sum, as the standard line of argument goes, austere descriptivism mischaracterizes the accuracy-conditions of perceptual experiences, and it makes faulty predictions of veridicality as a result.Footnote 6

This ubiquitous form of argument assumes a tight connection between one’s seeing an object and one’s visual experience being veridical to the extent that (inter alia) that object has the properties one’s experience represents it as having. The possibility of claiming that although the subject sees the white cube her visual experience is veridical, being satisfied by the yellow cube, just seems confused (Searle 1983; Soteriou 2000). In what follows we adopt the said connection between seeing and veridicality conditions. Without it we lose much of our intuitive grip on the notion of veridicality-conditions in visual experience.

Seeing::

If at t one sees an object o, at least part of the content of one’s visual experience at t involves, is satisfied by, or is otherwise about (inter alia) o.Footnote 7

Besides austere descriptivists, of which there are few defenders, we know of no representationalists who explicitly deny seeing.Footnote 8

2.2 Anscombe’s matchboxes

There is a different kind of case which seems to suggest a more fractured connection between seeing and the veridicality conditions of visual experience, at least assuming face value.

Case 2

A stereoscope apparatus with two eye pieces is so contrived that two exactly similar matchboxes, B and C, suitably placed in front of a subject with binocular vision appear as just one matchbox. Sadie puts on the apparatus and, so it seems to her, has an experience as of one yellow matchbox a few feet ahead which she is viewing with both eyes.

As Anscombe noted in her original presentation of this sort of case, “one can ask here, ‘Which matchbox am I seeing?’ and [we] ought to say that we see both matchboxes” (1974: 68).

We agree with Anscombe’s claim about what Sadie sees here (and, as indicated below, we are not alone). Notice, for instance, that if one of the matchboxes is illuminated so as to appear red, Sadie’s visual experience will alter: the difference between what each eye receives as input will result in binocular rivalry. Though we do not rest our confidence in Anscombe’s judgment on the following consideration, here is an appealing way to think about things. If Sadie closes her left eye, what does she see? Obviously, she sees a matchbox. Likewise if she closes her right eye. Now with both eyes open, why, to echo Parfit’s words (1971: 5), should a double-success be a failure?

In a case of this sort, although the subject sees both objects she is, intuitively, in no position to entertain perceptually-based singular judgments about either of them. At first pass, the problem is that perceptual judgments like That is yellow authored in these conditions require (at least) that there be a unique relevant causal source of the perceptual experience as of the purported demonstratum.Footnote 9

It seems impossible in these circumstances [to have] thoughts about one rather than another of the two match boxes. There seems to be no relevant relation he bears to either one which might allow him to think about it which he does not also bear to the other (Peacocke 1981: 196).

Discussing a structurally similar case, Hawthorne and Scala (2000) offer the following diagnosis:

There is no qualitative change that [B] could undergo such that it creates perceptual life of type x in [Sadie] but if [C] had undergone that change, a different type of perceptual life would be created. [It] is this indifference that makes [Sadie] unable to […] have perceptual demonstrative thoughts about [B or C] in particular (200).

So Sadie sees both B and C but is unable to author a perceptual judgment which is singular with respect to either matchbox.Footnote 10 In that case, what is the content of the perceptual judgment about B and C which face value predicts (given seeing) Sadie is in a position to author?

While none of the definite descriptive perceptual judgments Sadie may be disposed to form (e.g. The object in front of me is F) are uniquely satisfied and so cannot help, it may be proposed that there is indefinite descriptive content satisfied by B and C which can preserve the link in question. This indefinite descriptive content could then be the content of Sadie’s perceptually-based judgment, and since both matchboxes would be in the extension of ‘An object in front of me…’ the link posited by the conjunction of seeing and face value would be preserved. In the remainder of this section we illustrate that this proposal will not work. Let us combine Cases 1 and 2 to see why.

Case 3

Elizabeth is looking through the binocular apparatus from Case 2 towards a cleverly disguised mirror, which is placed at a 45° angle, behind which is a single yellow matchbox, A. Reflected on appropriate parts of the mirror’s surface, so as to give the appearance of just one matchbox to the wearer of the apparatus, are two white matchboxes to Elizabeth’s right, B and C, both illuminated by a yellow light.

In Case 3, the indefinite descriptive content An object in front of me is F will be satisfied by A, and not by B or C. Yet Elizabeth sees B and C, not A. Such content therefore cannot underpin the link entailed by the conjunction of seeing and face value. While An object causing this experience is F would offer a content for perceptual judgment satisfied by both B and C (and not by A), that is not much of a vindication of face value. These judgments are not made available simply by the having of the experience but by reflecting on the experience at a distance. As Soteriou (2000) has put it (in a different context), “[w]hen we have a visual experience it just does not seem to us as if we are aware of a causal relation between the apparent object of experience and our experience” (181).

Now we have a tension. We were led to conclude from the commonplace use of Case 1 that if one sees an object it must somehow figure in the veridicality conditions of one’s perceptual experience (seeing). In conjunction with face value this gives us the thesis that if one sees an object then one is in a position to author a perceptually-based judgment which is about the object one sees, either (i) by virtue of its being a constituent of the content judged or (ii) by virtue of its satisfying a definite or indefinite descriptive element of one’s judgment. Case 2 then seemed to provide a clear counterexample to expectation (i), and Case 3 to (ii). So when one sees an object, it sometimes cannot figure in the veridicality conditions of any perceptually-based judgments. But face value (plus seeing) told us to expect that it can! Something has gone wrong.

3 Formalizing the puzzle

This section makes the foregoing reasoning explicit. Section 4 then explores the most promising strategies for escaping the puzzle and, for each, the lesson(s) we learn should we choose to embrace it.

  1. (1)

    If the content of one’s perceptual experience involves, is satisfied by, or is otherwise about an object, one is in a position, by attending and taking one’s experience at face value, to make a perceptually-based judgment (or other cognitive attitude) that involves or is satisfied by that object. (Facevalue)

  2. (2)

    If at t one sees an object o, at least part of the content of one’s visual experience at t involves, is satisfied by, or is otherwise about (inter alia) o. (Seeing)

  3. (3)

    Elizabeth sees B and sees C.

  4. (4)

    Therefore, Elizabeth is in a position, by attending and taking her experience at face value, to make a perceptually-based judgment that involves or is satisfied by B and C. (From 1, 2, 3)

  5. (5)

    Elizabeth cannot make perceptually-based singular judgments (by taking her visual experience at face value) with either B or C as constituents.

  6. (6)

    Elizabeth cannot make perceptually-based definite descriptive judgments (by taking her visual experience at face value) that are (uniquely) satisfied by either B or C.

  7. (7)

    Elizabeth can only make perceptually-based indefinite descriptive judgments (by taking her visual experience at face value) that A satisfies at least as well as B or C (and so which cannot underpin the link with seeing given that she does not see A).

  8. (8)

    Therefore, it is not the case that Elizabeth is in a position, by attending and taking her experience at face value, to make a perceptually-based judgment that involves or is satisfied by B and C. (From 5, 6, 7)

    Contradiction. (From 4 and 8)

4 Solutions

We now clarify and explore the most promising responses, drawing the respective lessons as we go.

4.1 Denying (3)

One might look askance at (3) and wonder whether Elizabeth sees the pair (ensemble, plurality), B and C, while failing to see each—to see B and to see C. (3) would then be false. ‘Elizabeth sees B and C’, which could replace (3), would be true only on a non-distributive reading (like the true reading of ‘‘A Day in the Life’ was co-authored by Lennon and McCartney’). So (1) and (2), whose antecedents are about individual objects, would entail nothing about the perceptual judgments Elizabeth is in a position to make. (4) would not follow.

What might it mean to say that a subject sees a plurality o1 … on without seeing each of o1 … on? An independently plausible answer can be found in vision scientific studies of ‘ensemble perception’. In a now-classic experiment, participants saw a set of circles followed by a test circle. They could accurately report whether the test circle was larger or smaller than the mean size of the circles in the set but were unable to report whether it was itself a member of the set (their performance was at chance). So while participants were able to judge the mean size of an ensemble, they failed to judge the sizes of individual circles (Ariely 2001). What is the basis of the participants’ judgment of mean size here? Bayne and McClelland (2019) argue convincingly that it is an experience of mean size. In other words, the content of the participants’ experience includes an attribution of mean size, and, more broadly, of other ‘ensemble properties’, such as mean orientation or mean facial expression. The ensemble property mean size is distinct from what we might call the particular property, size (a property of an individual object). To represent one is not to represent the other (Bayne & McClelland 2019: 9).

The phenomenon of ensemble perception sheds light on the notion of non-distributive plural seeing, for instance in a case where it is tempting the describe a subject as seeing a crowd of faces without seeing any particular face. If a subject does not see the individual objects, she does not see their particular properties. She only sees ensemble properties—mean size, mean shape, etc.

Let us apply this conception of non-distributive plural seeing to Case 3. If, as the present response supposes, Elizabeth sees B and C without seeing each, she must experience the ensemble properties of the pair without experiencing the particular properties of each.

The plausibility of this suggestion turns on whether the processes producing Elizabeth’s ‘merged’ matchbox-experience—namely, stereopsis and binocular summation—produce a visual representation of ensemble properties. They don’t. Stereopsis calculates depth on the basis of the slight differences between the stimuli reaching each eye. Binocular summation calculates features such as brightness on the basis of the brightness detected by each eye. The resulting experience represents depth/brightness, a particular property, not mean depth/mean brightness.Footnote 11 While the calculations involved in stereopsis and binocular summation may involve averaging of information from each eye, the output of this process is not a perceptual representation of ensemble properties. In this respect, stereopsis is similar to monoscopic processes which compute features like depth on the basis of shading, or brightness on the basis of contrast with the surrounding context. These, too, involve complex statistical calculations over a population of items, and they output a representation of depth/brightness, not of mean depth/mean brightness. Since Elizabeth’s experience is produced by stereopsis and binocular summation, and since each eye is stimulated by one object, her experience represents particular properties, not ensemble properties. In that case, the phenomenon of ensemble perception cannot be used to explain what the relevant form of non-distributive plural seeing consists in (what seeing B and C without seeing each might consist in).

These observations do not entail that Elizabeth sees B and sees C if she sees the pair. But we think someone who questions this move incurs the burden of explaining what it means for there to be non-distributive plural seeing in such a way that it is present in Case 3. As we have shown, drawing analogies with seeing a crowd of people despite failing to see any individual in the crowd (for example) offer no support, since what explains that phenomenon is ensemble perception, and Case 3 is not an instance of ensemble perception.Footnote 12

4.2 Denying (1)

Given the case in favour of face value (Sect. 1) and its centrality to representationalism, we think this way out of the puzzle requires a principled defence. One way of mounting such a defence would be to explain that we should not be surprised to find cases in which perceptual experience referentially eludes cognition. For there are, the response goes, reasons to think that a ‘lossy’ conversion process is required to get from the kind of personal-level representation involved in perceptual experience to the kind of personal-level representation involved in cognition. It is of the nature of this conversion process, moreover, that perceptually-based thoughts can systematically fail to preserve representational properties in virtue of which particular objects are perceptually picked out. Since face value is compromised only in this limited respect, it may remain an important thesis for representationalists in restricted form.

The natural way to develop this reply is to appeal to a general difference in format between visual experience and perceptual judgment. Perhaps visual experience is systematically iconic (picture-like) in format whereas cognitive attitudes available for use in inferential reasoning are discursive (sentence-like) (see Block ms.; Burge 2014; Kulvicki 2015; Lande 2018a, b; and Quilty-Dunn 2016, 2019). As we said in Sect. 1, face value is compatible with the existence of format differences. We provided an example of a prominent philosopher who takes perception to be iconic but for whom face value, and the capacity for cognition to ‘take over’ the referential properties of perceptual experiences, is nonetheless an important commitment. So if face value is false due to differences in format, this would be newsworthy. To motivate a rejection of face value on the basis of format differences, then, we need an argument.

We will now develop such an argument. It aims to establish that in Case 3 Elizabeth’s visual experience has plural content— <<B, C>, are yellow> (i.e. content of the kind expressed by the sentence ‘Those are yellow’)—expressed by means of an iconically-formatted vehicle. This perceptual-experiential content captures the sense in which (2) and (3) are true. However, when Elizabeth attempts to convert this iconic representation to a discursive representation by essaying a perceptually-based judgment of the form That is F, some constraint on successful conversion fails to be met, and she winds up with a judgment which expresses gappy content <<_>, is yellow>, or perhaps no content at all. For this reason, (1) is false, (4) does not follow, and contradiction is averted.

To develop this response, we need to hear more about the proposed format difference, and, in particular, about the alleged constraint on successful conversion which is not met in Case 3.

A format is a type of representational medium. Maps, sentences, photographs, and hieroglyphs are formats. Tokens of distinct formats never compose or otherwise combine (at least assuming each token of a format-type is essentially a token of that type). A token of the word ‘coffee’ adjacent to a photograph of Roger Federer is not a token of any format and fails to represent. The significance of picture-like formats for cognitive science goes back to debates about visual mental imagery in the 1970s and 1980s. Kosslyn (1980) conceives of iconic representations as involving a two-(or three-)dimensional array, in which each cell corresponds to a spatial location in the scene, and the cells of which represent various features such as texture and colour. The shapes of objects and the distances between them can be read-off from the two-dimensional array; they are implicitly represented. This leads to various constraints on the semantics of representations in iconic format. One constraint is as follows:

Any portion of an image [i.e. an iconic representation] is a representation of a portion of the represented object. For example, any portion of an image of my grandmother’s car is an image of a portion of the car. […] In contrast […,] although ‘grandmother’ is part of ‘my grandmother’s car,’ ‘grandmother’ does not represent a part of the car (Kosslyn 1980: 33, 35; see also Kulvicki 2015; Lande 2018a; Quilty-Dunn 2016, 2019).

In other words, it is a semantic constraint on iconic representations that the content of a part of an iconic representation of an object represent a part of that object. A further constraint is that

the internal symbols used to represent an object in an image are not arbitrarily related to the object. If the internal ‘space’ is 2-dimensional, for example, once patterns in two local regions of the space are used to depict two portions of an object, the regions where patterns can be placed to depict the rest of the object are then determined (Kosslyn 1980: 33, 35).

Interestingly, these constraints tell us nothing about the number of objects an iconic representation may be about.Footnote 13 The constraints which are supposed to characterize iconic format do not encode constraints on number, and there is no independent reason for thinking such a constraint is in effect. At the very least, while we lack anything like a definition of iconicity the question is open.Footnote 14

(A similar strategy is available with respect to the idea that perception is analog in format, in contrast to the digital format of cognition (Block ms.). According to Beck (2019),

[a]nalog representation […] involves the representation of one magnitude by a second magnitude such that the second magnitude has the function of increasing or decreasing with the first (334).

For example, if a neuron’s firing rate is (or has the function of being) proportional to the magnitude of external luminance, then the neuron is an analog representation of luminance. Contrast this with a digital representation, the string ‘101’ in binary code represents the number 5, and ‘110’ represents the number 6. There is no sense in which the magnitude of ‘101’ is lower than that of ‘110’ [see Beck (2019)]. In this way, analog representations involve distinctive semantic constraints which digital representations lack. Yet, as with iconicity, these constraints which are characteristic of analog format are neutral with respect to the cardinality of represented objects.)

The upshot of these considerations about iconicity (and analogicity) of format is that it seems to make available the following move in response to our puzzle. The distinction between iconic and discursive formats is perhaps the most influential of its kind in the cognitive sciences. According to some, propositional attitudes like judgment are discursive, and this accounts for the systematic and productive potential of the constituents of such attitudes (Fodor 1975; Burge 2010a). Now, while there are no cardinality constraints on the objects iconically represented in visual experience, there are such constraints in the case of perceptually-based judgments of the form That is yellow. For the judgment-constituent That refers to at most one matchbox, much like the deictic pronouns ‘This F’, ‘That’, and the definite description ‘The matchbox’.Footnote 15 The judgment-constituent That is supposed to pick out the unique object to which the subject of the experience is (in the relevant way) perceptually related. But in Case 3 Elizabeth is seeing two matchboxes. The demonstrative constituent of her perceptually-based judgment fails to refer. In this way, we have independent reason to expect face value to fail.

Of course, positing this systematic difference in format at the interface of perception and cognition is controversial. Some writers claim that there are no general differences in format between perception and cognition (Prinz 2002; Pylyshyn 2003), others that there is no border of any kind (Shea 2014; Lupyan 2015). In a recent contribution, Quilty-Dunn (2019) attempts to resolve the debate by marshalling evidence to argue that, while some visual representations are plausibly iconic, vision outputs discursive object-representations. This helps to explain how such object-representations feed so seamlessly and quickly into practical deliberation and action.Footnote 16 This explanation may be unavailable to proponents of the present solution. Still, we find this line of reply promising.

An alternative way of targeting (1) is to appeal to a difference in whether some further factor—perhaps attention—constrains aboutness in perception versus in cognition.

According to Block’s (2014) overflow thesis, subjects can (consciously) see more objects than they can attend to (and hence store in working memory). Block describes visual attention as having roughly four ‘slots’ to be allocated to seen objects, capturing the fact that it is possible to attend to roughly four seen objects at most.Footnote 17 Now imagine a subject allocating each slot to two seen objects, in some sense thereby attentionally ‘covering’ eight. This would not be a way of attending to eight seen objects, in Block’s sense. The reason is that slots model ways of processing information with a presupposed single source. Perhaps it is in exactly this way that Elizabeth ‘attends’ to B and C, allocating one attentional slot for what appears to be one matchbox but is in fact two. In Block’s sense, then, Elizabeth is not attending to B or to C. (Of course, this does not deprive her of seeing B and C. In that sense, Case 3 is, at least structurally, a case of overflow.)

According to several authors (Campbell 2002; Dickie 2015; Smithies 2011; for criticism see Kelly 2004), attention is necessary for perceptually-based singular thought. Now if the notion of ‘attention’ in play is that on which Block’s above claims are true (if they are true), we have an available explanation as to why Elizabeth cannot form perceptually-based singular thoughts about B and C despite seeing them, and despite their figuring in the content of her perceptual experience: because Elizabeth cannot attend to B or C—even though she sees B and C in the sense of ‘sees’ operative in (2) and (3)—she cannot form perceptually-based singular judgments about them. Therefore (1) is false.

It is crucial to emphasise that this response does not merely turn on the claim that attention is required for perceptually-based singular thought. It requires the claim—which entails the structural possibility of overflow (a highly controversial thesis)—that an object could be a constituent of the content of one’s visual experience despite one’s inability to attend to it.

On both the format- and attention-based replies, perceptual experience is more ‘permissive’ than cognition, and this undermines face value. On the format reply, this is because iconic format involves no semantic constraints on number of referents (in contrast to the discursive format of thought). On the attention reply this is because, unlike perceptually-based thought, no attentional constraints are placed on the semantic scope of perception. While both of these theses are controversial, that they appear to provide elegant solutions to our puzzle counts in their favour.

Before moving on, it might seem as though taking non-discursive formats seriously puts principles like face value in trouble anyway, rendering our puzzle otiose. In particular, it was said that any part of an iconic object-representation is a representation of a part of the represented object. Depending on the mereological complexity of icons, then, the iconicity of perceptual-experiential representation would entail that they may be about many things indeed: the many parts of represented objects. And perhaps this is enough to threaten face value. For face value implies that subjects would be in a position, by attending, to make perceptually-based judgments which involve or are satisfied by each of the represented parts. However, remember that the idea behind face value is not that the total content of a perceptual experience may be endorsed in the form of one complex judgment. It is that the subject is in a position, perhaps by attending or deploying demonstrative thought-vehicles, to endorse or doubt any given aspect of that total content. The only clear reason iconicity alone may threaten face value is if the represented parts are, say, too small (in terms of angular size) to attend to. (If one were able to attend to the represented parts, why would a perceptual judgment not be possible?) If this were the case, face value would indeed be in trouble already. Yet this assumes not only that perceptual-experiential representation is iconic but that a particularly strong kind of overflow is possible: a perceptual experience may represent objects it is not even possible for its subject to attend to.Footnote 18 Our puzzle shows that face value may be false without relying either on this controversial claim or on the assumption of iconicity [as conceived by Kosslyn (1980)].

4.3 Positing equivocation between (2) and (3)

We can imagine an objector drawing our attention to a ‘purely causal’ disambiguation of the perceptual transitive verb ‘sees’ which, they would argue, is not connected to the content(s) of perceptual experience in the way posited by seeing. The thought would be that it is only in this attenuated sense of ‘seeing’ that (3) is true, and this attenuated sense of ‘seeing’ is insufficient to render seeing true. To bring this out, consider the following case.

Case 4

Imagine a wall which is perfectly white except for a patch of red paint. This patch happens to be obscured by a no-less perfectly white, and perfectly camouflaged, sheet of paper, which has been pasted to the wall. Alice is sitting in the room and is facing the wall with her perceptual faculties fully functioning. She forms a perceptual demonstrative judgment about the wall to the effect that it is completely and perfectly white.Footnote 19

Although Alice does not notice the paper, an observer might truly utter ‘Alice can’t see the red patch because she’s seeing the paper.’ Arguably, all that is being tracked by the truth of this sentence is the paper’s playing a certain causal role in the production of her visual experience. And it is far from clear that true ‘sees’-claims of this sort impact on content. By analogy, then, it is unclear that the truth of (3) entails that the content of Elizabeth’s experience involves or is otherwise about B and C.

Our reply to this complaint is as follows. We simply cannot detect any reading of ‘Elizabeth sees B and sees C’ which is false. As a result, we cannot detect any purported ‘strong’ reading of (3), required to entail (4) along with (1) and (2), which is false. To our ears, there is just no good sense in which Elizabeth does not see B and see C.

While we are tempted to leave this response here, we expect those sympathetic to this way out of the puzzle will not be appeased. So we continue our reply. That there are multiple interpretations of ‘sees’-locutions, some of which do not support seeing, does not by itself undermine our appeal to seeing in Case 3 and in our argument. What must be shown by a critic of the argument above is that the only sense of ‘sees’ in which ‘Elizabeth sees B and sees C’ is true is a sense which is too weak to vindicate seeing. How might this be argued for?

One might claim that it conflicts with the popular view on which perception is a matter of ‘discriminating and singling out’ particulars.

Perception’s functioning to single out particulars figures in the veridicality conditions of a perceptual state. Whether the state is accurate or not hinges on whether it succeeds in singling out relevant particulars (Burge 2010b: 27).

Perception is constitutively a matter of employing perceptual capacities that function to discriminate and single out particulars (Schellenberg 2018: 13). […] To a first approximation, singling out a particular is a proto-conceptual analogue of referring to a particular. […] discriminating and singling out a particular from its surround is a [metaphysically] necessary condition for perceiving the particular (2018: 25).

One natural response, of course, is to see Case 3 as a straight-up counterexample to these a priori claims—as a failure for these writers to digest Anscombe’s (1974) original thought experiment. Elizabeth cannot discriminate or single out B and C from the other, yet, as (3) says, she sees both. Sure, Burge and Schellenberg’s proposed constraint may look plausible if one restricts one’s focus to typical, everyday cases of perceptual experience, and in particular to visual experience (though Schellenberg (2018: 25) conjectures that the constraint applies to all sense modalities). But atypical cases which exemplify the complexity of the phenomenon raise question marks. To take another example, it is not clear that any ‘discrimination’ or ‘singling out’ occurs in a Ganzfeld visual display, in which a subject is presented with a homogeneous, ‘space-filling fog’ of a single colour. But seeing obviously does (see Block ms.). It is also far from clear that the psychological capacity picked out by ‘singling out’ in the case of vision, if there is one, will be present in olfaction or touch. Leaning on an appeal to ‘singling out and discriminating’ is not a particularly convincing way of motivating a strong notion of ‘seeing’ required to vindicate seeing.

Perhaps Burge and Schellenberg’s remarks are intended simply as an expression of common-sense, to be refined by further theorizing.Footnote 20 If that is their role, then, while we are not unsympathetic, the correct response is not to see them as providing grounds to reject (3) but to take the implausibility of denying (3) as offering a constraint on how we should precisify the ‘single out and discriminate’ proposal. Here, notice that there are some senses in which Elizabeth is in a position to single out and discriminate each of B and C. She is able to detect the presence or absence of each of B and C (since the removal of either would likely induce binocular rivalry), detect changes in the visible properties of each of B and C (since certain changes in colour, shape, orientation etc. would induce binocular rivalry), differentiate B and C from their respective backgrounds, and successfully categorize B and C (for instance, as matchboxes).

Elaborating her view, Schellenberg writes: “it is unclear what it would be to perceive a particular without at the very least discriminating and singling it out from its surround” (2018: 25). This is said to involve “scene segmentation, border and edge detection, and region extraction” (ibid.). It is quite plausible that Elizabeth has many of these capacities. Schellenberg does not say whether she takes these capacities to each comprise a necessary condition without which there is no perception, or instead as a cluster of features which serve to home in on a natural kind which is the subject of vision science, some of which may be absent in certain cases (Block (ms.)). The latter, more charitable reading is suggested by the following remark: “If there is no discriminatory activity, it is unclear how he could be perceptually aware of the cup. [P]erceiving the cup will involve discriminating it in some way from its surround” (25; emphases added). In that case, (3) is compatible with Burge and Schellenberg’s thesis about the nature of perception.Footnote 21 That thesis cannot on its own be used to escape the puzzle.

To summarize, the notion of ‘discriminating and singling out’ stands in need of precisification. We propose that the difficulty of hearing any sense of (3) which is false gives us insight into how we ought to precisify Burge and Schellenberg’s constraint.

Similar remarks would apply to the suggestion that the kind of seeing required to vindicate seeing is some kind of attentional seeing [perhaps because all conscious seeing is attentional (Prinz 2012)], which Elizabeth is incapable of. As described in Sect. 4.2, a responder might point out that since Elizabeth does not allocate one ‘attentional slot’ to B or C, and hence is not attending to either, she does not see either in the sense of ‘sees’ operative in seeing. So (4) does not follow. As with the ‘discriminating and singling out’-based reply, while the attentional constraint on seeing sounds plausible in paradigm cases, the difficulty of hearing any sense of (3) which is false means that insofar as we are convinced of the attentional constraint on seeing, the truth of (3) should be used as a constraint on precisifying the relevant notion of attention required to vindicate seeing. Should those efforts to precisify fail, Case 3 may be taken to undermine the attentional constraint itself, it being an instance of a specific kind of overflow.

4.4 Denying (5)

Finally, representationalists may look to (5) for a way out of the puzzle. Perhaps Elizabeth’s perceptually-based judgment That is yellow expresses the plural content <<B, C>, are yellow>? Even assuming that the perceptual-demonstrative thought-constituent That does not have cardinality constraints of the sort assumed in Sect. 4.2, however, it is not very plausible that one’s self-awareness of what one is judging could be so off that one takes oneself to be thinking about an individual when one is in fact thinking about a plurality.

A somewhat more plausible way in which representationalists might deny (5) is by claiming that it is indeterminate which content Elizabeth’s perceptually-based judgment That is yellow has. Either it is <<B>, is yellow>or <<C>, is yellow>, it is just indeterminate which. Of course, given the determinate truth of (2) and (3), this reply suggests that it is indeterminate which matchbox Elizabeth sees. And this is not the right result: she sees both.

We think the most promising route in the vicinity is to suggest that Elizabeth’s perceptually-based judgment That is yellow conveys multiple contents, some singular with respect to B and some with respect to C. When Elizabeth has the thought That is yellow, this one perceptual-demonstrative singular judgment-token is about both B and C by virtue of expressing multiple contents. Whether Elizabeth’s visual experience also expresses multiple contents, or instead the plural content <<B, C>, are yellow>, we can say that she is, in accordance with (1), in a position to take her visual experience at face value and author a perceptually-based judgment which is about B and about C. On this approach, while (4) goes through, contradiction is avoided because (8) does not.

Note that this response denies an assumption we leaned on in Sect. 4.2 to the effect that the judgment-constituent That is about at most one object. While this strategy may seem ad hoc, similar features underpin the interest in multiple contents views in the philosophy of language.

Consider, for example, the literature on vagueness. Call a function from sentences, or vehicles of thought, to semantic values an interpretation. On one way of implementing supervaluationist ideology, inspired by some of David Lewis’s remarks, vagueness is a partly metasemantic phenomenon.Footnote 22 “Whatever it is that we do to determine the ‘intended’ interpretation of our language determines not one interpretation but a range of interpretations” (Lewis 1993: 172).Footnote 23 Vague languages have not one admissible interpretation but many. If our use of a language L is too coarse-grained in this way to induce a total ordering on interpretations, a vague sentence of L on an occasion of use may express the many contents which it is assigned by its admissible interpretations—contents sufficiently similar as to go undetected by language users.

On this setup, the truth of a sentence of L on an occasion of use is truth on an admissible interpretation of L. Unlike standard supervaluationism, this view may retain bivalence. How? Won’t ‘Prince William is bald’ come out both true and false, being true on some admissible interpretation(s) of L and false on others? Not if we deny that sentential truth is a monadic property. There is no way to evaluate a sentence of L for truth simpliciter, even on an occasion of its use at a context. A use of ‘Prince William is bald’ is instead true-on-i1, false-on-i2, etc. Of course, contents will still instantiate the usual monadic properties of truth and falsehood.Footnote 24

Now whether or not this is the right way to go about diagnosing vagueness in language, we might apply this way of thinking to the form of representation in Case 3. We can even put the point by saying that the reference-determining facts in Case 3 are too coarse-grained for Elizabeth’s perceptually-based That-thoughts to have a unique admissible interpretation. Suppose that what determines the reference of a perceptually-based demonstrative judgment is its being caused in an appropriate way by an object, or perhaps the judgment’s being caused in an appropriate way by an object many of whose properties one is reliably able to get right by taking advantage of that causal relation. Since both B and C meet these kinds of conditions with respect to Elizabeth’s perceptually-based That-judgments, those judgments express multiple contents—some singular with respect to B and others singular with respect to C. In general, when a subject exercises a perceptual-demonstrative thought-vehicle in such cases, she thereby entertains all (and only) those propositions to which that thought-vehicle is mapped by admissible interpretations. It is not indeterminate which thing, B or C, Elizabeth’s perceptually-based judgment That is yellow is about. It is about both.

For those who like to think of the vehicles of singular thought as being mental files, we could put this by saying that it is possible for one perceptual-demonstrative file to be about many things. A possible worry here is that this endangers the validity of the distinctive inferential transitions which a mental file allows. A subject with a mental file treats beliefs in that file as being about the same thing. She will be disposed to ‘trade on identity’—to transition from beliefs of the form α is Φ and α is Ψ to Something is both Φ and Ψ (Campbell 1987). It is often held that if one (at least synchronically) authors two perceptual-demonstrative judgments, e.g. That is yellow and That is a matchbox, one is in a position to know that the two demonstrative thought-tokens co-refer if they refer at all (Recanati 2012: 132). The idea is that one’s basis for thinking the pair of thoughts affords a basis for one’s recognizing that the two referentially stand or fall together. Such ‘mental files’ theorists will need reassurance that our multiple contents picture does not threaten the validity of such thought-patterns as That is yellow, That is matchbox-shaped, Therefore, something is both yellow and matchbox-shaped, if authored synchronically.

The many contents view does not threaten the validity of trading on identity, nor one’s capacity to know immediately (Campbell 1987) or even infallibly (Recanati 2012) that one trades on identity when one does so. In a chain of reasoning (where the range of admissible interpretations remains fixed) of the form That is F; That is G; therefore, That is both F and G, we are to treat the interpretation of the demonstrative as uniform throughout, so that when one reasons in this way one comes out as entertaining many univocal patterns of argument each of which is valid.Footnote 25

We find this solution sufficiently plausible to be worth further exploration. That representations might express not a unique content but instead the multiplicity of contents assigned to them by admissible interpretations is not (yet) a popular view. This is not due to its implausibility, however, but because of assumptions which have long been taken for granted about the relation between sentences or thought-vehicles and their contents. Once again, this would be a surprising conclusion to reach, yet it is perhaps the least revisionary of those we have considered.

5 Conclusion

This paper has presented a puzzle for representationalism about perceptual experience. The solutions (which need not be seen as mutually exclusive) have significant implications for our understanding of the nature of perceptual experience and its interface with cognition, attention, and related phenomena. Representationalists must investigate these implications further.

First and foremost, perhaps face value is false. Perhaps perceptually-based thoughts can systematically fail to preserve representational properties in virtue of which some object is perceptually picked out; either because (a) perceptual-experiential representation is iconic (or analog) in format whereas cognition is discursive in format, and iconic (or analog) formats—unlike discursive formats—do not carry constraints on number of referents, or (b) the objects a perceptual experience may be about need not be attended-to whereas the objects a perceptually-based judgment may be about must be attended-to (in a sense of ‘attended-to’ which the subjects in Cases 2 and 3 are incapable of).

As we indicated in Sect. 4.2, positing the systematic format difference in (a) is controversial. While the idea is consonant with enough of the literature to be rather promising, this could change as the notions of iconicity and analogicity are sharpened by empirical investigation. As it stands, however, we think this may be the most compelling moral to draw from our puzzle. Representationalists should take seriously the idea that perceptual-experiential representation has a format somehow less constrained than cognition, lending it a broader semantic scope. The fruit of the puzzle is that it provides an insight into the nature of this format, suggesting that there is an important sense in which the iconic or analog representations involved in perceptual experience do not encode constraints on number (of representanda) while the vehicles of perceptual-demonstrative thought do.

On the other hand, the somewhat more controversial diagnosis in (b) leads us to tentatively raise our credence in the existence of a specific kind of overflow, so that the scope of one’s perceptual awareness outstrips one’s capacity to attend (where the notion of attention here is required for perceptual-demonstrative aboutness). In line with the kind of evidence interpreted and marshalled by Block (2012, 2014), we are not unsympathetic with this sort of conclusion. But the distinctive fruit of our puzzle is that it provides further insight into the nature of such attention. It must be something unavailable to the subjects in Cases 2 and 3. This means that the relevant difference between seeing and attending is not one of capacity limitations (Sperling 1960; Block 2014) or ‘grain’ size (Block 2012); rather, the difference is that whereas attention involves something like a ‘one slot per object’ requirement, seeing does not.

An alternative conclusion to which we are somewhat attracted is that a perceptually-based visual-demonstrative judgment of the form That is yellow may express multiple singular contents, differing in respect of which seen object they are about, and between which the subject cannot discriminate. This is arguably the least revisionary conclusion to be drawn, saving the orthodox representationalist principles which underpin the puzzle. It also may have much broader application, for example in the case of demonstrative thoughts about places—e.g. Here is warm—or lumps of matter—e.g. That constitutes a tree—whose imprecision as to the microphysically specific lump or region referred to could be rendered as the expression of many singular contents, one for each of the most eligible candidates. While it may seem ad hoc in the context of the present puzzle, then, its non-revisionary character and its capacity to generalize across to other puzzles of referential imprecision make this reply an empirically safe fallback option should the above two fall on hard times.

Less promisingly, in Sect. 4.3 we considered the reply that in order to see an object in a sense which suffices for it to figure in the content of one’s visual experience one must (a) single out and discriminate it, or (b) attend to it (in a sense of ‘single out and discriminate’ or ‘attend’ which the subjects in Cases 2 and 3 are incapable of). As a priori claims leveraged from typical cases, a natural response is to see our puzzle as an opportunity to precisify what is meant by ‘attend’ and by ‘single out and discriminate’ as they occur in each of these claims, so that the subjects in Cases 2 and 3 do achieve it. In any event, readers in the grip of these claims who are willing to trust their intuitive sense of what they mean may see fit to conclude that the subjects can see B and see C only in a sense too weak to substantiate seeing. We only wish to point out that these principles do require independent motivation—motivation strong enough to prise us away from the sense that there is no reading on which (3) is false. And so far, defenders of these claims have failed to provide such motivation.Footnote 26

Finally, as the discussion in Sect. 4.1 made clear, the claim that Case 3 involves non-distributive plural seeing is, so far as we can see, unmotivated. It is no good drawing analogies with seeing a pile of sand despite failing to see any individual grain, since what explains that phenomenon is ensemble perception, and Case 3 is not an instance of ensemble perception.