Introduction

In many studies that investigate discourse processes, reading time has been measured by presenting a text one sentence at a time, with participants asked to press a key to progress to the next sentence (e.g., Gerrig, Love, & McKoon, 2009; Rapp, 2008; Rapp & Taylor, 2004). This paradigm is similar in some ways to rapid serial visual presentation (RSVP; Forster, 1970), in which text is displayed word-by-word in the middle of the screen for set durations. It is also similar to the moving-window paradigm, in which readers press a key to reveal each successive word (Just, Carpenter, & Woolley, 1982). All three methods allow researchers to identify exactly when certain content is presented to participants and for how long. But reading a text in these ways departs from normal reading as it occurs in the real-world. With respect to the RSVP paradigm, saccadic eye movements typical of normal reading are eliminated entirely (Just & Carpenter, 1980), which results in changes in reading behavior such as an increase in attentional focus (Castelhano & Muter, 2001; Juola, Ward, & McNamara, 1982; Potter, 1984) and faster reading times (Öquist & Goldstein, 2002). Similarly, reading a narrative one sentence at a time prevents readers from backtracking or regressing to previous sentences, something that can occur quite often during everyday real-world reading (Rayner & Pollatsek, 1989; Vitu & McConkie, 2000). So although these paradigms provide a convenient way to either control or measure the flow of information during reading, it is important to consider whether adopting these approaches sacrifices external validity. It may be that reading a narrative one sentence at a time, contingent on a key-press, prevents readers from becoming fully engaged in the events being represented. Or, similarly, this method of text presentation might result in different reading strategies or different levels of effort while comprehending a text compared to typical reading. On the other hand, it may be that readers have little trouble adapting to a sentence-by-sentence presentation, such that they process the text and engage with its contents with ease. Given the prevalence of this paradigm in reading research, it is important to investigate the ecological validity of reading narratives in this way. In order to do so, we examined whether self-paced sentence-by-sentence presentations influence various reading outcomes, namely reading comprehension, text recall, and narrative transportation (i.e., feeling immersed and engaged with story events; Gerrig, 1993; Green & Brock, 2000).

To date, there is insufficient relevant research to make strong predictions as to whether sentence-by-sentence presentations are ecologically valid. Some have questioned this approach and similar methods, whereas other work seems to indicate that there is little issue with respect to ecological validity. For example, Schotter, Rayner, and Tran (2014) argue that systematically controlling eye movements in an RSVP paradigm decreases the accuracy of text comprehension, since this paradigm prevents the regressions that support higher-level linguistic processing (see also Baccino & Pynte, 1998). A faltering in comprehension typically triggers regressions, so preventing this referencing disallows readers from repairing their understanding. In their study, Schotter et al. (2014) asked participants to read garden-path sentences (i.e., sentences that are temporarily ambiguous, leading readers toward a false interpretation) and unambiguous sentences, both normally and in a condition where the words became masked once participants moved their eyes. The authors found that preventing regressions within a sentence negatively impacted comprehension accuracy for both garden-path sentences and normal sentences. Since the sentence-by-sentence reading paradigm also prevents regressions, it is possible that this paradigm might have a similar influence on comprehension accuracy. A different study by Just et al. (1982) compared the self-paced moving-window paradigm to normal eye gaze with respect to word reading time and recall for short passages; these researchers found that the former produced longer reading times on each word but similar performance on recall. Based on the results from these two studies, presenting text in a way that prevents regressions appears to impair sentence comprehension, but preserve recall.

Previous studies have also examined differing ways of presenting discourse with the aim of uncovering the most preferred and efficient method of presenting text on small displays (Castelhano & Muter, 2001; Rahman & Muter, 1999). These studies employed short passages, often just one paragraph, and examined reading preferences, reading speed, and comprehension. The text was presented via self-paced sentence-presentation, self-paced page-presentation, or RSVP of single words presented at a predetermined rate. What these studies found was that sentence-presentation and page-presentation are equally preferred over RSVP, but no differences were observed between any of the text presentation formats with respect to comprehension (Castelhano & Muter, 2001; Rahman & Muter, 1999). Hence, these two studies show that sentence-by-sentence text presentations can be as efficient and effective as conventional page reading by some metrics, which is encouraging news for researchers who use the former to measure reading times. These studies, however, employed very short passages and so it would be valuable to investigate this question using longer, more ecologically-valid narratives, more akin to what is employed in discourse comprehension studies and how people read in the real-world. Doing so would allow us to determine the ecological validity of past work and also allow for more diverse aspects of reader response to be examined, such as engagement with the story.

Fictional narratives have been shown to impact readers in many ways. They can change our attitudes and beliefs (Appel & Mara, 2013; Escalas, 2004; Praxmarer, 2011; Prentice, Gerrig, & Bailis, 1997), self-concept (Djikic, Oatley, Zoeterman, & Peterson, 2009; Richter, Appel, & Calio, 2014), and possibly even our theory-of-mind abilities (Kidd & Castano, 2013; Koopman & Hakemulder, 2015; Mar & Oatley, 2008; cf. Panero et al., 2016). These outcomes seem related to a key feature of stories: enabling individuals to escape their everyday lives and enter a fictional world, forming strong emotional connections with characters and engaging with the plot. The experience of being immersed and absorbed into a fictional world is known as “transportation,” based on the metaphor of being transported into the narrative world (Gerrig, 1993). Transportation is a distinct mental process integrating attention (or loss of access to real-world information), affect (emotional reactions), and ability to form mental imagery (Busselle & Bilandzic, 2009; Green & Brock, 2000). Stories that are well-crafted, have a high degree of realism, and report on events that are worth telling elicit greater transportation (Green, 2004). Seeing as how transportation is an important aspect of engaging with fictional narratives, it is important to examine whether self-paced sentence-by-sentence reading has an impact on reader engagement. In order to do this, lengthier and more ecologically-valid texts must be presented, in comparison to the stimuli employed in past work. Thus, for our study we presented a professionally-written short story, A Small, Good Thing by Carver (1989). Employing a piece of professionally-written narrative fiction as a text not only allows us to study narrative transportation, but doing so also provides greater ecological validity compared to previous studies that employed only short paragraphs or sentences.

The present study thus aims to validate the self-paced sentence-by-sentence paradigm by investigating the effect of presentation format on comprehension, recall, and narrative transportation, whilst using a more realistic narrative text. Sentence-presentations were compared to self-paced presentation of whole pages, the latter of which better parallels a normal reading experience as it allows for regressions between and within sentences. If sentence-presentations result in approximately equivalent reading outcomes as page-presentations, this will provide an important validation of the ecological validity of this widely-used method of measuring reading times. In contrast, if sentence-presentations result in different outcomes, then past studies that rely on this paradigm will have to be examined with greater scrutiny with an eye toward determining if these results can be generalized to real-world reading.

Methods

Participants

One hundred and two students were recruited from an undergraduate research participant pool and all received academic credit for completing two separate testing sessions, held approximately 1 week apart. Eight participants were removed due to missing data. Hence, the final sample consisted of 94 participants (24 males, M age  = 19.71, SD = 4.49). Thirty-four out of 94 participants indicated they learned a language aside from English as their first language.

Measures and materials

Stimulus text

All participants were asked to read the entirety of a short story by Carver (1989), A Small, Good Thing. The story is about a young boy who dies 3 days after being struck by a car on the day of his birthday while walking to school. This story was chosen as it contains a great deal of emotional content, a suspenseful plotline, and is presented in an accessible and engaging prose style. It is 9635 words in length. This narrative was presented either one sentence at a time, controlled by the reader using a key-press (sentence-presentation), or one page at a time also self-paced (page-presentation), with the latter serving as a control condition that more closely mimics real-world reading. Please see Table 1 for information on the level of difficulty of the text, assessed via the Lexile Framework for Reading (www.lexile.com) and Coh-Metrix Text Easability Assessor (http://tea.cohmetrix.com).

Table 1 Text characteristics for “a small good thing” by raymond carver

Immediate and delayed comprehension test

Comprehension of the text was measured using two separate tests comprising 10 multiple-choice questions (four response options per question), one presented directly after reading (Immediate) and the other a week later (Delayed). Items tested knowledge for specific details about the story, with some requiring the retrieval of factual details (e.g., “What age will Scotty be on his birthday?”) and others requiring some degree of inference (e.g., “What made Ann realize who the caller was?”). All questions appear in the Supplementary Materials. Both scales had low levels of internal reliability, likely due to the variability in difficulty for each item along with the different types of story knowledge tested across items (Schmitt, 1996). The immediate comprehension test had a Cronbach’s alpha of .16 and the delayed comprehension test had a Cronbach’s alpha of .29.

Immediate and delayed free recall test

Memory for story details was measured by asking participants to recall as much information from the story as possible, both directly after reading (Immediate) and a week later (Delayed). Two individual coders coded the number of unique idea units recalled by each participant. For every unique detail of the story remembered, the participant was rewarded one point. The inter-rater reliability coefficients between two individual coders for immediate and delayed recall were .96 and .92, respectively.

Transportation scale

Engagement with the narrative was evaluated using the Transportation Scale developed by Green and Brock (2000). The scale contains 16 questions about participants’ ability to visualize the events of the story, emotional engagement, and attentional focus. Responses were made using a 7-point Likert scale, ranging from “not at all” to “very much.” Higher scores on the scale represent a greater degree of engagement with the text. Past research has shown that scores on this measure predict how persuaded people are by the themes of a narrative (van Laer, de Ruyter, Visconti, & Wetzels, 2014; Wyer, Adaval, & Colcombe, 2002). The scale had a Cronbach’s alpha of .81 for the first session and .87 for the second session.

Demographic questionnaire

Participants were requested to provide demographic and background information, including gender, age, and years of education.

Procedure

In the first session, participants indicated their consent and then completed all measures on a computer. Participants were randomly assigned to one of two conditions: sentence-presentation (n = 47) or page-presentation (n = 47). They then read the target story either one sentence at a time or one page at a time, hitting the spacebar or clicking the mouse to progress in both cases. Text was displayed in black 14-point Times New Roman font on a white background (600px × 800px, resolution). For sentence-presentation, sentences were centered on the screen horizontally, consistent with common research methodology (Rahman & Muter, 1999). For page-presentation, text was single-spaced, left justified with paragraph indentations included to be consistent with real-world text presentations. Participants were allowed to re-read within the sentence or page; however once they pressed the spacebar they were unable to go back to the previous page or sentence. The order of presentation for the other measures was the same for all participants. Once the reading of the narrative was completed, participants were administered the Transportation Scale, Immediate Free Recall Test, and Immediate Comprehension Test. A general demographics questionnaire was administered last. The second session took place approximately 1 week later, during which participants were administered the Delayed Free Recall Test, Delayed Comprehension Test, and Transportation Scale once again.Footnote 1 At the end of the second session, participants were debriefed by the experimenter and received partial course credit for their time. Each testing session took approximately 1 h to complete.

Results

Mean scores and standard deviations by text presentation format and session for all measures are reported in Table 2. Participants in the sentence-presentation and page-presentation conditions were roughly equivalent in age and years of education, Age: t(92) = .21, p = .84, d = .04, 95 % CI [−.45, .36]; Education: t(92) = .38, p = .70, d = .08, 95 % CI [−.48, .32].

Table 2 Mean scores and standard deviations on all measures by text presentation format and session

Bayesian independent samples t tests were performed to examine the likelihood that the two presentation formats result in differences in comprehension, recall, and transportation (H1), compared to the likelihood that the two presentations result in equivalent outcomes (H0), using JASP v.0.7.5.5 (JASP Team, 2016; jasp-stats.org). Bayesian model selection offers the possibility to use prior knowledge to compare hypotheses based on specific expectations. The output is a Bayes Factor (BF), which provides information on the probability or likelihood of the null and alternative hypothesis (Kass & Raftery, 1995; Rouder, Speckman, Sun, Morey, & Iverson, 2009). A BF01 is the likelihood of the data given the null hypothesis over the likelihood of the data given the alternative hypothesis (BF10 is the opposite). For example, a BF01 of 2.00 would indicate that the data observed are twice as likely under the null hypothesis (of no difference) compared to the alternative hypothesis (of a difference, the magnitude and range of which is specified by the prior). One advantage of using Bayesian statistics is that the assumptions are not based on normality, making it a suitable approach for relatively small sample sizes (Gill, 2014). Moreover, Bayesian analyses allow researchers to evaluate the likelihood of the null hypothesis, unlike traditional null hypothesis statistical testing (NHST), which assumes the null is true (Fraley & Marks, 2007).

Bayesian independent t tests were performed using the Jeffrey–Zellner–Siow Prior (Jeffreys, 1961), which is a prior used in a previous reading study (Abbott & Staub, 2015). In addition, a cauchy prior width of .40 was chosen because it corresponds to a medium effect size (Cohen, 1988). See Fig. 1 for the Bayes factor robustness check of each measure within each session. These figures illustrate how the results would change if wider priors, indicating greater a priori uncertainty regarding the likelihood of an effect, were adopted. In light of the dearth of past research, greater uncertainty regarding the outcome would seem to be warranted.

Fig. 1
figure 1

Bayes factor robustness check for immediate and delayed comprehension, recall, and narrative transportation. All figures are taken directly from the JASP output

We found that overall reading times were longer for the sentence-presentation condition than page-presentation condition, t(92) = 2.54, p = .013, d = .52, 95 % CI [.11, .93], BF01 = .25, BF10 = 3.97. For the measures administered in the first session, participants were about equally engaged with the story across conditions, as measured by the Transportation Scale, t(92) = 1.21, p = .23, d = .25, 95 % CI [−.17, .65], BF01 = 1.67, BF10 = .60.Footnote 2 And so, the data are roughly 1.7 times more consistent with there being no difference in narrative engagement based on whether one is reading sentence-by-sentence or a page at a time. Small differences were observed for both immediate recall and comprehension, with those in the sentence-presentation condition exhibiting better performance in both cases. The NHST results fell just above traditional threshold for statistical significance (and with confidence intervals that include zero) and the Bayesian results indicated weak evidence in favor of H1 for immediate comprehension and immediate recall, Immediate Comprehension Test: t(92) = 1.93, p = .057, d = .40, 95 % CI [−.01, .81], BF01 = .71, BF10 = 1.41; Immediate Free Recall Test: t(92) = 1.72, p = .089, d = .35, 95 % CI [−.06, .76], BF01 = .96, BF10 = 1.04. Comprehension and recall were weakly, and not statistically significantly, related immediately after reading the short story, r(94) = .16, p = .13, 95 % CI [−.05, .35].

For measures administered in the second session, participants in the sentence-presentation condition did not differ much from those in the page-presentation condition for our measures of narrative engagement, memory, or recall. In all cases the Bayesian result found either greater evidence in favor of the null, or no evidence in favor of either the null or the alternative hypothesis, Transportation Scale: t(92) = .68, p = .50, d = .14, 95 % CI [−.27, .54], BF01 = 2.45, BF10 = .41; Delayed Free Recall Test: t(92) = 1.33, p = .19, d = .27, 95 % CI [−.13, .68], BF01 = 1.49, BF10 = .67; Delayed Comprehension Test: t(92) = 1.33, p = .19, d = .27, 95 % CI [−.14, .68], BF01 = 1.49, BF10 = .67. Comprehension and recall were moderately associated when measured during this second session, r(94) = .37, p < .001, 95 % CI [.18, .53]. See Fig. 1 for Bayes factor robustness check for each measure and session.

Discussion

The goal of the present study was to investigate the ecological validity of implementing a sentence-by-sentence paradigm when examining the reading of discourse, using a longer and more ecologically-valid text compared to past studies. To achieve this, a short story high in emotional content was presented as sentences and the outcomes were compared to those observed after reading the same story presented as individual pages. In general, we found little evidence that reading a story sentence-by-sentence leads to differences in long-term comprehension and recall, compared to reading one page at a time. A major goal of this study was to examine whether different text presentation paradigms influence how engaged readers become with a story. We observed no differences between text presentation formats with respect to narrative transportation, validating past work that relies on sentence-by-sentence presentation paradigms with respect to narrative engagement. Overall, these findings provide an important validation of the sentence-by-sentence paradigm, indicating that it can be employed to study reading times with little fear that reading processes and outcomes will not resemble those observed during real-world naturalistic reading.

On the surface, self-paced sentence-presentation formats look like a rather artificial method of reading, raising the question of whether this approach provides an experience that is ecologically valid. However, participants were equally engaged in the storyline across the two forms of text presentation, demonstrating that sentence-by-sentence paradigms also have the ability to absorb readers into a fictional world, where they identify with characters and feel emotionally involved with the narrative. In addition, the differences in comprehension and recall across presentation formats appear to be rather minimal.

Some exceptions to this lack of difference between conditions were observed. In particular, participants read the short story slower when it was presented as isolated sentences compared to pages. As a result, reading time measurements based on sentence-by-sentence presentations may be different from more those found during natural reading based on pages of text. That said, in most cases researchers are interested in differences in reading times between key sentences rather than the reading times themselves and our data do not permit any inferences regarding whether differences in reading time are affected by sentence-presentations. The sentence-presentation condition also resulted in higher scores for comprehension and recall when measured immediately after reading the short story. Although the magnitude of these differences was not large, Bayesian analyses found greater evidence for the alternative hypothesis (a difference between presentations) for immediate comprehension, recall, and overall reading times. One possible explanation for these differences is that readers in the sentence-presentation condition may have been more likely to read each sentence carefully, knowing that this was the only opportunity to process this information (i.e., it would be impossible to re-read the sentence by regressing to it one a key has been pressed). In contrast, readers in the page-presentation condition may have read more quickly, skimming the text because they felt overly comfortable in the knowledge that they could re-read parts of the page if they felt they did not understand what was happening. Hence, it is possible that readers of each presentation type employed different strategies. These are purely speculative post hoc interpretations, however, and it should be emphasized that the differences between conditions for these outcomes were not large.

One limitation of the current study is that only a single narrative was presented and therefore it is unclear whether the results generalize to other short stories or lengthier pieces of published fiction. It is possible that with longer pieces of fiction, the small effects in favor of sentence-presentation for immediate comprehension will be diluted and possibly disappear. In addition, it is worth noting that Raymond Carver’s style is straightforward and rather expository, which may have reduced the readers’ need to regress to earlier portions of the text. Other styles of writing (e.g., frequent use of free indirect discourse) may be more challenging and more likely to necessitate regressions, resulting in poorer comprehension for sentence-presentations. Hence, future studies should aim to examine the ecological validity of the sentence-by-sentence presentation paradigm with longer narratives and stories that contain other styles of writing (e.g., word play, and allusions). It would also be useful to replicate this study using scales that capture other relevant experiences associated with reading that are not captured by the transportation scale (e.g., reflection and insight; Miall & Kuiken, 1995). Furthermore, this study could be improved upon in the future by employing a larger sample. This would improve statistical power and reduce the width of confidence intervals around our point estimates. The Bayesian analyses we report are independent of sample size, however, and so are not influenced by this factor.

In conclusion, our data demonstrate that it is appropriate to use sentence-reading paradigms in discourse research. Based on these findings, any differences in how readers process a text sentence-by-sentence versus page-by-page do not appear to manifest large differences for outcomes like comprehension, recall, or engagement. As a result of this study, past research on narrative that utilizes a sentence-presentation paradigm can now be viewed with greater confidence with respect to ecological validity.