In a now seminal paper, Wineburg (1991) compared the reasoning of expert historians to that of Advanced Placement history students. He found experts’ historical reasoning to be characterized by their use of three heuristics, those of sourcing, corroboration, and contextualization. Sourcing refers to experts’ tendency to consider document information (i.e., text metadata like author and publisher) in interpreting texts’ content. Corroboration refers to experts’ comparison of texts to one another, while contextualization describes historians’ placement of texts in a temporal and spatial context. Since Wineburg’s (1991) initial work, examinations of students’ use of these three heuristics have been carried out primarily in the domain of history (Britt & Aglinskas, 2002; Nokes, Dole, & Hacker, 2007; Rouet, Favart, Britt, & Perfetti, 1997; Stahl, Hynd, Britton, McNish, & Bosquet, 1996), with some indication that these heuristics may be more broadly applicable as well (Bråten, Strømsø, & Salmerón, 2011; Goldman, Braasch, Wiley, Graesser, & Brodowinska, 2012; Shanahan, 2009; Stadtler & Bromme, 2013). Consistent with Wineburg’s (1991) initial work, students, even at the graduate level, have been found to rarely engage any of these heuristics, with corroboration and contextualization used least often of all (Nokes et al., 2007; Rouet et al., 1997). In this study, we build on this prior work in two primary ways. First, we examine students’ applications of the multiple text use heuristics that Wineburg (1991) identified beyond the domain of history, when students are asked to complete a multiple text task addressing a controversial topic in education (i.e., gifted education). Second, considering the infrequency with which students have been found to use the heuristics identified, we examine performance differences when students are explicitly instructed to engage in sourcing, corroboration, or contextualization during a multiple text task. More generally, we examine the associations among students’ text access, note taking, and written response composition during multiple text task completion.

Sourcing, corroboration, and contextualization

In the most comprehensive study following Wineburg’s (1991) initial work, Rouet et al. (1997) examined history (i.e., disciplinary specialists) and psychology (i.e., disciplinary novices) graduate students’ engagement in sourcing, corroboration, and contextualization when completing a multiple text task about the construction of the Panama Canal. Sourcing was assessed according to the number of references that students included in their essays composed based on multiple texts, with no differences found across graduate students in psychology vis-à-vis history. Corroboration was identified when students connected documents with similar points of view (i.e., positive connections) or identified documents as conflicting with one another (i.e., negative connections) in their essays, with some students linking groups of texts together more generally (i.e., general references). Nevertheless, Rouet et al. (1997) found graduate students, across domains, to include less than one instance of corroboration in the essays that they composed. Finally, contextualization was assessed by students’ provision of information based on their general world knowledge (i.e., general context statements), domain knowledge of history (i.e., historical context statements), or specific topic knowledge about the construction of the Panama Canal (i.e., problem context statements). It was here, in the contextual statements that individuals produced, that domain novices and domain specialists differed, with history graduate students including significantly more statements related to historical context knowledge in their essays than their psychology counterparts. As such, Rouet et al.’s (1997) investigation is important for at least two reasons. First, it provides a model for how sourcing, corroboration, and contextualization may be assessed in students’ writing, rather than during processing, as was done in Wineburg’s (1991) study. Second, it suggests that these three heuristics may be used not only by discipline experts to reason about history, but rather by discipline novices, as well.

Assessing sourcing, corroboration, and contextualization

While Wineburg (1991) used a think-aloud approach to identify experts’ and novices’ use of sourcing, corroboration, and contextualization, follow-up studies have favored trace-based and writing-based approaches to capturing students’ use of these strategies. Trace-based approaches focus on students’ processing during multiple text use. These approaches have examined various indicators, including computer log-files, data from eye-tracking, and students’ notes, for records of strategic processing when students learn from multiple texts (e.g., Gottschling, Kammerer, & Gerjets, 2019; Hagen, Braasch, & Bråten, 2014; Salmerón, Gil, & Bråten, 2018; Wiley et al., 2009). For instance, List and colleagues (List & Alexander, 2017, 2018; List, Alexander, & Stephens, 2017) used log files to capture the frequency with which students accessed document information (e.g., author background and credentials) when engaged in a multiple text task, while Britt and Aglinskas (2002) captured sourcing according to the volume of document and author information included in students’ notes.

Writing-based assessments of sourcing, corroboration, and contextualization have looked to students’ task products or written responses for evidence of strategy use. Sourcing has been examined most directly as reflected in students’ citation use during written response composition (Britt & Aglinskas, 2002; Du & List, 2020; List et al., 2017; List, Du, Wang, & Lee, 2019; Rouet et al., 1997; Wiley & Voss, 1999). Corroboration has been assessed both via students’ explicit juxtaposition of information from across texts (e.g., Rouet et al., 1997) and as reflected in students’ use of various connective terms (e.g., causal, temporal) to link information across texts (Britt & Aglinskas, 2002). Contextualization has been assessed by categorizing the statements included in students’ written responses as reflecting prior (i.e., contextual) knowledge (Rouet et al., 1997) or as transformed, corresponding to the combination of text-based information with information not from texts (Wiley & Voss, 1999). In this study, we use both trace-based and writing-based methods to determine the extent to which students engage in sourcing, corroboration, and contextualization, when explicitly instructed to do so during completion of a multiple text task.

Sourcing, corroboration, and contextualization beyond history

While the bulk of studies examining students’ use of these three heuristics have been in the domain of history, there is reason to believe that sourcing, corroboration, and contextualization may be more generally applicable strategies for learning from multiple texts. First, sourcing, or the evaluation of texts based on author and document information, has been found to be a key multiple text use strategy across domains (Bråten, Strømsø, & Britt, 2009; Bråten et al., 2011; Du & List, 2020; List et al., 2019; Thomm & Bromme, 2016). Stadtler & Bromme (2014) explicitly emphasized the importance of sourcing for learning in science. This is because students often lack the knowledge and resources necessary to independently verify scientific claims (i.e., make primary trustworthiness judgments); in such instances students are required to make secondary judgments about which author(s) to defer to or which sources to trust. These secondary judgments reflect sourcing or the evaluation of texts’ quality based on author characteristics. More generally, students’ engagement in sourcing through citation, or information attribution, is an academic practice widely used across domains (Burton & Chadwick, 2000; Hyland & Bondi, 2006).

Corroboration, as a multiple text use heuristic, has, likewise, been investigated across areas of study. Wiley et al. (2009) examined students’ multiple text strategy use during completion of a science inquiry task. In particular, students were provided with a set of seven texts, deliberately constructed to include some overlapping but otherwise unique information on the eruption of Mt. St. Helens. Although uncommon, when evaluating the reliability of the websites provided, students were found to engage in corroboration to at least some extent, describing websites that: “matched” and “kinda went along with” other texts (p. 1083). At the same time, echoing results from the domain of history, only 13% of students identified the corroboration of information across texts to be a method of determining the reliability of scientific information. In a follow up study, Goldman et al. (2012) found students performing comparatively better on a multiple text task to revisit more reliable websites, as compared to poorer performing students, a possible indicator of the association between corroboration and multiple text task performance in science.

In the area of literature, students’ intertextual reading, requiring corroboration, has likewise been examined as key to literary interpretation (Goldman, 2004; Hartman & Hartman, 1993). Many (1996), in a study of Scottish high school students, found them to engage in inter-textual strategy use across both literary and informational genre texts. For literary texts, in particular, students engaged in intertextuality to develop an enriched understanding of a given text (i.e., to understand an imaginary world more richly and deeply) and for personal acknowledgment (i.e., to form a personal connection to text content), with both of these processes considered to be associative, or corroborative, in nature. As such, across the areas of history, science, and literature, corroboration, intertextuality, and the juxta-positioning of texts in relation to one another have been demonstrated to be foundational skills for learning from multiple texts (Goldman, 2004; Goldman et al., 2016; McCarthy & Goldman, 2019).

Contextualization as a multiple text use heuristic has received perhaps the least attention of all, likely due to studies of multiple text use often involving low-knowledge samples, with students having limited contextual knowledge available to bring to bear on processing (Reisman & Wineburg, 2008; Van Boxtel & Van Drie, 2012). Indeed, students’ irrelevant or inaccurate prior knowledge engagement has sometimes even been found to hinder comprehension (Wolfe & Goldman, 2005). Nevertheless, given that prior knowledge has commonly been associated with students’ performance on multiple text tasks, across domains (Bråten, Anmarkrud, Brandmo, & Strømsø, 2014), we can assume that knowledge-based contextualization may help with inferencing and with the interpretation and evaluation of texts’ content across domains (Anmarkrud, Bråten, & Strømsø, 2014; Strømsø, Bråten, & Samuelstuen, 2003).

Throughout this earlier work, contextualization, or students’ prior knowledge use, was examined as a deliberate strategy, engaged by learners to support comprehension. Still, other work has suggested that the engagement of prior knowledge to validate claims in texts may also be an automatic process. In particular, the Two-Step Model of Validation (Richter, 2011; Richter & Maier, 2017) suggests that during comprehension students engage in two types of validation based on their prior knowledge (or beliefs). The first type, non-strategic validation, is automatic and tends to result in students’ construction of prior-knowledge-biased or prior-belief-biased mental models of the information presented across texts. This reflects students insufficiently considering conflicting or knowledge- or belief-inconsistent information during reading. The second type of validation, strategic elaboration, involves students’ deliberate reasoning about knowledge- or belief-inconsistent information and results in students constructing a more balanced representation of information introduced via texts, including belief consistent and belief inconsistent information.

In this study, we deliberately cued students to contextualize or evaluate information using prior knowledge, in the hope of engaging this latter, deliberate form of validation. Additionally, we chose a topic for our multiple text task that we expected students to be broadly familiar with. That is, we expected students to have both personal and academic knowledge of or beliefs about gifted education and we expected these to support students’ strategic elaboration. Indeed, Barzilai, Thomm, and Shlomi-Elooz (2020) recently demonstrated that students rely more-so on prior knowledge when evaluating information about a familiar vis-à-vis an unfamiliar topic.

Still, based on Richter and Maier’s (2017) work, we expected students, across conditions, to engaged in non-strategic validation, as a routine part of their reading process. We did not expect such non-strategic validation to necessarily be adaptive as it may have led students to ignore or neglect prior-knowledge- or belief-inconsistent information during processing, in favor of information that comported with their existing knowledge or beliefs (Britt, Rouet, Blaum, & Millis, 2019).

Topic-specific instantiations of sourcing, corroboration, and contextualization

In this study, we examine students’ sourcing, corroboration, and contextualization when learning about a controversial topic in education (i.e., gifted education). This topic was selected for three primary reasons. First, nested in education, this topic, allowed us to examine students’ multiple text reasoning beyond the realm of history. Second, this topic was complex, with a variety of stake-holders (i.e., parents, teachers, researchers) involved. This allowed us to attribute texts to a variety of differentially qualified and motivated authors and therefore to examine students’ sourcing when reading about a controversial topic across multiple texts. Relatedly, this topic was one which was multifaceted and controversial in nature, allowing both complementary and conflicting texts to be presented to students, facilitating cross-textual corroboration. Finally, this topic was selected because of the contextualization that it would allow. Specifically, this topic was expected to tap both students’ general, world knowledge about schooling in the United States and more specific, academic knowledge about gifted education. In particular, while we expected all students to have some general, contextual knowledge about schooling from their own experiences as students, we also knew that all of the students in our sample would have domain knowledge about education, and that a number of students would have taken specific courses in gifted and talented education, resulting in topic knowledge relevant to the task. As such, we expected this topic to allow students to engage in contextualization at all three levels of knowledge identified by Rouet et al. (1997).

In sum, the topic of gifted and talented education afforded us the opportunity to examine sourcing, corroboration, and contextualization as reasoning strategies within a novel domain context (i.e., that of education). Sourcing was expected to reflect students’ consideration of author and document information in evaluating texts. In particular, texts in this study were variably attributed to expert authors or to individuals holding applied expertise (i.e., a parent, and a teacher). Sourcing in this context was expected to result in students’ recognition of these author features and consideration of these features when evaluating and writing about texts. We expected the use of the sourcing strategy, in this case, to generally reflect what students are often asked to do when reading various perspectives in the popular press (e.g., students may elect to defer to research-based vis-à-vis applied points of view, or vice versa).

Corroboration, in this case, involved students comparing and contrasting information across texts. While Wineburg (1991) specifically examined corroboration as the comparing of facts and details in historical documents as a means of ascertaining their accuracy and veracity, we view corroboration as having both this local level purpose (i.e., verifying specific information), alongside a more global goal, wherein diverse perspectives are able to be juxtaposed across texts, with this more global connection formation better reflected in Rouet et al.’s (1997) work. To capture both of these aspects of corroboration, the texts used in this study reflected both overarching perspectives supporting and opposing gifted and talented education and specific information, able to be compared across texts (e.g., one text reporting that students learned better within homogenous ability groups, while another text presented evidence for heterogenous grouping).

Within the context of this study, we also viewed corroboration as a component of (although not synonymous with) integration. Corroboration was thought to reflect the explicit comparison of information across texts, whereas integration was considered to encompass both students’ connection formation across texts during processing (i.e., integrative reasoning) and the ultimate representation of texts that students formed after reading (i.e., integration performance as assessed through writing). While corroboration is a form of integrative reasoning, other types of integrative reasoning have also been documented (e.g., elaboration, or the use of one text to expand on information in another, comprehension support, or the use of information in one text to better understand another, Anmarkrud et al., 2014). In this study, we expected corroboration, alongside other types of integrative reasoning, to contribute to students’ integration performance or to their formation of a singular, coherent mental model of information presented across multiple texts.

Finally, contextualization in this study was viewed as students’ marshalling of prior knowledge or experience to understand and evaluate information in texts. We considered contextualization to be a highly domain general strategy, applicable beyond the domain of history. At the same time, we also expected prior knowledge to manifest in a somewhat topic-specific way in this study. That is, while the facilitative role of prior knowledge for reading may be considered to be a domain-general characteristic of contextualization, the type of knowledge engaged across domains may be somewhat different. Because students lack direct personal experience with history, Wineburg (1991) and others (Reisman, 2012) view contextualization as the placement of texts within a historical and temporal context, whereas, in this study students could bring to bear not only academic knowledge (e.g., about theories of learning) but also world knowledge and knowledge based on personal experience in reasoning about the texts provided. Indeed, work examining students’ reasoning about topics in education has often found students to rely on lay theories, rather than on academic knowledge in doing so (Fives & Buehl, 2012; Kendeou, Robinson, & McCrudden, 2019; List & Rubenstein, 2019). At the same time, Wolfe and Goldman (2005) in a think-aloud study found students, albeit in middle school, to nevertheless deploy irrelevant, personal associations (e.g., their uncle traveling to Italy, when learning about the fall of the Roman Empire) even when reasoning about a topic in history. This speaks to the domain general role of prior knowledge, of different types (e.g., domain, topic, and world knowledge and personal experience), in supporting (or impeding) comprehension across domains.

While we expected students’ general engagement of relevant prior knowledge to play a facilitative role in learning, in this study, we specifically selected a topic and texts that would speak both to students’ general world knowledge of schooling practices in the United States and to their own personal experiences as learners. We were unsure regarding the specific role such personal experiences would play in the effectiveness of students’ reasoning. We further thought that the controversial nature of this topic might further cause students to consider and reflect on their own opinions and points of view when interacting with the study texts. In this study, we were particularly interested in whether elements of contextualization (e.g., inclusion of prior knowledge or opinion) would manifest in the notes and written responses that students composed when learning based on multiple texts.

Task assignment

With these topic-related affordances in mind, in this study, we randomly assigned students to one of three different task conditions asking them to engage one of three heuristics during multiple text use (i.e., to engage in sourcing, corroboration, or contextualization). In accordance with the task assignments that students received, we examined students’ processing and writing performance for evidence of sourcing, corroboration, and contextualization. Prior work has found task assignment (e.g., asking students to write a summary or an argument) to be a powerful manipulation to improve students’ integration and writing based on multiple texts (Cerdán & Vidal-Abarca, 2008; Gil, Bråten, Vidal-Abarca, & Strømsø, 2010a, 2010b; Wiley & Voss, 1999), with task assignment serving a number of functions, including directing students’ attention to task-relevant information and helping students decide on which strategies to use (McCrudden & Schraw, 2007). Nevertheless, prior work manipulating task assignment within the context of students’ learning from multiple texts has done so by assigning students to produce different types of academic outcomes, with assignments focused on knowledge transforming (e.g., arguments) rather than knowledge telling (e.g., summaries), found to offer benefits for learning (List, Du, & Wang, 2019; Wiley & Voss, 1999). Nevertheless, in this study, in addition to asking students to produce one particular type of performance outcome (i.e., to compose a research report), we also explicitly directed students to engage in a particular type of strategic approach (i.e., sourcing, corroboration, or contextualization) during processing. We did this to direct students’ decision making regarding strategy use during task completion, suggested by Bohn-Gettler and McCrudden (2018) as a promising avenue for encouraging students’ more integrative processing when learning from multiple texts.

Thus we were interested in the extent to which students’ engagement in sourcing, corroboration, and contextualization increased learning from multiple texts. In this study, we assessed learning from multiple texts by examining students’ writing. As per the Documents Model Framework (Britt et al., 1999; Perfetti et al., 1999) and related empirical work in this area (e.g., Wiley & Voss, 1999), we expected better better quality writing to reflect (a) students’ inclusion of substantial information from the study texts, (b) their elaboration or transformation of this information, as a contrast to only directly paraphrasing text-based information, (c) their integration or thematic linking of information across texts, and (d) their attribution of information to source(s) of origin, allowing for verification. We expected students’ assignment to each of the three heuristic conditions examined in this study to specifically target at least one of these aspects of writing quality. For instance, contextualization was expected to support students’ elaboration or explanation of text-based information using prior knowledge. Corroboration was expected to increase students’ integration or thematic linking of information across texts, while sourcing was expected to increase students’ citation use or attribution of information to sources of origin. This was, in addition to, our more general expectation that students who were more engaged during multiple text use (e.g., accessing more texts, for longer periods of time, including more information in their notes) would also demonstrate better writing performance when composing a research report based on multiple texts (Bråten, Brante, & Strømsø, 2018).

We expected students’ engagement of all three heuristics (i.e., sourcing, corroboration, contextualization), jointly, to produce maximum benefits for students’ learning, as has been demonstrated in prior work (Rouet et al., 1997; Wineburg, 1991). Nevertheless, in this study, we directed students to engage in only one type of strategy use per condition. We did this for two related reasons. First, we wanted to analyze and compare the distinctive manifestation of each of these heuristics when students completed a multiple text task, beyond the domain of history. That is, we wanted to establish the extent to which all three of the strategies identified by Wineburg (1991) were comparably effective for students’ multiple text learning or whether some of these strategies seemed to have an outsized effect. Second, we wanted to determine whether brief, process-focused task instructions were sufficient to elicit variable strategic processing on the part of learners. Because of this second goal, we expected the introduction of all three strategies prior to reading to be overly demanding for the students in our sample. Indeed, when the use of all three of these heuristics has been examined in prior work, it has been investigated with relatively expert samples (e.g., professors, graduate students; e.g., Rouet et al., 1997; Wineburg, 1991), rather than with undergraduates, although this has not always been the case (e.g., Nokes et al., 2007).

We had the following research questions:

  1. 1

    To what extent does students’ information use during a multiple text task differ in association with strategy assignment (i.e., asking students to engage in sourcing, corroboration, or contextualization)?

Given prior work on the effects of task assignments on subsequent processing (Gil et al., 2010a; McCrudden & Schraw, 2007; Wiley & Voss, 1999) we expected task assignments to change processing, in association with each condition. That is, we expected students in the sourcing condition to access more document information (e.g., author and publisher information) during reading and to include more sourcing information in their notes and more citations in their written responses. Similarly, we expected students in the corroboration condition to revisit texts as well as to include more cross-textual connections in their notes and written responses. We had less clear expectations regarding the processing of students assigned to the contextualization condition; nevertheless, we expected these students to include more information based on prior knowledge and experience in their notes and in the written responses that they composed.

  1. 2.

    What is the nature of the association between students’ information use during multiple text processing and writing based on multiple texts (i.e., what is the association among the number of texts that students access, the volume of text-based information that students include in their notes, and the volume of text-based information students include in the written responses that they compose)?

Consistent with prior work examining the associations among students’ multiple text processing and writing quality (Anmarkrud et al., 2014; Du & List, 2020; Hagen et al., 2014), we expected all of these indicators to be associated with one another. In particular, we expected students accessing more texts to then include more text-based information in their notes and in the written responses that they composed. This is also consistent with Kobayashi’s (2009) work on the use of external strategies (e.g., note-taking) supporting students’ learning from multiple texts.

  1. 3.

    To what extent do students’ strategy assignment and patterns of information use predict performance on a multiple text task?

Given prior work identifying the association between deep-level strategy use and task performance (Bråten, Anmarkrud, Brandmo, & Strømsø, 2014; Du & List, 2020; List & Alexander, 2020; List, 2020), we expected both task assignment and resulting strategy use (e.g., sourcing, corroboration) to be associated with improved writing based on multiple texts. We did not have a specific hypothesis regarding whether sourcing or corroboration would offer larger benefits for task performance; however, we did expect students assigned to the sourcing and corroboration conditions to outperform students assigned to the contextualization condition. While prior knowledge has been found to play a facilitative role in students’ learning from multiple texts (Barzilai & Strømsø, 2018; Gil et al., 2011a, 2010b; Le Bigot & Rouet, 2007), in this study, we did not specifically expect these prior knowledge-related benefits to outweigh those associated with instructing students to engage in deliberate multiple text processing-related strategy use (e.g., corroboration). This expectation was based on prior research which has found prior knowledge to have not only a facilitative effect for learning, but, at times, to be detrimental as well (Wolfe & Goldman, 2005).

With regard to strategic processing, we expected variables reflective of increased engagement with text-based information to be associated with improved task performance. Specifically, we expected that accessing more texts, for longer periods of time, and recording more information in notes would be associated with improvements in writing quality. We considered this to be the case both because increased information access was likely to facilitate students’ text-based response composition (e.g., allowing more text-based evidence to be included in writing, Du & List, 2020) and because we considered increased access to serve as an indicator of students’ effort, persistence, and deeper-level processing during reading (Bråten, Brante, & Strømsø, 2018; List, Stephens, & Alexander, 2019).

Methods

Participants

Participants were 72 undergraduate students from a large university in the Mid-Western United States (age: M = 19.97, SD = 1.27). Students were majority female (70.83%, n = 51; male: 23.61%, n = 17) and White (75.00%, n = 54). Two students (2.78%) reported their ethnicities as Black or African American. One student (1.39%) reported his or her ethnicity as Hispanic/Latino. Students came from a variety of education-related majors, including elementary education, special education, and speech-language pathology. Four students did not report any demographic information. Fifteen students did not report race/ethnicity.

Procedures

This study consisted of two primary parts. First, participants were asked to complete a variety of individual difference measures (i.e., prior knowledge, interest, and attitudes with regard to gifted education). Second, participants were asked to complete a multiple text task on the topic of gifted education. Specifically, this multiple text task included a research phase and a response phase, during which students were asked to use a library of six digital texts to research the topic of gifted education (i.e., research phase) before writing an argument on the issue (i.e., response phase). Students were assigned to one of three heuristic conditions prior to completing the multiple text task, or asked to engage in sourcing, corroboration, or contextualization.

Individual difference measures

Three individual difference measures were collected in this study (i.e., prior knowledge, interest, and students’ attitudes toward gifted education). While prior knowledge was used as a control in analyses, interest and attitudes are presented descriptively. Therefore, information about how interest and attitudes were assessed is included in Appendix 1. Neither interest (p = .79) nor any of the attitude-related measures (ps > .07) were found to significantly differ across task conditions.

Prior knowledge Prior knowledge was assessed via a term identification measure. Students were asked to provide their definitions for eight terms related to the topic of gifted and talented education. Each term was binary coded as correct or incorrect. Two researchers coded 28 student responses, reflecting 38.89% of participants. Cohen’s kappa inter-rater agreement was .89, with 94.64% exact agreement between raters. The internal consistency of prior knowledge items was Cronbach’s \(\alpha\) = .42; this increased to \(\alpha\) = .51 when one term was excluded.

Multiple text task

The multiple text task had three parts. First, students were assigned to a heuristic condition, or asked to engage in sourcing, corroboration, or contextualization. Then students were asked to research the topic of gifted education using a library of six digital texts as well as to compose a written response on the topic of gifted education. Specifically, students were asked to respond to the prompt: write an argument explaining whether or not your elementary school should have a gifted and talented program.

Experimental assignment Prior to starting the multiple text task, students were randomly assigned to one of the three experimental conditions, representing the three heuristics that Wineburg (1991) identified. Specifically, students in the sourcing condition were instructed: “In researching your argument, please consider where the information you are using comes from and who wrote it.” Students in the corroboration condition were asked to “compare the information sources to determine what they agree or disagree about” while researching. Finally, students in the contextualization condition were asked to “consider how this information relates to your prior knowledge and how it may apply in real-world school contexts.”

Research phase During the research phase, students were provided with a library of six digital texts to use in researching the target prompt, with three texts presenting arguments for gifted and talented education and three texts presenting arguments against it. All texts were intended to be trustworthy in nature (i.e., written in a professional style and attributed to reputable publishers). Texts were drawn from the Room for Debate segment from the New York Times and modified for inclusion in this study: https://www.nytimes.com/roomfordebate/2014/06/03/are-new-york-citys-gifted-classrooms-useful-or-harmful. Texts ranged from 249 to 256 words in length (M = 254.33. SD = 2.43). See Table 1 for information about the texts used in this study.

Table 1 Descriptions of each text

During the research phase of the study, log data of students’ text access were gathered and students were allowed to take notes on the texts provided. Table 2 includes a description of all indicators examined in this study.

Table 2 Descriptions of variables examined

Log data Log data included information on which and how many texts students accessed, whether or not students revisited texts, and the total time that students devoted to text access. Moreover, when students accessed texts in the digital library, only the title and content of a given text was loaded. Students had to access document information about each text (e.g., information about author and publisher) by clicking an additional button labeled: “click here to learn more about this source.” Including this button allowed us to further log whether or not students accessed document information in association with each text visited.

Students’ document information access was used as a metric of sourcing, while text revisits were intended as a metric of corroboration, corresponding to students’ iterative access of texts to compare these to one another (List & Alexander 2018). No log-based measures of contextualization were used.

Notes Students’ notes were analyzed using a four step process. First, notes were segmented into statements, typically including a single distinct idea that was off-set via a bullet or spacing. Then, statements were coded according to their content; including: (a) information about the task instructions, (b) source information (e.g., author, credentials), and (c) text-based content (i.e., number of texts included, number of statements drawn from each text). Inter-rater agreement for the number of statements from each text included in students’ notes was determined by two raters scoring 25 students’ notes (34.72%). Inter-rater agreement was 90.67% (n = 136), with disagreements revised through discussion.

Notes were also holistically examined for their inclusion of evaluations, cross-textual connections, and prior knowledge or opinions (see Table 2). Specifically, evaluations, along with the inclusion of sourcing information in students’ notes more generally, were intended to reflect sourcing. Cross-textual connection formation was taken to reflect corroboration. Finally, the inclusion of statements in notes reflecting prior knowledge or personal opinion was used as an indicator of contextualization. Holistic scoring was done by two raters for all students’ notes; exact agreement ranged from 100% (n = 60) for whether or not students included task instructions in their notes to 70% (n = 42) for the inclusion of prior knowledge. All disagreements were resolved through discussion.

As a final step in coding, information from students’ notes was compared to the information ultimately included in the written responses that students composed. This included considering (a) how many texts, of those accessed and included in students’ notes, were reflected in students’ written responses, (b) how many statements, from which texts, students included in the responses that they composed, and (c) whether the information included in students’ writing followed the same order as information in students’ notes, or whether backtracking (i.e., recursion or the inclusion of notes-based information out of sequence) was evidenced. This backtracking metric, in particular, was considered to reflect students’ more flexible, dynamic, and cross-textually focused information use. See Appendix 2 for a sample notes page.

Response phase Students were asked to write an argument in favor of or in opposition to their school adopting a gifted and talented program. Students’ responses were coded in two ways, analytically and for quality. While the analytic response coding classified each sentence included in students’ writing into one of ten possible categories, the quality score assigned to students’ writing reflected both the degree of argumentative reasoning that students’ writing exhibited and students’ overall writing performance.

Analytic response coding Students’ responses were coded analytically. That is, each sentence identified in students’ writing was first coded as text-based or not, and then further identified according to its purpose (e.g., stating students’ positions for or against gifted and talented education; linking information across two texts). Sentences in students’ writing were chosen as the focal units of analysis as these represented distinct segments of thought, as indicated by students themselves. A total of ten categories were identified, five corresponding to text-based sentences included in students’ written responses and five capturing sentences that were non-text-based. See Table 2 for definitions of each coding category and Table 3 for examples. See Appendix 3 for our coding of a sample student response. Two raters coded each sentence included in 45 students’ written responses (62.50% of the sample). Exact agreement for each sentence coded in target students’ responses was 81.15% (n = 451), out of a total of 556 sentences coded (Cohen’s \(\kappa\) = .79). Disagreements were resolved through discussion. As a final analytic metric, the number of unique citations, referring to any one of the six library texts by title or author, was further totaled.

Table 3 Sample response coding and examples of different types of sentences

Although a variety of analytic metrics, characterizing students’ written responses, were examined in this study, some of metrics were specifically analyzed for evidence of sourcing, corroboration, and contextualization. First, students’ citation use was used as an indicator of sourcing. Inter-textual connection formation was used as an indicator of corroboration. Finally, the number of statements reflecting prior knowledge and personal experience included were used as indicators of contextualization. See Table 2 for a mapping of response indicator to heuristic task conditions. See Table 3 for sample statements and responses.

Quality scores Consistent with prior work (Anmarkrud et al., 2014), students’ written responses were assigned a single score reflecting writing quality. A six point scale was used, reflecting (a) whether students generated a claim regarding the inclusion of gifted and talented education in their schools or not (i.e., assigned a 1); (b) whether students provided a single reason or multiple reasons to support this claim (i.e., assigned scores of 2 and 3, respectively); and (c) whether students acknowledged that there was a conflicting perspective, provided evidence consistent with this conflicting perspective, and refuted, reconciled, or otherwise balanced this conflicting perspective with their own point of view (i.e., assigned scores of 4, 5, and 6, respectively). As such, responses receiving a 6 on our rubric were both more elaborated in support of the inclusions of gifted and talented education in schools, or not, and were two-sided in the arguments these introduced. Scores were assigned in a hierarchical fashion, with higher quality responses including all of the features of their lower-level counterparts. For instance, if students made conflicting claims regarding the quality of gifted education, but did not provide any evidence in support of those claims, this was coded as a 1, rather than as a 4. That is, a score of 1 was applied because, in this example, no supporting evidence was provided, even though students did recognize a conflicting perspective introduced across texts. However, such responses were exceedingly rare. As demonstrated in the sample responses in Table 4, the vast majority of students’ responses were scored as at least a two (i.e., indicating that students were providing at least a single piece of evidence in support of a single claim). The only response scored using this decision rule was assigned a 1 and is included in Table 4. Cohen’s kappa inter-rater agreement was .66, indicating moderate agreement and corresponding to 71.83% exact agreement between raters. All disagreements were resolved through discussion.

Table 4 Sample responses for quality scores

Results

Because of the volume of data collected as a part of this study, results are presented in three parts. First, each data source (i.e., log data, students’ notes, and students’ written responses) is compared across conditions, when students were asked to engage in sourcing, corroboration, or contextualization. Next, descriptives of students’ multiple text use are presented, across conditions. Finally, a series of hierarchical regression models are run predicting response quality based on condition, log data of students’ text access, and information included in students’ notes. All variables analyzed and their correspondence to the three assigned heuristic conditions examined are presented in Table 2. Descriptive information is in Table 5 and information about skewness and kurtosis is in Table 6. Two variables of interest (i.e., the total number of text-based sentences and the number of inter-textual connections that students formed) were found to be non-normally distributed to a considerable extent. Analyses of these should be interpreted with caution, although the analytic procedures used (i.e., F-test) are robust to violations of normality (Blanca et al., 2017).

Table 5 Descriptives of key variables
Table 6 Measures of skewness and kurtosis for each outcome of interest

Research question 1. Multiple text use by condition

Our first research question examined the differences in multiple text use, as captured by log data, notes, and written responses, among students who were assigned to different heuristic conditions, or asked to engage in sourcing, corroboration, or contextualization.

Log data Among the log data indicators examined for this question, we expected document information access to be associated with students’ assignment to the sourcing condition and text revisits to be particularly common among students in the corroboration condition.

Overall One-way ANOVAs were run to examine differences in the number of texts visited and the total time devoted to text access among students assigned to different heuristic conditions. However, none of these results were significant, ps > .59.

Document information access Another one-way ANOVA was run to examine the differences in the number of texts for which students accessed document information in association with heuristic condition. The ANOVA was significant, F(2, 65) = 14.51, p < .001, \(\eta\) 2 = .31. Post-hoc analyses using Tukey’s HSD showed that students who were assigned to the sourcing condition accessed document information about significantly more texts (M = 3.70, SD = 2.38) than students assigned to the corroboration (M = 1.52, SD = 2.14) and contextualization (M = .58, SD = 1.47), ps < .01 conditions.

Revisits A Chi squared test of association was run to examine whether students revisiting texts during access was associated with heuristic condition. However, this association was not significant, p = .54.

All told, only the volume of document information access was found to differ across heuristic conditions and to be more frequent among students assigned to engage in sourcing while learning from multiple texts.

Notes With regard to the information included in students notes, we expected that students assigned to the sourcing condition would include both more document information and more frequent evaluations in their notes. We expected students in the corroboration condition to include more cross-textual connections in their notes and to evidence more backtracking. Finally, we expected students assigned to the contextualization condition to include more statements reflecting their prior knowledge or opinions in their notes.

Overall A Chi squared test of association was run to examine whether students’ inclusion of task instructions in their notes, or not, was associated with heuristic condition. However, this association was not significant, p = .81. Next we examined whether students’ notes across heuristic conditions differed in the amount of document information included, the number of texts featured, and the number of text-based statements recorded. A series of one-way ANOVAs were run. The volume of document information included in students’ notes differed by heuristic condition, F(2, 67) = 19.49, p < .001, \(\eta\) 2 = .37. Post-hoc analyses using Tukey’s HSD showed that students who were assigned to the sourcing condition included significantly more document information in their notes (M = 2.74, SD = 2.51), as compared to students assigned to the corroboration (M = .18, SD = .66) and contextualization (M = .28, SD = .89) conditions, ps < .001. However, across conditions, students’ notes did not differ in the amount of texts that they featured (p = .32) nor in the number of text-based statements in students’ notes (p = .97). Students’ engagement in backtracking was also not found to differ by heuristic condition (p = .38)

Evaluation, cross-textual connection formation, and prior knowledge/opinion inclusion A series of Chi squared tests were run to examine whether the inclusions of evaluations, cross-textual connections, and prior knowledge or opinion in students’ notes differed across heuristic conditions. However, these did not, ps > .58.

Only students’ inclusion of document information in their written notes was found to differ by heuristic condition.

Analytic written response scores A series of one-way ANOVAs were run to determine whether the analytic components of the responses that students composed differed in association with heuristic condition. Citation use was expected to be associated with students’ membership in the sourcing condition. We expected the number of inter-textual connections in students’ writing to be associated with their assignment to the corroboration task condition. Further, we expected students’ inclusion of prior knowledge and personal experiences in their written responses to be associated with assignment to the contextualization condition.

The number of text-based and non-text-based sentences, of various types, included in students’ written responses were not found to differ across task conditions, ps > .21. Still, the proportion of different types of sentences included in students' written responses, by condition, is presented in Figure 1. A Chi squared test of association was then run to examine whether the formation of cross-textual connections or not was associated with heuristic condition. However, this association was not found to be significant, p = .10. A one-way ANOVA was run to examine the differences in the number of citations included among students in different heuristic conditions. However, these differences were again not significant, p = .16.

Fig. 1
figure 1

Proportions of different types of sentences in students’ written responses across task conditions

Written response quality A final one-way ANOVA was run to determine whether students’ overall writing quality differed by task condition. However, no significant differences were found, p = .98.

No measures of writing quality were found to differ across task conditions.

Research question 2. Students’ process of multiple text use

Our second research question examined (a) log data of students’ text access, (b) students’ notes, and (c) written responses, composed in response to a multiple text task. Given the limited differences across heuristic conditions found in Research Question 1, analyses for Research Question 2 are collapsed across conditions. Descriptive information from these data sources is presented in Table 5. Information about students’ accessing and use of specific texts is included in Table 7. We further examined the associations among these. Due to concerns over length, only significant results are presented.

Table 7 Number of texts accessed, recorded in notes and reflected in students’ written responses

Associations among log data of students’ text access and information in students’ notes First, students were found to take notes on the majority of the texts that they accessed (81.94%, SD = .74). The number of text accessed was significantly associated with both the number of texts included in students’ notes, r(66) = .53, p < .001, and the total number of statements included, r(66) = .41, p < .001. As such, students accessing more information during multiple text use also included more of this information in their notes.

Second, including document information in notes was associated with the number of texts for which document information was accessed, r(66) = .71, p < .001. In particular, among students accessing any document information during navigation (45.83%, n = 33), an average of 52.19% (SD = .47) texts were then tagged with document information in students’ notes. That is, students accessing more document information during reading also included more of this information in their notes.

Associations among log data of text access and students’ analytic response scores The total number of texts that students accessed was significantly associated with the number of text-based sentences included in students’ written responses, r(68) = .30, p < .05. The number of citations included in students’ written responses was associated with log data of students’ document information access, r(68) = .29, p < .05. When students accessed more texts, this resulted in their including more text-based information in the written responses that they composed. Similarly, accessing more document information contributed to students’ citation use.

Associations among log data of text access and the quality of students’ written responses The total number of texts that students accessed was significantly associated with the quality of their written responses, r(68) = .30, p < .05. Accessing more texts was associated with better response quality.

Associations among notes and analytic written response scores Students’ written responses were further examined in association with the notes that students composed during the research phase of the study.

Document information in notes and written responses Students’ citation use was found to be significantly associated with their inclusion of source information in their notes, r(69) = .46, p < .001.

Texts reflected in students’ notes and written responses Overall, students included information from an average of 2.61 (SD = 1.36) texts in the written responses that they composed. The number of texts featured in students’ written responses, even if these were not explicitly cited, was significantly associated with the number of texts included in students’ notes, r(69) = .58, p < .001. In particular, students included an average of 70.45% (SD = .43) of the texts featured in their notes in the written responses that they composed. As such, students taking notes on more information then included this information in the written responses that they composed.

Content reflected in students’ notes and written responses Students included an average of 10.14 (SD = 8.02) text-based statements in their notes and an average of 4.59 (SD = 3.56) notes-based statements in their written responses. The number of text-based statements in students’ notes and notes-based statements in their written responses were significantly associated with one another, r(69) = .49, p < .001. This corresponded to students including an average of 54.70% of the information from their notes in the written responses that they composed (SD = .47). For us, this finding reflected students’ transference and culling of information from texts to their notes and from their notes to their writing.

Backtracking Backtracking is the non-linear use of information in notes. On average, students engaged in 1.11 (SD = 1.19) instances of backtracking during response composition. However, 38.89% (n = 28) of students only used their notes in a linear fashion. Students’ backtracking was significantly associated with the total number of text-based statement included in students’ notes, r(69) = .41, p < .001, as well as with the number of text-based statements, r(70) = .57, p < .001, and notes-based statements, r(70) = .73, p < .001, included in students’ written responses. Backtracking, as an indicator of more dynamic response composition, was associated with both the volume of information included in students’ notes as well as with the volume of text-based and notes-based information reflected in students’ writing.

Associations among students’ notes and the quality of students’ written responses Only the total number of texts included in students’ notes was associated with response quality, r(69) = .29, p < .05.

Associations among students’ analytic scores and written response quality Figure 2 shows the distribution of different types of statements in each student's written response. Students’ writing quality scores were significantly associated with the total number of sentences included in students’ responses [r(71) = .46, p < .001], the number of text-based sentences included [r(71) = .36, p < .01], the number of non-text-based sentences included [r(71) = .31, p < .01], as well as with the number of inter-textual connections formed during writing [r(71) = .27, p < .05]. The scores for writing quality assigned to students’ responses were significantly associated with the volume of information that students included (i.e., both text-based and non-text based) and with this information being integrated or linked across texts.

Fig. 2
figure 2

Representation of proportions of different types of sentences in each students’ written response

Research question 3. Predicting multiple text task performance

Our final research question examined the extent to which heuristic condition, log-data indicators of multiple text access, and metrics of text-based content in students’ notes could predict analytic written response components and overall writing quality. In particular, five multiple regression models were run, predicting the total number of statements, the number of text-based statements, the number of non-text-based statements, the number of cross-textual connections, and the number of citations included in students’ writing, respectively. Citation use and cross-textual connection formation were of particular interest to us in this study as these have been considered to be key indicators of successful multiple text integration in prior work (Britt & Aglinskas, 2002; List, Du, Wang, & Lee, 2019; Wiley & Voss, 1999). In this study, citation use was expected to be associated with students’ assignment to the sourcing condition. Cross-textual connection formation was expected to be associated with students’ membership in the corroboration task condition. And students’ inclusion of more non-text-based statements, overall, was expected to be predicted by their membership in the contextualization condition. A final, sixth, regression model was run predicting students’ overall writing quality.

For each regression model, prior knowledge was controlled for in Step 1, task condition was entered as a predictor in Step 2, with the sourcing condition coded as the referent group; log data indicators (i.e., total number of texts accessed, amount of document information accessed) were entered at Step 3; and metrics of students’ notes (i.e., number of statements in notes, volume of document information in notes, number of cross-textual connections, amount of back-tracking) were entered at Step 4. In selecting indicators, we were interested in choosing both those metrics that reflected the volume of information that students accessed and recorded (e.g., number of texts accessed; number of text-based statements in notes) and the quality of students’ information access and note-taking (e.g., document information access; cross-textual connections). Correlations among key variables are in Table 8.

Table 8 Correlations among key variables

After running each model, a residuals plot was examined to account for violations of normality and homoscedasticity. Three of the dependent variables examined (i.e., the number of inter-textual connections and citations in students’ written responses, students’ writing quality scores) were found to have non-homoscedastic residuals. To account for this, these three outcomes were predicted using weighed least squares regression. Multicollinearity statistics (i.e., tolerance and VIF) were inspected, with no problems identified (tolerance > .31, VIF < 3.20).

Total number of sentences A multiple regression model was run to predict the total number of sentences included in students’ written products. The final model was significant, F(9, 56) = 2.62, p < .05, \(R_{adj}^{2}\) = .18. Significant predictors in the model included the number of cross-textual connections (β = .25, p < .05) and backtracks (β = .33, p < .05) evidenced in students’ notes (See Table 9).

Table 9 Regression model predicting number of sentences in students’ responses

Total number of text-based statements A second regression model was run to predict the total number of text-based sentences (i.e., including text-direct statements, elaborations, transformations, intra-textual connections, and inter-textual connections) included in students’ writing. The overall model was significant, F(9, 56) = 6.21, p < .001, \(R_{adj}^{2}\) = .42. The number of texts accessed (β = .29, p < .05) as well as the volume of document information included in students’ notes (β = .36, p < .05), the number of cross-textual connections (β = .23, p < .05), and the number of backtracks (β = .53, p < .001) were all individually significant predictors in the model (See Table 10).

Table 10 Regression model predicting number of text-based sentences

Total number of non-text-based statements A third multiple regression model was run to predict the total number of non-text based sentences featured in students’ written responses. However, the overall model was not significant, p = .73.

Number of cross-textual statements A fourth, weighted multiple regression model was run to predict the number of cross-textual connections that students included in the written responses that they composed. The overall model was significant, F(9, 56) = 4.37, p < .001, \(R_{adj}^{2}\) = .32. However, students’ backtracking through notes was the only individually significant predictor in the model (β = .61, p < .001) (See Table 11).

Table 11 Weighted least squares regression model predicting the number of cross-textual statements

Citations A weighted multiple regression model was further run to predict the number of unique citations included in students’ written responses. The overall model was significant, F(9, 56) = 4.01, p < .001, \(R_{adj}^{2}\) = .29. Prior knowledge (β = .39, p < .05), the volume of document information included in students’ notes (β = .57, p < .001), and the number of back-tracks (β = .30, p < .05) were found to be individually significant predictors in the model (see Table 12).

Table 12 Weighted least squares regression model predicting number of citations

Scores for writing quality A final, weighted multiple regression model was run to predict students’ writing quality scores. This model was significant, overall, F(9, 56) = 5.69, p < .001, \(R_{adj}^{2}\) = .39. The number of texts that students accessed (β = .38, p < .01) and the number of cross-textual connections included in students’ notes (β = .71, p < .001) were found to be individually significant predictors in the model (see Table 13).

Table 13 Weighted least squares regression model predicting students’ writing quality scores

Discussion

This study was carried out with both a primary and secondary purpose in mind. The primary purpose of this study was to determine whether three multiple text use strategies (i.e., sourcing, corroboration, and contextualization), commonly emphasized in the domain of history, would support students’ multiple text use in another social scientific domain (i.e., education). Drawing on prior work using task instructions as a means of inculcating strategy use among learners (Anmarkrud, McCrudden, Bråten, & Strømsø, 2013; Wiley & Voss, 1999), in this study, we asked students to engage in sourcing, corroboration, or contextualization, and examined the extent to which this was associated with differences in multiple text processing and task performance. In particular, we captured students’ processing both through log data of their multiple text access and through the notes that they recorded while researching the topic of gifted and talented education. We captured task performance by analyzing the written responses that students composed based on their accessing of multiple texts.

Our secondary goal in this study was to track students’ information use throughout the course of multiple text task completion. This included examining log data of students’ multiple text access, the notes that students recorded, and the information included in the written responses that they composed. In particular, we adopted an analytic approach, tracing each statement included in students’ written responses to either their prior knowledge or to their notes and to the texts they accessed in the research phase of the study. We unite these dual purposes by examining whether information use, as captured via log data, students’ notes, and written responses differed in association with assigned heuristic condition. In this discussion, we first overview the extent to which students’ strategy use and multiple text task performance differed when students were asked to engage in sourcing, corroboration, or contextualization. Then we consider, more generally, the nature of students’ information use throughout the course of multiple text task completion. We conclude with a discussion of the implications and limitations associated with this study.

Multiple text processing and performance across heuristic conditions

In this study, we prompted students’ use of three strategies found to characterize the nature of expert multiple text use in the domain of history (i.e., sourcing, corroboration, and contextualization; Wineburg, 1991). In particular, we were interested in whether the use of these strategies would facilitate the multiple text use of novices outside of the domain of history, in the field of education. We wanted to interrogate the assumption that strategies like comparing and contrasting information across texts (i.e., corroboration) and applying prior knowledge to understand information in texts (i.e., contextualization) would be facilitative of learning, beyond history. Findings from this study suggest that while prompting students to engage in sourcing, or the consideration of texts’ document information, like author or publisher, resulted in differences in multiple text processing, directing students to engage in corroboration or contextualization did not produce any noteworthy results. Indeed, students in the sourcing condition were found to access more document information during multiple text use and to include more document information in their notes than students assigned to the corroboration and contextualization heuristic conditions. However, no other differences in multiple text access or note-taking emerged. These results are consistent with Nokes et al. (2007) who found sourcing to be the strategy that high school students most commonly used, as compared to corroboration or contextualization, after receiving an intervention intended to foster historical reasoning. For instance, while 70% of students were found to engage in sourcing, only 7% of students used contextualization.

These sourcing-centered results can be explained in one of three primary ways. First, it may be the case that while sourcing is a strategy that can be engaged through the provision of directed task instructions, the other two heuristics (i.e., corroboration and contextualization), cannot be elicited only by asking students to use these during processing. Rather students may need more specific instruction in how to engage in corroboration and contextualization to be able to effectively use these during processing. Such instruction may need to be substantial as Nokes et al. (2007) found no improvement in high-school students’ use of sourcing, corroboration, and contextualization after completing a 15-lesson intervention. At the same time, some prior work has found promise in developing interventions for sourcing, as a principal strategy for improving students’ learning from multiple texts (e.g., Britt & Aglinskas, 2002; Branten & Strømsø, 2009; Pérez et al., 2018; Stadtler & Bromme, 2007).

Second, sourcing may be more overt as a strategy, than corroboration or contextualization. In this study, there were a number of indicators readily available to capture the nature of students’ sourcing during both processing (i.e., accessing document information, recording document information in notes) and performance (i.e., citation use). But such indicators were less plentiful and less effective in capturing students’ engagement in corroboration and contextualization. For instance, in this study, we considered students’ revisiting of texts during the research phase of the study to reflect corroboration. While this interpretation is consistent with prior work (Goldman et al., 2012; List et al., 2019; List & Alexander, 2017; Wiley et al., 2009), students could certainly have revisited texts without engaging in corroboration or could have corroborated information across texts, using only their notes, without revisiting any of the texts they accessed. Likewise, we examined the inclusion of prior knowledge or personal opinion in students’ notes as an indicator of contextualization; at the same time, students may have brought their prior knowledge to bear on the information they accessed (i.e., engaging in contextualization), without necessarily recording any of this processing in their notes. As such, it is important to recognize that all three of the strategies examined in this study are primarily cognitive in nature, making capturing these during the course of multiple text processing and performance a methodological challenge, potentially reducing the associations between heuristic condition and processing indicators identified in this study.

Finally, it may also be the case that while sourcing is a strategy that is more universally applied across domains, corroboration and contextualization are strategies that function in a more domain-specific fashion. Certainly, across domains, there is a need to corroborate or compare and contrast information across texts. At the same time, the corroboration of historical accounts may be somewhat distinct from juxtaposing arguments for and against a particular issue, that are marshalling diverse evidence in support of conflicting claims. Likewise, contextualization, or the application of topic, domain, and world knowledge to understanding information presented across texts (Rouet et al., 1997), may function differently in the history domain vis-à-vis in domains where more general, world knowledge or personal experience may apply. Additional research is also needed to identify those strategies, beyond sourcing, corroboration, and contextualization, that may be central to effective learning from multiple texts in non-history domains. A key limitation of this study was that it did not include a control group. Therefore, we are unable to determine whether students assigned to different strategy conditions performed better than students would have otherwise (i.e., if not prompted to engage in strategy use at all). Rather, we can only compare the relative effectiveness of instructions to engage in sourcing, corroboration, or contextualization. Further, it may be the case that we did not see differences across strategy conditions because students already use these strategies to a considerable extent (i.e., and do not need further prompting to do so). We nevertheless consider this latter explanation to be unlikely given the limited evidence of sourcing, corroboration, and contextualization demonstrated in students’ notes and other measures of multiple text processing and considering prior work (e.g., Anmarkrud et al., 2014; List & Alexander, 2020).

Despite sourcing being associated with differences in students’ processing of multiple texts, to the exclusion of corroboration and contextualization, the limitations of this strategy for multiple text task performance ought be recognized as well. That is, while sourcing was associated with differences in students’ multiple text access and processing (e.g., document information accessed; document information included in students’ notes), it was not ultimately associated with differences in multiple text task performance, as captured via both analytic and quality-related written response scores. This may be the case for a variety of reasons. First, unlike prior work which has used task instructions as a means of directing students to produce some type of outcome (e.g., an argument or a research report, Wiley & Voss, 1999), in this study we asked students to engage in a mode of processing (i.e., to engage in sourcing, corroboration, or contextualization) during multiple text use. This focus on processing may have carried with it some limitations. That is, unlike simply assigning students to a task condition, for which students may have a robust schema (e.g., students know what writing a summary entails, List et al., 2019), students may not have had the declarative, procedural, and conditional knowledge necessary to effectively engage any of the strategies that they were assigned to use (Paris, Lipson, & Wixson, 1983). Indeed, extensive work on strategy instruction-related interventions suggests that task instructions alone may not be sufficient to engage students in integrative processing when learning from multiple texts (Macedo-Rouet, Braasch, Britt, & Rouet, 2013; Martínez, Mateos, Martín, Rijlaarsdam, 2015). In this study, only brief prompts were used to elicit students’ strategy use during reading. Although these were intentionally written in a manner that would be actionable for students, these brief prompts may not have been sufficiently elaborated to change the nature of students’ processing. Future work should consider using richer strategy inductions to improve students’ processing during multiple text task completion. Certainly, this is suggested by results from regression analysis and should be explored in further work. That is, students instructed to engage in sourcing did attend more to document information during processing. At the same time, assignment to the sourcing condition did not predict analytic writing performance, while the accessing of document information did, after heuristic condition was controlled for. This suggests that strategy assignment may result in differential strategy use (i.e., attendance to document information during processing), with that strategy use then manifesting in differences in task performance (i.e., the number of text-based statements and citations included in students’ responses).

Third, it may be that directions intended to foster any type of strategy use during multiple text access only produce a superficial, rather than deep-level, type of engagement. That is, instructions asking students to engage in sourcing did result in students accessing and making note of document information during text access but was not associated with indicators of evaluation in students’ notes nor with cross-textual connection formation or citation use in the written responses that students composed. This may mean that while sourcing instructions do prompt students’ attendance to document information, more metacognitive volition or deliberate strategic engagement is needed for students to engage in deeper level multiple text evaluation and integration. More generally, the nature of students’ writing based on multiple texts may be such a complex and multidimensional process that it requires a variety of strategies, beyond sourcing, to improve (Anmarkrud et al., 2014; Mateos, Solé, Martin, Cuevas, Miras, & Castells, 2014; Spivey, 1990). To the extent that providing task instructions to elicit the whole range of strategies that students may need to compose a quality written response is not a feasible option, more research is needed on how to encourage strategic writing on the part of learners. For insight into the types of strategies that may be required, we next examine the nature of students’ information use throughout the course of writing based on multiple texts.

Information use throughout text access, note-taking, and response composition

In addition to examining processing across heuristic conditions, we were further interested in examining students’ information use, more broadly, through the course of text access, note-taking, and written response composition. In the broader literature on learning from multiple texts, a variety of processes associated with information use have been identified (List & Alexander, 2019; Rouet & Britt, 2011). These include information selection, processing, evaluation, and integration. These processes were likewise manifest in this study, to varying extents. In particular, students’ engagement in two key processes (i.e., selection and integration) were notable in this study, one for its relative prevalence and the other for its notable absence.

Analytically tracking students’ information use throughout text access, note taking, and response composition revealed that these were primarily guided by a process of information selection or filtration. We use the term information selection here to refer to students’ deliberate decisions regarding which information in texts is important or relevant for task completion and which information is not. Such selection includes both the cognitive appraisal of information as relevant or not and the behavioral act of note-taking, to record that information that is deemed to be relevant or important. In this study, students selected a subset of information from texts to include in their notes and further filtered that information to determine which content to ultimately include in their written responses. Indeed, while metrics of text access, information included in students’ notes, and text-based information included in students’ written responses were all associated with one another, as demonstrated in Research Question 2, it was also the case that there was less information included in students’ notes than was available in the texts provided, and less text-based information in students’ written responses than was included in their notes. As such, multiple text processing and written response composition can be thought of as tasks requiring the progressive reducing of information (see Table 7). Although it is difficult to determine how much information is sufficient for students to include in their notes or in the written responses that they compose, students did seem to be at least adept in selecting among the volume of text-based information available.

At the same time, while selection seemed to be a robust and well-developed competency for learners, integration was found to be considerably less so. A variety of indicators, capturing students’ cross-textual connection formation during both multiple text processing and response composition were examined in this study. These included students revisiting texts during multiple text access, formulating cross-textual connections in their notes, and including instances of integration in the written responses that they composed. Across these various indicators, students’ cross-textual connection formation, associated with corroboration, was found to be quite limited in nature, despite its association with students’ quality of writing performance. Similar findings have been found in prior work documenting the challenges that students experience with cross-textual integration (Anmarkrud et al., 2014; Barzilai, Zohar, & Mor-Hagani, 2018; List, Du, Wang, & Lee, 2019; Wiley & Voss, 1999). This study both provides further evidence of such challenges and suggests that they cannot be ameliorated with task instructions alone (e.g., asking students to engage in corroboration). As such, developing more robust interventions to foster cross-textual integration, in addition to other strategies related to learning from multiple texts, represents an important direction for future work (Barzilai et al., 2018).

Further, more work is needed to consider which specific processing strategies may be most associated with students’ performance under various task conditions. In this study, students were asked to write an argument based on multiple texts. The outcomes measures used in this study pertained most to the multiple texts aspect of this assignment. That is, by analytically coding each sentence included in students’ writing, we specifically focused on students’ use of text-based information, their transformation of this information or its elaboration using prior knowledge, and, most of all, on students’ cross-textual information integration. In other words, the dependent variables examined in this study carried with them the tacit assumption that both more information (i.e., more sentences) and more text-based and cross-textual information, in particular, were reflective of better response quality. Certainly this is consistent with models of learning from multiple texts which identify both the comprehensive coverage of key issues about complex topics and the integrative understanding of these issues, as they are described across texts, to be key facets of effective learning from multiple texts (Britt et al., 1999; Du & List, 2020; List, Du, Wang, & Lee, 2019; Perfetti et al., 1999). This is further consistent with work suggesting that, in part, students’ provision of evidence (e.g., the inclusion of more text based information) in their writing is associated with better argument writing performance (Iordanou & Constantinou, 2014; Zembal-Saul, Munford, Crawford, Friedrichsen, & Land, 2002). Indeed, in the present study students’ writing quality scores were predicted both by the number of texts that students initially accessed (i.e., information volume) and the number of cross-textual connections that students included in their notes. Nevertheless, additional aspects of argument writing (e.g., forming a claim, developing counter-arguments) were not examined as reflected in students’ multiple text access or note-taking. As such, considering which argument-construction related processing variables are associated with writing performance, and, indeed, what these variables even may be, should doubtlessly be considered in future research.

Limitations

In addition to these major discussion points, a number of limitations associated with the study must also be acknowledged. First, the fundamental design of this study may have limited its applicability to the real world. While students in this study were deliberately assigned to engage in only one strategic approach when learning from multiple texts, in reality sourcing, corroboration, contextualization, alongside other strategy use, are all needed to foster students’ learning from multiple texts. As such, students’ assignment to only a singular heuristic task condition presents a major limitation in this study. Moreover, inherent in definitions of strategic processing and strategy use is the notion that such processing should be deliberate, goal-directed, and metacognitively directed on the part of learners (Paris et al., 1983). It remains unclear the extent to which assigning students to adopt a particular type of heuristic processing during task completion adequately simulates experts’ use of sourcing, corroboration, or contextualization, when they decide to engage such strategies during processing. This methodological limitation is compounded by the fact that we did not include a control group in this study, thereby limiting our understanding of what students may do by default when presented with a multiple text task, such as the one used in this study. Moreover, the specific task instructions that students received may not have been sufficient to elicit the depth of strategy engagement that we intended. For instance, directing students to engage in sourcing may have resulted in them only attending to source, rather than in considering source in their interpretations of information. Likewise, asking students to engage their prior knowledge and experiences, in the contextualization condition, may have over-directed students to rely on personal experience, rather than directing students to contextualize information using their academic knowledge of effective educational practice.

Second, the sample size used in this study was smaller than was likely needed. For instance, G*Power suggests that to conduct an ANOVA with three groups, and with \(\alpha\) = .05 and \(\beta\) = .80, by convention, a sample of 159 students is needed to identify a medium effect (f = .25). Under these same conditions (\(\alpha\) = .05, \(\beta\) = .80), we were only able to identify large effects (f > .40) for Research Question 1, with identifying these only requiring a sample of 66 students or more. Likewise, for a multiple linear regression, with 9 predictors (\(\alpha\) = .05, \(\beta\) = .80), a sample size of 114 students would be been needed to be able to identify medium-sized effects (f2 = .15, R2 > .13). At the same time, we were successfully able to identify large effects, for Research Question 3, for which only a sample of 54 students, or more, was needed (f2 = .35, R2 > .26).

Third, a methodological challenge arising in this study was associated with our desire to assess students’ sourcing, corroboration, and contextualization, with these ultimately constituting cognitive strategies, unable to be directly observed. The fundamentally cognitive nature of these heuristics may have resulted in the limited associations between task condition and indicators of strategy use identified in this study. Nevertheless, the future use of data-intensive methods, including eye-tracking (e.g., to trace students’ gaze-paths across texts) and think-alouds, to supplement analyses carried out in this study, may prove to be fruitful ways of capturing sourcing, corroboration, and contextualization. We can also draw on other research paradigms to better explore students’ strategy use during processing. For instance, some studies have used experimental designs to introduce students to foregrounding information (or not) thereby manipulating the potential for contextualization (Kurby, Britt, & Magliano, 2005; Maier & Richter, 2013). Students can also be asked to report their prior knowledge activation or engagement in corroboration, each time this occurs, similar to how Stadtler and Bromme (2007) asked students to rate the quality of their comprehension during reading. Still, investigating the extent to which students engage prior knowledge, of various types, when learning from multiple texts, and in sourcing and corroboration more broadly, remain as directions for future work.

Conclusion

In this study we sought to evaluate the effectiveness of three multiple text use strategies (i.e., sourcing, corroboration, and contextualization), previously identified based on expert studies in history (Wineburg, 1991). We sought to elicit the use of these strategies by providing students with process-focused task instructions, asking them to engage in sourcing, corroboration, or contextualization. While we found evidence for students’ use of sourcing during multiple text access and as reflected in students’ notes, use of corroboration and contextualization seemed to be more limited. At the same time, we found evidence for students’ progressively more filtered information selection as students moved from gathering information during reading, to recording text-based information in their notes, to including this information in the written responses that they composed. Moreover, we found some students to engage in integration and cross-textual relation formation, albeit to a limited extent. This was reflected in students drawing connections among texts in their notes, backtracking through notes or using notes dynamically during writing, and formulating integrative statements in the written responses that they composed. Although such integrative strategy use was not found to be facilitated by any of the three heuristic conditions to which students were assigned, this type of processing represents a key target for future intervention.