Examining trace data to explore self-regulated learning

Theories of self-regulated learning (SRL) emphasize fine-grained, dynamic adaptation but few instruments satisfactorily capture such data (Pintrich et al. 2000; Winne et al. 2002; Winne and Perry 2000). An abundance of self-report instruments capture information about relatively stable propensities to engage in SRL and studying activities but there are no instruments that capture strategic adaptations and developments students make within and across studying sessions (Winne and Perry 2000). Even if dynamic data could be gathered, there is little guidance about how to analyze this data to research metacognition and SRL in terms of how learners use tactics and how they monitor usage at multiple points in time during common tasks embedded in authentic contexts. Consequently, important questions remain unanswered in research on SRL. What do students do to engage with content? How do they select, monitor and adapt tactics for learning? How do they pattern tactics that comprise strategies for studying? How do students strategically allocate time across alternate studying tactics?

Examining SRL as a process developing in sophistication across multiple episodes poses methodological challenges for researchers. To date we know very little about how SRL develops while students study, or how they adapt studying across studying episodes (Hofer et al. 1998). While self-report data provide invaluable information about learners’ perceptions of learning, they do not measure how students actually employ studying tactics (Winne and Jamieson-Noel 2002), or how tactics are strategically adapted to specific learning contexts (Hadwin et al. 2001). Moreover, current self-report protocols reveal almost nothing about how learners adapt tactics and interweave them to form an efficient strategy. Research needs to augment self-reports of SRL with fine-grained traces of actual student actions as they study. In addition, research should examine relationships among actual actions and learners’ reflections, self-evaluations, and self-perceptions of learning activities.

Trace data has not been central in research about self-regulated learning, but it does have a strong history in an array of other disciplines. The literature describes trace data as audit trails, dribble files, log files, navigation trails, event recordings, and event traces. Uses of log file data include usability testing in computer science and human computer interaction, studying aggregated patterns of engagement in library sciences, data mining in computer science, and profiling web-user statistics. The diverse nature and use of log file data introduces challenges for analysis and interpretation. In the Learning Kit project we have developed new computer technologies that unobtrusively collect detailed information about students’ studying actions by logging the time and context of every learning event. Traces recorded in gStudy are artifacts of tactics and strategies in a log of fine-grained, temporally identified data that can advance research about how learners go about learning.

The purpose of this exploratory case study was to examine in depth the studying activities of eight students. This study (Yin 2003) also examined ways trace data might be used to add depth to our understandings of student self-regulated learning. Consistent with exploratory case study methodologies, findings are not intended to be generalized to a population, but rather to inform theory and analysis regarding SRL (Yin 2003). Following Hadwin and Leard (2001) and Leard and Hadwin (2001), we analyzed log file data in four ways to construct profiles of self-regulated learning activity across participants: (a) frequency of studying events, (b) patterns of studying activity, (c) timing and sequencing of events, and (d) analysis of the content of student notes and summaries.

Methods

Research context

Computer based tools for investigating SRL

gStudy

(Winne et al. 2006a) is a cross-platform software tool for researching learning. Content (styled and hyperlinked text, graphics, and audio and video clips) is assembled into learning kits displayed in a web browser. Researchers can manipulate a kit’s elements to operationalize experimental variables corresponding to research hypotheses (see Winne et al. 2006b). When students use gStudy, they can examine and annotate a kit’s content using a number of tools, including making notes based on a choice of schemas, classifying information by properties, constructing new glossary entries, and making links that assemble information within and across elements of the content. We describe three major tools used in this study: (1) Quicknotes refer to labels/annotations students assign to a segment of highlighted text. They include things like “important,” “disagree,” “don’t understand.” Students can choose a quicknote from a drop down list, or create their own quick note to use during studying. (2) Glossary notes are note templates designed to support the creation of a concept, term, or name definition. They provide fields for stating the term, defining it, and giving and example. (3) Notes include a series of templates for creating categories of notes. For example, a debate note provides fields for stating position 1, evidence for position 1, position 2, evidence for position 2, and choosing a position.

Trace data

Trace data or log data are precise time-stamped records of everything a student does in gStudy. Events that modify the state of the learning kit by adding new information or permanently modifying information are model events. Events that modify how a student views content, such as scrolling or clicking on a menu, are view events. The logging system records events and writes the data to an XML file. Log file analysis is performed on recorded XML files using LogAnalyzer (Hadwin et al. 2005). LogAnalyzer was used to produce frequency counts, transition statistics, transition graphs, and time-event-position graphs.

Context

The study took place in an Introductory Educational Psychology course lectured by the second author with a tutorial led by teaching assistants. The context for this study was a reflection assignment worth five percent of the final grade. In the course, students were graded on a two-page report in which each student summarized: (a) studying activities and how they used gStudy tools, (b) profiles based on subscale scores from the Motivated Strategies for Learning Questionnaire (MSLQ; Pintrich et al. 1991) that all students had taken in week 3 of the course, and (c) reflections about the relationship between their MSLQ profile and studying behaviors while using gStudy. In addition to completing a graded reflection assignment based upon their studying, students were responsible for knowing chapter materials for the final exam, therefore we assume students took this studying task as seriously as they would studying any other text for the course.

Procedures

Students completed the MSLQ in week three of the course. In week eight during class time in a computer lab participants received a minimum of 50 minutes of training to use the software. Approximately one week later, during class time, they studied one chapter from their course textbook using gStudy’s tools for approximately 2 hours over 2 studying sessions.

Participants

Based on cluster analysis of data described later, eight volunteer participants were selected from a sample of 188 volunteer undergraduate students.

Results

Purposeful sampling through cluster analysis

We selected only 10 MSLQ items that corresponded to things that could actually be traced in gStudy. We conducted a cluster analysis of students’ self report responses to the 10 selected MSLQ items, each one representing a traceable studying event, to identify 8 students for close examination who were representative of four different self-report clusters (namely, low summarizer, low questioner, medium summarizer, high summarizer). A four-cluster solution presented a relatively even distribution of cases over clusters, allowing for purposefully sampling cases that were similar to one another, yet different from participants in other clusters with respect to their self report. A proximity matrix returned by the cluster analysis was used to identify two typical cases statistically closest to the centre of each cluster. Table 1 includes scores of each of these cases on the 10 selected MSLQ items and median scores for each cluster.

Table 1 Number of participants (N) and median scores of four clusters on 10 selected MSLQ items

Frequency of studying activities

Frequencies were computed for traced actions in gStudy that matched each of the 10 MSLQ items selected from the MSLQ. For each of the eight case study participants, Table 2 presents: (a) a list of traceable model object events (column 1), (b) the raw (row 1) proportional frequency (row 2) of that collection of model object events with respect to the total number of events for that participant (c) MSLQ self-reported scores (row 3) summarized as High (H-a raw score of 7, 6, or 5), Medium (M-a raw score of 4), or Low (L-a raw score of 1, 2, or 3), and (d) the corresponding MSLQ self-report item (last column). When an event did not occur, proportional frequency is listed as “–”. MSLQ responses are reported as High, Moderate, or Low to facilitate easier interpretation since self-report and event data are on different scales. Traceable model object events were conceptually matched to specific MSLQ items. For example, one MSLQ item (Q63) referred to “outline concepts.” In gStudy there are two primary tools for outline concepts and terms involves creating or editing a glossary entry, and creating a quicknote that identifies the segment of text as a principle. Similarly, “summarize main ideas” can be accomplished by creating a new note or creating a new glossary. We attempted to as inclusive as possible in identifying gStudy actions that corresponded with each MSLQ item. Therefore, if a student created a new glossary item, it was counted as a traceable action for both “outline concepts” (Q63), and “summarize main idea” (Q67).

Table 2 Contrast of raw counts and proportional event data from log files (rows 1 & 2) and self-reports (L, M, H) per MSLQ item

Frequency counts and proportional frequencies provide important information about metacognitive control expressed in terms of tactics students engage while studying versus what they describe as their general approach to studying in answering an MSLQ item.

Prominence of studying activities

For all but one of our eight cases (P2 from cluster 1), the highest frequency and proportion of studying events focused on identifying important information (M = .58, SD = .14). The next most common studying event was making lists of important questions and memorizing lists of items (M = .35, SD = .27) followed by and remembering and summarizing main ideas (M = .19, SD = .15). However, the latter two categories varied greatly across cases.

Trends in the calibration of self-reports and studying events

We defined calibration as occurring when participants’ self-reports of a particular studying activity match the proportional frequency of that studying behavior as recorded in the log file traces of actual studying activities. In this sense, calibration is a measure of the extent to which learners study as they say they do (based on responses to the MSLQ). In Table 2, high calibration is marked with an asterisk. Calibration between self-reports and actual studying events was best for identifying important information (item 42 on the MSLQ), for which six of eight students were well calibrated. This was also the most frequently occurring event across all participants. The second most highly calibrated studying activity was summarizing important information (item 67 on the MSLQ) for which five of the eight participants were well calibrated.

Calibration between self-reports and events for each cluster and case participant

Comparisons of participants across clusters of self-reported studying methods indicate differences in calibration. Participants in cluster 2 (P3 and P4) demonstrated the best calibration between studying events and self-report responses. Participants in cluster 1 (P1 and P2) were also well calibrated on three classes of studying activities, followed by two participants from cluster 4 (P7 and P8) who were well calibrated on two (P7) and three items (P8), respectively. Participants in cluster 3 (P5 and P6) were the most poorly calibrated showing calibration on two and one items, respectively.

Overall, these findings indicate there was considerable variance in the number of studying events logged for each participant (M = 92.75, SD = 50.36), as well as types of events that were logged. Importantly, some participants’ self-reports of studying tactics on some of the MSLQ items were not well calibrated with studying events that gStudy traced as students studied. The most highly calibrated participants were well calibrated on 4 of 10 (or 40 percent) of self-reported studying activities. The average calibration across participants was closer to 27 percent.

Patterns of studying activity and self-regulated learning

Patterns of activity concern sequences of events. Theoretically, strategies that comprise SRL often involve multiple actions or events enacted to serve a common purpose or goal. Thus, it is important to examine participants’ transitions across fine-grained studying events.

We used transition matrices to examine patterns across studying events. In a transition matrix, each studying event is listed as a row and as a column. Each cell at the intersection of a column and a row represents a move from the action named in the row to the action named in the column. Transition matrices for this research included every type of study event (model event) logged by any of the eight participants (56 distinct events in total). Our matrices included 56 rows and 56 columns. Appendix lists each possible event and its corresponding abbreviation.

Transition graphs

In the transition graphs presented in Figs. 1, 2, 3 and 4, each event appears as a node and each transition between events is a directional line with a number indicating its frequency. Light grey lines in the transition graph represent transitions that occurred only once. Self-referencing loops (e.g. CQLD–CQLD = 20) are included.

Fig. 1
figure 1

Transition graphs for participant 1 (left) and participant 2 (right) from cluster 1

Fig. 2
figure 2

Transition graphs for participant 3 (left) and participant 4 (right) from cluster 2

Fig. 3
figure 3

Transition graphs for participant 5 (left) and participant 6 (right) from cluster 3

Fig. 4
figure 4

Transition graphs for participant 7 (left) and participant 8 (right) from cluster 4

The transition graphs in Figs. 1, 2, 3 and 4 provide information about: (a) the amount and type of activity, and (b) predominant transition patterns or studying strategies. Much like a frequency table, we can tell from the graphs that P1 (Fig. 1) was quite an active learner because the graph has many nodes and many transitions. In contrast, P8 (Figure 4) was not an active learner because the graph contains very few nodes and very few transitions. Finally, P4 (Fig. 2) was an active learner but limited studying to mainly highlighting (H) and creating links to concepts (CCL).

Examining transition patterns in graphs helps distinguish students who experiment with studying and students who settle in to regular sequences of events. For example, P6 (Fig. 3) experimented with a large variety of studying events. This learner frequently highlighted (H) and followed highlighting with a variety of kinds of notes (C.N...) and updates of glossaries concepts (UG...). Similarly, P 1 (Fig. 1) was an active learner who used a variety of tactics. For P1, creating quicknotes (CQ..., or CLQ…) was a predominant activity. P1’s strategy consisted of completing sequences of quicknotes (labels).

Graph theoretic statistics

Graph theoretic statistics summarize properties of transition graphs (c.f., Polanco 2003; Winne et al. 1994). Density is a graph theoretic measure ranging from 0 to 1 that compare the number of transitions (links between event nodes) occurring in a graph, to the number of possible transitions. Lower values indicate predominant transitions occurring repetitively; higher values indicate much diversity in the types of transitions with less regularity and repetition of patterns. Table 3 provides density statistics for each participant’s transition graph. The first measure, session density, considers all possible transitions for a single participant. Self-referencing nodes were included. The second measure, overall density, considers all possible transitions based on the total collection of possible events across all participants. We posit that participants with lower overall densities have formed some distinct and regular studying patterns whereas participants with higher densities are experimenting with tactics and strategies. That is, these latter students are engaged in more metacognitive monitoring and, hence, more active SRL. The participant with the highest overall density, and therefore the most experimental approach to studying, was P1 (Fig. 1) with an overall density of .02. The participant with the lowest overall densities, and therefore the most stable studying pattern was P8 (Fig. 4) with an overall density of .003.

Table 3 Graph density statistics for each of the 8 participants

Centrality identifies central or pivotal nodes in a network (see Table 4). These are “hub” events in studying. Centrality statistics include indegree and outdegree which are counts of the number of unique events that precede a focal event, or follow a focal event, respectively. A high overall degree (indegree plus outdegree) means that many nodes are connected with a focal node. High degree signals studying activities that have been used in a variety of ways in strategies; or, they may signal events the learner is freely inserting in a way that is independent of preceding and following activities.

Table 4 Graph centrality statistics for each of the 8 participants

Examining centrality provides information about how individuals and samples build on specific studying events to make them part of complex strategies. Although the most frequently occurring event is often the most central, this is not always the case.

Table 4 summarizes frequency, density, and centrality for commonly observed studying events. Due to space limitations, low frequency events occurring for two or fewer participants that were also low in centrality were excluded. Examining data in rows identifies common patterns of activity (strategies) across participants. The most common patterns across students included creating a quicknote to label information as an example (CQLE). CQLE was a node for all participants. Overall it was not a high frequency event, but it was a central event used in a variety of ways or in conjunction with a number of other studying events (high centrality) by participants 1, 3, 5, and 7. The other participants also created these kinds of quicknotes but either used them in a repetitive way or only one time. In contrast, highlighting (H) was a high frequency activity for four participants (P1, P4, P5, P6), but it was a very central activity for two participants (P5 & P6). For these two participants highlighting was used in conjunction with a wide range of other studying activities. In contrast, participant 4 used highlighting quite frequently in a similar and repetitive way. Figure 2 indicates that participant 4 experimented with highlighting in conjunction with several events (4 degrees: CQLI, CQLE, CSNL, CCL) but most frequently followed highlighting with making a concept/glossary note (CCL). We consider centrality to be one marker of metacognitive monitoring in a strategy. It is a specialized arrangement of studying events used repeatedly or centrally within studying sessions. However, we also note that the power of this type of data is yet to be mined through exploring even more sophisticated graph theoretic statistics and augmenting trace data with other information about participant intent.

Time based analyses and self-regulated learning

Simple time-based analyses include time on task or duration of time spent studying which are predictors of student performance (Zimmerman et al. 1994). We found the time participants devoted to studying varied dramatically. Participants were asked to study one chapter for 2 hours to complete their assignment. As can be seen in Table 5, participants actually studied for as little as 20 min to much as 2 hr and 47 min. Not surprisingly, time spent studying was strongly positively correlated with frequency of studying events, r(7) = .86, p < .01, r 2 = .75.

Table 5 Duration of time spent studying and total frequency of events

Content analysis

Content analysis was used to examine participants’ annotations of the content. When participants annotated text using the note templates, the glossary, or the linking feature, gStudy recorded information on which template was chosen and the information/text participants added to the template’s fields. We examined how active participants were in annotating the text including: (a) How many fields did participants fill in when they were annotating text? (b) Were annotations written in the participants’ own words vs. mere copies of the text (e.g., own words versus cut and pasted from the text)? And, (c) did annotations demonstrate active, generative processing of the content (e.g. when written in their own words did they just paraphrase the text or add thoughtful extensions to the original text)?

Participant 1( Cluster 1)

P1 engaged behavior consistent with the self-report in the MSLQ in that only 3 instances of elaborations on the content were made while working on the material. Participant 1 self-regulated in two ways: (a) using different types of quicknotes to discriminate between content such as “important”, “example,” and “I agree,” and (b) creating custom quick note types (mainly definition) to annotate information within the text. This reflects a generative process as the participant determined how the interface could be adapted to suit her purposes for studying. This participant did not elaborate, but did make use of labeling to make discriminations about the text.

Participant 2 (Cluster 1)

In the mere 20.5 minutes P2 spent studying, quicknotes were primarily used to flag important concepts (CQLI). Some glossary items were updated with small annotations or transformations (UC) to original content, and some concepts were linked together (CL). Findings indicate that P2 did not really elaborate on the content.

Participant 3 (Cluster 2)

P3 predominantly used the quick note feature to annotate text. A variety of quick note types were used indicating that P3 was discriminating amongst different types of content. P3 also used the summary note template frequently. When initially studying the participant just used one of the fields and provided labels for ideas presented within the text. Later in the studying episode, P3 elaborated with statements such as “teaching tip,” “how to apply new knowledge,” or “when to use guided and unguided learning.” Collectively analyses of P3’s notes indicate that there was very little attempt to change the wording of the text or translate the text into new words.

Participant 4 (Cluster 2)

P4 primarily focused on highlighting information within the text. However, she was more active in creating new concepts and filling in the title and description fields of the concept template. Similar to P3, P4 did little to summarize or organize information and primarily reproduced or copied definitions, or highlighting without labeling (quicknotes). These events are surface activities that do not generate personalized representations of the text.

Participant 5 (Cluster 3)

P5 was the most active participant. This participant wrote mainly summary notes filling in the fields to describe and elaborate on concepts and ideas presented within the content and augmented elaboration with highlighting and quicknotes. For example, the participant created this summary note: topic—cognitive views of learning, key details—learning and memory and main ideas—share basic notions about learning and memory. Most of the notes demonstrate surface level transformations of the content, staying close to the original wording in the text. This participant experimented with different tools in the program, occasionally double checking actions and deleting objects (notes, quicknotes, etc) afterward. This participant coordinated different actions transitioning between more active processing through notes, and less active tactics such as highlighting and making quicknote labels.

Participant 6 (Cluster 3)

P6 predominantly highlighted information. P6 experimented by making a debate note, but only filled in the names of two theorists with no indication of the position or topics proposed by those theorists. Updates P6 made to glossaries primarily involved small edits and formatting. In contrast to self-reports, P6 evidenced very little summarization or active processing of the text.

Participant 7 (Cluster 4)

P7 predominantly used the quick note feature, flagging information that she did not understand or needed to remember. On the few occasions where P7 made a link to a note or a link to a concept template fields were not filled in or copied directly from the text. P7 generated one fact based question: “what is a gestalt? How does it organize sensory data?” Although P7s self-reports on the MSLQ items suggest active processing, this was not demonstrated in traceable activities.

Participant 8 (Cluster 4)

In the 23 minutes that P8 studied, annotations of content occurred rarely. Note contents were skimpy, primarily serving as a reminder to go back to information rather than reflecting on the content itself (e.g., “major concepts I should know in the chapter”). While P8 evidenced self-regulation in terms of preparing content for latter study, there is no evidence that P8 actually returned to notes to elaborate or process more deeply.

Overall, content analysis for the 8 participants demonstrates that they rarely strayed far from the original wording of the text. When notes were created they primarily involved slight rewording of information which is characteristic of lower level or less generative processing regardless of the cluster to which students belonged.

Discussion

Overall, our findings demonstrate these eight participants varied in SRL. As well, our study challenges past research on SRL and metacognition that relies heavily on self-report data to examine strategy use. For eight participants in our study, self-reports were poorly calibrated with actual traceable studying events. We acknowledge that poor calibration could be explained by other factors such as the time delay between self-report and the studying episode, differences in studying activity associated with the use of gStudy, or the fact that self-report may capture what students perceive they do on a more general than specific level. However, this finding corroborates past research (e.g., Winne and Jamieson-Noel 2002) and warrants further investigation.

We propose that trace data of student activity in e-learning environments such as gStudy are important in furthering our understanding of SRL. Data collected from log files provide information about the frequency, patterns, and duration of actual studying activities. For the eight participants in our study, analysis of log file data revealed a different story than self-report data that was used to cluster students according to similar self-reports about studying. Analyses of trace data demonstrated few similarities within clusters. Furthermore, analyses of trace data revealed different ways students regulated learning over time by (a) experimenting with a variety of studying events, (b) experimenting with collections of events or settling into routine patterns, and (c) varying the depth of engagement with particular content. We acknowledge that although students had been trained to use the gStudy software, some of their activity was likely also experimenting with gStudy tools.

Frequency counts are the most common method for synthesizing and summarizing trace data (e.g., Beasley and Vila 1992; Fitzgerald and Semrau 1998; Kelly and O’Donnell 1994; Reed and Oughton 1997). Frequency counts provide information about the distribution of user events (or actions) across a range of possible events. They can be used to compare user events for participants grouped by performance or motivation (e.g., Fitzgerald and Semrau 1998), or to conduct hierarchical cluster analysis thereby grouping participants by patterns in their studying actions (e.g., Lawless and Kulikowich 1996, 1998).

Although frequency counts are common, analyses based solely on frequency counts of events may not provide an accurate representation of learner engagement (Guzdial et al. 1995). Relationships across events are difficult to interpret based on frequency counts (Misanchuk and Schwier 1992).

Examining transitions between events can depict learners’ actions in a complex hypermedia-learning environment (Winne et al. 1994). However, most research examines transitions in very simple hypermedia environments that involve a small number of logged events, rather than in complex environments with dozens of user choices and logged events. Large transition matrixes, such as those representing studying in gStudy, are challenging to interpret. Similarly, transition graphs provide imprecise but intuitive insight into patterns of activity (Jones and Jones 1997). The pattern analyses we described in this study provide insufficient data about the timing of actions and the context of those actions in the original text (e.g., concept or lines of text being targeted). Figure 5 shows a method for examining the timing, sequence, type of activity and context of activity. The y-axis presents a character location in each of three source html documents (colored lines) [note color will be substituted for an alternative for the publication ready figure]. Each symbol represents a different studying event (note, glossary, highlight), and the x-axis follows time from the start to completion of the studying session. We posit that this type of graphical summary of studying events has potential to inform researchers and students about studying actions and sequences. It shows when students shift back and forth between chapters, and where most activity occurs in a chapter (beginning, end or distributed evenly across the chapter). These are markers of metacognitive monitoring and control. A limitation of graphical representations such as this is that they are difficult to aggregate across participants or across study sessions. Recording multiple participants on one graph would result in so many lines and events it would be difficult to decipher patterns. Producing graphs for each participant would be difficult to synthesize or present in a published report.

Fig. 5
figure 5

Sequence and timing of each studying event located in the context of each html page of text. Note: Position refers to the character position in the text. Shading denotes each html document. Symbols represent different studying events (notes, highlighting, glossaries, etc.)

Finally, time-based analyses describe time students spend studying and how they allocate time across studying events. In this study we observed dramatic differences in the time students spent on the studying task. However, elaborate time-based analyses of log file data are rare. Most studies examine, total time or mean time in the learning environment (e.g., Lawless and Kulikowich 1998; Lickorish and Wright 1994) or time spent on a particular activity (Andris 1996; Fitzgerald and Semrau 1998; Schroeder and Grabowski 1995). Horney and Anderson-Inman (1994), on the other hand, furthered the measurement of time spent on a certain node by categorizing durations into levels of engagement and then counting the frequencies of each category.

In summary, despite the exploratory nature of this study and our reliance on only one type of logged event (model events where actual changes are made to content), this study demonstrates three potential contributions of log data for augmenting understandings of metacognition and SRL. First, examining logs of trace data affords opportunities to examine the intersect between what students perceive about their studying, and what they do when they study (calibration). Calibration is an important foundation for productive metacognitive monitoring and adaptation. Second, transition graphs and statistics provide foundations for defining strategies as purposeful collections of activity transitions. These data have potential to inform students and researchers about the strategies they employ in studying and the opportunities that branches in strategies provide for metacognitive monitoring. Finally, our analyses offer a window into the kind of data students need to collect about learning to successfully metacognitively control and adapt their studying.

To fully mine the potential of analyses of logs of trace data, future work should focus on developing more sophisticated statistical techniques and methods for examining patterns across groups of students. To date, filtering and analyzing log data has been laborious and time consuming.