1 Introduction

With the advancement of the information and communications technologies (ICT) environment, data analytics concerning behavior and consumption by ICT users is extensively investigated (Cho and Moon 2015). Similarly, a range of efforts is exerted in technology-enhanced learning (TEL) to analyze instructional data and to apply the findings to the betterment of learning environments. The United Nations Educational, Scientific and Cultural Organization (UNESCO) in 2012 categorized instructional data analytics into learning platform analytics, predictive analytics, adaptive learning analytics, social network analytics, discourse analytics, and assessment using ICT (Shum 2012). Thus, the scope of instructional data analytics was extended.

Notably, diverse approaches have been taken in learning analytics to delve into teaching–learning activities and to characterize individual learners’ interests and traits on the grounds that learning analytics is conducive to highlighting information that improves learners’ achievements, develops personalized learning environments and services, helps learners maintain and improve attention and concentration, and improves learning methods or content (Drachsler and Kalz 2016).

A teaching–learning activity consists of continuous actions. To better understand teaching–learning activities, it is necessary to analyze the data from a stream of actions arising in teaching and learning environments and thus to characterize such activities. Hence, this paper proposes a data model based on activity theory to facilitate the collection and structuration of action data arising from teaching–learning activities. The proposed model allows analysis of the relationship between learners and learning media and highlights the continuity and persistence of learning activities.

Table 1 Comparison of analytics frameworks and models

This paper covers the following: Section 2 describes the theoretical rationale. Section 3 deals with the design of the teaching–learning data model based on activity theory. Section 4 applies the designed model to actual data analysis. Finally, Sect. 5 presents the conclusion.

2 Literature review

2.1 Learning analytics

Learning analytics has been well documented. Still, the definition of learning analytics varies among researchers. Siemens (2010) defined learning analytics as using analysis models in order to make use of and predict intelligent data and learner-generated data, inform learning from such data, and discover informative and social connections. Long and Siemens (2011) defined learning analytics as collecting, measuring, analyzing, and reporting data about learners and learning contexts for the purpose of understanding the learning environments, optimized learning, and learners (Kim 2016).

Other definitions of learning analytics in the literature can be summarized as follows: Learning analytics is a series of processes for collecting and analyzing data about learners and learning contexts with the intent to understand the learning environments, as well as learning itself, and to provide optimal learning environments (e.g., personalized services, achievement prediction, and instruction) (Jeong 2016; ECAR-ANALYTICS Working Group 2015). Based on the foregoing definitions, this paper redefines learning analytics as a data-based decision-making process, and a cyclic process involving the extraction, storage, and analysis of data, visualization of analysis findings, prediction of future behaviors, and application of findings with a view to offering personalized and more effective learning experiences.

Table 1 outlines several frameworks established for learning analytics. Each model defines a series of processes for analyzing and applying the data arising from teaching–learning activities. In brief, the analytical models come down to a process involving Select, Capture, Aggregate & Report, Predict, Use, Refine, and Share (Jeong 2016; Lias and Elias 2011).

Data models are standardized in tandem with research on the frameworks for learning analytics. That is, the frameworks for learning analytics involving the collection, analysis, and management of data arising in the systems, content, learning tools, and users in heterogeneous environments are standardized. The standardization is led by the ISO/IEC and IMS Global (Jeong 2016; Pardos et al. 2016; Kim and Moon 2014).

  • ISO/IEC JTC1 SC 36 (Information Technology for Learning, Education and Training) works on the standardization of the reference model for learning analytics by deriving use cases from the perspectives of learners, instructors, and institutions.

  • The IMS Global Learning Consortium (GLC) published Learning Measurement for Analytics for learning analytics and the Learning Measurement Framework (IMS Caliper) and worked on their sophistication.

Despite the foregoing efforts, learner-oriented personalized services reflecting a learner’s characteristics (e.g., learning style) call for a data model for representing and analyzing the information of teaching–learning activities as actions. Also, the persistence and iteration of certain activities should be represented across diverse types of learning activities, which will enable the learning analytics that takes into account learners’ characteristics.

2.2 Activity theory

Activity theory is a theoretical framework intended to help understand human interactions by analyzing the use of tools and media. In activity theory, a human action system is defined as an activity. The relationship between subjects and objects is viewed as a unit and mediated by the presence of tools (Hashim and Jones 2007; Choi 2016; Uden 2006).

Fig. 1
figure 1

Activity system

An activity system consists of three steps: activity, action, and operation. Specifically, a human activity exists in the form of actions. That is, actions underlie the human activity. Basically, actions belong to and constitute an activity. Actions convey meanings in the context of an activity. Exploring actions within an activity enables the conceptualization of human actions from the perspective of wider contexts. Also, an operation is a method of realizing an action.

According to Hashim and Jones (2007), a series of operations constitutes a higher-order action. Actions are linked to individuals’ skills and knowledge. Engaging in an activity is to carry out a limited series of actions. Actions can be classified into individual or collective actions. The highest-order activity is relevant to goals and motivation. Actions are associated with goals. Operations directly depend on the conditions for the achievement of goals. Figure 1 shows how these three notions are related (Hashim and Jones 2007).

Literature on learning analytics based on activity theory in TEL environments mostly applies activity theory to lay a qualitative foundation for analysis (Xing et al. 2014; Mwalongo 2016).

Gifford and Enyedy (1999) argued that activity theory would be an appropriate framework for knowledge-generation models in computer-based collaborative learning activities. They asserted that activity theory could establish the characteristics of collaborative activities and how people could engage in social interactions based on technology, allowing the effective use of computer-supported collaborative learning activities (Hashim and Jones 2007).

Scanlon and Issroff (2005) measured the outcome in higher education by analyzing data using a framework based on activity theory on the grounds that the outcome (i.e., more learning) is underpinned by an organic combination of the components of activity theory. They defined the components of activity theory and the attributes of each component as follows: tools (learning technology), subjects (students), objects (tasks or learning situations), rules (ethics as appropriate), communities (higher education institutions), and division of labor (who controls what) (Hashim and Jones 2007).

Zurita and Nussbaum (2007) drew on activity theory to propose a conceptual framework and design for analyzing a mobile computer-supported collaborative learning system. In approaching computer-supported collaborative learning, they divided (components of) activity theory largely into roles and rules, networks, and collaborative activities. Roles and rules involve the rules from activity theory. The networks component involves tools, subjects, and communities. Collaborative activities involve objects and division of labor. They applied these components to understand the subjects and the activities arising in relevant contexts. That is, they intended to analyze the interrelationship between the components based on activity theory (Zurita and Nussbaum 2007).

Liaw et al. (2007) investigated learners’ attitude factors in e-learning systems based on activity theory. They elicited four components of learner attitudes, i.e., the learner autonomy environment, the problem-solving environment, the multimedia learning environment, and instructors’ roles as supporters. They considered activity theory as an appropriate approach to address the challenges associated with e-learning systems and environments, and moreover, as a positive element for problem-solving in e-learning environments (Liaw et al. 2007).

Florian et al. (2011) proposed an activity-based learner model to monitor learners. Their activity-based learner model was intended to provide data for competency assessment applicable to adaptive learning support, evaluation, competency analysis, and recommendation. To that end, a method of building a Moodle activity–learner model was proposed based on an activity theory model and an actuator-indicator model, where social roles arising in a teaching–learning process are viewed as a principal component to provide learners and instructors with the prospect of adaptation to learning (Florian et al. 2011).

2.3 Teaching–learning activities and stream data

Engagement actions constitute a teaching–learning activity. Actions are continuous, where data arise in the form of a stream. Also, teaching–learning engagement actions tend to be iterative and continuous. Such continuous data are generated in the form of a stream, as well. To derive meaningful information from teaching–learning activity data, activity indicator components are needed. That is, to analyze a teaching–learning activity process, both activity indicators and stream data of actions are necessary. Teaching–learning activity indicators include things like the frequency and duration of a learner’s access to a learning source and the learner’s gender, etc. (Brooks et al. 2015).

Previous studies on teaching–learning activity indicators for learning analytics in a TEL environment are discussed below.

Ruipérez-Valiente et al. (2015) proposed an extended-learning analytics model to better understand the learning process. They built on learning analytics to support the Khan Academy platform. Specifically, their model expanded the features of the tools for visualization of the results from learning analytics, supporting the analysis of entire classes and individual learners. Ultimately, their model supported instructors and learners with decision making about learning processes. They categorized the indicators for learning analytics into six groups: total use of the platform, correct progress on the platform, time distribution for the use of the platform, gamification habits, exercise-solving habits, and affective states (Ruipérez-Valiente et al. 2015).

Mukala et al. (2015) used a fit linear model to analyze learning patterns in massive open online course (MOOC) environments. A learner visits a MOOC during a semester, eagerly engaging in video instructions and quizzes. The learner clicks to search instructional videos or quizzes, leaving a trace of click events before logging out. Such actions constitute a click stream. Click streams constitute learning via videos and quizzes. So do page views. These data indicate how the learner interacts with the instructional videos or quizzes (Mukala et al. 2015).

Gašević et al. (2016) drew on learning analytics to investigate the instructional conditions influencing the predictors of successful learning. Based on the Moodle, they classified data for learners into 12 types in order to extract the following data from each course: assignments, books, chats, course logins, feedback, forums, lightbox galleries, maps, quizzes, resources, Turnitin, and virtual classrooms. Based on these data, they performed learning analytics (Gašević et al. 2016).

3 Designing a teaching–learning data model based on activity theory

3.1 Overview

Teaching–learning in TEL environments involves participants (instructors/learners), learning objects (learning sources/learning tools), interactions, and learning outcomes. That is, learning is achieved via student–student, student-teaching, and student-learning sources/content, student-learning tools, teaching–learning sources/content, and student-system and teaching-system interactions. Such interactive actions and activities underlie the implementation of learning. The likelihood that a learner will complete the learning is predicted based on the analysis of their relationships with each component. In TEL environments, a learner often logs onto the system more than once daily. Such actions should be represented for learning analytics. That is, activities performed by the learners over a certain period of time (as well as each time they access the system) should be represented. To that end, this paper is based on activity theory to approach the model for data collection.

The components of activity theory applied to the proposed model include tools, subjects, objects, rules, communities, and division of labor, as shown in Fig. 2 (Hashim and Jones 2007; Xing et al. 2014, 2015; Zurita and Nussbaum 2007; Liaw et al. 2007).

Fig. 2
figure 2

Components of activity theory

Fig. 3
figure 3

Learning process based on activity theory

Table 2 Components of activity theory relevant to those of the learning activity

Subjects are motivated to achieve goals, use tools, and engage in a learning process (objects). Here, the learning process is completed by an activity, which is comprised of continuous actions, as shown in Fig. 3 (Carvalho et al. 2015). In other words, a learning process consists of a series of actions that constitute an activity, through which learning goals are achieved.

3.2 Defining relationships between activity theory and teaching–learning activities

Data should indicate whether a learner participates in a learning activity and the extent of engagement over time. Thus, the teaching–learning data model based on activity theory starts from the relationship between activity theory components and teaching–learning components, as defined in Table 2, where primary teaching–learning attributes are linked with each component of activity theory defined (Xing et al. 2014, 2015).

Subjects are motivated to achieve learning goals by engaging in learning activities (objects) using tools. Therefore, this paper proposes the Activity-Subject, Object, Tool (A-SOT) model involving subjects, objects, and tools among other components of activity theory.

First, the term subject means the characteristics of participants in learning. Learners have their own style of learning. Styles of learning vary with the methods of perceiving information and those of processing information. The styles of learning in terms of the perception of information are divided into user groups preferring visual components (such as pictures, photos, tables, charts, and graphs) and groups preferring verbal components (such as words, sentences, and descriptions). Also, in terms of information processing, user types are divided into groups preferring practical components or active activity components (e.g., participation in a group or discussion) and groups preferring reflective or passive activity components (e.g., individual activities, attentive listening, and thinking). Classification helps to better understand users (Kim et al. 2015).

Next, objects are comprised of activities, actions, and operations to represent the system of activities. That is, a teaching–learning activity consists of a series of actions. An action may be represented as a time frame (start and end times) so that the start and end times for each action can be extracted as an action.

In TEL environments, a learner uses a range of learning sources and tools. Tools imply specific features of an LMS/MOOC in a TEL environment. Tools refers to sources or instruments for learning, e.g., instructional materials, videos, discussions, quizzes, blogs, and reports, used by a learner as the media in learning activities. Based on the tools, learning actions continuously take place. That is, a teaching–learning process is a series of actions. To understand the tools, indicators for learning tools need to be set up. In this paper, the components of the indicators for tools are limited to Notice, Video/Text Content, Grade, Group, Forum, and Blog. Figure 4 shows the A-SOT data model based on subject, object, and tool components.

Fig. 4
figure 4

Class diagram in the A-SOT data model

To save the model in Learning Record Store (LRS) format for data, the relationship should be represented in detail. To that end, the data are represented as a Tuple: <Attribute: Value>.

First, to extract data over time, Duration is set as Day, Week, or Month. Therefore, each time frame is stated as in formula (1).

  • $$\begin{aligned} \bullet \,&\hbox {Time Frame}~=~{<}\hbox {Timestamp}{:} \nonumber \\&\quad \hbox {Starttime-Endtime, Duration: D}{>} \end{aligned}$$
    (1)

The sub-components defined in the foregoing class diagram are defined in formulae (2)–(4).

  • $$\begin{aligned} \bullet \,&\hbox {Subject}~=~\{\hbox {Individual Learning Style},\nonumber \\&\quad \hbox {Group Learning Style}\}\nonumber \\&\hbox {Learning Style Type}~=~ \{\hbox {Visual(Vi)/Verbal(Ve)},\nonumber \\&\quad \hbox {Active(A)/Reflective(R)}\}\end{aligned}$$
    (2)
    $$\begin{aligned} \bullet \,&\hbox {Tool}~=~\{\hbox {Notice, Video/Text Content}, \nonumber \\&\quad \hbox {Grade, Group, Forum, Blog}\} \end{aligned}$$
    (3)
    $$\begin{aligned} \bullet \,&\hbox {Object}~=~\{\hbox {Activity, Action, Operation}\} \nonumber \\&\hbox {Action}=\sum \mathrm{Operation}=\hbox {O}1,\ldots .\hbox {On}\nonumber \\&\hbox {Activity}=\sum \mathrm{Action}=\hbox {A}1,\ldots .\hbox {An} \end{aligned}$$
    (4)

Thus, the learning activity data associated with a learner’s use of tools are represented in formula (5). The components of activity theory and those of learning activities are represented in the <Attribute: Value> format, which can be used to state whether, and how often, a learner engages in teaching–learning activities.

  • $$\begin{aligned} \bullet \,&{<}\hbox {Subject: individual, Tool: Notice, Object}{:}\nonumber \\&\quad \hbox {Number of operation, Duration: D}{>} \end{aligned}$$
    (5)

For example, the following string indicates the data about user 1 engaging in learning three times a day using a video object: <Subject: student1, Tool: Video Content, Object: 3, Duration: D> where teaching–learning activities are represented as a series of actions based on the data generated in a teaching–learning process. Hence, the relationships between a learner or a subject, and a tool or a medium for a learning activity, as well as the continuity of an object, or a teaching–learning activity system, are identified.

4 Application of the teaching–learning data model based on activity theory

4.1 Overview

When the teaching–learning support systems (e.g., MOOC and LMS) are used in TEL environments, a huge amount of data is generated in the teaching–learning process. Such data are not suitable for immediate use because of the differences in the types of data. Thus, collected data should be processed before the attributes of activity-based teaching–learning components can be identified quantitatively for analysis.

To analyze the data based on activity theory, the data collection process should be separated from data analysis. First, data collection is based on teaching–learning activity indicators. A learner’s click-stream data are collected. A data set is constructed based on the click-stream data. To begin with, the duration of learning is set as Day, Week, or Month to process the data. Also, the tools used are classified. Next, the analysis of data is classified into analysis of actions and analysis of activities (Leslie et al. 2016). Analysis of actions involves analyzing a series of actions per teaching–learning tool. Analysis of activities involves cluster analysis, where comparable clusters are formed, and the use of tools is characterized per cluster.

Table 3 Experimental data set
Fig. 5
figure 5

Standardized data values of learning tools used by each user

4.2 Experimental setting

In the experiment for this paper, teaching–learning action data generated in an engineering class for junior undergraduates at H University using a supplementary teaching–learning support system are collected and analyzed. The data concern a series of learning actions via the medium of the tool in the teaching–learning support system. The daily and weekly click-stream data are processed and used as the action data.

Based on the stream data collected from the participants, the frequency and the duration (days) of engagement are preprocessed. Table 3 shows the data collected from the 47 participants.

The experimental data are standardized with the max–min method, which is widely used to compare each component, as in formula (6) Kim et al. (2015):

  • $$\begin{aligned} \bullet \, \hbox {Standardization of Max}-\hbox {Min}=\frac{\left( {x-\mathrm{Min}} \right) }{\left( {\mathrm{Max}-\mathrm{Min}} \right) } \end{aligned}$$
    (6)

4.3 Data analytics

The proposed A-SOT model is used to characterize the subjects (individuals/groups), tools (notices, content, grades, groups, forums, and blogs), and objects (actions and activities).

Figure 5 shows the results of analyzing the user action data relevant to the tool components as the media for teaching–learning. Based on the characteristics of tools, individual-activity components are separated from group-activity components. The individual-activity components include the notices, content, and grades, while the group-activity components include the groups, forums, and blogs. In Fig. 5, ID29 is less active, whereas ID36 is more active in using all learning tools.

Now, user characteristics are clustered based on the tool components. The clustering methods include hierarchical and non-hierarchical methods. First, to derive the most suitable number of clusters, hierarchical clustering is used. In hierarchical clustering, the Ward method is used to determine the variance in correlation coefficients. Here, a value with a large variance is chosen. That is, \(K=5\) (K value \(=\) 46-41), meaning the ideal number of groups is 5. Next, the k-means method is used for non-hierarchical clustering.

Table 4 Clustering by tool component for individual activities
Table 5 Clustering by tool component for group activities

Table 4 shows the results of clustering by the tools component for individual activities, together with the median values of sub-components in each cluster. In Table 4, Cluster 2 most actively uses individual-activity tools, such as grades, notices, and content.

Also, Table 5 shows the results of clustering by tool component for group activities together with the median values of sub-components in each cluster. In Table 5, Cluster 4 most actively uses group-activity tools, such as groups, forums, and blogs. Cluster 3, which has the most members, uses the tools for group activities the least.

Next, the relationship between individual- and group-activity tool clusters is identified to characterize the subjects (individuals/groups) or learners who engage in learning activities using tools. Figure 6 shows the correlation between the individual- and group-activity tool clusters. Cluster 3 in the tools for group activities shows the lowest level, while playing a central role in the relationship with the individual-activity tool cluster. Cluster 1 in individual-activity tools and Cluster 5 in group-activity tools actively use both individual- and group-activity components.

Next, daily and weekly data are used for clustering so as to identify the teaching–learning activity systems (objects). This is intended to derive comparable groups based on teaching–learning activity systems. Table 6 shows the results of clustering by daily frequency of engagement. Cluster 4 shows the most active daily engagement and participates in learning several times a day.

Fig. 6
figure 6

Correlations in individual- and group-activity clusters

Table 6 Clustering by frequency of engagement in learning
Table 7 Clustering by duration of engagement in learning (number of days)

Table 7 shows the results of clustering by duration of engagement in learning in terms of the number of days. Cluster 3 regularly and continuously engages in learning activities most actively.

Based on Tables 6 and 7, the relationship between the frequency and the duration of engagement among the clusters is determined, as shown in Fig. 7. Cluster 2 is placed in the middle in terms of duration (number of days), playing a central role in clustering by frequency of engagement. Cluster 4 in terms of duration (number of days) of engagement, and Cluster 1 in terms of frequency of engagement show the lowest levels.

In addition, the clustering relationship based on weekly data between the number of days (duration) and the frequency of engagement is shown on the left side in Fig. 8. Cluster 2 in terms of the weekly duration of engagement (number of days) is placed in the middle, playing a central role in relation to the frequency of engagement.

As mentioned before, the relationship between teaching–learning subjects, teaching–learning objects, and teaching–learning media is analyzed with clustering. That is, individual and group activities are clustered based on the characteristics of learning tools used by learners. Also, based on daily and weekly cycles and the frequency of learning activities, learning-action groups are clustered. Moreover, based on weekly cycles and frequency of learning activities, learning activity groups are clustered.

Users ID29 and ID36 are characterized in Table 8. ID29 hardly uses the tools for individual and group activities and rarely engages in learning, whereas ID36 actively uses the tools for individual and group activities and frequently engages in learning daily. Yet, the latter’s weekly engagement in learning is about average.

Fig. 7
figure 7

Correlation between clusters by frequency and duration of engagement

Fig. 8
figure 8

Correlation between clusters by weekly frequency and duration of engagement

Table 8 Tool–subject–object clusters by user

5 Conclusion

In a TEL-based teaching–learning activity, each action generates data, which are automatically accumulated in the system. The accumulated data can be restructured or repurposed. That is, such data may be used to improve learner achievements and to develop personalized learning environments and services through learning analytics, which requires efficient analytic models for the data’s collection and structuration. Hence, this paper proposes the A-SOT model (based on activity theory) for data collection. The proposed model is intended to identify and use a teaching–learning process that is iteratively and continuously implemented through activities and actions.

The A-SOT model represents a teaching–learning activity as a series of actions based on data generated in the process. That is, subjects for learning have tendencies toward individual and group activities. Six components of learning tools are derived. Activity systems are defined as having operation, action, and activity steps. With the proposed model and definitions, not only the relationship between learners (subjects) and tools (media for learning activities) but the continuity and persistence of teaching–learning activity systems (objects) are also determined.

The proposed A-SOT model is applicable to clustering and sheds light on learners’ tendencies toward individual and/or group activities, as well as the frequency, iteration, and continuity of learning activities. Also, the proposed model is conducive to explicating individual learners’ tendencies toward learning and to predicting their achievements in learning activities.

These findings are meaningful on the following grounds. First, as a model based on activity theory and intended for data collection, the A-SOT model represents the interactions arising in TEL environments and allows extraction of the generated data in tuple format over time. Second, stream data are collected and analyzed by deriving and applying teaching–learning activity indicators. Third, a learning activity is approached and characterized in terms of action and activity. Also, learners are characterized collectively and individually in light of their tendencies toward frequency, cycle, and continuity of learning activities.

The present findings should be reinforced by further research on standardization of collected teaching–learning data, data-storage models, and teaching–learning activity indicator–profile models in TEL environments, and through replication of the proposed model. In addition, the proposed model will be upgraded continuously so it can be applied to a range of teaching–learning models for online instruction, offline supplementary instruction, and online–offline instruction. Furthermore, it will be made more sophisticated in order to predict the collective and individual achievements of learners.