Keywords

1 Introduction

Since 2012, MOOCs (Massive open online courses) have attracted a large number of learners all over the world. Every day, millions of people with different education and demographics background study free online courses in popular MOOC platforms, such as CourseraFootnote 1 and EdxFootnote 2. Compared with traditional university-level education, MOOCs dramatically speed up the way how college courses are delivered to normal crowd. From the view of higher education, MOOCs may have the potential to largely reduce the cost of higher education in a near future.

However, despite its increasing popularity, MOOCs still suffer from the problem of high drop-out rate (or low completion rate). According to the annual report of Coursera, the drop-out rate in Coursera’s start-up stage ever reaches 91% [1]. Such a high drop-out rate not only means a waste of huge amount of human learning efforts but also brings negative opinions to MOOCs industry. Why does the vast majority of students fail to finish free online courses? One reason may be due to the difficulty in understanding learning demand and user interests. On the one hand, a good modeling of user interests can effectively help understand what kind of course students may like. On the other hand, a good modeling of learning motivation can also help estimate how likely a student will finish courses and thus avoids unnecessary course drop-out. However, due to the limitation of current MOOC platforms, it is usually difficult for users to explicit reveal their true interests and motivation.

To tackle this problem, personalized course recommendation technology has become a hot research topic in e-learning and data mining community. One of the key components in classic personalized course recommendation technology is user modeling which tries to learn the latent user interests from historic behavior data so that the decision module in recommendation system, i.e., a machine learning system, can estimate the likelihood of course preferences more accurately. Many works have been proposed, such as course enrollment pattern analysis [4, 13], profile text mining [5], and resource visiting log analysis [6] and so on. These methods have shown good recommendation performance. However, they are all passive as they do not actively query user interests from users but passively collect user behavior data to learn latent user interests. Due to the sparsity and noise problem, the modeling effect may not be satisfying.

Recently, the interactive course recommendation approaches have become more and more popular. Main-stream MOOC platforms, such as Coursera, have adopted such techniques to help students choose online courses (see Fig. 1). In these systems, before a user selects courses, the platform will show a list of survey tables (or questionnaires) to let her choose what topics she likes most. After the user submits the survey tables, the recommender system will integrate these explicit interest data with user demographics data together for better recommendation. For example, in Fig. 1, after a user selects the topics “Data Science” and “Machine Learning”, the Coursera platform will recommend related courses, such as Machine Learning Foundations and Probabilistic Machine Learning. This recommendation mechanism has been shown to give more accurate courses to users. However, very few of research works have systematically studied the effect of human-system interaction in user interest acquisition.

Fig. 1.
figure 1

The user interest survey table in Coursera’s interactive course recommendation module.

In this paper, we study the user interest acquisition problem based on the interactive course recommendation framework (ICRF). We first give a formal model of this framework. After that, we systematically discuss different settings on integrating user interests into recommender system. We propose the ICRF user interest acquisition algorithm that selectively acquires interests from a small group of representative users and propagates the acquired interests to all other users. With extensive experiments on real-world MOOC course enrollment datasets, we find that selectively request users to show their interests before course selection do help improve the recommendation performance. On average, our strategy can significantly reduce the performance-cost score by 30.25% compared to the traditional aggressive acquisition strategy.

The rest of this paper is organized as follows. Section two reviews previous work in the field of course recommendation with a focus on user modeling methods. Section three describes the model of ICRF and proposes the ICRF user interest acquisition algorithm. Section four discusses the dataset, experimental setting and results. The last section concludes this paper and presents future works.

2 Related Works

In this section, we review literature of recommender system in the field of course recommendation domain with a focus on user modeling methods. Different from the movie or music recommendation, the course recommendation domain usually have complex context [1, 2, 7, 10].

Traditional data mining methods are first explored for user modeling in course recommendation. Aher et al. [4] use k-means clustering and Apriori association rule algorithm. Zhang et al. [13] develop a course recommendation system that implements distributed Apriori algorithm on computer clusters. These user modeling methods can reveal students’ course enrollment pattern and show reasonable recommendation performance. However, they rely on analyzing the historical course enrollment behavior. Thus, traditional rule based methods may suffer from the cold-start problem when serving new users without any historical behavior [3].

To tackle the cold-start problem, several works try to explore contextual information, such as user profile, course prerequisite and other auxiliary information. Piao et al. [5] propose a similarity based user modeling method for MOOC course recommendation based on mining profile keywords such as job titles and skills. Jing et al. [6] propose a hybrid user modeling framework that mixes three item-based collaborative filtering scores based on user access behavior, user demographics and course prerequisite. Yu et al. [8] propose a method to mine discussion comments for user recommendation. Salehi et al. [11] propose a method combing sequential pattern mining and attribute based collaborative filtering. Social network based methods are also discussed in Zhang et al. [12]. These methods can be viewed as an extension of neighborhood-based collaborative filtering methods. They are all based on complex similarity metrics learned from contextual data. However, a major limitation of these methods is a lack of differentiable model that can be mathematically optimized to improve generalization ability.

Recently, latent factor models have become a popular research direction in the field of course recommendation. Due to their solid mathematical ground and good ability on modeling latent contextual behavior, these methods have shown better performance than item-based collaborative filtering methods. For example, Elbadrawy et al. [9] propose an ensemble of matrix factorization models each of which learns latent factors of students and courses at different levels of granularity. Sahebi et al. [14] propose a tensor factorization model for quiz recommendation. Their methods can model the temporal relationship between student knowledge and skill improvement.

Due to the effectiveness of latent factor model in course recommendation, we also use this type of recommendation methods in this paper. More specifically, we use the popular factorization machine [15] which learns the second-order interactions between features. This is especially useful for datasets with extreme sparsity, such as course recommendation datasets.

3 Interactive Course Recommendation Framework

In this section, we first describe the general model of the interactive course recommendation framework (ICRF). Second, we discuss various settings of user interest acquisition. Last, we propose the ICRF user interest acquisition algorithm to acquire user interest in a cost-effective way.

3.1 Framework Modeling

In this paper, we adopt the popular rating based recommendation approach. The goal of rating based recommendation is to learn a regression function that can accurately estimate user ratings on different courses.

Different from the traditional rating based recommendation, in ICRF, we explicitly model user interests. Specifically, we use U to denote the set of all users, C to denote the set of all courses, I to denote the domain of user interests and R to denote the rating interval (usually between zero and one)Footnote 3. The framework can be defined as

$$ f:\left( {U,C,I} \right) \to R $$
(1)

We use factorization machine (FM) [15] as the basic regression function. The regression function of FM of degree two in ICRF is defined as

$$ \hat{f}\left( {\varvec{x};\varvec{w}, \varvec{V}} \right) = w_{0} + \sum\nolimits_{i = 1}^{n} {w_{i} x_{i} } + \sum\nolimits_{i = 1}^{n} {\sum\nolimits_{j = i + 1}^{n} {\left\langle {\varvec{v}_{i} ,\varvec{v}_{j} } \right\rangle x_{i} x_{j} } } $$
(2)

where w and V are the model parameters and x is the feature vector that concatenates u, c and Iu as \( \left[ {\varvec{u}^{T} ,\varvec{c}^{T} ,\varvec{I}_{\varvec{u}}^{T} } \right] \). The most important part of FM is the inner product of vi and vj on the cross product of the i-th feature and the j-th feature. This has been proved to allow high quality estimation of pair-wise feature interaction under sparsity [15].

In a typical ICRF application scenario, the recommendation process usually contains four steps (as shown in Fig. 2). First, a user tags a set of topic options in the interest survey table. The survey tables usually contains many topic options which include, for example, academic topics, habit topics and professional career topics. Second, the system acquires interest data from the survey table. Third, the system conducts rating estimation on the triplet (u, c, Iu) for any \( c \in C \). Finally, the estimated top rated courses will be shown as a ranking list to the user for further selection.

Fig. 2.
figure 2

The recommendation process in ICRF.

The most important part of ICRF is the second step (user interest acquisition). To represent user interests I in a computational way, we assume that any user interests can be modeled as a multivariate normal distribution of a k-dimensional random vector

$$ I_{u} = \left[ {I_{u}^{\left( 1 \right)} ,I_{u}^{\left( 2 \right)} , \cdots ,I_{u}^{\left( k \right)} } \right] \sim N\left( {\varvec{\mu},\sum } \right) $$
(3)

where \( I_{u}^{\left( i \right)} \) represents the degree of likelihood of user interests on the i-th application specific topic and \( \varvec{ \mu } \) and \( {\sum } \) are the parameters.

3.2 User Interest Acquisition

Based on the above discussion, many interesting research questions can be discussed in ICRF. In this paper, we systematically study the user interest acquisition problem by answering how user and recommender system interact with each other. In real-world ICRF application, we can find four setting of user interest acquisition scenarios (see Table 1).

Table 1. Different setting on user interest acquisition.

From the view of recommender system, there are two type of acquisition strategies including aggressive acquisition and selective acquisition.

Aggressive acquisition. The recommender system sends survey tables to every users. By doing so, the recommender system can build a user interests database for all users. Therefore, with the supplementary user interest information, the recommender system can model users better.

Selective acquisition. From the view of cost-sensitive learning, acquiring user interests is not free. Answering professional or privacy related questions can take users’ time and even confuse them. To reduce the acquisition cost, it may be more realistic to intelligently acquire user interests from a partial group of representative users. Other user’s interests can be deduced from these representative users. Thus, a significant amount of acquisition cost can be saved.

On the other side, from the view of users, there are two type of answering strategies including cooperative answering and reluctant answering.

Cooperative answering. When users receive survey tables either through emails or web portal, they are always willing to provide their course preferences and interests. For recommender systems, it means that users are very cooperative to provide personal interest data. Thus, the cost of sending survey tables will not be wasted.

Reluctant answering. A user may be reluctant to fill survey tables either because they are too busy or uncertain about questions. In such a case, the recommender system wastes amount of cost to send survey tables.

3.3 ICRF User Interest Acquisition Algorithm

Understanding the advantage and limitation of different settings of user interest acquisition is important for ICRF. In this paper, we propose the ICRF user interest acquisition algorithmFootnote 4 (see Algorithm 1) to empirically study this problem. As the SR setting (selective acquisition + reluctant answering) is the most complex, we will mainly discuss ICRF algorithm for SR in this section. Other settings can be easily modified from SR. The algorithm consists of an initialization part, a selective acquisition part and an interest propagation part.

In the initialization part, as it is very costly to collect true user interest online (step 1 and step 2 in Fig. 2), we have to resort simulation approaches. Specifically, we simulate the interest acquisition procedure by designing an oracle componentFootnote 5 (i.e., oracles in active learning literature [18]) which can generate the true user interests. Specifically, for each user, we assume that her interests can be estimated based on the average topic distribution (learned by the LDA model [20]) on all her enrolled courses (step 1). Second, we define the probabilistic function e as a reluctant trigger which mimics the reluctant behavior (i.e., user will answer only when \( e\left( u \right) > h \)) for AR (aggressive acquisition + reluctant answering) and SR (selective acquisition + reluctant answering) settings (step 2).

In the selective acquisition part, we first adopt the representative sampling strategy to build the selective user pool \( U^{\left( s \right)} \) (step 3). Specifically, we use k-means clustering to group users into different set and choose the users that are most closed to the clustering centroids as representative users. The recommender system with selective acquisition strategy will only query interest data for users in \( U^{\left( s \right)} \). Second, for each user, we generate her user interests either as \( O\left( u \right) \) or \( \emptyset \) based on the in-pool condition and reluctant trigger condition (step 5). We also add an acquisition cost to the total cost V (step 6). The acquired user interests Iu will be added in a new training set \( \tilde{D}^{T} \) (step 7).

In the interest propagation part (from step 8 to step 11), we propose a simple interest propagation algorithm to iteratively update interests for not-queried users (i.e., \( u{ \in }U - U^{\left( s \right)} \)). The idea is borrowed from the famous PageRank [19] algorithm which has been applied to many domains, such as keyword extraction [16]. The interest propagation algorithm relies on the user nearest neighbor graph. We form the graph edges by connecting each pair of users if they are among each other’s k nearest neighbors (measured by Euclidean distance on demographics data). The damping factor d plays the role of absorbing interest from other random users.

4 Experiments

In this section, we first describe the datasets used in our experiments. Second, we present the experimental configuration. Third, we conduct experiments to study the following research questions:

  • RQ1. Is the selective acquisition strategy better than the aggressive acquisition strategy in terms of the performance-cost score?

  • RQ2. How does the interest propagation method improve performance-cost score?

  • RQ3. Are the reluctant answering behavior harmful?

4.1 Dataset

The dataset used in this work is provided by a MOOC platform deployed at our universityFootnote 6. As there are usually no explicit rating data for MOOC courses, we use video watching progress as pseudo ratings. Generally speaking, the higher the ratings, the more likely students may finish all videos. Thus, the goal of course recommendation becomes recommending courses that students are likely to finish all videos.

This dataset contains 319,408 course enrollment events spanning from Sept 2014 to April 2015. During this period, 93,063 students have enrolled in 79 MOOC courses. For each student and each course, the dataset provides categorical features and text features (see Table 2). For example, an undergraduate student in College of Compute Science can be represented as a sparse 0/1 feature vector where only the feature index in “Undergraduate”, “Student” and “College of Computer Science” are one. For each course, the dataset also provides title, introduction and syllabus text that can be used to learn topic distribution.

Table 2. Feature description.

In our experiments, we split the full dataset into six groups by a time window of three months (see Table 3). Specifically, in each time window, the first two months of data is used as the training set and the rest one is used as the testing set. Thus, in total, six groups of datasets will be used in our experiments.

Table 3. Dataset split.

4.2 Configuration

Experimental Protocols. For each group of dataset, we use the ICRF user interest acquisition algorithm to generate explicit user interests for users in both training and testing sets. For the selective acquisition strategy, we set the cluster number of k-means algorithm as 500. The k parameter to build the nearest neighbor graph is set as 100. Following the work of [16], we also set the damping factor d in the interest propagation algorithm as 0.85. When the oracle queries a user, we accumulate the total cost by one.

We use the famous libFM [17] library as the implementation of FM. The popular SGD (stochastic gradient descent) optimization method is used to train FM regression functions on the training set. In our pilot experiments, we find that setting the learning rate as 0.001 and the regularization parameter as 0.0001 for SGD can have reasonable performance. Thus, we use these hyper-parameters to train FM. The number of training iteration is restricted as 50.

To evaluate each strategy in a cost-sensitive way, we compare the performance-cost score (PCS) of different strategies. Intuitively, a good user interest acquisition strategy should not only have good recommendation performance (e.g., low RMSE score) but also have low acquisition cost (i.e., the lower the PCS score is, the better the strategy performs). Thus, we define a new evaluation metric by combining RMSE and cost acquisition as

$$ PCS = \sqrt {RMSE \times Cost} $$
(4)

Implement of Other Settings. We summarize the differences of the four settings in Table 4. By default, the ICRF algorithm implements the SR settings. We can easily modified ICRF algorithm to implement the other settings by removing not used components.

Table 4. Differences of the four user interest acquisition settings.

4.3 Aggressive Acquisition VS Selective Acquisition (RQ1)

Table 5 compares the PCS scores between AC (aggressive acquisition + cooperative answering) and SC (selective acquisition + cooperative answering). We can see that on all the six datasets, SC outperforms AC significantly. On average, the SC strategy can reduce the PCS score by 30.25% compared to AC. This means that in real-world interactive course recommendation application, the selective acquisition strategy that intelligently acquires interests from representative users and propagates their interest to other similar users is much better than the aggressive acquisition strategy that blindly acquires interest from all users.

Table 5. Comparing the PCS scores between AC and SC.

4.4 Effect of Interest Propagation (RQ2)

The selective acquisition strategy relies on the interest propagation (IP) algorithm to propagate interests to users not in the selected pools. How about the performance of SC without IP? To study this issue, we compare SC with two alternatives. For each user not in the selected pools, the SC-IP strategy assigns no interests while the SC-IP+Center strategy assigns the interests of cluster centers where the user belongs to.

From Table 6, we can see that on all the six datasets the SC strategy outperforms the other two variants. This demonstrates that propagating interests to similar users is important in the ICRF user interest acquisition algorithm.

Table 6. Compare PCS scores of SC, SC-IP and SC-IP+Center.

4.5 Is Reluctant Answering Harmful (RQ3)

When users refuse to answer survey tables, the recommender system will waste amount of cost. How will be such reluctant behavior harmful to ICRF framework? To study this issue, we conduct experiments for SR (selective acquisition + reluctant answering) and AR (aggressive acquisition + reluctant answering) on the I and the V datasets where we find the smallest and biggest PCS improvement of SC over AC (both strategies do not consider reluctant behavior). The reluctant thresholds are set from 0.3 to 0.7.

Table 7 tabulates the PCS scores at different reluctant thresholds. We can see that with the increasing of reluctant thresholds, the PCS scores of both AR and SR continue to increase. This means that reluctant answering behavior do have negative impact on performance-cost score. However, the PCS scores of SR are always lower than AR in all cases. It means that the selective acquisition strategy is more robust than the aggressive acquisition strategy when reluctant users exist.

Table 7. Comparing PCS scores at different reluctant thresholds.

5 Conclusion

In this paper, we study the problem of user interest acquisition in interactive course recommendation. Different from traditional course recommendation methods where user modeling is conducted by mining historic user behavior data, in this paper, we study a new paradigm where recommender systems can directly query user interests through survey tables or questionnaires. We describe the formal model of ICRF (Interactive Course Recommendation Framework) and propose the ICRF user interest acquisition algorithm to reduce performance-cost score. Specifically, we use k-means clustering to choose representative users for querying oracles and propose an interest propagation algorithm to deduce interests for not selected users. With extensive experiments on real-world MOOCs enrollment datasets, we empirically demonstrate that our selective acquisition strategy is very effective and it outperforms the traditional aggressive acquisition strategy by 30.25% in terms of performance-cost score.

In our future work, we plan to study the ICRF framework by reinforcement learning where interest acquisition actions will be decided by a policy agent that learns the optimal policy from the environment to maximize expected recommendation reward.