Keywords

1 Introduction

The emergence and rapid development of Internet have greatly affected the traditional viewpoint on choosing courses by providing detailed course information. As the number of courses conforming to the students’ has tremendously increased, the above-mentioned problem has become how to determine the courses mostly suitable for the students accurately and efficiently. A plethora of methods and algorithms [2, 3, 11, 15] for course recommendation have been proposed to deal with this problem. Most of the methods designed for recommendation system can be grouped into three categories, including collaborative [1, 8], content-based [7, 14], and knowledge-based [5, 8, 17], which have been applied in different fields such as [4] proposed a collaborative filtering embedded with an artificial immune system to the course recommendation for college students. The rating from professor was exploited as ground truth to examine the results.

Inspired by the idea form [4] and the optimization framework in [9], we propose a sparse linear based method for top-N course recommendation with expert knowledge as the ground truth. This method extracts the coefficient matrix for the courses in the recommender system from the student/course matrix by solving a regularized optimization problem. The sparseness is exploited to represent the sparse characteristics of recommendation coefficient matrix. Sparse linear method (SLIM) [9] was proposed to top-N recommender systems, which is rarely exploited in course recommender systems. Due to the characteristics of course recommendation system in Chinese University, our method focuses on the accuracy more than the efficiency. It is different form the previously proposed SLIM based methods [6, 9, 10, 18], which mainly addresses the real-time applications of top-N recommender systems. The framework of our proposed course recommender system is shown in Fig. 1.

Fig. 1.
figure 1

The framework of our proposed course recommender system

According to our observation about common recommendation system matrix, most of the entries are assigned the same value (zero or one), and the gradients of neighboring entries also hold the same value (zero or one). Therefore, the sparse counting strategy of L 0 regularization terms [16] were included into the optimization framework of SLIM. The L 0 terms can globally constrain the non-zero values of entries and the gradients in the recommendation system matrix, which is the main contribution of our proposed method. Different from the previously proposed regularization terms (the L 1 and L 2 terms), the L 0 term can maintain the subtle relationship between the entries in recommendation system matrix.

After the process of data gathering as shown in Fig. 1, comparing experiments between state-of-the-art methods and our method are conducted. Consequently, both the experimental results of state-of-the-art methods and our method are evaluated with the course recommendations presented by seven experts with voting strategy.

The rest of the paper is organized as follows. In Sect. 2, we describe the details of our proposed method. In Sect. 3 the dataset that we used in our experiments and the experimental results are presented. In Sect. 4 the discussion and conclusion are given.

2 Our Method

2.1 The Formation of the Method

In the following content, t j and s i are introduced to denote each course and each student in course recommender system, respectively. The whole student-course taken will be represented by a matrix A of size \( m \times n \), in which the entry is 1 or 0 (1 denotes that the student has taken the course, 0 vice versa).

In this paper, we introduce a Sparse Linear Method (SLIM) to implement top-N course recommendation. In this approach, the score of course recommendation on each un-taken student/course item t j of a student s i is computed as a sparse aggregation of items that have been taken by s i , which is shown in Eq. (1).

$$ \overline{{a_{ij} }} = a_{i}^{T} w_{j} $$
(1)

where \( \bar{a} \) is the initial course selection of a specific student and \( w_{j} \) is the sparse vector of aggregation coefficients. The model of SLIM with matrix is represented as:

$$ \bar{A} = AW $$
(2)

Where overlineA is the initial value of student/course matrix, A denotes the latent binary student-course item matrix, W denotes the \( n \times n \) sparse matrix of aggregation coefficients, in which \( j - th \) column corresponds to w j as in Eq. (1), and each row of \( C(c_{i} ) \) is the course recommendation scores on all courses for student s i . The final course recommendation result of each student is completed through sorting the non-taken courses in decreasing order, and the top-N courses in the sequences are recommended.

In our method, the initial student/course matrix is extracted from the learning management system of a specific University in China. With the extracted student/course matrix of size \( m \times n \), the sparse matrix W size of \( n \times n \) in Eq. (2) is iteratively optimized by alternate minimization method. Different from the objective function previously proposed in [9] shown in Eq. (3), our proposed method is shown in Eq. (4).

$$ \mathop {\hbox{min} }\limits_{W \ge 0} \frac{1}{2}\left\| {A - AW\left\| {_{2}^{2} } \right.} \right. + \frac{{\beta_{1} }}{2}\left| {W\left\| {_{F}^{2} } \right.} \right. + \lambda_{1} \left\| {A\left\| {_{1} } \right.} \right. $$
(3)
$$ \mathop {\hbox{min} }\limits_{W \ge 0} \frac{1}{2}\left\| {A - AW\left\| {_{2}^{2} } \right.} \right. + \frac{{\beta_{2} }}{2}\left| {W\left\| {_{F}^{2} } \right.} \right. + \lambda_{2} \left\| {A\left\| {_{0} } \right.} \right. + \mu \left| {\nabla A} \right|_{0} $$
(4)

Where \( \left\| . \right\|_{F} \) denotes the Frobenius norm for matrix, \( \left\| W \right\|_{1} \) is the item-wise L 1 norm, \( \left\| W \right\|_{0} \) denotes the entry-wise L 0 norm that stands for the number of entries with zero value. The data term \( \left\| {A - AW} \right\| \) is exploited to measure the difference between the calculated model and the training dataset. The \( L_{F} - norm \), \( L_{1} - norm \), and \( L_{0} - norm \) are exploited to regularize the entries of the coefficient matrix W, A, and \( \nabla A \), respectively. The parameters \( \beta_{1} ,\beta_{2} ,\lambda_{2} \), and \( \mu \) are used to constrain the weights of regularization terms in the objective functions.

In our proposed final objective function, the L F norm is introduced to transfer the optimization problem into elastic net problem [19], which prevents the potential over fitting. Moreover, the L 1 norm in Eq. (3) is changed to L 0 norm in our proposed objective function. This novel norm L 0 [12, 13, 16] is introduced to constrain the sparseness of the A and \( \nabla A \).

Due to the independency of the columns in matrix W, the final objective function in Eq. (4) is decoupled into a set of objective functions as follows:

$$ \mathop {\hbox{min} }\limits_{{w_{j} \ge 0,w_{j} ,j = 0}} \frac{1}{2}\left\| {a_{j} - a_{j} w_{j} } \right\|_{2}^{2} + \frac{{\beta_{2} }}{2}\left| {w_{j} } \right.\left\| {_{F}^{2} } \right. + \lambda_{2} \left\| {a_{j} } \right\|_{0} + \mu \left| {\nabla a_{j} } \right|_{0} $$
(5)

where \( a_{j} \) is the j-th column of matrix A, \( w_{j} \) denotes j-th column of matrix W. As there are two unknown variables in each Eq. (5), which is a typical ill-posed problem. Thus, this problem need to be solved by alternate minimization method. In each iteration, one of the two variables is fixed and the other variable is optimized.

2.2 The Solver of Our Proposed Method

Subproblem1: computing \( w_{j} \)

The \( w_{j} \) computation sub-problem is represented by the minimization of Eq. (6):

$$ \frac{1}{2}\left\| {a_{j} - a_{j} w_{j} } \right\|_{2}^{2} + \frac{{\beta_{2} }}{2}\left\| {w_{j} } \right\|_{F}^{2} $$
(6)

Through eliminating the L 0 terms in Eq. (5), the function Eq. (6) has a global minimum, which can be computed by gradient descent. The analytical solution to Eq. (6) is shown in Eq. (7):

$$ w_{j} = F^{ - 1} \left( {\frac{{F\left( {a_{j} } \right)}}{{F\left( {a_{j} } \right) + \frac{{\beta_{2} }}{2}\left( {F\left( {\partial_{x} } \right)^{ * } \cdot F\left( {\partial_{x} } \right) + F\left( {\partial_{y} } \right)^{ * } \cdot F\left( {\partial_{y} } \right)} \right)}}} \right) $$
(7)

where \( F\left( . \right) \) and \( F^{ - 1} \left( \cdot \right) \) denotes the Fast Fourier Transform (FFT) and reverse FFT, respectively \( .F\left( {} \right)^{ * } \) is the complex conjugate of \( F\left( \cdot \right) \).

Sub-problem 2: computing \( a_{j} \) and \( \nabla a_{j} \)

With the intermediate outcome of \( w_{j} \), the \( a_{j} \) and \( \nabla a_{j} \) can be computed by Eq. (8):

$$ \frac{1}{2}\left\| {a_{j} - a_{j} w_{j} } \right\|_{2}^{2} + \lambda_{2} \left\| {a_{j} } \right\|_{0} + \mu \left\| {\nabla a_{j} } \right\|_{0} $$
(8)

By introducing two auxiliary variables h and v corresponding to the column vector \( a_{j} \) and \( \nabla a_{j} \). The sub-problem can be transformed into Eq. (9):

$$ \frac{1}{2}\left\| {a_{j} - a_{j} w_{j} } \right\|_{2}^{2} + \lambda_{2} \left\| {a_{j} - h} \right\|_{2}^{2} + \mu \left\| {\nabla a_{j} - v} \right\|_{2}^{2} + \lambda \left( {\left\| h \right\|_{0} + \left\| v \right\|_{0} } \right) $$
(9)

To testify the performance of our proposed method, comparing experiments between state-of-the-art methods and our method are carried out with gathered dataset and expert knowledge. In the following section, the experiments are described in detail.

3 Experimental Results

3.1 Datasets

In order to testify the performance of our proposed method and implement the method in practical scenarios, we gather the data from five classes of information management specialty for the learning management system of our University. The data records of the courses and students were extracted from the Department of Management Information System, Shandong University of Finance and Economics and the Department of Electronic Engineering Information Technology at Shandong University of Sci&Tech. The most important information of the courses and students is mainly about the grades corresponding to the courses. All of the students from the information management specialty are freshmen in our University. Most of them have taken the courses of the first year in their curriculum except three students have failed to go up to the next grade. Thus, firstly we eliminate the records of the three students. Meanwhile, we collect the knowledge including the programming skill that they have mastered through a questionnaire. The courses that they have taken and the content that have grasped are combined in the final datset. A part of the dataset is shown in Table 1, where 1 denotes that the s i student has mastered the t j course, and 0 denotes the opposite.

Table 1. The initial dataset from the five classes

After gathering the data of the students from the five classes, comparing experiments between state-of-the-art methods and our method are conducted. We choose several state-of-the-art methods including collaborative filtering methods itermkNN, userkNN, and the matrix factorization methods PureSVD.

3.2 Measurement

The knowledge from several experts on the courses in information management specialty are adopted as ground truth in the experimental process. To measure the performance of the comparing methods, we introduce the Hit Rate (HR) and the Average Reciprocal Hit-Rank (ARHR) in the experiments, which are defined as shown in Eqs. (11) and (12).

$$ HR = \frac{\# hits}{\# students} $$
(11)

where #hits denotes the number of students whose course in the testing set is recommended by the expert, too. #students denotes the number of all students in the dataset.

$$ ARHR = \frac{ 1}{\# students}\sum\limits_{i = 1}^{\# hits} {\frac{1}{{p_{i} }}} $$
(12)

Where \( p_{i} \) is the ordered recommendation list.

3.3 Experimental Results

In this section, the experimental results calculated from the practical dataset. Table 2 shows the experimental results of the comparing methods in top-N course recommendation.

Table 2. The performance of the comparing methods

Where HR i , ARHR i denotes the performance for class i , respectively. The experimental results shown in Table 2 demonstrate that our proposed method outperforms state-of-the-art methods in most of course recommendations both in the HR and ARHR. It shows that the sparse regularization term based on the prior knowledge from the observation in our method are suitable for solving the problem of course recommendation.

In order to illustrate the performance of our proposed method according to the number of courses and topics included in the experimental testing. It shows in Fig. 2 that a higher accuracy is obtained when the number of courses increases. Meanwhile, the courses included in our experiments are divided into 32 different topics, Fig. 3 shows that the accuracy is also higher when there are more relative courses.

Fig. 2.
figure 2

Accuracy of our proposed recommendation system method due to the number of courses

Fig. 3.
figure 3

Accuracy of our proposed recommendation system method due to the number of topics

4 Conclusion

In this paper, we propose an approach of course recommendation. In our method, the SLIM was introduced and a novel L 0 regularization term was exploited in SLIM. Meanwhile, the alternate minimization strategy is exploited to optimize the outcome of our method. To testify the performance of our method, comparing experiments on students from five different classes between state-of-the-art methods and our method are conducted. The experimental results show that our method outperforms the other previously proposed methods.

The proposed method was be mainly used to implement the course recommendation for the Universities in China. However, it also can b exploited in other relative fields. In the future, more applications of our approach would be investigated. Other future work includes the modification of the objective function in our method including the other regularization terms and different optimization strategy.