Keywords

1 Introduction

With the rapid development of information technology, more and more online course platforms are emerging one after another and anyone can choose courses on the online course platform to start learning. Therefore, the learning resources show an explosive growth, which makes it impossible for users to obtain the required information quickly and accurately in the massive course information and resources [1]. As a web-based course website for scholars, SCHOLAT course platform also encounters such problems. Recommendation system can effectively alleviate the problem of information overload by studying user data, making personalized recommendation and recommending courses that meet the user’s interest.

Collaborative filtering recommendation algorithm is one of the most widely used recommendation algorithms, mainly divided into memory-based collaborative filtering and model-based collaborative filtering [2]. David Goldberg et al. proposed the recommendation system Typestry based on collaborative filtering [3]. Paul Resnick et al. proposed GroupLens, an automated collaborative filtering recommendation system based on user ratings [4]. Therefore, collaborative filtering recommendation system is gradually applied to e-commerce and social networking sites. In real world scenarios, it is impossible for users to rate every item The online course website also has the problem of data sparsity.

In the absence of sufficient statistics, it is difficult for the memory-based approach to provide accurate predictions. The model-based method can effectively alleviate the cold start problem by training the recommendation model with existing user information [5]. The matrix decomposition recommendation algorithm is widely used as a model-based method [6, 7, 8] because of its simplicity and high accuracy. Probability matrix decomposition [9] has achieved good results in various application areas since it is proposed.

The use of social information in social networking sites for recommendation, that is, social recommendation, has become a hot topic in the field of recommendation systems. Researcher Li [10] introduced co-author information as the user’s associated information to improve the accuracy of recommendation. Researcher [11] use social tagging as a basis to solve the problem of recommendation. The course platform of SCHOLAT is different from the traditional course website. In addition to the elective function, it also has rich user social information and academic information. However, there is little research on using social information and academic information to recommend courses. In this paper, we will analyze the similarity of social information and the influence of users in social networks, and use the probability matrix decomposition method to recommend more accurate course for users.

2 Related Work

2.1 Similarity Computation of Nearest Neighbor Users

The methods of measuring user similarity mainly include cosine similarity, Pearson correlation coefficient and Euclidean distance. All three methods are based on the similarity computation of vectors, which is a strict match between object attributes.

  1. (1)

    Cosine similarity: User ratings are used as vectors in n-dimensional space, and user similarity is measured by the angle between the cosines of the two vectors. Given two vectors of attributes, the scores of user i and user j in n-dimensional space are vector i ⃗ and j ⃗ respectively, the cosine similarity between user i and user j is:

    $$ \cos \left( {\vec{i},\vec{j}} \right) = \frac{{\vec{i}\cdot\vec{j}}}{{\left\| {\vec{i}} \right\|\cdot\left\| {\vec{j}} \right\|}} $$
    (1)
  2. (2)

    Pearson correlation coefficient: Set \( X_{i} \) be the actual score of user X in item i, \( Y_{i} \) be the actual score of user Y in item i, \( \overline{X} \) be the average score of user X, and \( \overline{Y} \) be the average score of user Y, then user X and user Y’s Pearson correlation coefficient is:

    $$ {\text{r}} = \frac{{\mathop \sum \nolimits_{i = 1}^{n} \left( {X_{i} - \overline{X} } \right)\left( {Y_{i} - \overline{Y} } \right)}}{{\sqrt {\mathop \sum \nolimits_{i = 1}^{n} \left( {X_{i} - \overline{X} } \right)^{2} } \sqrt {\mathop \sum \nolimits_{i = 1}^{n} \left( {Y_{i} - \overline{Y} } \right)^{2} } }} $$
    (2)
  3. (3)

    Euclidean distance: Set two users \( {\text{X}} \) and \( {\text{Y}} \), \( X_{i} \) is the actual score of user \( {\text{X}} \) in item i, and \( Y_{i} \) is the actual score of user \( {\text{Y}} \) in item i, then the Euclidean distance of the two users is:

    $$ {\text{d}}\left( {{\text{X}},{\text{Y}}} \right) = \sqrt {\sum\nolimits_{i = 1}^{n} {\left( {X_{i} - Y_{i} } \right)}^{2} } $$
    (3)

2.2 Matrix Factorization

Because of its relatively low time and space complexity and high prediction accuracy, matrix factorization is widely used in recommender systems. The goal of matrix decomposition is to decompose the user-course elective matrix R into the user’s hidden information feature matrix \( {\text{U}} \in R^{M \times K} \) and the hidden information and the hidden information feature matrix of the course \( {\text{V}} \in R^{N \times K} \)(\( {\text{K}} \ll { \hbox{min} }\left( {{\text{M}},{\text{N}}} \right) \)), \( {\text{K}} \) is the dimension of the hidden information feature vectors. \( {\text{U}} \) and \( {\text{V}} \) are used to predict missing items in R, and the predicted values are compared for recommendation, that is:

$$ {\text{R}} \approx UV^{T} = \widehat{R} $$
(4)

In collaborative filtering recommendation algorithm, probabilistic matrix decomposition assumes the potential distribution of data, which makes up for the over-fitting problem of traditional matrix decomposition method and performs well in collaborative filtering algorithm.

3 Probability Matrix Factorization Algorithm for Course Recommendation System Fusing the Influence of Nearest Neighbor Users Based on Cloud Model

In order to reflect the influence of neighboring users on target users, this paper proposes a course recommendation system that integrates the influence of neighboring users and the decomposition of probability matrix.

3.1 Computation of User Similarity Based on Cloud Model

In this paper, a qualitative and quantitative transformation model named cloud model proposed by academician Deyi Li is used to compute the similarity between two users [12]. The cloud model uses \( E_{x} \) (expectation), \( E_{n} \) (entropy), He(superentropy) three parameters to characterize a fuzzy concept, so that the fuzziness of things and randomness organically combined. The model is defined as follows:

Definition: Set U be a fuzzy set and C is a qualitative concept on U. If the numerical a∈U is a random implementation on C, and A is a stable random with a tendency for the determinacy y:Y → [0,1] ∀a∈Y a → y(x). The distribution of all a in the fuzzy domain U can be called a cloud, and each numerical a in the cloud can be called a cloud droplet. Where \( E_{x} \) denotes the expectation that a distributes on the fuzzy field U, \( E_{n} \) denotes the measurability of C, and He denotes the uncertainty of entropy, which is determined by the randomness and fuzziness of entropy.

If the cloud-model between users is similar, the three-dimensional vector formed by controlling the parameter expectation, entropy, and super-entropy generated by the cloud droplet should also be similar. In this paper, the above three parameters are obtained by the reverse cloud algorithm, and the user’s three-dimensional vector is formed. The user’s similarity is computed by the user’s three-dimensional vector.

Let \( r_{1} ,r_{2} ,r_{3} \ldots r_{n - 2} ,r_{n - 1} ,r_{n} \) be cloud droplet (rating). The computation steps of the three parameters of \( E_{x} ,E_{n} \) and He for controlling cloud droplet generation are as follows:

  • Step1: Compute the sample average of the cloud droplet: \( \overline{r} = \frac{1}{n}\sum\nolimits_{i = 1}^{n} {r_{i} } \); Compute the first order absolute central moment of cloud droplet: \( \frac{1}{n}\sum\nolimits_{i = 1}^{n} {\left| {r_{i} - \overline{r} } \right|} \); Compute the variance of cloud droplet: \( S^{2} = \frac{1}{n - 1}\sum\nolimits_{i = 1}^{n} {(r_{i} - \overline{r} )}^{2} \).

  • Step2: Estimated value of \( {\text{E}}\widehat{x} = \overline{r} \);

  • Step3: Estimated value of \( {\text{E}}\widehat{n} = \sqrt {\frac{\pi }{2}} \times \frac{1}{n}\sum\nolimits_{i = 1}^{n} {\left| {r_{i} - {\text{E}}\widehat{x}} \right|} \);

  • Step4: Estimated value of \( {\text{H}}\widehat{e} = \sqrt {S^{2} - \frac{1}{3}{\text{E}}\widehat{n}} \).

Using the reverse cloud algorithm, the user preference is represented by three parameters of the cloud computed from the user’s original rating data. The cloud of three parameters of the user rating is called user characteristic vector and recorded as \( \vec{V} = \left( {Ex,En,He} \right) \), where \( {\text{Ex}} \) reflects the user’s comprehensive performance of each score. \( {\text{En}} \) reflects the user’s concentration in the various ratings, that is, the measures of dispersion. And \( {\text{He}} \) be the stability of the entropy. User similarity based on cloud model is defined as follows:

Definition: Set the characteristic vector of two users be \( \overrightarrow {{V_{i} }} \) and \( \overrightarrow {{V_{j} }} \), \( \overrightarrow {{V_{i} }} = \left( {Ex_{i} ,En_{i} ,He_{i} } \right) \), \( \overrightarrow {{V_{j} }} = \left( {Ex_{j} ,En_{j} ,He_{j} } \right) \). Their cosine angles are called the similarity of cloud i and cloud j.

$$ \cos \left( {\overrightarrow {{V_{i} }} ,\overrightarrow {{V_{j} }} } \right) = \frac{{\overrightarrow {{V_{i} }} \cdot\overrightarrow {{V_{j} }} }}{{\left\| {\overrightarrow {{V_{i} }} } \right\|\cdot\left\| {\overrightarrow {{V_{j} }} } \right\|}} $$
(5)

3.2 Recommendation System

PMF is a probabilistic linear model. The addition records of N course sets \( ({\text{I}} = \{ I_{j} |{\text{j}} = 1,2,3, \ldots {\text{N}}\} ) \) by M user sets \( ({\text{H}} = \{ H_{i} |{\text{i}} = 1,2,3, \ldots {\text{M}}\} ) \) constitute the observed scoring matrix R. \( R_{ij} \) denotes user i’s selection of course j, the missing item or 0 item in R means that this course is not added, and 1 means that the course has been added. The PMF model We assume that the conditional distribution of obey the Gaussian distribution, and the gradient descent method is used to correct the variables. To prevent overfitting, the regularization term is added and iteratively continues until the algorithm finally converges (Fig. 1).

Fig. 1.
figure 1

Structure of proposed recommendation model

We suppose that the hidden information matrix of students-courses obeys Gauss prior distribution:

$$ {\text{P}}\left( {{\text{U|}}\delta_{U}^{2} } \right) = \prod\nolimits_{i = 1}^{M} {N(U_{i} |0,\delta_{U}^{2} I)} $$
(6)
$$ {\text{P}}\left( {{\text{V|}}\delta_{V}^{2} } \right) = \prod\nolimits_{j = 1}^{N} {N(U_{j} |0,\delta_{V}^{2} I)} $$
(7)

We suppose that the acquired student-elective scoring matrix of conditional probability also obeys the Gaussian priori distribution:

$$ {\text{P}}\left( {{\text{R|U}},{\text{V}},\delta_{R}^{2} } \right) = \prod\nolimits_{i = 1}^{M} {\prod\nolimits_{j = 1}^{N} {[N(R_{ij} |g\left( {U_{i} V_{j}^{T} } \right),\delta_{R}^{2} )]} }^{{I_{ij}^{R} }} $$
(8)

Where I is the unit matrix, \( \delta_{U}^{2} \), \( \delta_{V}^{2} \) are the variances of \( {\text{U}} \) and \( {\text{V}} \) distributions respectively. \( I_{ij}^{R} \) is the indicator function, and if the user \( U_{i} \) does not join the course \( V_{j} \), then its value is 0, and if it is joined, its value is 1. \( {\text{g}}\left( {\text{x}} \right) \) maps the value of \( U_{i} V_{j}^{T} \) to the interval [0,1]. In this paper, we use \( {\text{g}}\left( {\text{x}} \right) = \frac{{x - x_{min} }}{{x_{max} - x_{min} }} \), where \( {\text{x}} \) is the original data, \( x_{max} \) is the maximum of the original data set, \( x_{min} \) is the minimum of the original data set.

The posterior distribution function of the feature matrix can be found by Bayesian theorem:

$$ {\text{P}}\left( {{\text{U}},{\text{V|R}},\delta_{R}^{2} ,\delta_{U}^{2} ,\delta_{V}^{2} } \right) \propto {\text{P}}\left( {{\text{R|U}},{\text{V}},\delta_{R}^{2} } \right) \times {\text{P}}\left( {{\text{U|}}\delta_{U}^{2} } \right) \times {\text{P}}({\text{V}}|\delta_{V}^{2} ) $$
(9)

Logarithmic processing of the above formula:

$$ \begin{array}{*{20}l} {{\text{lnp}}\left( {U,V|R,\delta_{R}^{2} ,\delta_{U}^{2} ,\delta_{V}^{2} } \right) = - \frac{1}{{2\delta_{R}^{2} }}\sum\nolimits_{i = 1}^{M} {\sum\nolimits_{j = 1}^{N} {I_{ij}^{R} \left( {R_{ij} - g\left( {U_{i} V_{j}^{T} } \right)} \right)} }^{2} - \frac{1}{{2\delta_{U}^{2} }}\sum\nolimits_{i = 1}^{M} {U_{i} U_{i}^{T} } } \hfill \\ { - \frac{1}{{2\delta_{V}^{2} }}\sum\nolimits_{j = 1}^{N} {V_{i} V_{j}^{T} } - \frac{1}{2}\left( {\left( {\sum\nolimits_{i = 1}^{M} {\sum\nolimits_{j = 1}^{N} {I_{ij} } } } \right){ \ln }\delta_{R}^{2} + {\text{MKln}}\delta_{U}^{2} + {\text{NKln}}\delta_{V}^{2} } \right) + {\text{C}}} \hfill \\ \end{array} $$
(10)

Maximizing the logarithm of a posteriori is equivalent to minimizing its following objective function:

$$ {\text{f}} = \frac{1}{{2\delta_{R}^{2} }}\sum\nolimits_{i = 1}^{M} {\sum\nolimits_{j = 1}^{N} {I_{ij} \left( {R_{ij} - g\left( {U_{i} V_{j}^{T} } \right)} \right)} }^{2} + \frac{{\lambda_{U} }}{2}\sum\nolimits_{i = 1}^{M} {U_{i} U_{i}^{T} } + \frac{{\lambda_{V} }}{2}\sum\nolimits_{j = 1}^{N} {V_{j} V_{j}^{T} } $$
(11)

In the above formula, \( \lambda_{U} = \frac{{\delta_{R}^{2} }}{{\delta_{U}^{2} }} \), \( \lambda_{V} = \frac{{\delta_{R}^{2} }}{{\delta_{V}^{2} }} \).

The cloud model is used to compute the similarity between users, and a collection of neighboring users of each user is obtained. The influence of users is computed according to their academic information, academic degree, number of papers, number of projects, number of patents, number of works, etc., and the influence of nearby users is filled into the influence matrix \( {\text{D}} \) of nearby users. \( D_{i} \in \left[ {1,2} \right] \) be the rating of the user \( U_{i} \) in the influence matrix. If the user \( U_{j} \) is the user \( U_{i} \) neighbor user, \( D_{ij} \) equals the user’s influence score, when the user \( U_{j} \) is not the user \( U_{i} \) neighbor user, \( D_{ij} \) equals 1. The rating contributions of different users were determined according to the influence of these users on target users. The more influential the users, the greater the proportion of the rating prediction of target users.

We use gradient descent method, using \( U_{i} \) and \( V_{j} \) as parameters. To reduce the computational complexity, \( \lambda_{U} = \lambda_{V} =\uplambda \). \( U_{i} \), \( V_{j} \) iterate over each iteration.\( U_{i} \leftarrow \left( {U_{i} - \gamma \cdot \frac{\partial f}{{\partial U_{i} }}} \right)D_{i} \). Weighted control of each iteration by multiplying a similar user influence matrix. Where \( \upgamma \) be the threshold. We get the hidden information characteristic matrix \( {\text{U}} \) and \( {\text{V}} \), and then we can predict the reconstructed score matrix \( \widehat{R} \) based on the eigenvector.

4 Experimental Setup and Analysis

4.1 Data Set

In this paper, a comprehensive teaching and scientific research collaboration platform developed by SCHOLAT research is used as an experiment. We use the data set provided by the SCHOLAT until March 16, 2018. A total of 43905 users and 1698 courses were recorded. User information included number of papers published, number of applications, landing score, dynamic score, academic score, academic degree, patents, number of teams joined, number of courses joined, number of friends joined, number of praise points, etc.

4.2 Evaluation Measures

In this paper, the root-mean-square error (RMSE) is used as the measure standard. The smaller the value of the RMSE, the higher the accuracy of the algorithm.

$$ {\text{RMSE}} = \sqrt {\frac{1}{n}\sum\nolimits_{i \in m,j \in n} {\left( {r_{ij} - \widehat{{r_{ij} }}} \right)}^{2} } $$
(12)

Where \( r_{ij} \) is real relationship between the user and the course selection, \( \widehat{{r_{ij} }} \) is the predictive relationship between the user and the course in the recommendation system.

4.3 Experimental Results and Analysis

We compare the proposed recommendation model with the traditional probability matrix decomposition method, the nearest neighbor user influence decomposition method using cosine similarity, the nearest neighbor user influence decomposition method using Pearson similarity, and the RMSE value of the nearest neighbor user influence decomposition method using Euclidean distance.

As can be seen from the above Fig. 2, the number of neighbors and λ of the parameters are both the best parameters. As the dimension K of hidden feature vector increases, the accuracy of all algorithm recommendations is improved. On the other hand, the problem of overfitting and increasing computational complexity may arise. In each dimension, the proposed algorithm is superior to the traditional PMF algorithm and other similar competition algorithms.

Fig. 2.
figure 2

The RMSE value polygon diagram of different algorithms when K takes different values

As the number of nearby users increases, the recommendation results gradually improve. RMSE values tend to stabilize after the number of nearby users exceeds 6. As the number of neighbor users increases, the algorithm will consume more time. Therefore, we choose neighbor users as 6 (Fig. 3).

Fig. 3.
figure 3

The RMSE value of the algorithm in different neighborhood user numbers

From the above Fig. 4, we can see that the RMSE value is the smallest when λ = 0.01, and the precision of the algorithm begins to decrease as lambda continues to increase, so select λ = 0.01 as the lambda parameter value of this recommendation system.

Fig. 4.
figure 4

The RMSE value of the algorithm in different λ

5 Conclusion

Aiming at the problem of information overload on the online course website, this paper proposed a course recommendation system that integrates the influence of neighboring users and the decomposition of probability matrix by studying the SCHOLAT course data of the new course platform, and provides the course recommendation for the system users. Experiments show that this method can provide better and more accurate course recommendation for users than the traditional probability matrix decomposition method, effectively improve the efficiency of users’ inquiry course, and thus provide better services for users.

Although this paper combines social information, it does not consider the time sequence of social information. In future research, the time stamp mechanism can be considered to fully analyze the influence time and scope of social information in different time sequence, so as to obtain the recommended results of time information.