Probability Matrix Factorization Algorithm for Course Recommendation System Fusing the Influence of Nearest Neighbor Users Based on Cloud Model

Li, Jianguo; Chang, Chao; Yang, Zuoxi; Fu, Hailin; Tang, Yong

doi:10.1007/978-3-030-15127-0_49

Jianguo Li¹⁷,
Chao Chang¹⁷,
Zuoxi Yang¹⁷,
Hailin Fu¹⁷ &
…
Yong Tang¹⁷

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11354))

Included in the following conference series:

International Conference on Human Centered Computing

1214 Accesses
5 Citations

Abstract

With the explosion of data on the online course website, getting the required course information quickly and accurately becomes more and more difficult. In this paper, probability matrix factorization algorithm for course recommendation system fusing the influence of nearest neighbor users based on cloud model is proposed. The proposed algorithm uses the cloud model to compute user similarity and integrates social information into the course recommendation. The experimental results show that the algorithm can improve the accuracy of course recommendation effectively.

Access provided by Autonomous University of Puebla. Download conference paper PDF

Critical Review on Course Recommendation System with Various Similarities

Personalized Recommendation Method of Economics Online Teaching Curriculum Resources Based on Fuzzy Analytic Hierarchy Process

User-Based Cloud Service Recommendation System

Keywords

1 Introduction

With the rapid development of information technology, more and more online course platforms are emerging one after another and anyone can choose courses on the online course platform to start learning. Therefore, the learning resources show an explosive growth, which makes it impossible for users to obtain the required information quickly and accurately in the massive course information and resources [1]. As a web-based course website for scholars, SCHOLAT course platform also encounters such problems. Recommendation system can effectively alleviate the problem of information overload by studying user data, making personalized recommendation and recommending courses that meet the user’s interest.

Collaborative filtering recommendation algorithm is one of the most widely used recommendation algorithms, mainly divided into memory-based collaborative filtering and model-based collaborative filtering [2]. David Goldberg et al. proposed the recommendation system Typestry based on collaborative filtering [3]. Paul Resnick et al. proposed GroupLens, an automated collaborative filtering recommendation system based on user ratings [4]. Therefore, collaborative filtering recommendation system is gradually applied to e-commerce and social networking sites. In real world scenarios, it is impossible for users to rate every item The online course website also has the problem of data sparsity.

In the absence of sufficient statistics, it is difficult for the memory-based approach to provide accurate predictions. The model-based method can effectively alleviate the cold start problem by training the recommendation model with existing user information [5]. The matrix decomposition recommendation algorithm is widely used as a model-based method [6, 7, 8] because of its simplicity and high accuracy. Probability matrix decomposition [9] has achieved good results in various application areas since it is proposed.

The use of social information in social networking sites for recommendation, that is, social recommendation, has become a hot topic in the field of recommendation systems. Researcher Li [10] introduced co-author information as the user’s associated information to improve the accuracy of recommendation. Researcher [11] use social tagging as a basis to solve the problem of recommendation. The course platform of SCHOLAT is different from the traditional course website. In addition to the elective function, it also has rich user social information and academic information. However, there is little research on using social information and academic information to recommend courses. In this paper, we will analyze the similarity of social information and the influence of users in social networks, and use the probability matrix decomposition method to recommend more accurate course for users.

2 Related Work

2.1 Similarity Computation of Nearest Neighbor Users

The methods of measuring user similarity mainly include cosine similarity, Pearson correlation coefficient and Euclidean distance. All three methods are based on the similarity computation of vectors, which is a strict match between object attributes.

(1)
Cosine similarity: User ratings are used as vectors in n-dimensional space, and user similarity is measured by the angle between the cosines of the two vectors. Given two vectors of attributes, the scores of user i and user j in n-dimensional space are vector i ⃗ and j ⃗ respectively, the cosine similarity between user i and user j is:
$$ \cos \left( {\vec{i},\vec{j}} \right) = \frac{{\vec{i}\cdot\vec{j}}}{{\left\| {\vec{i}} \right\|\cdot\left\| {\vec{j}} \right\|}} $$
(1)
(2)
Pearson correlation coefficient: Set $ X_{i} $ be the actual score of user X in item i, $ Y_{i} $ be the actual score of user Y in item i, $ \overline{X} $ be the average score of user X, and $ \overline{Y} $ be the average score of user Y, then user X and user Y’s Pearson correlation coefficient is:
$$ {\text{r}} = \frac{{\mathop \sum \nolimits_{i = 1}^{n} \left( {X_{i} - \overline{X} } \right)\left( {Y_{i} - \overline{Y} } \right)}}{{\sqrt {\mathop \sum \nolimits_{i = 1}^{n} \left( {X_{i} - \overline{X} } \right)^{2} } \sqrt {\mathop \sum \nolimits_{i = 1}^{n} \left( {Y_{i} - \overline{Y} } \right)^{2} } }} $$
(2)
(3)
Euclidean distance: Set two users $ {\text{X}} $ and $ {\text{Y}} $, $ X_{i} $ is the actual score of user $ {\text{X}} $ in item i, and $ Y_{i} $ is the actual score of user $ {\text{Y}} $ in item i, then the Euclidean distance of the two users is:
$$ {\text{d}}\left( {{\text{X}},{\text{Y}}} \right) = \sqrt {\sum\nolimits_{i = 1}^{n} {\left( {X_{i} - Y_{i} } \right)}^{2} } $$
(3)

2.2 Matrix Factorization

Because of its relatively low time and space complexity and high prediction accuracy, matrix factorization is widely used in recommender systems. The goal of matrix decomposition is to decompose the user-course elective matrix R into the user’s hidden information feature matrix $ {\text{U}} \in R^{M \times K} $ and the hidden information and the hidden information feature matrix of the course $ {\text{V}} \in R^{N \times K} $($ {\text{K}} \ll { \hbox{min} }\left( {{\text{M}},{\text{N}}} \right) $), $ {\text{K}} $ is the dimension of the hidden information feature vectors. $ {\text{U}} $ and $ {\text{V}} $ are used to predict missing items in R, and the predicted values are compared for recommendation, that is:

$$ {\text{R}} \approx UV^{T} = \widehat{R} $$

(4)

In collaborative filtering recommendation algorithm, probabilistic matrix decomposition assumes the potential distribution of data, which makes up for the over-fitting problem of traditional matrix decomposition method and performs well in collaborative filtering algorithm.

3 Probability Matrix Factorization Algorithm for Course Recommendation System Fusing the Influence of Nearest Neighbor Users Based on Cloud Model

In order to reflect the influence of neighboring users on target users, this paper proposes a course recommendation system that integrates the influence of neighboring users and the decomposition of probability matrix.

3.1 Computation of User Similarity Based on Cloud Model

In this paper, a qualitative and quantitative transformation model named cloud model proposed by academician Deyi Li is used to compute the similarity between two users [12]. The cloud model uses $ E_{x} $ (expectation), $ E_{n} $ (entropy), He(superentropy) three parameters to characterize a fuzzy concept, so that the fuzziness of things and randomness organically combined. The model is defined as follows:

Definition: Set U be a fuzzy set and C is a qualitative concept on U. If the numerical a∈U is a random implementation on C, and A is a stable random with a tendency for the determinacy y:Y → [0,1] ∀a∈Y a → y(x). The distribution of all a in the fuzzy domain U can be called a cloud, and each numerical a in the cloud can be called a cloud droplet. Where $ E_{x} $ denotes the expectation that a distributes on the fuzzy field U, $ E_{n} $ denotes the measurability of C, and He denotes the uncertainty of entropy, which is determined by the randomness and fuzziness of entropy.

If the cloud-model between users is similar, the three-dimensional vector formed by controlling the parameter expectation, entropy, and super-entropy generated by the cloud droplet should also be similar. In this paper, the above three parameters are obtained by the reverse cloud algorithm, and the user’s three-dimensional vector is formed. The user’s similarity is computed by the user’s three-dimensional vector.

Let $ r_{1} ,r_{2} ,r_{3} \ldots r_{n - 2} ,r_{n - 1} ,r_{n} $ be cloud droplet (rating). The computation steps of the three parameters of $ E_{x} ,E_{n} $ and He for controlling cloud droplet generation are as follows:

Step1: Compute the sample average of the cloud droplet: $ \overline{r} = \frac{1}{n}\sum\nolimits_{i = 1}^{n} {r_{i} } $; Compute the first order absolute central moment of cloud droplet: $ \frac{1}{n}\sum\nolimits_{i = 1}^{n} {\left| {r_{i} - \overline{r} } \right|} $; Compute the variance of cloud droplet: $ S^{2} = \frac{1}{n - 1}\sum\nolimits_{i = 1}^{n} {(r_{i} - \overline{r} )}^{2} $.
Step2: Estimated value of $ {\text{E}}\widehat{x} = \overline{r} $;
Step3: Estimated value of $ {\text{E}}\widehat{n} = \sqrt {\frac{\pi }{2}} \times \frac{1}{n}\sum\nolimits_{i = 1}^{n} {\left| {r_{i} - {\text{E}}\widehat{x}} \right|} $;
Step4: Estimated value of $ {\text{H}}\widehat{e} = \sqrt {S^{2} - \frac{1}{3}{\text{E}}\widehat{n}} $.

Using the reverse cloud algorithm, the user preference is represented by three parameters of the cloud computed from the user’s original rating data. The cloud of three parameters of the user rating is called user characteristic vector and recorded as $ \vec{V} = \left( {Ex,En,He} \right) $, where $ {\text{Ex}} $ reflects the user’s comprehensive performance of each score. $ {\text{En}} $ reflects the user’s concentration in the various ratings, that is, the measures of dispersion. And $ {\text{He}} $ be the stability of the entropy. User similarity based on cloud model is defined as follows:

Definition: Set the characteristic vector of two users be $ \overrightarrow {{V_{i} }} $ and $ \overrightarrow {{V_{j} }} $, $ \overrightarrow {{V_{i} }} = \left( {Ex_{i} ,En_{i} ,He_{i} } \right) $, $ \overrightarrow {{V_{j} }} = \left( {Ex_{j} ,En_{j} ,He_{j} } \right) $. Their cosine angles are called the similarity of cloud i and cloud j.

$$ \cos \left( {\overrightarrow {{V_{i} }} ,\overrightarrow {{V_{j} }} } \right) = \frac{{\overrightarrow {{V_{i} }} \cdot\overrightarrow {{V_{j} }} }}{{\left\| {\overrightarrow {{V_{i} }} } \right\|\cdot\left\| {\overrightarrow {{V_{j} }} } \right\|}} $$

(5)

3.2 Recommendation System

PMF is a probabilistic linear model. The addition records of N course sets $ ({\text{I}} = \{ I_{j} |{\text{j}} = 1,2,3, \ldots {\text{N}}\} ) $ by M user sets $ ({\text{H}} = \{ H_{i} |{\text{i}} = 1,2,3, \ldots {\text{M}}\} ) $ constitute the observed scoring matrix R. $ R_{ij} $ denotes user i’s selection of course j, the missing item or 0 item in R means that this course is not added, and 1 means that the course has been added. The PMF model We assume that the conditional distribution of obey the Gaussian distribution, and the gradient descent method is used to correct the variables. To prevent overfitting, the regularization term is added and iteratively continues until the algorithm finally converges (Fig. 1).

We suppose that the hidden information matrix of students-courses obeys Gauss prior distribution:

$$ {\text{P}}\left( {{\text{U|}}\delta_{U}^{2} } \right) = \prod\nolimits_{i = 1}^{M} {N(U_{i} |0,\delta_{U}^{2} I)} $$

(6)

$$ {\text{P}}\left( {{\text{V|}}\delta_{V}^{2} } \right) = \prod\nolimits_{j = 1}^{N} {N(U_{j} |0,\delta_{V}^{2} I)} $$

(7)

We suppose that the acquired student-elective scoring matrix of conditional probability also obeys the Gaussian priori distribution:

$$ {\text{P}}\left( {{\text{R|U}},{\text{V}},\delta_{R}^{2} } \right) = \prod\nolimits_{i = 1}^{M} {\prod\nolimits_{j = 1}^{N} {[N(R_{ij} |g\left( {U_{i} V_{j}^{T} } \right),\delta_{R}^{2} )]} }^{{I_{ij}^{R} }} $$

(8)

Where I is the unit matrix, $ \delta_{U}^{2} $, $ \delta_{V}^{2} $ are the variances of $ {\text{U}} $ and $ {\text{V}} $ distributions respectively. $ I_{ij}^{R} $ is the indicator function, and if the user $ U_{i} $ does not join the course $ V_{j} $, then its value is 0, and if it is joined, its value is 1. $ {\text{g}}\left( {\text{x}} \right) $ maps the value of $ U_{i} V_{j}^{T} $ to the interval [0,1]. In this paper, we use $ {\text{g}}\left( {\text{x}} \right) = \frac{{x - x_{min} }}{{x_{max} - x_{min} }} $, where $ {\text{x}} $ is the original data, $ x_{max} $ is the maximum of the original data set, $ x_{min} $ is the minimum of the original data set.

The posterior distribution function of the feature matrix can be found by Bayesian theorem:

$$ {\text{P}}\left( {{\text{U}},{\text{V|R}},\delta_{R}^{2} ,\delta_{U}^{2} ,\delta_{V}^{2} } \right) \propto {\text{P}}\left( {{\text{R|U}},{\text{V}},\delta_{R}^{2} } \right) \times {\text{P}}\left( {{\text{U|}}\delta_{U}^{2} } \right) \times {\text{P}}({\text{V}}|\delta_{V}^{2} ) $$

(9)

Logarithmic processing of the above formula:

$$ \begin{array}{*{20}l} {{\text{lnp}}\left( {U,V|R,\delta_{R}^{2} ,\delta_{U}^{2} ,\delta_{V}^{2} } \right) = - \frac{1}{{2\delta_{R}^{2} }}\sum\nolimits_{i = 1}^{M} {\sum\nolimits_{j = 1}^{N} {I_{ij}^{R} \left( {R_{ij} - g\left( {U_{i} V_{j}^{T} } \right)} \right)} }^{2} - \frac{1}{{2\delta_{U}^{2} }}\sum\nolimits_{i = 1}^{M} {U_{i} U_{i}^{T} } } \hfill \\ { - \frac{1}{{2\delta_{V}^{2} }}\sum\nolimits_{j = 1}^{N} {V_{i} V_{j}^{T} } - \frac{1}{2}\left( {\left( {\sum\nolimits_{i = 1}^{M} {\sum\nolimits_{j = 1}^{N} {I_{ij} } } } \right){ \ln }\delta_{R}^{2} + {\text{MKln}}\delta_{U}^{2} + {\text{NKln}}\delta_{V}^{2} } \right) + {\text{C}}} \hfill \\ \end{array} $$

(10)

Maximizing the logarithm of a posteriori is equivalent to minimizing its following objective function:

$$ {\text{f}} = \frac{1}{{2\delta_{R}^{2} }}\sum\nolimits_{i = 1}^{M} {\sum\nolimits_{j = 1}^{N} {I_{ij} \left( {R_{ij} - g\left( {U_{i} V_{j}^{T} } \right)} \right)} }^{2} + \frac{{\lambda_{U} }}{2}\sum\nolimits_{i = 1}^{M} {U_{i} U_{i}^{T} } + \frac{{\lambda_{V} }}{2}\sum\nolimits_{j = 1}^{N} {V_{j} V_{j}^{T} } $$

(11)

In the above formula, $ \lambda_{U} = \frac{{\delta_{R}^{2} }}{{\delta_{U}^{2} }} $, $ \lambda_{V} = \frac{{\delta_{R}^{2} }}{{\delta_{V}^{2} }} $.

The cloud model is used to compute the similarity between users, and a collection of neighboring users of each user is obtained. The influence of users is computed according to their academic information, academic degree, number of papers, number of projects, number of patents, number of works, etc., and the influence of nearby users is filled into the influence matrix $ {\text{D}} $ of nearby users. $ D_{i} \in \left[ {1,2} \right] $ be the rating of the user $ U_{i} $ in the influence matrix. If the user $ U_{j} $ is the user $ U_{i} $ neighbor user, $ D_{ij} $ equals the user’s influence score, when the user $ U_{j} $ is not the user $ U_{i} $ neighbor user, $ D_{ij} $ equals 1. The rating contributions of different users were determined according to the influence of these users on target users. The more influential the users, the greater the proportion of the rating prediction of target users.

We use gradient descent method, using $ U_{i} $ and $ V_{j} $ as parameters. To reduce the computational complexity, $ \lambda_{U} = \lambda_{V} =\uplambda $. $ U_{i} $, $ V_{j} $ iterate over each iteration.$ U_{i} \leftarrow \left( {U_{i} - \gamma \cdot \frac{\partial f}{{\partial U_{i} }}} \right)D_{i} $. Weighted control of each iteration by multiplying a similar user influence matrix. Where $ \upgamma $ be the threshold. We get the hidden information characteristic matrix $ {\text{U}} $ and $ {\text{V}} $, and then we can predict the reconstructed score matrix $ \widehat{R} $ based on the eigenvector.

4 Experimental Setup and Analysis

4.1 Data Set

In this paper, a comprehensive teaching and scientific research collaboration platform developed by SCHOLAT research is used as an experiment. We use the data set provided by the SCHOLAT until March 16, 2018. A total of 43905 users and 1698 courses were recorded. User information included number of papers published, number of applications, landing score, dynamic score, academic score, academic degree, patents, number of teams joined, number of courses joined, number of friends joined, number of praise points, etc.

4.2 Evaluation Measures

In this paper, the root-mean-square error (RMSE) is used as the measure standard. The smaller the value of the RMSE, the higher the accuracy of the algorithm.

$$ {\text{RMSE}} = \sqrt {\frac{1}{n}\sum\nolimits_{i \in m,j \in n} {\left( {r_{ij} - \widehat{{r_{ij} }}} \right)}^{2} } $$

(12)

Where $ r_{ij} $ is real relationship between the user and the course selection, $ \widehat{{r_{ij} }} $ is the predictive relationship between the user and the course in the recommendation system.

4.3 Experimental Results and Analysis

We compare the proposed recommendation model with the traditional probability matrix decomposition method, the nearest neighbor user influence decomposition method using cosine similarity, the nearest neighbor user influence decomposition method using Pearson similarity, and the RMSE value of the nearest neighbor user influence decomposition method using Euclidean distance.

As can be seen from the above Fig. 2, the number of neighbors and λ of the parameters are both the best parameters. As the dimension K of hidden feature vector increases, the accuracy of all algorithm recommendations is improved. On the other hand, the problem of overfitting and increasing computational complexity may arise. In each dimension, the proposed algorithm is superior to the traditional PMF algorithm and other similar competition algorithms.

As the number of nearby users increases, the recommendation results gradually improve. RMSE values tend to stabilize after the number of nearby users exceeds 6. As the number of neighbor users increases, the algorithm will consume more time. Therefore, we choose neighbor users as 6 (Fig. 3).

From the above Fig. 4, we can see that the RMSE value is the smallest when λ = 0.01, and the precision of the algorithm begins to decrease as lambda continues to increase, so select λ = 0.01 as the lambda parameter value of this recommendation system.

5 Conclusion

Aiming at the problem of information overload on the online course website, this paper proposed a course recommendation system that integrates the influence of neighboring users and the decomposition of probability matrix by studying the SCHOLAT course data of the new course platform, and provides the course recommendation for the system users. Experiments show that this method can provide better and more accurate course recommendation for users than the traditional probability matrix decomposition method, effectively improve the efficiency of users’ inquiry course, and thus provide better services for users.

Although this paper combines social information, it does not consider the time sequence of social information. In future research, the time stamp mechanism can be considered to fully analyze the influence time and scope of social information in different time sequence, so as to obtain the recommended results of time information.

References

Xia, Z., Song, A., Fang, D., et al.: A collaborative filtering recommendation mechanism for cloud computing. J. Comput. Res. Dev. 51(10), 2255–2269 (2014)
Google Scholar
Chen, L., Chen, G., Wang, F.: Recommender systems based on user reviews: the state of the art. User Mod. User-Adap. Inter. 25(2), 99–154 (2015)
Article Google Scholar
Goldberg, D., Nichols, D., Oki, B.M., et al.: Using collaborative filtering to weave an information tapestry. Commun. ACM 35(12), 61–70 (1992)
Article Google Scholar
Resnick, P., Iacovou, N., Suchak, M., et al.: GroupLens: an open architecture for collaborative filtering of netnews. In: ACM Conference on Computer Supported Cooperative Work 1994, pp. 175–186. ACM, Chapel Hill (1994)
Google Scholar
Pagare, R.A., Patil, S.: Study of collaborative filtering recommendation algorithm scalability issue. Int. J. Comput. Appl. 67(25), 10–15 (2014)
Google Scholar
Pham, T.A.N., Li, X., Cong, G., et al.: A general graph-based model for recommendation in event-based social networks. In: International Conference on Data Engineering 2015, pp. 567–578. IEEE (2015)
Google Scholar
Bokde, D., Girase, S., Mukhopadhyay, D.: Matrix factorization model in collaborative filtering algorithms: a survey. Procedia Comput. Sci. 49, 136–146 (2015). Icac
Article Google Scholar
Song, Y., Zhuang, Z., Zhao, Q., et al.: Real-time automatic tag recommendation. In: International ACM SIGIR Conference on Research and Development in Information Retrieval 2008, vol. 6, pp. 515–522. ACM (2008)
Google Scholar
Mnih, A., Salakhutdinov, R.R.: Probabilistic matrix factorization. Adv. Neural. Inf. Process. Syst. 20(2), 1257–1264 (2007)
Google Scholar
Li J, Xia F, Wang W, et al.: ACRec: a co-authorship based random walk model for academic collaboration recommendation. International Conference on World Wide Web 2014, pp. 1209–1214. ACM (2014)
Google Scholar
Rendle, S., Schmidt-Thieme, L.: Pairwise interaction tensor factorization for personalized tag recommendation. In: ACM International Conference on Web Search and Data Mining 2010, pp. 81–90. ACM (2010)
Google Scholar
Zhang, G.-W., Li, D.-Y., Li, P., et al.: A collaborative filtering recommendation algorithm based on cloud model. J. Softw. 18(10), 2403–2411 (2007)
Article Google Scholar

Download references

Acknowledgements

This work is supported by the National Natural Science Foundation of China (No. 61272067), the Science and Technology Project of Guangdong Province (Nos. 2017A040405057 and 2016A030303058), and the Science and Technology Program of Guangzhou, China (No. 201604046017).

Author information

Authors and Affiliations

South China Normal University, Guangzhou, 510631, China
Jianguo Li, Chao Chang, Zuoxi Yang, Hailin Fu & Yong Tang

Authors

Jianguo Li
View author publications
You can also search for this author in PubMed Google Scholar
Chao Chang
View author publications
You can also search for this author in PubMed Google Scholar
Zuoxi Yang
View author publications
You can also search for this author in PubMed Google Scholar
Hailin Fu
View author publications
You can also search for this author in PubMed Google Scholar
Yong Tang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yong Tang .

Editor information

Editors and Affiliations

South China Normal University, Guangzhou, China
Yong Tang
Wuhan University of Technology, Wuhan City, China
Qiaohong Zu
CINVESTAV, Mexico City, Mexico
José G. Rodríguez García

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Li, J., Chang, C., Yang, Z., Fu, H., Tang, Y. (2019). Probability Matrix Factorization Algorithm for Course Recommendation System Fusing the Influence of Nearest Neighbor Users Based on Cloud Model. In: Tang, Y., Zu, Q., Rodríguez García, J. (eds) Human Centered Computing. HCC 2018. Lecture Notes in Computer Science(), vol 11354. Springer, Cham. https://doi.org/10.1007/978-3-030-15127-0_49

Download citation

DOI: https://doi.org/10.1007/978-3-030-15127-0_49
Published: 22 March 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-15126-3
Online ISBN: 978-3-030-15127-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics