Keywords

1 Introduction

In recent years, the mobile Internet and e-commerce industries have developed rapidly, and the amount of information and data traffic has exploded. People are facing serious information overload problems. In the context of this year-on-year development of Internet technology and communication technology, a good recommendation system is particularly important  [1,2,3,4]. The recommendation system with the help of the mobile Internet platform uses the interactive information between users and items to help users find information of interest and solve the problem of information overload  [5]. At the same time, the development of mobile Internet has also greatly promoted the rapid development of recommendation systems.

Collaborative filtering is one of the most widely used methods in recommendation systems  [6],which predicts user preferences simply and effectively by discovering and exploiting the similarities between users and items through the rating matrix. The most widely employed models are user-based and item-based collaborative filtering. However, these shallow models cannot learn the deep features of users and items, limiting their scalability for recommendation. In recent years, deep learning techniques represented by neural networks have made considerable progress in the fields of image and speech  [6]. Consequently, more and more research has been put forward to apply neural networks into collaborative filtering, where the autoencoder model, such as AutoRec  [7,8,9,10] is the most ideal one. Compared with the traditional collaborative filtering algorithm, the recommendation accuracy of AutoRec is greatly improved. Unfortunately, AutoRec can’t deal with the large-scale historical behavior data of users. Besides, the shallow model structure is hard to extract the deep hidden features of users and items.

This paper proposes a collaborative filtering recommendation algorithm based on improved Stacked Denoising AutoEncoder  [11,12,13,14]. The hidden representation of users and items is learned from the ratings and auxiliary information through the Stacked Denoising AutoEncoder framework. The deep feature representation ability is extracted to address the inefficiency and sparsity issues of matrix decomposition in traditional collaborative filtering algorithms. In addition, the user and item dimensions are also taken into account, which is able to effectively alleviate the sparse data and cold start of new items, so as to improve the efficiency of the recommendation algorithm. Experiments are done on the movielens dataset and compared with several mainstream algorithms. The results show that the recommendation precision and recall rate of the proposed algorithm are significantly improved, and the cold start problem has been alleviated.

2 Preliminaries

2.1 Autoencoder

The autoencoder  [15] is a type of neural network that is commonly used to learn the deep features of input data as shown in Fig. 1. The basic autoencoder consists of an input layer, a hidden layer, and an output layer. The input layer and the output layer have the same number of neurons, while the number of neurons in the hidden layer is typically smaller than the input layer and the output layer. The autoencoder tries to learn an identity function that makes the input and output as equal as possible. The automatic encoder is an unsupervised learning approach, which does not need to mark the training data.

Fig. 1.
figure 1

The network structure of AutoEncoder

The AutoEncoder’s working process is elaborated as below. Suppose the training set has sample ratings for m users \(\{x_{1},x_{2},\cdots ,x_{m}\}\), and the rating for each sample \(x_{i} \in R^N\) is an N-dimensional vector. First, each sample rating is encoded to obtain the features of the hidden layer \(h^i \in R^L\).

$$\begin{aligned} h^i={\sigma }(Wx_{i}+b) \end{aligned}$$
(1)

Where \(W\in R^{L*N}\) is the weight matrix of the encoding part,b is the bias vectors, \({\sigma }(x)=1/(1+e^{-x})\) is Sigmoid function indicating that the Sigmoid operation is performed on each dimension of the input x after the encoding. The decoding operation is executed to restore \(\hat{x}\in R^N\) from the hidden feature \(h^i\) of the L dimension as (2).

$$\begin{aligned} \hat{x}={\sigma }(W'h_{i}+b') \end{aligned}$$
(2)

Where \(W'\in R^{N*L}\) is the weight matrix of the encoding part, \(b'\) is the bias vectors.The training process of the AutoEncoder is to constantly adjust the weight matrix W and \(W'\), the offset vector b and \(b'\) in order to minimize the objective function as (3).

$$\begin{aligned} E=\frac{1}{2m}\sum _{i=1}^{m}||x_{i}-\hat{x}_{i}||+\frac{\lambda }{2}||W||^{2}+\frac{\lambda }{2}||W'||^{2} \end{aligned}$$
(3)

Where \(||x_{i}-\hat{x}_{i}||\) is the error term of the input data x and the output data \(\hat{x}\) which is used to minimize the error between the output data and the original data. \(\frac{\lambda }{2}||W||^{2}\) and \(\frac{\lambda }{2}||W'||^{2}\) are regular terms, in order to avoid over-fitting the training data. Finally, the hidden layer features \(h^i\) are gained through the trained parameters, so that the hidden layer feature codes of the original data can be obtained.

2.2 Denoising AutoEncoder

The AutoEncoder performs pre-training of the model by minimizing the error between the input and output. However, it is easy to learn an identity function from the AutoEncoder due to problems such as model complexity, training set data volume, and data noise. In order to solve this problem, Vincent proposed Denoising AutoEncoder(DAE) in terms of robustness  [16] based on AutoEncoder. In order to prevent the over-fitting problem, random noise is added to the input data, and the process of encoding and decoding by adding noise data is reproduced input.In order to minimize the error between the reconstructed input and the original input, the purpose of DAE is to minimize the loss function.

2.3 Stacked Denoising AutoEncoder

Stacked Denoising AutoEncoder (SDAE) is a deep-structured neural network constructed by stacking multiple DAE  [17]. SDAE is used to process larger data sets and extract deeper features of the input data. The training of SDAE network adopts the greedy layer-wise training approach proposed by Hitton  [18]. The first layer of the network is trained to get the parameters. The hidden layer output obtained by the first layer is then used as the input of the second layer. When training the next layer, the parameters of the preceding layers remain unchanged. After the training of each layer is completed, the entire network is initialized by the weights during training separately. The output of the layer is used as reconstruction data. Finally, the optimization objective function as Eq. (3) is adopted to adjust the parameters.

3 The Proposed Algorithm

In order to address the data sparseness and cold start issues in traditional collaborative filtering algorithms, two SDAEs are employed to handle the user’s ratings - user’s auxiliary information and items scores - item auxiliary information  [13, 14, 19] respectively in this paper. The hidden layer’s feature is referred as the deep level feature of user and item, which is used to calculate the similarity between users and items.

3.1 User Similarity Calculation

The traditional collaborative filtering algorithms only consider user rating data when performing user similarity calculation, ignoring the user’s auxiliary information. There is also a cold start problem for the new user. In addition, the traditional algorithms only consider the shallow features of the user, and cannot extract the deep hidden features of user and item,that results in the low accuracy during the similarity calculation. The proposed algorithm integrates deep neural network SDAE into collaborative filtering. Taking movie recommendation as an example. Suppose there are M users, N movies, and user u scores an integer of 1–5 for movie v, where that \(R^{m*n}\) is the user’s rating matrix. Three auxiliary information of user, gender, age, and occupation are considered. After discretizing the user’s age, the user information matrix \(U\in R^{m*l}\) is obtained . Each node of at SDAE input layer represents user’s rating on the current movie and the features of the current user. The input data is trained layer by layer without label to get the parameters of each layer, which are used to extract the deep features of users. The user based network structure of SDAE is defined as U-SDAE, and the item based network structure of SDAE is defined as I-SDAE.

Fig. 2.
figure 2

Improved Collaborative Filtering based on Stacked Denoising AutoEncoders

As shown in Fig. 2, the network structure of SDAE in this paper consists of one input layer, two hidden layers, and one output layer. The algorithm inputs the user information matrix \(U^{m*l}\) to generate a user feature vector \(U^i\in U^l\), where l is the number of neurons in the input layer, representing a user’s score for n items and the characteristics of the current user. The parameters are trained using the automatic encoder training method as follows:

$$\begin{aligned} h_{u}^{1}=\sigma (W_{1}U^{T}+b_{1}) \end{aligned}$$
(4)
$$\begin{aligned} h_{u}^{2}=\sigma (W_{1}^{'}h_{u}^{1}+b_{1}^{'}) \end{aligned}$$
(5)
$$\begin{aligned} \hat{U}=\sigma (W_{1}^{''}h_{u}^{2}+b_{1}^{''}) \end{aligned}$$
(6)

Where \(W_{1}\in R^{k*l}\)\(W_{1}^{'}\in R^{j*k}\) and \(W_{1}^{''}\in R^{l*j}\) is weight matrix. \(h_{u}^{1}\) and \(h_{u}^{2}\) is the hidden layer feature of the user. \(b_{1}\in R^{m*1}\),\(b_{1}^{'}\in R^{m*1}\), \(b_{1}^{''}\in R^{m*1}\) are bias vectors. The objective function of learning user’s potential features is defined as:

$$\begin{aligned} E=\frac{1}{2m}\sum _{i=1}^{m}||U-\hat{U}||+\frac{\lambda }{2}||W_{1}^{'}||^{2}+\frac{\lambda }{2}||W_{1}^{''}||^{2} \end{aligned}$$
(7)

Where \(\lambda \) is a regularization parameter used to prevent overfitting. By continuously minimizing the objective function, the parameters \(\{W_{1},b_{1}\}\) of the first layer and the output of the first hidden layer are obtained, which forms the input of the next layer. The above training process is continuously repeated to record the parameters of each layer \(\{W_{1},W_{1}^{'},W_{1}^{''},b_{1},b_{1}^{'},b_{1}^{''}\}\) The trained parameters are then used to calculate \(h_{u}^{2}\) through formula (4) and formula (5) in order to compress the original l sample dimension into j dimensional features. Finally, user similarity is calculated with user’s low-dimensional feature vector.

$$\begin{aligned} sim(u,v)=\frac{h_{uu}^{2}\bullet h_{uv}^{2}}{|h_{uu}^{2}|\times |h_{uv}^{2}|} \end{aligned}$$
(8)

Where \(h_{uu}^{2}\) and \(h_{uv}^{2}\) represent the j dimensional feature vectors compressed by user u and user v through the SDAE.

3.2 Item Similarity Calculation

In the recommendation system, the auxiliary information of an item is an important indicator to distinguish different items. The traditional collaborative filtering algorithm ignores the contribution of the item attribute to the similarity calculation. The proposed algorithm combines ratings and item attributes to calculate similarities between items. First, the item-attribute matrix is obtained by analyzing the item information. Assuming that the number of items is n and the number of attributes is r, the item-attribute matrix is shown in Table 1. Then, the user’s rating matrix and the item attribute matrix are combined to obtain an item information matrix \(I^{n\times p}\). Each node of the SDAE input layer represents the scores of the current item by m users and the attribute characteristics of the current item. The input data are trained layer by layer without label to gain the parameters of SDAE network. The parameters are used to extract the deep-seated features of the item. The structure of the SDAE network is similar to that of Fig. 2. The proposed algorithm inputs the item information matrix \(I^{n\times p}\) to generate a user feature vector \(I^i\in I^p\), where p is the number of neurons in the input layer, indicating that m users have scored the current item and attribute features of the current item. The training process of the I-SDAE model is basically the same as the U-SDAE. After the training is completed, the hidden layer feature \(h_{I}^{2}\) of the item is calculated through the trained parameters, which is a feature of compressing the original sample from p dimensional to t dimensional. The learned low-dimensional features include the evaluation information obtained by the item and the attribute features of the item itself, that can express the features of the item in a deeper level. Finally, the low-dimensional feature vector of the learned item is used to calculate the item similarity :

$$\begin{aligned} sim_{1}(i,j)=\frac{h_{Ii}^{2}\bullet h_{Ij}^{2}}{|h_{Ii}^{2}|\times |h_{Ij}^{2}|}, \end{aligned}$$
(9)

where \(h_{Ii}^{2}\) and \( h_{Ij}^{2}\) represent t dimensional feature vectors that the item i and item j are compressed by the SDAE.

Table 1. Item-Attribute Sheet

3.3 Prediction of Comprehensive Score

This paper uses a domain-based scoring prediction algorithm, which first calculates the user-based score prediction. First, formula (8) to calculate the similarity of the user sim(uv), sort the similarity between the items, and get the set of nearest neighbors of the target user \(U_{u}=\{ U_{u1},U_{u2},\cdots ,U_{uk}\}\), Then user u’s score prediction \(Q_{u}\) to item i is:

$$\begin{aligned} Q_{u}=\frac{\sum _{v\in S(u,K)\cap N(i)}sim(u,v)(r_{vi}-\bar{r_{v}})}{\sum _{v\in S(u,K)\cap N(i)}|sim(u,v)|}+\bar{r_{u}} \end{aligned}$$
(10)

Where S(uK) is a collection of K users most similar to the user u’s interest, N(i) is a set of users who have scored the item i, sim(uv) is the similarity between users, \(\bar{r_{u}}\) is the average value of user u’s score on all items, \(r_{vi}\) is user v’s score on item i, \(\bar{r_{v}}\) is the average value of user v ratings on all items he scored.

This paper considers the similarity of the items to predict the score. The Item-based scoring prediction algorithm refers to user u scoring for other items similar to item i. User u’s scoring prediction \(Q_{I}\) for item i is:

$$\begin{aligned} Q_{I}=\frac{\sum _{j\in S(i,K)\cap N(u)}sim(i,j)(r_{uj}-\bar{r_{j}})}{\sum _{j\in S(i,K)\cap N(u)}|sim(i,j)|}+\bar{r_{i}} \end{aligned}$$
(11)

Where S(iK) is the most similar set of item i, N(u) is a collection of items that users have scored, sim(ij) is the similarity between items, \(\bar{r_{i}}\) is the average score of item i. After getting the predicted scores for the two dimensions of user and item, the predicted score for the fusion can be calculated as follows:

$$\begin{aligned} Q=\beta Q_{u}+(1-\beta )Q_{I} \end{aligned}$$
(12)

Where: \(\beta \in \left[ 0,1\right] \) is the weight that controls the prediction scores, which should be adjusted in the experiment.

4 Experiments and Analysis

4.1 Datasets

In this paper, movielens datasetFootnote 1 is adopted to validate the related recommendation algorithms. The dataset has three scales, where we employ the 1 M scale, including 6040 users, 3883 movies, and 1000209 rating data. Each rating data includes user number, movie number, user rating data, and timestamp. In addition, the movie information includes the name and category of each movie, and the user information includes gender, age, and occupation. In the experiment, we choose 80\(\%\) of the dataset as training set and the remaining 20\(\%\) as test set.

4.2 Evaluation Goal

We take the precision rate and recall rate of the recommendation system as the evaluation goal  [20]. The precision rate and recall rate is described 13 and 14 respectively:

$$\begin{aligned} Precision=\frac{\sum _{u}|R(U)\bigcap T(U)|}{\sum _{u}|R(U)|} \end{aligned}$$
(13)
$$\begin{aligned} Recall=\frac{\sum _{u}|R(U)\bigcap T(U)|}{\sum _{u}|T(U)|} \end{aligned}$$
(14)

Where: R(U) is a list of recommendations for the user based on the behavior of the user on the training set, which is a list of behaviors of the user on the test set.

As shown in Table 2 ,the traditional user-based, item-based, AE, and SDAE schemes are choosen to make the comparative analysis with our proposed algorithm (SDAE-U-I).

Table 2. Comparison between models

4.3 Results Analysis

Figure 3 shows the recall rate as a function of weight. It can be seen from the figure that the \(\beta \) value is around 0.4 to 0.6, and the recall rate is better. In this paper we set the weight \(\beta \) to 0.5. When \(\beta \) = 0, the algorithm makes a score prediction based on the hidden features of the item learned by SDAE. When \(\beta \) = 1, the algorithm makes a score prediction based on the hidden features of the user learned by SDAE.

Fig. 3.
figure 3

Effect of different parameters \(\beta \) on recall rate

Figure 4 and Fig. 5 show the recall rate comparison between SDAE-U-I algorithm and other five algorithms under different number of neighbors. It can be seen from the figures that there is no linear relationship between the nearest neighbors and the recall rate of the recommended results, where the best number of nearest neighbors is between 80–100. Compared with user-based, item-based, and AE, the recall rate and precision rate of SDAE, SDAE-U, SDAE-I are significantly improved, indicating that the feature extraction effect of deep network is better than that of shallow model and improves the quality of the recommendation system. In addition, compared with AE, SDAE-U, and SDAE-I models, SDAE-U-I has improved the precision and recall rate. When we recommend the same length item list, SDAE-U-I has higher precision and more accurate results, which shows that the recommended cold start problem has been alleviated. Moreover, it can be seen from the results that the user characteristics and item characteristics learned from deep network can better replace users and items. Compared with the recommendation algorithm which only considers one dimension of users or items, the recall rate and precision rate are improved, and the recommendation effect is improved.

Fig. 4.
figure 4

Precision rate of each algorithm under different neighbors

Fig. 5.
figure 5

Recall rate of each algorithm under different neighbors

5 Conclusion and Future Studies

This paper proposes an improved collaborative filtering algorithm with Stacked Denoising AutoEncoders. The information matrix of users and items is trained by two Stacked Denoising AutoEncoders. The hidden feature vectors of users and items are considered, which equipts the proposed algorithm with the recommendation ability for new users or new items. The experimental results show that compared with the traditional methods, the precision and recall rate of the proposed algorithm are higher. To some extent, the issues of data sparseness and the cold start of new items and new users are solved. In addition, it can be seen that the effect of extracting features from deep neural networks is better than that of shallow models. However, we spend a lot of time on data preprocessing, which needs to be improved. When the data volume of users and items gradually increases, how to optimize the computational efficiency of the recommendation algorithm and achieve real-time recommendation will be the focus of the future research.