Abstract
FM can use the second-order feature interactions. Some researchers combine FM with deep learning to get the high-order interactions. However, these models rely on negative sampling. ENSFM adopts non-sampling and gets fine results, but it does not consider the high-order interactions. In this paper, we add the high-order interactions to ENSFM. We also introduce a technique called Order-aware Embedding. The excellent results show the effectiveness of our model.
Access provided by Autonomous University of Puebla. Download conference paper PDF
Similar content being viewed by others
Keywords
- Context-aware recommendation
- Factorization machines
- Non-sampling
- The high-order interactions
- Order-aware embedding
1 Introduction
ENSFM [1] achieves non-sampling, but only considers the second-order interactions. In this paper, we continue to use non-sampling. On this basis, the third-order and the fourth-order interactions are added. We also consider that the use of shared embedding may cause some problems. Therefore, we adopt a technique called Order-aware Embedding to solve these problems. Its main idea is to apply different embeddings to different orders for feature interactions.
The main contributions of this work are summarized as follows: (1)We consider that the high-order interactions have an important influence on performance, so the third-order and the fourth-order interactions are added. (2)We believe that the use of shared embedding will result in learned feature interactions less effective, so we adopt Order-aware Embedding.
2 Preliminaries
2.1 Factorization Machines (FM)
FM is a machine learning algorithm based on MF. The model uses a low-dimensional dense vector to represent the weight of a feature. The number of user features and item features are denoted by m and n, respectively. By using the factorized parameters, FM captures all interactions between features:
2.2 Efficient Non-sampling Matrix Factorization
Although the performance of non-sampling matrix factorization is excellent, its shortcoming is also obvious—inefficiency. In order to solve this problem, researchers have proposed some effective solutions [2, 11, 12].
Theorem 1
A generalized matrix factorization whose prediction function is:
where \(\mathbf {p}_{u}\) and \(\mathbf {q}_{v}\) are representation vectors of user and item, respectively. And \(\odot \) denotes the element-wise product of two vectors. Its loss function is:
where \(c_{u v}\) is the weight of sample \(y_{u v}\). It is completely equivalent to that of:
3 Our Model—ONFM
3.1 Overview
Using the FM form, ONFM is expressed as:
where \(f_{2}(x)\), \(f_{3}(x)\) and \(f_{4}(x)\) denote the second-order, the third-order and the fourth-order, respectively. Figure 1 shows the composition structure of ONFM.
There are five layers—Input, Order-aware Embedding, Feature Pooling, Fully-connected and Output. The input of Input are some high-dimensional sparse vectors obtained by one-hot encoding. We need to convert these special vectors into low-dimensional dense vectors. The role of Order-aware Embedding is to solve this problem. After Order-aware Embedding processing, we get three different sets of low-dimensional dense vectors. The embedding vectors of feature i for different orders can be formulated as [5]:
Then these low-dimensional dense vectors directly enter Feature Pooling for feature interaction processing. The target of this layer is to reconstruct FM model in Eq. (5) into a completely equivalent generalized MF form:
where \(\mathbf {p}_{u}\), \(\mathbf {q}_{v}\) are two vectors obtained by Feature Pooling. They are only related to the corresponding user and item, not to the objects they interact with.
Finally, the two vectors are input to Fully-connected. Then, we obtain the final prediction \(\hat{y}_{F M}\), which represents user u’s preference for item v.
3.2 ONFM Theoretical Analysis
The basic theory of ONFM is that the FM model incorporating the high-order interactions in Eq. (5) can be transformed into a MF form in Eq. (10). Then we will prove the correctness of the theory.
Recalling Eq. (5), we consider three parts—\(f_{2}(x)\), \(f_{3}(x)\), \(f_{4}(x)\), they can be transformed into the following form:
Next, we will describe the training process of ONFM in detail by constructing three auxiliary vectors—\(\mathbf {p}_{u}\), \(\mathbf {q}_{v}\) and \(\mathbf {h}_{aux}\):
For the auxiliary vector \(\mathbf {p}_{u}\), it is calculated by module-1. The input of module-1 are multiple sets of user feature embedding vectors. The first six elements have a unified format—\(p_{u, d}^{x,y}\). They are used for the user-item feature interactions. The last two elements are related to the global weight, the first-order features and the self feature interactions. The form of each part is expressed as follows.
The second auxiliary vector \(\mathbf {q}_{v}\) is similar to the first one, but the element position is adjusted accordingly. For the third auxiliary vector \(h_{\text{ aux }}\), \(h_{\text{ aux }, d}^{2}=h_{2}\), \(h_{\text{ aux }, d}^{4}=h_{4}\), \(h_{\text{ aux }, d}^{6}=h_{6}\), \(h_{\text{ aux }, d}^{7}=h_{7}\).
3.3 Optimization Method
Now, we will introduce some optimization methods for ONFM. Firstly, the theory we have proved shows that FM can be expressed as two uncorrelated vectors—\(\mathbf {p}_{u}\) and \(\mathbf {q}_{v}\). By precomputing the vectors, we can greatly improve the training efficiency. Secondly, after transforming FM into generalized MF, the prediction function of ONFM satisfies the requirements of Theorem 1, so the loss function it proposes is available for ONFM:
where \(\mathbf {B}\) indicates a batch of users, and \(\mathbf {V}\) indicates all items.
4 Experiments
4.1 Experimental Settings
Datasets. The two publicly available datasets used are Frapple and Last.fm. For Frapple, the number of user, item, feature and instance are 957, 4082, 5382, 96203. For Last.fm, the number are 1000, 20301, 37358 and 214574.
Baseline. We compare ONFM with the following baseline:
-
PopRank: This model returns Top-k most popular items.
-
FM [9]: The original factorization machines.
-
NFM [6]: Neural factorization machine uses MLP to learn nonlinear and high-order interactions signals.
-
DeepFM [4]: This model combines FM and MLP to make recommendations.
-
ONCF [7]: This model improves MF with outer product.
-
CFM [10]: Convolutional Factorization Machine uses 3D convolution to achieve the high-order interactions between features.
-
ENMF [3]: Efficient Neural Matrix Factorization uses non-sampling neural recommendation method to generate recommendations.
-
ENSFM [1]: Efficient Non-Sampling Factorization Machines conducts non-sampling training by transforming FM into MF.
Evaluation Protocols and Metrics. ONFM adopts the leave-one-out evaluation protocol [8, 10] to test its performance. For Frappe, we randomly choice one transaction as the test example for each specific user context because of no timestamp. For List.fm, the latest transaction of each user is held out for testing and the rest is treated as the training set. The evaluate metrics are Hit Ratio (HR) and Normalized Discounted Cumulative Gain (NDCG).
Parameter Settings. In ONFM, the weight of all missing data is set to \(c_{0}\) uniformly, the batch size is set to 512, the embedding size is set to 64, the learning rate is set to 0.05, and the dropout ratio is set to 0.9. \(c_{0}\) is set to 0.05 and 0.005 for Frappe and Lisr.fm, respectively.
4.2 Performance Comparison
Table 1 summarize the best performance of these models on Frappe and List.fm, respectively. In order to evaluate on different recommendation lengths, we set the length K = 5, 10, and 20 in our experiments. The experimental results show that our model achieves the best performance on all datasets regarding to both HR and NDCG. ONFM-1 adds the third-order interactions between features based on ENSFM. It is noted that ONFM-1 uses shared embedding. Compared with ENSFM, its performance is better, which indicates the effectiveness of the third-order interactions. On the basis of ONFM-1, ONFM-2 introduces the technique called Order-aware Embedding. The performance is improved, indicating that using order-aware embedding is a better choice. ONFM-3 is the final form of our model, which adds the third-order interactions and the fourth-order interactions meanwhile, and also use Order-aware Embedding. Compared with ENSFM, the performance of ONFM-3 is excellent.
5 Conclusion and Future Work
In this paper, we propose a novel model named Order-Aware Embedding Non-Sampling Factorization Machines. The key design of ONFM is to transform FM model incorporating the high-order interactions into a MF form through mathematical transformation. Then we can get three auxiliary vectors—\(\mathbf {p}_{u}\), \(\mathbf {q}_{v}\) and \(\mathbf {h}_{aux}\). \(\mathbf {p}_{u}\) and \(\mathbf {q}_{v}\) are only related to the corresponding user and item. We also use Order-aware Embedding. Finally, through some optimization methods, we apply non-sampling to train ONFM. Extensive experiments on two datasets demonstrate that ONFM obtains effective feature information successfully.
Although the results of ONFM illustrate the importance of the high-order interactions, the way to calculate the high-order interactions is crude. In the future, we will design a more excellent method to calculate the high-order interactions. Moreover, different feature interactions have different influence on the accuracy of the final prediction. So in order to better extract feature information, we are also interested in applying attention mechanism to our model.
References
Chen, C., Zhang, M., Ma, W., Liu, Y., Ma, S.: Efficient non-sampling factorization machines for optimal context-aware recommendation. In: WWW 2020
Chen, C., Zhang, M., Wang, C., Ma, W., Li, M., Liu, Y., Ma, S.: An efficient adaptive transfer neural network for social-aware recommendation. In: SIGIR 2019
Chen, C., Zhang, M., Zhang, Y., Liu, Y., Ma, S.: Efficient neural matrix factorization without sampling for recommendation. ACM Trans. Inf. Syst. 38(2), 14:1–14:28 (2020)
Guo, H., Tang, R., Ye, Y., Li, Z., He, X.: Deepfm: A factorization-machine based neural network for CTR prediction. In: IJCAI 2017
Guo, W., Tang, R., Guo, H., Han, J., Yang, W., Zhang, Y.: Order-aware embedding neural network for CTR prediction. In: SIGIR 2019
He, X., Chua, T.: Neural factorization machines for sparse predictive analytics. In: SIGIR 2017
He, X., Du, X., Wang, X., Tian, F., Tang, J., Chua, T.: Outer product-based neural collaborative filtering. In: IJCAI 2018
He, X., Liao, L., Zhang, H., Nie, L., Hu, X., Chua, T.: Neural collaborative filtering. In: WWW 2017
Rendle, S.: Factorization machines. In: ICDM 2010
Xin, X., Chen, B., He, X., Wang, D., Ding, Y., Jose, J.: CFM: convolutional factorization machines for context-aware recommendation. In: IJCAI 2019
Xin, X., Yuan, F., He, X., Jose, J.M.: Batch IS NOT heavy: Learning word representations from all samples. In: ACL 2018
Yuan, F., Xin, X., He, X., Guo, G., Zhang, W., Chua, T., Joemon, J.M.: f\({}_{\text{bgd}}\): Learning embeddings from positive unlabeled data with BGD. In: UAI 2018
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Hou, Q. et al. (2020). Order-Aware Embedding Non-sampling Factorization Machines for Context-Aware Recommendation. In: Yang, H., Pasupa, K., Leung, A.CS., Kwok, J.T., Chan, J.H., King, I. (eds) Neural Information Processing. ICONIP 2020. Communications in Computer and Information Science, vol 1332. Springer, Cham. https://doi.org/10.1007/978-3-030-63820-7_89
Download citation
DOI: https://doi.org/10.1007/978-3-030-63820-7_89
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-63819-1
Online ISBN: 978-3-030-63820-7
eBook Packages: Computer ScienceComputer Science (R0)