Keywords

1 Introduction

ENSFM [1] achieves non-sampling, but only considers the second-order interactions. In this paper, we continue to use non-sampling. On this basis, the third-order and the fourth-order interactions are added. We also consider that the use of shared embedding may cause some problems. Therefore, we adopt a technique called Order-aware Embedding to solve these problems. Its main idea is to apply different embeddings to different orders for feature interactions.

The main contributions of this work are summarized as follows: (1)We consider that the high-order interactions have an important influence on performance, so the third-order and the fourth-order interactions are added. (2)We believe that the use of shared embedding will result in learned feature interactions less effective, so we adopt Order-aware Embedding.

2 Preliminaries

2.1 Factorization Machines (FM)

FM is a machine learning algorithm based on MF. The model uses a low-dimensional dense vector to represent the weight of a feature. The number of user features and item features are denoted by m and n, respectively. By using the factorized parameters, FM captures all interactions between features:

$$\begin{aligned} \hat{y}_{F M}(x)=w_{0}+\sum _{i=1}^{m+n} w_{i} x_{i}+\sum _{i=1}^{m+n} \sum _{j=i+1}^{m+n} e_{i}^{T} e_{j} \cdot x_{i} x_{j} \end{aligned}$$
(1)

2.2 Efficient Non-sampling Matrix Factorization

Although the performance of non-sampling matrix factorization is excellent, its shortcoming is also obvious—inefficiency. In order to solve this problem, researchers have proposed some effective solutions [2, 11, 12].

Theorem 1

A generalized matrix factorization whose prediction function is:

$$\begin{aligned} \hat{y}_{u v}=\mathbf {h}^{T}\left( \mathbf {p}_{u} \odot \mathbf {q}_{v}\right) \end{aligned}$$
(2)

where \(\mathbf {p}_{u}\) and \(\mathbf {q}_{v}\) are representation vectors of user and item, respectively. And \(\odot \) denotes the element-wise product of two vectors. Its loss function is:

$$\begin{aligned} \mathcal {L}(\varTheta )=\sum _{u \in \mathbf {U}} \sum _{v \in \mathbf {V}} c_{u v}\left( y_{u v}-\hat{y}_{u v}\right) ^{2} \end{aligned}$$
(3)

where \(c_{u v}\) is the weight of sample \(y_{u v}\). It is completely equivalent to that of:

$$\begin{aligned} \begin{aligned} \tilde{\mathcal {L}}(\varTheta )&=\sum _{u \in \mathrm {U}} \sum _{v \in \mathrm {V}^{+}}\left( \left( c_{uv}^{+}-c_{uv}^{-}\right) \hat{y}_{u v}^{2}-2 c_{uv}^{+} \hat{y}_{u v}\right) \\&+\sum _{i=1}^{d} \sum _{j=1}^{d}\left( \left( h_{i} h_{j}\right) \left( \sum _{u \in \mathrm {U}} p_{u, i} p_{u, j}\right) \left( \sum _{v \in \mathrm {V}} c_{uv}^{-} q_{v, i} q_{v, j}\right) \right) \end{aligned} \end{aligned}$$
(4)

3 Our Model—ONFM

3.1 Overview

Using the FM form, ONFM is expressed as:

$$\begin{aligned} \hat{y}_{F M}(x)=w_{0}+\sum _{i=1}^{m+n} w_{i} x_{i}+f_{2}(x)+f_{3}(x)+f_{4}(x) \end{aligned}$$
(5)
$$\begin{aligned} f_{2}(x)=h_{r} \sum _{i=1}^{m+n} \sum _{j=i+1}^{m+n}\left( x_{i} e_{i}^{2} \odot x_{j} e_{j}^{2}\right) \end{aligned}$$
(6)
$$\begin{aligned} f_{3}(x)=h_{s} \sum _{i=1}^{m+n} \sum _{j=i+1}^{m+n} \sum _{k=j+1}^{m+n}\left( x_{i} e_{i}^{3} \odot x_{j} e_{j}^{3} \odot x_{k} e_{k}^{3}\right) \end{aligned}$$
(7)
$$\begin{aligned} f_{4}(x)=h_{t} \sum _{i=1}^{m+n} \sum _{j=i+1}^{m+n} \sum _{k=j+1}^{m+n} \sum _{l=k+1}^{m+n}\left( x_{i} e_{i}^{4} \odot x_{j} e_{j}^{4} \odot x_{k} e_{k}^{4} \odot x_{l} e_{l}^{4}\right) \end{aligned}$$
(8)

where \(f_{2}(x)\), \(f_{3}(x)\) and \(f_{4}(x)\) denote the second-order, the third-order and the fourth-order, respectively. Figure 1 shows the composition structure of ONFM.

There are five layers—Input, Order-aware Embedding, Feature Pooling, Fully-connected and Output. The input of Input are some high-dimensional sparse vectors obtained by one-hot encoding. We need to convert these special vectors into low-dimensional dense vectors. The role of Order-aware Embedding is to solve this problem. After Order-aware Embedding processing, we get three different sets of low-dimensional dense vectors. The embedding vectors of feature i for different orders can be formulated as [5]:

$$\begin{aligned} e_{i}^{j}=W_{i}^{j} X\left[ s t a r t_{i}: e n d_{i}\right] \end{aligned}$$
(9)

Then these low-dimensional dense vectors directly enter Feature Pooling for feature interaction processing. The target of this layer is to reconstruct FM model in Eq. (5) into a completely equivalent generalized MF form:

$$\begin{aligned} \hat{y}_{F M}(\mathbf {x})=\mathbf {h}_{a u x}^{T}\left( \mathbf {p}_{u} \odot \mathbf {q}_{v}\right) \end{aligned}$$
(10)

where \(\mathbf {p}_{u}\), \(\mathbf {q}_{v}\) are two vectors obtained by Feature Pooling. They are only related to the corresponding user and item, not to the objects they interact with.

Finally, the two vectors are input to Fully-connected. Then, we obtain the final prediction \(\hat{y}_{F M}\), which represents user u’s preference for item v.

Fig. 1.
figure 1

The overall framework of ONFM.

3.2 ONFM Theoretical Analysis

The basic theory of ONFM is that the FM model incorporating the high-order interactions in Eq. (5) can be transformed into a MF form in Eq. (10). Then we will prove the correctness of the theory.

Recalling Eq. (5), we consider three parts—\(f_{2}(x)\), \(f_{3}(x)\), \(f_{4}(x)\), they can be transformed into the following form:

$$\begin{aligned} \begin{aligned} f_{2}(x)&=h_{1}\left( \sum _{i=1}^{m} \sum _{j=i+1}^{m}\left( x_{i}^{u} e_{i}^{u,2} \odot x_{j}^{u} e_{j}^{u,2}\right) +\sum _{i=1}^{n} \sum _{j=i+1}^{n}\left( x_{i}^{v} e_{i}^{v,2} \odot x_{j}^{v} e_{j}^{v,2}\right) \right) \\ {}&+h_{2}\left( \sum _{i=1}^{m} x_{i}^{u} e_{i}^{u,2} \odot \sum _{i=1}^{n} x_{i}^{v} e_{i}^{v,2}\right) \end{aligned} \end{aligned}$$
(11)
$$\begin{aligned} \begin{aligned} f_{3}(x)&=h_{3}(a+b)+h_{4}\left( \sum _{i=1}^{m} \sum _{j=i+1}^{m}\left( x_{i}^{u} e_{i}^{u, 3} \odot x_{j}^{u} e_{j}^{u, 3}\right) \odot \sum _{i=1}^{n} x_{i}^{v} e_{i}^{v, 3}\right) \\&+h_{4}\left( \sum _{i=1}^{m} x_{i}^{u} e_{i}^{u, 3} \odot \sum _{i=1}^{n} \sum _{j=i+1}^{n}\left( x_{i}^{v} e_{i}^{v, 3} \odot x_{j}^{v} e_{j}^{v, 3}\right) \right) \end{aligned}\end{aligned}$$
(12)
$$\begin{aligned} a=\sum _{i=1}^{m} \sum _{j=i+1}^{m} \sum _{k=j+1}^{m}\left( x_{i}^{u} e_{i}^{u,3} \odot x_{j}^{u} e_{j}^{u,3} \odot x_{k}^{u} e_{k}^{u,3}\right) \end{aligned}$$
(13)
$$\begin{aligned} b=\sum _{i=1}^{n} \sum _{j=i+1}^{n} \sum _{k=j+1}^{n}\left( x_{i}^{v} e_{i}^{v,3} \odot x_{j}^{v} e_{j}^{v,3} \odot x_{k}^{v} e_{k}^{v,3}\right) \end{aligned}$$
(14)
$$\begin{aligned} \begin{aligned} f_{4}(x)&=h_{5}(c+d) \\ {}&+h_{6}\left( \sum _{i=1}^{m} \sum _{j=i+1}^{m} \sum _{k=j+1}^{m}\left( x_{i}^{u} e_{i}^{u,4} \odot x_{j}^{u} e_{j}^{u,4} \odot x_{k}^{u} e_{k}^{u,4}\right) \odot \sum _{i=1}^{n} x_{i}^{v} e_{i}^{v,4}\right) \\ {}&+h_{6}\left( \sum _{i=1}^{m} x_{i}^{u} e_{i}^{u,4} \odot \sum _{i=1}^{n} \sum _{j=i+1}^{n} \sum _{k=j+1}^{n}\left( x_{i}^{v} e_{i}^{v,4} \odot x_{j}^{v} e_{j}^{v,4} \odot x_{k}^{v} e_{k}^{v,4}\right) \right) \\ {}&+h_{7}\left( \sum _{i=1}^{m} \sum _{j=i+1}^{m}\left( x_{i}^{u} e_{i}^{u,4} \odot x_{j}^{u} e_{j}^{u,4}\right) \odot \sum _{i=1}^{n} \sum _{j=i+1}^{n}\left( x_{i}^{v} e_{i}^{v,4} \odot x_{j}^{v} e_{j}^{v,4}\right) \right) \end{aligned} \end{aligned}$$
(15)
$$\begin{aligned} c=\sum _{i=1}^{m} \sum _{j=i+1}^{m} \sum _{k=j+1}^{m} \sum _{l=k+1}^{m}\left( x_{i}^{u} e_{i}^{u,4} \odot x_{j}^{u} e_{j}^{u,4} \odot x_{k}^{u} e_{k}^{u,4} \odot x_{l}^{u} e_{l}^{u,4}\right) \end{aligned}$$
(16)
$$\begin{aligned} d=\sum _{i=1}^{n} \sum _{j=i+1}^{n} \sum _{k=j+1}^{n} \sum _{l=k+1}^{n}\left( x_{i}^{v} e_{i}^{v,4} \odot x_{j}^{v} e_{j}^{v,4} \odot x_{k}^{v} e_{k}^{v,4} \odot x_{l}^{v} e_{l}^{v,4}\right) \end{aligned}$$
(17)

Next, we will describe the training process of ONFM in detail by constructing three auxiliary vectors—\(\mathbf {p}_{u}\), \(\mathbf {q}_{v}\) and \(\mathbf {h}_{aux}\):

$$\begin{aligned} p_{u}=\left( \begin{array}{l} p_{u, d}^{2,1} \\ p_{u, d}^{3,1} \\ p_{u, d}^{3,2} \\ p_{u, d}^{4,1} \\ p_{u, d}^{4,2} \\ p_{u, d}^{4,3} \\ p_{u, 1} \\ \ 1 \end{array}\right) ; q_{v}=\left( \begin{array}{l} q_{v, d}^{2,1} \\ q_{v, d}^{3,2} \\ q_{v, d}^{3,1} \\ q_{v, d}^{4,3} \\ q_{v, d}^{4,2} \\ q_{v, d}^{4,1} \\ \ 1 \\ q_{v, 1} \end{array}\right) ; h_{aux}=\left( \begin{array}{l} h_{aux, d}^{2} \\ h_{aux, d}^{4} \\ h_{aux, d}^{4} \\ h_{aux, d}^{6} \\ h_{aux, d}^{7} \\ h_{aux, d}^{6} \\ \quad 1 \\ \quad 1 \end{array}\right) \end{aligned}$$
(18)

For the auxiliary vector \(\mathbf {p}_{u}\), it is calculated by module-1. The input of module-1 are multiple sets of user feature embedding vectors. The first six elements have a unified format—\(p_{u, d}^{x,y}\). They are used for the user-item feature interactions. The last two elements are related to the global weight, the first-order features and the self feature interactions. The form of each part is expressed as follows.

$$\begin{aligned} p_{u, d}^{2,1}=\sum _{i=0}^{m} x_{i}^{u} e_{i}^{u, 2} \end{aligned}$$
(19)
$$\begin{aligned} p_{u, d}^{3,1}=\sum _{i=0}^{m} x_{i}^{u} e_{i}^{u,3} \end{aligned}$$
(20)
$$\begin{aligned} p_{u, d}^{3,2}=\sum _{i=0}^{m} \sum _{j=i+1}^{m}\left( x_{i}^{u} e_{i}^{u,3} \odot x_{j}^{u} e_{j}^{u,3}\right) \end{aligned}$$
(21)
$$\begin{aligned} p_{u, d}^{4,1}=\sum _{i=0}^{m} x_{i}^{u} e_{i}^{u,4} \end{aligned}$$
(22)
$$\begin{aligned} p_{u, d}^{4,2}=\sum _{i=0}^{m} \sum _{j=i+1}^{m}\left( x_{i}^{u} e_{i}^{u,4} \odot x_{j}^{u} e_{j}^{u,4}\right) \end{aligned}$$
(23)
$$\begin{aligned} p_{u, d}^{4,3}=\sum _{i=0}^{m} \sum _{j=i+1}^{m} \sum _{k=j+1}^{m}\left( x_{i}^{u} e_{i}^{u,4} \odot x_{j}^{u} e_{j}^{u,4} \odot x_{k}^{u} e_{k}^{u,4}\right) \end{aligned}$$
(24)
$$\begin{aligned} \begin{aligned} p_{u, 1}&=w_{0}+\sum _{i=0}^{m} w_{i}^{u} x_{i}^{u}+h_{1} \sum _{i=0}^{m} \sum _{j=i}^{m}\left( x_{i}^{u} e_{i}^{u, 2} \odot x_{j}^{u} e_{j}^{u, 2}\right) \\ {}&+h_{3} \sum _{i=1}^{m} \sum _{j=i+1}^{m} \sum _{k=j+1}^{m}\left( x_{i}^{u} e_{i}^{u, 3} \odot x_{j}^{u} e_{j}^{u, 3} \odot x_{k}^{u} e_{j}^{u, 3}\right) \\ {}&+h_{5} \sum _{i=1}^{m} \sum _{j=i+1}^{m} \sum _{k=j+1}^{m} \sum _{l=k+1}^{m}\left( x_{i}^{u} e_{i}^{u, 4} \odot x_{j}^{u} e_{j}^{u, 4} \odot x_{k}^{u} e_{k}^{u, 4} \odot x_{l}^{u} e_{l}^{u, 4}\right) \end{aligned} \end{aligned}$$
(25)

The second auxiliary vector \(\mathbf {q}_{v}\) is similar to the first one, but the element position is adjusted accordingly. For the third auxiliary vector \(h_{\text{ aux }}\), \(h_{\text{ aux }, d}^{2}=h_{2}\), \(h_{\text{ aux }, d}^{4}=h_{4}\), \(h_{\text{ aux }, d}^{6}=h_{6}\), \(h_{\text{ aux }, d}^{7}=h_{7}\).

3.3 Optimization Method

Now, we will introduce some optimization methods for ONFM. Firstly, the theory we have proved shows that FM can be expressed as two uncorrelated vectors—\(\mathbf {p}_{u}\) and \(\mathbf {q}_{v}\). By precomputing the vectors, we can greatly improve the training efficiency. Secondly, after transforming FM into generalized MF, the prediction function of ONFM satisfies the requirements of Theorem 1, so the loss function it proposes is available for ONFM:

$$\begin{aligned} \begin{aligned} \tilde{\mathcal {L}}(\varTheta )&=\sum _{u \in \mathbf {B}} \sum _{v \in \mathbf {V}^{+}}\left( \left( c_{v}^{+}-c_{v}^{-}\right) \hat{y}(\mathbf {x})^{2}-2 c_{v}^{+} \hat{y}(\mathbf {x})\right) \\&+\sum _{i=1}^{d} \sum _{j=1}^{d}\left( \left( h_{a u x, i} h_{a u x, j}\right) \left( \sum _{u \in \mathbf {B}} p_{u, i} p_{u, j}\right) \left( \sum _{v \in \mathbf {V}} c_{v}^{-} q_{v, i} q_{v, j}\right) \right) \end{aligned}\end{aligned}$$
(26)

where \(\mathbf {B}\) indicates a batch of users, and \(\mathbf {V}\) indicates all items.

4 Experiments

4.1 Experimental Settings

Datasets. The two publicly available datasets used are Frapple and Last.fm. For Frapple, the number of user, item, feature and instance are 957, 4082, 5382, 96203. For Last.fm, the number are 1000, 20301, 37358 and 214574.

Baseline. We compare ONFM with the following baseline:

  • PopRank: This model returns Top-k most popular items.

  • FM [9]: The original factorization machines.

  • NFM [6]: Neural factorization machine uses MLP to learn nonlinear and high-order interactions signals.

  • DeepFM [4]: This model combines FM and MLP to make recommendations.

  • ONCF [7]: This model improves MF with outer product.

  • CFM [10]: Convolutional Factorization Machine uses 3D convolution to achieve the high-order interactions between features.

  • ENMF [3]: Efficient Neural Matrix Factorization uses non-sampling neural recommendation method to generate recommendations.

  • ENSFM [1]: Efficient Non-Sampling Factorization Machines conducts non-sampling training by transforming FM into MF.

Evaluation Protocols and Metrics. ONFM adopts the leave-one-out evaluation protocol [8, 10] to test its performance. For Frappe, we randomly choice one transaction as the test example for each specific user context because of no timestamp. For List.fm, the latest transaction of each user is held out for testing and the rest is treated as the training set. The evaluate metrics are Hit Ratio (HR) and Normalized Discounted Cumulative Gain (NDCG).

Parameter Settings. In ONFM, the weight of all missing data is set to \(c_{0}\) uniformly, the batch size is set to 512, the embedding size is set to 64, the learning rate is set to 0.05, and the dropout ratio is set to 0.9. \(c_{0}\) is set to 0.05 and 0.005 for Frappe and Lisr.fm, respectively.

Table 1. The performance of different models on Frappe and List.fm.

4.2 Performance Comparison

Table 1 summarize the best performance of these models on Frappe and List.fm, respectively. In order to evaluate on different recommendation lengths, we set the length K = 5, 10, and 20 in our experiments. The experimental results show that our model achieves the best performance on all datasets regarding to both HR and NDCG. ONFM-1 adds the third-order interactions between features based on ENSFM. It is noted that ONFM-1 uses shared embedding. Compared with ENSFM, its performance is better, which indicates the effectiveness of the third-order interactions. On the basis of ONFM-1, ONFM-2 introduces the technique called Order-aware Embedding. The performance is improved, indicating that using order-aware embedding is a better choice. ONFM-3 is the final form of our model, which adds the third-order interactions and the fourth-order interactions meanwhile, and also use Order-aware Embedding. Compared with ENSFM, the performance of ONFM-3 is excellent.

5 Conclusion and Future Work

In this paper, we propose a novel model named Order-Aware Embedding Non-Sampling Factorization Machines. The key design of ONFM is to transform FM model incorporating the high-order interactions into a MF form through mathematical transformation. Then we can get three auxiliary vectors—\(\mathbf {p}_{u}\), \(\mathbf {q}_{v}\) and \(\mathbf {h}_{aux}\). \(\mathbf {p}_{u}\) and \(\mathbf {q}_{v}\) are only related to the corresponding user and item. We also use Order-aware Embedding. Finally, through some optimization methods, we apply non-sampling to train ONFM. Extensive experiments on two datasets demonstrate that ONFM obtains effective feature information successfully.

Although the results of ONFM illustrate the importance of the high-order interactions, the way to calculate the high-order interactions is crude. In the future, we will design a more excellent method to calculate the high-order interactions. Moreover, different feature interactions have different influence on the accuracy of the final prediction. So in order to better extract feature information, we are also interested in applying attention mechanism to our model.