Keywords

1 Introduction

Recommendation systems strive to recommend items that users are willing to purchase or interact with in modern e-commerce applications, however, the large number of user interactions (such as browsing, clicking, collecting, shopping carts, purchasing) in e-commerce makes it a great challenge to identify the users’ consumption patterns and interest preferences.

Existing recommendation systems model the user-item interactions mainly in two ways. The first one is traditional recommendation systems that focus on mining static associations from user-item interactions, which are represented by collaborative filtering models [3,4,5]. The second one is sequential recommendation systems [6, 7] that try to capture the patterns hide in the successive user’s interaction behaviors, and the user’s long-term preferences and short-term preferences can be dynamically represented. Specifically, the user’s long-term preferences reflect in the historical behaviors and the short-term preferences are determined by recent purchases items.

Although the existing sequential recommendation models have achieved promising results, two shortcomings are remained. Firstly, these methods directly use the item sequence to represent the relationship between items. However, they fail to take into account that different users pay attention to different aspects. Secondly, most of the existing models neglect users’ instant interests. Specifically, instant interests refer to an instant and specific purchase demand and are more helpful than the short-term preferences. For example, we can learn from the Q&A function in Amazon’s earphone information page, as shown in Fig. 1, user’s search contents (“airplane use”, “workouts”) reflects the attention to different aspects when having the same instant purchase needs (“earphone”), which can’t be captured in their historical behaviors and more accurate than short-term preferences.

Fig. 1.
figure 1

Q&A in an item page of Amazon. User1 and user2 have different instant requirements when buying an earphone.

To address the above challenges, we propose a sequential recommendation model MGCN4REC. Firstly, inspired by the works in user and item representation learning [8,9,10,11], we model the users’ historical interaction sequence as a multi-graph (user relationship graph, item relationship graph, and user item bipartite graph), and learn a more feasible representation of each user and item through graph convolution network. Secondly, we use a recurrent neural network to model users’ preferences from historical interaction sequence and instant interests from users’ question-and-answer data. Finally, we use attention mechanism to aggregate users’ preferences and instant interests and make the recommendation.

In summary, the main contributions of this paper are as follows:

  • This paper proposes a sequential recommendation model (MGCN4REC) based on preferences and instant interests. We leverage instant interests to better represent short-term preferences and uses the attention mechanism to aggregate preferences and instant interests.

  • This paper firstly models the user-item interaction sequence as a multi-graph structure, and leverages a graph convolution network to learn the embedding representation of users and items in the same feature space.

  • To evaluate our model, experiments are conducted on Amazon’s real dataset and the results show that MGCN4REC outperforms than the state-of-art baselines over 15%.

2 Related Work

Among many Internet-based services [12] such as e-commerce, the recommendation system plays a vital role, and it has also attracted widespread attention in industry and academia. The related work in this paper mainly includes two aspects of representation learning and sequential recommendation system.

2.1 Representation Learning

For user and item representation learning, the recommendation system evolves according to the embedding between different entities in addition to three models: user-based recommendation based on user similarity, item-based recommendation based on user similarity of item interactions, and user extraction Interact with the characteristics of items and recommend model-based items with these characteristics. All three models inevitably need to embedding users and items to obtain the user-user, item-item, and user-item similarity. The user-based usage scenario is that the number of users is much larger than the number of items and few items appear, mainly focusing on the calculation of user-user similarity. On the contrary, the item-based usage scenario is that the number of items is much larger than the number of users and few new users appear, focusing on calculating items-items similarity. Model-based mainly focuses on calculating the correlation between users and items.

Barkan \( et \;al. \) [9] consider the user’s historical interaction sequence as a natural sentence and they propose the item2vec, which maximizes the probability of the appearance of other items in the window under each central item to learn the vector representation of each item. Perozzi \( et \;al. \) [10] converts the historical interaction sequence of the user into a graph and they propose deepwalk, which utilize deep first search (DFS) on each node to obtain multiple item sequences and learn the vector representation of each item through item2vec. Different from DeepWalk, Grover \( et\; al. \) propose an item sequence sampling method by jointly utilizing broad first search (BFS) and deep first search (DFS), called Node2Vec, which allocates sampling probabilities according to the distance between items and the source item. After getting the sequence, they use Item2Vec to learn the representation of each item.

Most of the existing works focus on learning the similarity between items or learning the low-dimensional embedding of users and items separately. One challenge exists when directly combining the embedding of the user and his item sequence since they are in different vector space, making it impossible to directly use their inner product to get a recommendation list. Wu \( et \;al. \) [13] propose SR-GNN which constructs the user’s interaction sequence of each session into a directed graph, and then utilizes a gated neural network to learn the representation of each node and users’ current session. Grbovic \( et\; al. \) [14] propose list embedding which personalizes recommendations by leveraging Word2vec to model users and items into a sequence. However, due to the specificity of Airbnb data, the method is not robust to our work.

Therefore, this work intends to learn the embedding representation of each node from the user relationship graph, item relationship graph, and user-item interaction bipartite graph through graph convolutional neural network, so as to obtain the similarity between users and items.

2.2 Sequential Recommendation System

In recent years, researchers have begun to focus on various sequential recommendation scenarios, such as the next-basket recommendation [15, 16], session-based recommendation [17,18,19], and next item recommendation [7, 20].

The existing works of sequential recommender systems are mainly based on sequential pattern mining. Due to the huge success of deep neural networks in the past few years, sequential data modeling methods have made great progress in several applications such as NLP, social media, and recommender systems. For example, Wang \( et\; al. \) [16] make the next-time recommendation by hierarchical interactions, which captures the sequential patterns and the user’s overall preferences through historical interaction sequences. Hidasi \( et \;al. \) [17] utilize recurrent neural networks (RNNs) to model the entire session and then build a session-based recommender system, which have significantly better performance than item-based methods. Tang \( et \;al. \) [19] utilize the horizontal convolution kernel and vertical convolution kernel in the convolutional neural network (CNN) to capture the user’s long-term interest and short-term interest and make recommendations respectively. Li \( et \;al. \) [20] utilize the w-item2vec method to learn the vector of items, and then expresses the user’s long-term and short-term preferences and makes recommendations. Although these models consider the user’s sequence information, there are still shortcomings in terms of representing user long-term preferences and short-term preferences. In contrast, the method proposed in this paper combines the user’s preferences with the instant interests to make recommendations. The preferences and instant interests are obtained from the user’s historical interaction sequence with the item and the user’s current question text.

In summary, the main work of this paper is to make a unified representation of users and items, and calculate their correlations. Based on this vector, model users’ preferences and instant interests to perform sequential recommendations.

3 MGCN4REC Framework

3.1 Preliminaries

Let \( U = \left\{ {u_{1} ,u_{2} , \ldots ,u_{n} } \right\} \) denotes the set of users, \( I = \left\{ {i_{1} , i_{2} , \ldots , i_{m} } \right\} \) denotes the set of items, where n and m denote the number of users and items respectively. In addition, \( E = \left\{ {e_{1} , e_{2} , \ldots , e_{{\left( {m + n} \right)}} } \right\} \) denotes the embedding vector of all users and items, where \( e_{u} \) is the embedding vector of a user (\( e_{u} \in R^{d} ,u \in U \)), and \( e_{i} \) is the embedding of the item \( i_{i} \) (\( e_{i} \in R^{d} , i \in I \)). For each user \( u \in U \), we assume that his/her enquiry time is \( t_{q} \). Its historical baskets before the question time \( t_{q} \) are represented as:

$$ B_{{ < t_{q} }}^{u} = \{ \left[ {e^{u} ,b_{{t_{1} }}^{u} \left] , \right[e^{u} ,b_{{t_{2} }}^{u} \left] {, \ldots ,} \right[e^{u} ,b_{{t_{q - 1} }}^{u} } \right]|b_{{t_{j} }} \subset I\} $$

For each basket, there are usually more than one item in it (\( |b_{{t_{j} }} \left| \right\rangle 1 \)), Therefore, we utilize the mean pooling operation to present the embedding of the shopping basket at each timestep:

$$ b_{{t_{j} }} = \frac{1}{{\left| {b_{{t_{j} }} } \right|}} \mathop \sum \limits_{k = 1}^{{\left| {b_{{t_{j} }} } \right|}} e_{{i_{k} }} ,\,e_{{i_{k} }} \in E $$

where \( \left| {b_{{t_{j} }} } \right| \) represents the number of items in the shopping basket at the timestep \( t_{j} \), and \( e_{{i_{k} }} \) represents the embedding representation of the item \( i_{k} \) in the user’s shopping basket. Similarly, the basket after \( t_{q} \) is donated as \( b_{{t_{q + 1} }}^{u} \).

To recommend items with preferences and instant interests, we formalize the problem as the following objective function:

$$ L = \sum\limits_{u} {L_{u} (f(B_{{ < t_{q} }}^{u} ,s^{u} ),b_{{t_{q + 1} }}^{u} )} $$
(1)

Formula (1) is used to indicate the difference between the recommended item and the actual purchased item at the next time \( t_{q + 1} \): where \( f \left( \cdot \right) \) represents a function that predicts the probability of being recommended for other items from an aggregated preferences of traditional preferences and instant interests; \( L_{u} \left( \cdot \right) \) represents the loss between the recommended items and the actual purchased items in this method. However, users may not immediately purchase the items they are interested in after asking questions: from the conclusion of Caser [19], it can be known that there is a skipping behavior in the user’s interaction (the previous several sequence behaviors do not directly affect the purchase behavior at t + 1, but affect the behavior at t + 2 or even t + 3). Based on this, the aggregated preferences of traditional preferences and instant interests may affect the purchasing behavior at multiple moments after \( t_{q + 1} \). Therefore, we redefine the objective function as formula (2), where \( B_{{t > t_{q} }}^{u} = U_{{t > t_{q} }} b_{t}^{u} \) represents a group of items to be purchased after time \( t_{q} \).

$$ L = \sum\limits_{u} {L_{u} (f(B_{{ < t_{q} }}^{u} ,s^{u} ),B_{{ > t_{q + 1} }}^{u} )} $$
(2)

In this section, we formally define next-basket recommendation problem, and present the MGCN4REC system in detail, as shown in Fig. 2. MGCN4REC is mainly composed of three modules: multi-graph-based joint embedding of users and items, preferences and instant interests modeling, attention-based preferences aggregating. Firstly MGCN4REC converts the user’s historical interaction sequence into a graph, including a user relationship graph taking users as the nodes and the number of items that two nodes interact as the edge weights, an item relationship graph taking items as the nodes and the number of users that two nodes are purchased as the edge weights, the user-item bipartite graph taking users and items as the nodes and the interaction between user and item as edges. Secondly, the graph convolutional network(GCN) is used to learn the representation of each node from three types of graphs, which can obtain the similarity between the users and items. Afterwards, the preferences of users are obtained through a Bi-RNN, and the users’ instant interests are analyzed from the user’s question text at the current moment. Finally, the user’s preferences and instant interests are aggregated based on the attention mechanism for accurate recommendations.

Fig. 2.
figure 2

Architecture of the MGCN4REC system.

3.2 Multi-graph-Based User-Item Representation

At the first stage of GCN-Rec, the purpose of multi-graph-based user item vector representation learning is to generate a unified representation for each user and project by learning the similarity between them from a large number of user historical interactions. In pervious works, sequential recommendations have always used one-hot coding or added an additional embedding layer to the deep learning architecture for items representation [17, 22]. However, for item sets on large e-commerce platforms, on the one hand, the one-hot coding network may cost expensive time, and it is often not well optimized due to the high sparsity [23]. On the other hand, the additional embedding layer will cause a certain loss of neural networks [17]. The item2vec mines similarities between items from user’s historical interaction sequences, but it ignores the user interaction intensity for different items, which cannot effectively represent the item features with less interaction and user features. However, the graph neural networks (GNN) can learn from different types of graphs to the rich structural information and the relationships between different nodes, so MGCN4REC uses graph convolutional networks to abstract the representations of users and items.

Different types of heterogeneous graphs can present the characteristics of different dimensions of nodes. For example, we can learn the similarity between items from the shopping relationship graph, which represents those items may be purchased together. Furthermore, we could learn the competition among items from the attention item graph, meaning which item is selected from multiple items. Therefore, this paper uses the multi-graph to represent the relationship between items.

For the representation of items, we first get the sequence of the user’s interaction history, \( s_{u} = \left\{ {i_{1}^{u} ,i_{2}^{u} ,i_{3}^{u} , \ldots ,i_{n}^{u} } \right\} \)(u denotes the index of the user), and the purchase relationship graph between items is represented from the historical interaction sequence, \( G_{bought\_seq} = \left\{ {I,e_{bought\_seq} } \right\} \)(I denotes the item set and \( e_{bought\_seq} \) denotes the edge set, \( e_{seq} = \left\{ {\left( {i_{1}^{1} ,i_{2}^{1} } \right),\left( {i_{2}^{1} ,i_{3}^{1} } \right), \ldots \left( {i_{{\left( {n - 1} \right)}}^{\left| U \right|} ,i_{n}^{\left| U \right|} } \right) } \right\} \), \( \left( {i_{{\left( {n - 1} \right)}}^{\left| U \right|} ,i_{n}^{\left| U \right|} } \right) \) denotes that \( i_{{\left( {n - 1} \right)}}^{\left| U \right|} \) has the link with \( i_{n}^{\left| U \right|} \)). Second, we create a graph of the relationships between simultaneous purchases \( G_{also - bought} = \left\{ {I, e_{also - bought} } \right\} \), where \( e_{also - bought} \) indicates that there is an edge between other purchased items at the same time and this item. Third, we create a graph between purchased items and simultaneous browsing items \( G_{also - view} = \left\{ {I, e_{also - view} } \right\} \), where \( e_{also - view} \) indicates an edge between the other items browsed and the purchased item. Then calculate the adjacency matrix \( A^{\left| I \right|*\left| I \right|} \) and degree matrix \( D^{\left| I \right|*\left| I \right|} \) of each graph, using the graph convolutional neural network for each graph.

Specifically, the graph convolutional network aggregates the feature information of its neighbor nodes to update its own features for each node in the graph, and then nonlinearly transform the feature information to gain robust feature representations. The feature update function is as follows:

$$ h_{{v_{i} }}^{l + 1} = \sigma (\sum\limits_{j} {\frac{1}{{c_{ij} }}h_{{v_{j} }}^{l} W^{l} } ) $$
(3)

where j indicates the neighboring nodes of \( v_{i} \), and \( c_{ij} \) is a normalization constant for the edge \( \left( {v_{i} ,v_{j} } \right) \), originating from the symmetrically normalized adjacency matrix \( D^{{ - \frac{1}{2}}} AD^{{ - \frac{1}{2}}} \) in MGCN4REC.

Convert the update function to compute the nodes in the entire graph, so every neural network layer can then be described as a non-linear function:

$$ H^{l + 1} = f(H^{l} ,A) $$
(4)

with \( H^{0} = X \) and \( H^{L} = Z \). L is the number of layers, and X is the input features of nodes. In this paper, we use all aspects extracted from all reviews of each item to represent its initial features, and utilize each user’s sentiment scores on all aspects to represent the user’s initial feature. The specific propagation rule in convolutional layers \( f\left( \cdot \right) \) can be written as:

$$ f(H^{l} ,A) = \sigma (\widehat{D}^{{{ - }\frac{1}{2}}} \widehat{A}\widehat{D}^{{{ - }\frac{1}{2}}} H^{l} W^{l} ) $$
(5)

where \( W^{l} \) is a weight matrix for the neural network layer \( layer_{l} \) and \( \sigma \left( \cdot \right) \) is a non-linear activation function such as ReLU. \( \hat{A} = A + E \), where E is the identity matrix and \( \hat{D} \) is the diagonal degree matrix of \( \hat{A} \), i.e. \( \hat{D} = D + E \).

Since multi-graph can learn the features of nodes from different perspectives, the convolutional neural network is used to the graph \( G_{bought\_seq} \) of the relationship between items purchased sequences, the graph \( G_{also - bought} \) of the relationships between simultaneous purchases and the graph \( G_{also - view} \) between purchased items and simultaneous browsing items to obtain the representation of each item:

$$ Z_{item} = \left[ {Z_{seq} , Z_{also - bought} ,Z_{also - view} } \right] $$

For the representation of the user, repeat formulas (3)–(5), and will get the embedding for each user representing \( Z_{user} \).

Take \( Z_{user} \) and \( Z_{item} \) as X, use the graph convolutional network for the bipartite graph of users and items interaction, and repeat formula (3)-(5) to obtain the embedding representation of all users and items:

$$ Z = \left\{ {z_{{u_{1} }} ,z_{{u_{2} }} , \ldots ,z_{{u_{m} }} , z_{{i_{1} }} ,z_{{i_{2} }} ,..,z_{{i_{n} }} } \right\} $$

3.3 User Preferences and Interests Modeling

The user’s decision-making process is mainly affected by two factors: preferences and instant interests. In fact, the user’s interaction process is a series of implicit feedback over time. Therefore, unlike traditional recommendation systems that explore user-item interaction from a static way, this paper uses serialized modeling to process the next basket recommendation problem. Specifically, the system designs two models of preferences and instant interests learning to distinguish the user’s preferences from the instant interests, and then makes personalized recommendations by combining the user’s preferences and instant interests.

Preferences Modeling

As the preferences of the user runs through the entire sequence of the user, they depend not only on the previous purchase sequence but also on the future purchase sequence. Inspired by the bidirectional RNN [24], this paper applies CLSTM to a bidirectional architecture Bi-CLSTM, making full use of the interaction sequence of the forward and backward directions to express the preferences of the user. Therefore, the system developed a deep neural network based on GRU [25] to model the stable preferences of users. After initialization, the hidden state \( h_{j} \) of each interaction is updated from the previous hidden state \( h_{j - 1} \) at the jth time of the interaction:

$$ \begin{aligned} & i_{j} = \delta (W_{vi} b_{j} + W_{hi} h_{j - 1} + W_{ci} c_{j - 1} + \widehat{{b_{i} }}), \\ & f_{j} = \delta (W_{vf} b_{j} + W_{hf} h_{j - 1} + W_{cf} c_{j - 1} + \widehat{{b_{f} }}), \\ & c_{j} = f_{j} c_{j - 1} + i_{j} \tanh (W_{vc} b_{j} + W_{hc} h_{j - 1} + \widehat{{b_{c} }}), \\ & o_{j} = \delta (W_{vo} b_{j} + W_{ho} h_{j - 1} + W_{co} c_{j} + \widehat{{b_{o} }}), \\ & h_{j} = o_{j} \tanh (c_{j} ) \\ \end{aligned} $$
(6)

where \( i_{j} \), \( f_{j} \), and \( o_{j} \) correspond to the input gate, forget gate, and output gate of the GRU, \( b_{j} \) is the vector representation of the basket at the current time, \( c_{j} \) is the value of the GRU memory unit, \( \hat{b} \) is the bias term, and \( h_{j} \) is the hidden status of timestep \( t_{j} \).

At each time \( t_{q} \), calculate the hidden state \( \overrightarrow {{h_{{t_{q} }} }} \) of the forward RNN based on the previous hidden state \( \overrightarrow {{h_{{t_{q} - 1}} }} \) and the corresponding basket representation \( b_{{t_{q} }} \) of the current time; calculate the hidden state \( \overleftarrow {{h_{{t_{q} }} }} \) of the forward RNN based on the back hidden state \( \overleftarrow {{h_{{t_{q} + 1}} }} \) and the corresponding basket representation \( b_{{t_{q} }} \); therefore, the hidden state \( h_{{t_{q} }} \) of the bidirectional RNN at each moment \( t_{q} \) can be expressed by the hidden state of the forward RNN and the hidden state of the backward RNN:

$$ h_{{t_{q} }} = concatenate(\overrightarrow {{h_{{t_{q} }} }} ,\overleftarrow {{h_{{t_{q} }} }} ) $$
(7)

Represent the preferences of user u through the average pooling layer as

$$ preference^{u} = average(h_{1} ,h_{2} , \ldots ,h_{{t_{q} }} ) $$
(8)

Instant Interests Modeling

As shown in the user question data of Fig. 1, the user’s question can clearly reflect the user’s instant interests and the characteristics most interested in the item in the recent period.

Instant interests are represented by the embedding vector of the item being questioned at the time \( t_{q} \) and the vector of the question text:

$$ interest^{u} = [e_{{t_{q} }} |Score_{{a_{1} }} ,Score_{{a_{2} }} , \ldots ,Score_{{a_{m} }} ],\quad e_{{t_{q} }} \in {\text{E}} $$
(9)

where \( Score_{{a_{i} }} \) is the emotional score of the \( aspect_{i} \) in the user’s question text.After obtaining the preferences and instant interests of user u at the question time \( t_{q} \), by combining the stable preferences and dynamic instant interests, the aggregated preferences \( P_{{t_{q} }}^{u} \) of user u at the question time \( t_{q} \) can be obtained, and then through the fully connected layer to make personalized recommendations.

3.4 Attention-Based Preferences Aggregating

If simply using the user aggregation preferences \( P_{{t_{q} }}^{u} \) obtained in Sect. 3.3 for recommendation, it means that the degree of dependence on each user’s preferences and instant interests is same. But in fact, the impact of preferences and instant interests on the final decision-making behavior of users is essentially different. Among them, instant interests determine the items that users want to buy, and preferences determine the details of item. And may be different for different users depend on preferences and instant interests, user A may be more in line with historical habits when shopping, while user B may prefer novel items. Therefore, the system introduces an attention mechanism [2] to aggregate the user’s preferences and instant interests. The aggregated preferences after the introduction of the attention mechanism is expressed as follows:

$$ Aggre \, P_{{t_{q} }}^{u} = \beta [preference^{u} ,interest^{u} ]^{T} $$
(10)
$$ \beta_{i}^{u} { = }\frac{{exp(Aggre \, p_{i}^{u} )}}{{\sum\nolimits_{j = 1}^{||preference| + |interest||} {exp(Aggre \, p_{j}^{u} )} }} $$
(11)

Finally, the system uses a fully connected layer to find the relationship between the aggregation preferences \( P_{{t_{q} }}^{u} \) and the target term, as shown in formula (12). And \( \hat{y}^{u} \) represents the probability of user u interacting with the item after asking a question:

$$ \widehat{y}^{u} = sigmoid(W{ (}Aggre \, p^{u} )^{T} + b) $$
(12)

3.5 Model Training

The Loss function of this system includes two parts: the first is the difference between the recommendation made by aggregated preferences and the actual purchase of items as shown in formula (2); the other part introduces an auxiliary loss function that is he error between the predicted preferences and the real preferences at each t + 1 moment in preferences modeling. The user’s preferences modeling is introduced in the Sect. 3.3.1. The GRU layer can obtain the set of items that may be purchased at time t + 1. The system reduces the error between the predicted set and the real set at each time, so that the recommendation can obtain better accuracy. The set of items that may be purchased at t + 1 can be expressed as:

$$ \widehat{\text{b}}_{t + 1}^{u} = f_{R} (b_{t}^{u} ,h_{t} ) $$
(13)

where \( f_{R} \left( \cdot \right) \) is a function based on the RNN structure, \( h_{t} \in R^{d} \) is the hidden state at time t, and it is a dynamic representation of user preferences. In this article, \( f_{R} \left( \cdot \right) \) selects GRU. At the same time, we introduce the error between the predicted value and the true value at each time:

$$ L_{R} = \sum\limits_{t = 1}^{{t_{q} - 1}} {L_{t} (\widehat{b}_{t}^{u} ,b_{t}^{u} )} $$
(14)

where \( L_{R} \left( \cdot \right) \) is a loss function that measures the error between predicted preferences and true preferences. A cross-entropy loss function is used in this system, as shown in formula (15):

$$ L = \sum\limits_{u} {(\gamma L_{u} (\widehat{y}^{u} ,B_{{t > t_{q} }}^{u} ) + (1 - \gamma )L_{R} )} $$
(15)

where \( \gamma \in \left( {0,1} \right) \) is a parameter that balances the relative importance of the two-part loss function. Due to the large number of items, if all negative samples are used to reduce the Loss function, the time complexity and space complexity of training will be too high. Therefore, in this paper, the system uses the negative sampling technique to train the \( L_{U} \left( \cdot \right) \) function.

4 Experiment

4.1 Dataset

We collect amazon reviews and q&a records in [26]. Users are used to comment on items they buy and subsequent users can take comments as an advice. We first obtain the comment records to obtain the initial characteristics of the user and the item, and then leverage the user’s history to construct the commodity graph and the user graph. To extract features from item review data, we first used Stanford CoreNLP (A kind of NLP’s analysis tool) to extract feature words and emotional scores (on a scale of 1 to 5), and then we use TF-IDF technique to select the feature with the highest frequency as the feature of the current item.

We conduct experiments with two sub-datasets of the amazon dataset, Electronics and Baby, in [26]. Since our model is mainly aimed at recommending to users after asking questions, we filter out users who have never asked a question. The statistics of the two datasets are shown in Table 1.

Table 1. Statistics of dataset

4.2 Baselines

As the method in this paper firstly includes the representation learning of users as well as items and the sequential recommendation module, we compare our model with two user-item representation models and two sequential recommendation models:

  • Item2vec [9], which represents item vector based on Word2Vec and we use it to replace the GCN module in this article as the baseline;

  • node2vec [11], which represents item vector based on GNN. Based on depth-first search and breadth-first search, the model constructs sequences by exploring nodes that are similar to the structure and essence of the target node, so as to obtain the representation of each node and we use it instead of the GCN module as the baseline in this paper;

  • DREAM [18], which is a sequential recommendation model based on CNN while instant interest is not considered. We take the historical interaction as the input of LSTM for personalized recommendation;

  • Caser [19] ,which is a sequential recommendation model based on CNN and embeddings of both item and user are considered. In order to adapt to the problems in this paper, user embedding represents users’ stable long-term preferences. Caser’s vertical convolution kernel and horizontal convolution kernel are used to extract users’ short-term preferences, which is different from the immediate interest of this paper;

  • BINN [20], which is a sequential recommendation model based on RNN and contains the item2vec module to learn the item representation. Bidirectional RNN is used to model the user’s long-term preferences, and the recent session is used to model the user’s short-term preferences, which is different from the instant interest of this paper.

4.3 Evaluation Metric

To evaluate the performance of the MGCN4REC model in this article, we use Recall @K and HT@K, which are widely used in recommendation systems. Given the top-K recommendation results \( R_{K}^{u} \) of the user \( u \), the calculation formula is shown in (16):

$$ \begin{aligned} & Recall@K = \frac{1}{|U|}\sum\limits_{u} {\frac{{|R_{K}^{u} \cap B_{{t > t_{q} }}^{u} |}}{{|B_{{t > t_{q} }}^{u} |}}} , \\ & HR@K = \frac{{I(R_{K}^{u} \cap B_{{t > t_{q} }}^{u} \ne 0)}}{|U|} \\ \end{aligned} $$
(16)

where \( \left| {B_{{t > t_{q} }}^{u} } \right| \) is the number of purchases after user asks, and|U| is the number of users.

4.4 Experimental Results

Performance Comparison

The results of the MGCN4REC model and comparative algorithms are as shown in Table 2, in which the first column represents the two data sets, the second column shows performances on different evaluation metrics: each index of the optimal data is in bold and suboptimal results are shown underlined. The last column represents the improvement of MGCN4REC model compared with the suboptimal results.

Table 2. Performance comparisons of MGCN4REC with baseline methods on two datasets (Bold scores are the best in each row, while underlined scores are the second best).

According to the experiment results, the performances of the 3 baselines (Item2vec, node2vec, BINN) with the item representation are almost higher than that of the remained two baselines, which proves that it is very important to study the representation of the user and the item in recommendation system. In addition to the MGCN4REC algorithm proposed in this paper, Item2vec with aggregated preferences has achieved almost the best performance in all metrics. This is because item2vec algorithm learns item representation and models users’ long- and short-term preferences. At the same time, it proves that the user-item representation learning based on multi-graph are effective. Compared with the recommendation system with long- and short-term preferences aggregation, DREAM has the lowest performance, because it only considers users’ long-term preferences.

In the Baby dataset, our MGCN4REC model achieves an average improvement of more than 10% in the performance comparing with the other baselines in all metrics. On the Electronics dataset, MGCN4REC outperforms than the baselines on all metrics: each metric is improved by an average of 15% over the comparison algorithm that works best with the same parameters.

Impact of Multi-graph

We also study the effect of multi-graph user-item representation on the performance of the recommendation system. Table 3 shows the comparison results of MGCN4Rec with and without multi-graph on HR@10.

Table 3. MGCN4Rec with and without multi-graph on HR@10

According to the Table 3, our model MGCN4Rec with multi-graph outperforms than without multi-graph model over 10% on HR@10. Therefore, the user-item representation based on mulit-graph can better learn the similarity between users and items, then improving recommendation performance.

Impact of Preferences Dimensionality d

We also study the effect of user preferences dimension d on the performance of the recommendation system. Figure 3 shows HR@10 for MGCN4REC and other baselines with the preferences dimensionality d varying from 16 to 256 while keeping other optimal hyper-parameters unchanged. We make some observations from this figure.

Fig. 3.
figure 3

Effect of the preferences dimensionality d on HR@10 for neural sequential models.

The performance of each model tends to converge as the dimensions increase, and larger dimensions do not necessarily lead to better model performance. Moreover, the model presented in this paper is superior to the baseline in almost all dimensions.

5 Conclucion

This paper proposes a sequential recommendation model based on multi-graph to learn the representation of users and items, and then to model the preferences and instant interest simultaneously. For the part of user-item representation learning, this paper utilize graph convolutional neural network (GCN) to learn the similarity between user-item representation from different graphs. For preferences, this paper utilize recurrent neural network (RNN) to learn users’ stable preferences from the shopping basket of historical interaction sequence. For instant interests, this paper utilize the user’s question text to model the user’s instant interests. In order to aggregate preferences and instant interests, attention mechanism is introduced to calculate users’ attention distribution, so as to obtain users’ dependence on these two preferences. By comparing with two state-of-art methods of representation learning and three sequential recommendation systems, the results show that our MGCN4REC model can more effectively represent users and items, thus making more effectively recommendation.