Abstract
The sequential recommendation task based on the multi-interest framework aims to model multiple interests of users from different aspects to predict their future interactions. However, researchers rarely consider the differences in features between the interests generated by the model. In extreme cases, all interest capsules have the same meaning, leading to the failure of modeling users with multiple interests. To address this issue, we propose the High-level Preferences as positive examples in Contrastive Learning for multi-interest Sequence Recommendation framework (HPCL4SR), which uses contrastive learning to distinguish differences in interests based on user item interaction information. In order to find high-quality comparative examples, this paper introduces the category information to construct a global graph, learning the association between categories for high-level preference interest of users. Then, a multi-layer perceptron is used to adaptively fuse the low-level preference interest features of the user’s items and the high-level preference interest features of the categories. Finally, user multi-interest contrastive samples are obtained through item sequence information and corresponding categories, which are fed into contrastive learning to optimize model parameters and generate multi-interest representations that are more in line with the user sequence. In addition, when modeling the user’s item sequence information, in order to increase the differentiation between item representations, the category of the item is used to supervise the learning process. Extensive experiments on three real datasets demonstrate that our method outperforms existing multi-interest recommendation models.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
1 Introduction
The sequence recommendation (SR) [1, 2] task treats the interaction between users and items as a dynamic sequence, capturing interests of user over time by modeling sequence dependencies, and predicting future user interactions. Sequence recommendation provides services in many aspects of daily life, such as e-commerce [3, 4], news media [5, 6], video music [7], and social networks [8].
Hence, a variety of SR models, including both shallow and deep models, have been proposed to improve the performance of sequential recommendations. Specifically, Recurrent Neural Networks built on Gate Recurrent Units (GRU) have been employed to model the long- and short-term point-wise sequential dependencies over user-item interactions for next-item recommendations [9, 10]. Convolutional Neural Network (CNN) [11], self-attention [12, 13] and Graph Neural Network [14, 15] models have been incorporated into sequential recommendation systems for capturing more complex sequential dependencies for further improving the performance.
When modeling user interests using the above structure, it is common to represent user interests as low dimensional embeddings, which contradicts the fact that each user may have multiple interests in reality [16]. Some studies [3, 17,18,19] propose capturing multiple user interest vectors from different aspects instead of a single vector. These methods explicitly generate users’ diverse interest representations from their behavior sequences, breaking the representation bottleneck of using a single generic user embedding. Although these solutions have achieved significant performance improvements, they have not taken into account the differences between multiple interests. In the worst-case scenario, all interest capsules have the same meaning and cannot reflect the diversity of user interests. Recent work has further improved the modeling ability of multiple user interests through routing regularization [20]. However, as shown in Figure 1, we calculated the similarity between multiple interest feature vectors generated by REMI [20] and HPCL4SR model on the Amazon-Clothing dataset, and randomly selected 128 users for statistical analysis. We found that although REMI [20] models can also represent multiple interests of users, the representation of user interests has great similarity, and HPCL4SR significantly reduces the similarity between multiple interests of users. In other words, our model can better represent multiple different interests of users, increasing the diversity between interests.
Concretely, we propose a novel multi-interest sequence recommendation framework (named HPCL4SR). HPCL4SR models users’ high-level preference interests by constructing a global graph of categories, and feeding them as positive examples into contrastive learning, optimizing multiple interest representations of users. Specifically, based on the item sequence behavior of all users, the categories of items are constructed as global graph information for high-level preference interest modeling. In this process, in order to alleviate the imbalance problem in class interaction, attention weights are used to reconstruct the adjacency matrix, and multi-hop aggregation is performed on categories that are not directly connected to each other, reducing the sparsity of the interaction matrix and increasing the correlation between classes. For the user’s item sequence, the context information of the sequence is obtained by encoding the position of the item and fusing attention weights. Then, capsule networks are used to learn the item sequence information for low-level interest preference modeling. In order to further enhance the vector representation ability of items, the network model parameters are reverse optimized using the category information of items as labels. Naturally, we will integrate high-level and low-level preference interest to generate multiple interest features for users. Contrastive learning can maximize the similarity between related samples and minimize the similarity between unrelated samples. This paper draws inspiration from this idea but distinguishes multiple interests of users through learning while not treating interests as completely unrelated and preserving the hidden correlation information between interests. Therefore, we use low-level preference interest corresponding to the fused features of high-level preference interest as positive examples, and other preference interest as negative examples to learn the differences between user interests.
To summarize, the contributions of this paper are as follows:
-
We propose a new novel interest sequence recommendation framework (HPCL4SR), which solves the problem of existing methods not being able to represent the multiple interests of users.
-
We construct a global graph based on the category information of items to model the user’s high-level preference interest. In addition, using it as a positive example in contrastive learning is a relatively optimal approach.
-
We conduct extensive experiments on three real-world datasets to verify the effectiveness of the HPCL4SR. Further analysis has demonstrated that the proposed method can more reasonably model the diversity of multiple interests of users.
2 Related work
This paper mainly uses category information as positive examples to solve the problem of differences among multiple interests in sequence recommendation through contrastive learning. Therefore, this section briefly overviews representative efforts relevant to our work from Multi-Interest Sequence Recommendation, Large Language Models for Recommendation, and Contrastive Learning.
2.1 Multi-interest sequence recommendation
In practical scenarios, users’ historical behavior has complex interaction patterns, and modeling interests as a single vector using the above methods is not sufficient to accurately reflect users’ true multi preference interest. Therefore, studying sequential recommendation models based on multiple interests has become more important and practical.
MIND [17] proposes a multi-interest extractor layer based on the capsule routing mechanism, which is applicable for clustering historical behaviors and extracting diverse interests. SDM [21] uses a multi head attention mechanism in encoding behavior sequences to capture multiple interests of the user. SIN [22] adaptively infers user interaction interests from a large number of interest pools and outputs multiple interest embeddings, and then uses the attention weights of items to generate multiple interest embeddings that best match user characteristics. ComiRec [18] is based on multi head attention based multi-interest routing to capture multiple interests of users and introduce controllable factors to achieve diverse recommendations. PIMI [23] models the periodic features of temporal information between user behaviors and the interactive features between sequence items, respectively, and uses their representations to describe users’ multiple interests. DuoRec [24] designes contrastive regularization to reshape the distribution of sequence representations and selectes sequences with the same target item as hard positive samples, alleviating the problem of representation degradation in multi-interest sequence recommendation tasks. UMI [25] believes that the interests of a user are not only reflected in their historical behavior, but also inherently regulated by the profile information. Therefore, the user profiles are introduced as a source of multi-interest features for users. REMI [20] first mitigates the problem of easy negatives with an ideal interest-aware hard negative sampling distribution and an approximation method to achieve the goal at a negligible computational cost. REMI also incorporates a novel routing regularization to avoid routing collapse and further improve the modeling capacity of multi-interest models.
2.2 Large language models for recommendation
Recent years have witnessed the wide adoption of large language models (LLMs) in different fields, especially natural language processing and computer vision. Such a trend can also be observed in recommendation systems (RS). However, due to the huge number of items in real-world systems, traditional RS usually takes the two-stage filtering paradigm of the matching stage (It aims to extract a small subset of items from the extensive corpus with lightweight models, ensuring low computational costs.) and ranking (It utilities more sophisticated models to rerank the retrieved items), advanced recommendation algorithms are not applied to all items, but only a few hundred of items. [26]. Therefore, existing large language models (LLMs) (e.g., ChatGPT) methods [27,28,29] focus on the sorting stage that utilities more Sophisticated models to rerank the retrieved items. We focus on improving the effectiveness of the matching stage, which serves as a crucial foundation for the recommendation systems. To our knowledge, there is currently no research on the application of the LLMs methods in the matching stage. But, in the experimental section, we will attempt to analyze the performance of ChatGPT (ChatGPT 3.5-Turbo-1106 & ChatGPT 4-Turbo) in the matching stage.
2.3 Contrastive learning
Contrastive learning has been widely applied in the field of computer vision. In contrastive learning, methods such as CPC [30, 31] and DIM [32] feed the encoding of the same image at different scales as positive samples, while MoCo [33], SimSiam [34], CaCo [35] and other methods use multiple image enhancements as positive samples for contrastive learning. In the field of text information processing, some studies use different data-transforming methods or strategies, such as dropout and mask, to change the parameters and structure of the encoder to improve the model’s ability to perform sentence representation [36,37,38]. The introduction of contrastive learning in sequence recommendation systems mainly solves the problems of sparse user-item interaction and noise. Scholars improve recommendation performance by designing auxiliary tasks or loss functions [39]. CBiT [40] combines the cloze task mask and the dropout mask to generate high-quality positive samples and perform multi-pair contrastive learning. ICLRec [39] models user intentions through clustering of item sequences, maximizing the agreement between a view of the sequence and its corresponding intentions to improve recommendation performance. CL4SRec [41] employs contrastive learning to learn consistent perception enhancement representations from sequential pattern encoding and global collaborative relationship modeling.
Although researchers are trying to describe users’ various interests in different ways, they rarely consider the issue of diversity in interests. In the worst case, all interest capsules have the same meaning, or all items may activate the same interest capsule, which makes it difficult to express multiple interests. ComiRec [18] uses controllable factors to recommend diverse user interests, but the paper also mentions that increasing diversity can lead to a decrease in recall rates. REMI [20] observe that the interests tend to over-focus on single items in the behavior sequence, which impacts the expressiveness of multi-interest representations. They introduce the variance regularizer on the routing weights to eliminate sparsity and effectively address the problem. MIRACLE [19] forces interest capsules to satisfy orthogonality, which clearly provides each user with K unrelated interests. However, such K interests can cause unnecessary item recommendations for users, which goes against our common sense that there may be implicit correlations between interests. Therefore, it is necessary and meaningful for multi-interest sequence recommendations to preserve implicit correlation information while ensuring the difference between interests. We attempt to solve the above problem through contrastive learning, which distinguishes the differences in interests through self-supervised learning of data features and maintains the correlation information between the representations of interests.
3 Problem formulation
Assume U denotes a set of users, X denotes a set of items, and C is a set of categories. Each item \(x_{i}\) has its corresponding category \(c_{i}\). Given a user \(u \in U\), we have his/her chronological item interaction sequence \(S_{x}^{u}=\left\{ x_{1}^{u}, x_{2}^{u}, \ldots , x_{N}^{u}\right\} \) and a corresponding category interaction sequence \(S_{c}^{u}=\left\{ c_{1}^{u}, c_{2}^{u}, \ldots , c_{N}^{u}\right\} \), where \(x_{t}^{u} \in X\) and \(c_{t}^{u} \in C\) represents the item and its category that user u interacted with at time step t, respectively. N is the maximum sequence length. The candidate matching stage in RS aims to efficiently retrieve a subset of items the user is likely to interact with from the huge item corpus X.
4 Method
In this section, we propose the High-level Preferences as positive examples in Contrastive Learning for multi-interest Sequence Recommendation framework (HPCL4SR), as shown in Figure 2. There are three parts: high-level preference interest extraction module, low-level preference interest extraction module, and multi-interest contrastive learning module.
4.1 High-level preference interest extraction module
Experiments in numerous fields of contrastive learning applications have shown that selecting good positive and negative samples is the key to the effectiveness of contrastive learning. In sequence recommendation tasks, most models use methods such as pruning sequences, dropout, and mask to construct contrastive samples. As shown in Figure 3, the user’s historical interaction is extremely sparse, and such operations will not fully represent the user’s interests and may even result in errors. A large amount of excellent work has proven that taking side information (user profile, category, brand, description, price, position, rating, etc.) into recommendation sequences can better capture user preference information [25, 42, 43]. In real scenarios, item category information is the easiest to obtain and is a high-level conceptual representation of the item. Therefore, in this paper, we will use item category information as a contrastive sample. In fact, even though the number of categories is much smaller than the number of items, the interaction between categories is still sparse. So the method proposed in this paper does not directly take the category sequence corresponding to the item as the user’s high-level preference, but instead models the user’s high-level preference interest by constructing a category global graph, learning more preference interest correlation information from the user.
For user Category sequences \(S_{c}^{u}= \{c_{1}^{u}, c_{2}^{u}, \ldots , c_{N}^{u}\}\), we calculate the number of interactions between \(c_{i}^{u}\) and \(c_{j}^{u}\). And a category global graph (\(A_{1}\)) is constructed based on the historical interaction category sequence \(\{S_{c}^{1}, S_{c}^{2}, \ldots , S_{c}^{|U|}\}\) of all users, where the initial weight of the edges between the two nodes is the total number of interactions between the two categories \(a_{i j}\).
However, such a category global graph (\(A_{1}\)) still has two obvious problems: (1) By analyzing the interaction frequency of categories, it was found that due to the significant difference in the number of items contained in different categories, there is an imbalance in the interaction between categories, which will lead to recommendation results biased towards items in popular categories. (2) The method of constructing a graph through sequential interaction only considers the relationship between adjacent item categories, while ignoring the interaction between non directly adjacent categories. In fact, certain categories that are not adjacent often appear together in the user’s sequence.
In order to alleviate the imbalance of category interaction and avoid the impact of popular items on recommendation results, the adjacency matrix is redefined as follows:
where \(a_{i j}\) denotes the number of interactions with category \(c_{i}\) and category \(c_{j}\), \(a_{i}\) are the number of interactions category \(c_{i}\) with others, and \(a_{j}\) is similar to \(a_{i}\).
To learn the correlation between non-adjacent categories, we adapt the Multi-hop Attention Diffusion [44] method to aggregate information further. The attention score of multi-hop neighbors is calculated by:
where \(\sum _{i=0}^{\infty } \theta _{i}=1(\theta _{i}>0)\), \(\theta _{i}\) is the attention decay factor, \(\theta _{i} > \theta _{i+1}\), i is the power of the adjacency matrix \(A_{2}\) which represents the farthest length of the diffusion relation path and also represents the farthest length of the graph diffusion relation path.
Assume \(H^{(0)} \in {R}^{|C| \times d}\) denotes the initial embedding matrix of the category, and d represents the dimension of the node embedding. We use GCN to aggregate the features of neighbors as a new representation of the target node, and introduce residual connections in this process. The message-passing process is as follows:
where l is the number of GCN layers, W is trainable parameter matrices.
The final graph representation \(\hat{H} \in {R}^{|C| \times d}\) is obtained by:
Based on the category information of user historical interactions, category node embedding representation Hg is selected from the graph:
where, \(Hg \in {R}^{N \times d}\), N is the sequence length of user interaction.
Finally, the user’s high-level preference interest vector \(Q_{u}\) calculated as follows:
where, \(Q_{u} \in \mathbb {R}^{K \times d}\), K is the number of preference interests.
4.2 Low-level preference interest extraction module
In sequence recommendation, positional information can explicitly reflect contextual information between items. Therefore, in this paper, the attention mechanism is first used to encode sequence information:
where \(E_{i}^{emb}\), \(E_{i}^{pos}\) is the embedding of the i-th item, and the positional embedding, respectively, \(X_{i}\) is an item embedding representation with sequence position information.
where \(\alpha _{i j}\) is the attention weight of item j to item i, We use neural networks to make each item in the sequence perceive the entire contextual information.
In multi-interest recommendation tasks, the effectiveness of Capsule Network [45] has been verified, so we directly draw on the above method to extract user low-level preference interests \(P_{u} \in \mathbb {R}^{K \times d}\).
In addition, in order to enable the network model to learn the differences between items, category can be used as a good supervised label. Specifically, we use the first layer information of the Capsule Network as the feature \(Z=\{z_{1}, z_{2}, \ldots , z_{N}\}\) of the item, and use the fully connected layer as the classifier. The output result \(\hat{Z} \in \mathbb {R}^{N \times d}\) can be represented as follows:
Category \(S_{c} = \{c_{1}, c_{2}, \ldots , c_{N}\}\) is used as a label, and the cross entropy loss function calculates the loss of the classifier:
4.3 Multi-interest contrastive learning module
The differentiation processing between interests is the key to achieving multi-interest sequence recommendations. Existing methods rarely consider the differences between interest capsules, resulting in user sequence interest capsules having the same meaning in extreme cases. This paper uses a contrastive learning approach to distinguish between interests while preserving their implicit correlation information. The sequence information is used to represent the true interests of the user in an adaptive fusion.
We assume that the high-level preference interests of the category sequence (\(Q_{u}\)) are consistent with the low-level preference interests of the item sequence (\(P_{u}\)). We use fully connected layers to adaptively fuse them, and obtain the final multi-interest representation (\(M_{u}\)) of the user. Finally, \(P_{u}\) and \(M_{u}\) are selected as two views for contrastive learning.
where \(M_{u} \in \mathbb {R}^{K \times d}\)
Most existing contrastive learning methods are based on InfoNCE:
where \(\tau \) is a temperature hyperparameter, (\(\textbf{h}_i\) , \(\textbf{h}_{i^*}\)) is positive pair, (\(\textbf{h}_i\) , \(\textbf{h}_{j \ne i^*}\)) is negative pair.
However, due to a lack of decision margin, a small perturbation around the decision boundary may lead to an incorrect decision. To overcome the problem, inspired by ArcFace [46], we propose a new training objective for multi-interest contrastive learning by adding an additive angular margin m between positive pair \(\textbf{e}_{i}\) and \(\textbf{e}_{i^*}\). Therefore, (14) can be rewritten as follows:
where m is additive angular margin.
To some extent, more negative samples can lead to better performance in contrastive learning. In this paper, we set {\(i \in M_{u}^i\), \( i^* \in P_{u}^i\)} or {\( i \in P_{u}^i\), \(i^* \in M_{u}^i\)}, \({j \in \{P_{u} \cup M_{u}\}}\). In this way, for a sequence, any i will have (\(2K-2\)) negative samples, and the contrastive loss of multi-interest sequences recommended function is:
4.4 Model training
For the given target item embedding y, we use an argmax operator to obtain the interest that is the most related to the target item through (18):
The loss function between the predicted results of the model and the given target is :
where \(X^{'}\) is the item obtained through sampling softmax objective [47].
The joint loss is defined as a linear combination of these three losses :
where \(\lambda _1\) and \(\lambda _2\) are the hyperparameters to control the impact of different losses.
5 Experiments
5.1 Experimental settings
Dataset We consider three real-world e-commerce datasets. The specific statistics are shown in Table 1.
-
Amazon-ClothingFootnote 1. The Amazon Review Dataset is a classic data set commonly used in recommender systems, which records product reviews. We use the Clothing Shoes and Jewelry subset in our experiment.
-
Tmall-BuyFootnote 2. The Tmall dataset is collected by Tmall.com, which is an online shopping website. It contains users’ shopping history for about six months. We retain users’ purchase behaviors as a subset for experiments.
-
TafengFootnote 3. The Tafeng dataset collects user transaction behavior data from November 2000 to February 2001. The dataset covers everything from food and office supplies to furniture.
Baselines We compare our model with some sequential recommendation methods.
-
GRU4Rec [48]. GRU4Rec is a representative recommendation model that first introduces recurrent neural networks into sequence recommendation.
-
MIND [16]. MIND is one of the first frameworks to model users’ multiple interests based on dynamic routing algorithms.
-
ComiRec [18]. ComiRec is a representative baseline for the multi-interest recommendation. It uses two methods to represent user interests: attention mechanism and dynamic routing.
-
PIMI [23]. Considering the limitations of ComiRec, PIMI introduces the study of periodicity and interactivity of item sequences, capturing both global and local item features.
-
REMI [20]. REMI consists of an Interest-aware Hard Negative mining strategy and a Routing Regularization method to solve the issues of increased easy negatives and routing collapse during the training process.
Evaluation Metrics We use three common accuracy metrics for performance evaluation: Recall, Normalized Discounted Cumulative Gain(NDCG), and Hit Rate(HR). Metrics computation relies on the top 20/50 recommended candidates (e.g., Recall@20). For the three metrics, higher scores demonstrate better recommendation performance.
Implement Details For each dataset, we partition all users into the training, validation, and test sets with a ratio of 8:1:1. The maximum sequence length of the Amazon-Clothing and Tafeng datasets is set to 30, and the maximum sequence length of Tmall-Buy dataset is 20. The user sequence whose length exceeds the maximum value is truncated, and the user sequence whose length is insufficient is filled with 0. We filter users/items with fewer than 12 interactions to guarantee the length of recent sequences. All parameters are set as follows if not otherwise noted: following [20], the learning rate is 0.001, the mini-batch size is 128, the embedding size is set to 64, the interest number K = 4, and Adam is used as a gradient optimizer. We analyze in detail the effects of other hyperparameters in Section 5.4 and ultimately determined their values as \(\lambda _1\) = 0.1, \(\lambda _2\) = 1, \(\tau \) = 0.05, m = 10, respectively.
5.2 Performance evaluation
To demonstrate the recommendation performance of our model HPCL4SR, we compare it with other multi-interest models. The experimental results of three datasets are presented in Table 2. We have the following observations.
First, although the three datasets have different characteristics, HPCL4SR consistently yields the best performance, indicating the robustness of our model. By modeling users’ diverse interests through contrastive learning, even for the Tafeng dataset with limited user interests, we can still fully use the limited information to capture user interests and make optimal recommendations. It proves the effectiveness of finer-grained characterization of user persona contained in a user sequence. Overall, we model the high-level preference interests and low-level preference interests of user sequences, and distinguish the feature representations between interests through contrastive learning, which can more finely characterize users. Considering the meaning behind different user behaviors, that is, the real users of each interaction project, we mine and utilize the information in the user behavior sequence to understand users’ interests from the perspective of project usage, which helps to model users more accurately.
Next, judging from the performance of sequential recommendation models, The performance of PIMI, UMI, and HPCL4SR models on three datasets is superior to most models, such as ComiRec and MIND, indicating that adding side information is beneficial for user modeling.
Finally, on the Tafeng dataset with a small number of items (10,176), models that use a single vector to model user interests (GRU4Rec) outperform simple multi-vector models (MIND and ComiRec). However, the model performance is still worse than the PIMI model using time information and the HPCL4SR model the category information we proposed. The multi-interests model is better than the single interests model on the Amazon-Clothing and Tmall-Buy datasets with many users and items. In addition, the results of the REMI model indicate that selecting high-quality negative samples can bring surprises, but this requires a significant time cost in screening negative samples. Overall, modeling users’ interests from different aspects are better than using only one vector to model users’ overall interests. Because the multi-interest models can provide users with more mixed recommendation results, thereby improving the accuracy of recommendations.
In addition, to better demonstrate the effectiveness of our method, we supplemented the experimental results of CL4SRec [41] and DuoRec [24] in the sequence recommendation task. In Table 3, the experiment demonstrates the effectiveness and ingenuity of using contrastive learning methods, and also demonstrates that modeling users with multiple interests in sequence recommendation tasks can improve recommendation performance.
5.3 Ablation study
In this section, we select the Amazon Clothing dataset to analyze the effectiveness of our proposed method (HPCL4SR). Firstly, we refer to the method in HPCL4SR that only uses item information as the base and the method that uses category as the supervised signal as the base(w \(L_{class}\)). Then, After constructing a global graph based on category to obtain user high-level preference interest information, we further attempted three methods of integrating high-level preference interest (\(Q_ {u} \)) and low-level preference interests (\(P_ {u} \)): addition, multiplication, and adaptive fusion, represented as HPCL4SR(w ’+’), HPCL4SR(w ’*’), HPCL4SR, respectively. Finally, we analyze and consider the contribution of differences in interests to multi-interest sequence recommendation. We not only attempt to replace the multi-interest contrastive learning module with Capsule Regulation [19], denoted as HPCL4SR(w CR), but also analyze the impact of the lack of additive angle margin m, denoted as HPCL4SR(w/o m). The experimental results on three data sets are shown in Table 4. From the table, it can be seen that category, as a supervisory signal, improves certain performance by optimizing the representation of items. When combined with \(Q_{u}\), the performance improvement is significant, especially when using the MLP method. The experiment confirms that the difference between interests is the main reason for affecting the performance of sequence recommendation, and the contrastive learning method shows greater advantages than the Capsule Regularization form due to its ability to distinguish the differences between interests while preserving the correlation information between them.
5.4 Hyper-parameter study
\(\lambda _1\) and \(\lambda _2\) are hyperparameters of the joint loss function during the training process, which directly affect the optimization of model parameters. We selected \(\lambda _1 \in \{0.01, 0.05, 0.1, \) \( 0.5, 1\}\), \(\lambda _2 \in \{0.01, 0.1, 1, 5, 10\}\), and conducted experiments on three datasets using NDCG@50 as the evaluation metric. As shown in the left side of Figure 4, we can see that the best performance is achieved when \(\lambda _1=0.1\). This matches our intuition since using category as the supervisory signal for items is effective, but excessive weight can cause recommendation loss and reduce model performance. As shown in the right side of Figure 4, an increase in the weight \(\lambda _2\) of the comparison loss can help distinguish the differences between multiple interests, but if \(\lambda _2\) is too large, it can also mask the recommendation loss and reduce the model’s recommendation task ability. Therefore, a reasonable value \(\lambda _2=1\) is needed.
The temperature \(\tau \) and angular margin m in the multi-interest contrastive learning module affect its effectiveness. For \(\tau \), we carry out an experiment with \(\tau \) varying from 0.01 to 0.1 with an interval of 0.01. The results are shown in the left side of Figure 5. On Amazon-Clothing and Tmall-Buy datasets, the performance is best when \(\tau =0.05\), and on the Tafeng dataset, the result is best when \(\tau =0.03\) (however, the performance difference between it and \(\tau =0.05\) is small). Taking all factors into consideration, we chose \(\tau =0.05\) for all our experiments. For m, as shown in the right side of Figure 5, we selected \(m \in \{0, 5, 10, 15, 20\}\). Although the performance is best on the Tmall-Buy dataset when \(m=15\), it is best on the other two datasets when \(m=10\). Therefore, we set \(m=10\) during the experiment.
5.5 Case study
We analyze the proposed model’s effectiveness in solving the multi-interest recommendation problem by showing the model’s recommendation results. Because the Amazon-Clothing dataset contains detailed information such as items and item categories, while the Tmall-Buy only gives the data number, we use the item to represent the recommendation results of the datasets.
Figure 6 shows the recommendation results of the proposed HPCL4SR model for a certain user behavior sequence. It can be seen from the figure that the user is more interested in baby boy suits and boy socks, but the PIMI model only recommends items related to men and women and does not learn the interests of the two demand sides of baby boys and boys. The HPCL4SR models both high-level and low-level preference interests of users, and diversifies the interests to recommend the “baby suit” and “socks” that users want. Moreover, in the list of recommended items given by the HPCL4SR model, the “baby suit” item that the user actually interacts with ranks higher. That is, the ranking quality of the list of recommended items provided by the model is higher. In addition, the HPCL4SR model not only learns the interests of boys but also captures the preferences of boys, and at the same time, learns other categories such as socks and shoes.
Table 5 shows the Top-20 recommendation results of the PIMI and HPCL4SR models on the randomly selected user behavior numbered 68079 in the Tmall-Buy dataset. As can be seen from the table, the HPCL4SR model correctly predicts the items that two users interact with (the number is bold). Compared with the PIMI model, item ID 31744 in the recommendation result list given by the HPCL4SR model ranks higher. Therefore, the HPCL4SR model exceeds the performance of the PIMI model.
5.6 vs. LLMs
In order to compare the recommendation ability of HPCL4SR and large-scale language models in the matching stage, we use two prompt methods and have ChatGPT (ChatGPT 3.5-Turbo-1106 & ChatGPT 4-Turbo) provide recommended items based on user interaction information. The first method is to input user historical interaction information and prompt ChatGP to generate 50 items of interest to the user. The second method is to input user interaction history information and input the user’s next real item as well as randomly selected items, prompting ChatGPT to select items that the user may be interested in from the existing 50 items based on interaction history. The experimental details and results are shown in the Table 6. It can be seen that The performance of ChatGPT 4-Turbo is better than that of ChatGPT 3.5-Turbo-1106, but the results of both far lower than our model and even many existing models. This indicates that the current general LLMs still cannot be well applied to specific task domains [26, 49]. In addition, although T5-small [50] is much lower in model parameters than ChatGPT, we found that the large language model can further improve the performance of recommendation systems through fine-tuning, but it still cannot meet the model we have carefully designed for sequence recommendation. However, the expressive power of the large model will be enlightening for our next work.
6 Conclusion
In this paper, we propose a novel framework named HPCL4SR for multi-interest sequence recommendation. In order to achieve the representation of multiple user interests, HPCL4SR uses contrastive learning methods to differentiate interests, while preserving their correlation information, which is more in line with user behavior in real scenarios. We verify the effectiveness of the proposed method through experiments on three datasets. Additionally, we compare the recommendation ability of our approach in a task-specific domain with LLMs (ChatGPT 3.5-Turbo-1106 & ChatGPT 4-Turbo), further showcasing the superiority of HPCL4SR in multi-interest sequential recommendation. In the future, we consider enhancing the interpretability of recommendation tasks based on multi-interest recommendation models.
Data Availability
No datasets were generated or analysed during the current study.
References
Wang, S., Hu, L., Wang, Y., Cao, L., Sheng, Q.Z., Orgun, M.A.: Sequential recommender systems: Challenges, progress and prospects. In: Proceedings of the twenty-eighth international joint conference on artificial intelligence: IJCAI, pp. 6332–6338 (2019). https://doi.org/10.24963/ijcai.2019/883
Quadrana, M., Cremonesi, P., Jannach, D.: Sequence-aware recommender systems. ACM Comput. Surv. 51(4), 1–36 (2018). https://doi.org/10.1145/3190616
Pi, Q., Bian, W., Zhou, G., Zhu, X., Gai, K.: Practice on long sequential user behavior modeling for click-through rate prediction. In: Proceedings of the 25th ACM SIGKDD international conference on knowledge discovery & data mining: KDD, pp. 2671–2679 (2019). https://doi.org/10.1145/3292500.3330666
Zhang, Y., Liu, Y., Xiong, H., Liu, Y., Yu, F., He, W., Xu, Y., Cui, L., Miao, C.: Cross-domain disentangled learning for e-commerce live streaming recommendation. In: 39th IEEE International conference on data engineering: ICDE, pp. 2955–2968 (2023). https://doi.org/10.1109/ICDE55515.2023.00226
Chu, Q., Liu, G., Sun, H., Zhou, C.: Next news recommendation via knowledge-aware sequential model. In: Chinese computational linguistics - 18th China national conference: CCL. Lecture Notes in Computer Science, vol. 11856, pp. 221–232 (2019). https://doi.org/10.1007/978-3-030-32381-3_18
Wu, C., Wu, F., Qi, T., Huang, Y.: User modeling with click preference and reading satisfaction for news recommendation. In: Proceedings of the twenty-ninth international joint conference on artificial intelligence: IJCAI, pp. 3023–3029 (2020). https://doi.org/10.5555/3491440.3491858
Chaves, P.D.V., Pereira, B.L., Santos, R.L.T.: Efficient online learning to rank for sequential music recommendation. In: WWW ’22: The ACM Web Conference 2022, pp. 2442–2450 (2022). https://doi.org/10.1145/3485447.3512116
Li, Q., Wang, X., Wang, Z., Xu, G.: Be causal: De-biasing social network confounding in recommendation. ACM Trans. Knowl. Discov. Data 17(1), 1–23 (2023). https://doi.org/10.1145/3533725
Song, W., Wang, S., Wang, Y., Wang, S.: Next-item recommendations in short sessions. In: RecSys ’21: fifteenth acm conference on recommender systems, pp. 282–291 (2021). https://doi.org/10.1145/3460231.3474238
Hidasi, B., Karatzoglou, A.: Recurrent neural networks with top-k gains for session-based recommendations. In: Proceedings of the 27th ACM International Conference on Information and Knowledge Management: CIKM, pp. 843–852 (2018). https://doi.org/10.1145/3269206.3271761
Yuan, F., Karatzoglou, A., Arapakis, I., Jose, J.M., He, X.: A simple convolutional generative network for next item recommendation. In: Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining: WSDM, pp. 582–590 (2019). https://doi.org/10.1145/3289600.3290975
Kang, W., McAuley, J.J.: Self-attentive sequential recommendation. In: IEEE International Conference on Data Mining: ICDM, pp. 197–206 (2018). https://doi.org/10.1109/ICDM.2018.00035
Wang, S., Cao, L., Hu, L., Berkovsky, S., Huang, X., Xiao, L., Lu, W.: Hierarchical attentive transaction embedding with intra- and inter-transaction dependencies for next-item recommendation. IEEE Intell. Syst. 36(4), 56–64 (2021). https://doi.org/10.1109/MIS.2020.2997362
Wang, N., Wang, S., Wang, Y., Sheng, Q.Z., Orgun, M.A.: Modelling local and global dependencies for next-item recommendations. In: Web information systems engineering: WISE. Lecture Notes in Computer Science, vol. 12343, pp. 285–300 (2020). https://doi.org/10.1007/978-3-030-62008-0_20
Zheng, X., Su, J., Liu, W., Chen, C.: DDGHM: dual dynamic graph with hybrid metric training for cross-domain sequential recommendation. In: MM ’22: The 30th ACM international conference on multimedia: MM, pp. 471–481 (2022). https://doi.org/10.1145/3503161.3548072
Li, C., Liu, Z., Wu, M., Xu, Y., Zhao, H., Huang, P., Kang, G., Chen, Q., Li, W., Lee, D.L.: Multi-interest network with dynamic routing for recommendation at tmall. In: Proceedings of the 28th ACM international conference on information and knowledge management: CIKM, pp. 2615–2623 (2019). https://doi.org/10.1145/3357384.3357814
Xiao, Z., Yang, L., Jiang, W., Wei, Y., Hu, Y., Wang, H.: Deep multi-interest network for click-through rate prediction. In: CIKM ’20: The 29th ACM international conference on information and knowledge management, pp. 2265–2268 (2020). https://doi.org/10.1145/3340531.3412092
Cen, Y., Zhang, J., Zou, X., Zhou, C., Yang, H., Tang, J.: Controllable multi-interest framework for recommendation. In: KDD ’20: The 26th ACM SIGKDD conference on knowledge discovery and data mining (KDD)), pp. 2942–2951 (2020). https://doi.org/10.1145/3394486.3403344
Tang, Z., Wang, L., Zou, L., Zhang, X., Zhou, J., Li, C.: Towards multi-interest pre-training with sparse capsule network. In: Proceedings of the 46th international ACM SIGIR conference on research and development in information retrieval: SIGIR, pp. 311–320 (2023). https://doi.org/10.1145/3539618.3591778
Xie, Y., Gao, J., Zhou, P., Ye, Q., Hua, Y., Kim, J.B., Wu, F., Kim, S.: Rethinking multi-interest learning for candidate matching in recommender systems. In: Proceedings of the 17th ACM conference on recommender systems: RecSys, pp. 283–293 (2023). https://doi.org/10.1145/3604915.3608766
Lv, F., Jin, T., Yu, C., Sun, F., Lin, Q., Yang, K., Ng, W.: SDM: sequential deep matching model for online large-scale recommender system. In: Proceedings of the 28th ACM international conference on information and knowledge management: CIKM, pp. 2635–2643 (2019). https://doi.org/10.1145/3357384.3357818
Tan, Q., Zhang, J., Yao, J., Liu, N., Zhou, J., Yang, H., Hu, X.: Sparse-interest network for sequential recommendation. In: WSDM ’21, The Fourteenth ACM international conference on web search and data mining: WSDM, pp. 598–606 (2021). https://doi.org/10.1145/3437963.3441811
Chen, G., Zhang, X., Zhao, Y., Xue, C., Xiang, J.: Exploring periodicity and interactivity in multi-interest framework for sequential recommendation. In: Proceedings of the thirtieth international joint conference on artificial intelligence: IJCAI, pp. 1426–1433 (2021). https://doi.org/10.24963/ijcai.2021/197
Qiu, R., Huang, Z., Yin, H., Wang, Z.: Contrastive learning for representation degeneration problem in sequential recommendation. In: WSDM ’22: The fifteenth acm international conference on web search and data mining: WSDM, pp. 813–823 (2022). https://doi.org/10.1145/3488560.3498433
Chai, Z., Chen, Z., Li, C., Xiao, R., Li, H., Wu, J., Chen, J., Tang, H.: User-aware multi-interest learning for candidate matching in recommenders. In: SIGIR ’22: The 45th international acm sigir conference on research and development in information retrieval: SIGIR, pp. 1326–1335 (2022). https://doi.org/10.1145/3477495.3532073
Li, L., Zhang, Y., Liu, D., Chen, L.: Large language models for generative recommendation: A survey and visionary discussions. CoRR abs/2309.01157 (2023) https://doi.org/10.48550/ARXIV.2309.01157 https://arxiv.org/abs/2309.011572309.01157
Hou, Y., Zhang, J., Lin, Z., Lu, H., Xie, R., McAuley, J.J., Zhao, W.X.: Large language models are zero-shot rankers for recommender systems. CoRR abs/2305.08845 (2023) https://doi.org/10.48550/ARXIV.2305.08845 https://arxiv.org/abs/2305.088452305.08845
Luo, S., He, B., Zhao, H., Huang, Y., Zhou, A., Li, Z., Xiao, Y., Zhan, M., Song, L.: c: Instruction tuning large language model as ranker for top-k recommendation. arXiv preprint arXiv:2312.16018 (2023) https://doi.org/10.48550/arXiv.2312.16018
Bao, K., Zhang, J., Zhang, Y., Wang, W., Feng, F., He, X.: Tallrec: An effective and efficient tuning framework to align large language model with recommendation. In: Zhang, J., Chen, L., Berkovsky, S., Zhang, M., Noia, T.D., Basilico, J., Pizzato, L., Song, Y. (eds.) Proceedings of the 17th ACM conference on recommender systems, RecSys 2023, Singapore, Singapore, September 18-22, 2023, pp. 1007–1014. ACM, ??? (2023). https://doi.org/10.1145/3604915.3608857
Hénaff, O.J.: Data-efficient image recognition with contrastive predictive coding. In: Proceedings of the 37th international conference on machine learning:icml. proceedings of machine learning research, vol. 119, pp. 4182–4192 (2020). https://proceedings.mlr.press/v119/henaff20a.html
Oord, A., Li, Y., Vinyals, O.: Representation learning with contrastive predictive coding. CoRR abs/1807.03748 (2018) https://doi.org/10.48550/arXiv.1807.03748
Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. In: 7th International conference on learning representations: ICLR (2019). https://openreview.net/forum?id=Bklr3j0cKX
He, K., Fan, H., Wu, Y., Xie, S., Girshick, R.B.: Momentum contrast for unsupervised visual representation learning. In: 2020 IEEE/CVF Conference on computer vision and pattern recognition: CVPR, pp. 9726–9735 (2020). https://doi.org/10.1109/CVPR42600.2020.00975
Chen, X., He, K.: Exploring simple siamese representation learning. In: IEEE Conference on computer vision and pattern recognition: CVPR, pp. 15750–15758 (2021). https://doi.org/10.1109/CVPR46437.2021.01549
Wang, X., Huang, Y., Zeng, D., Qi, G.: Caco: Both positive and negative samples are directly learnable via cooperative-adversarial contrastive learning. IEEE Trans. Pattern Anal. Mach. Intell. 45(9), 10718–10730 (2023). https://doi.org/10.1109/TPAMI.2023.3262608
Yan, Y., Li, R., Wang, S., Zhang, F., Wu, W., Xu, W.: Consert: A contrastive framework for self-supervised sentence representation transfer. In: Proceedings of the 59th annual meeting of the association for computational linguistics and the 11th international joint conference on natural language processing: ACL/IJCNLP, pp. 5065–5075 (2021). https://doi.org/10.18653/v1/2021.acl-long.393
Gao, T., Yao, X., Chen, D.: Simcse: Simple contrastive learning of sentence embeddings. In: Proceedings of the 2021 conference on empirical methods in natural language processing: EMNLP, pp. 6894–6910 (2021). https://doi.org/10.18653/v1/2021.emnlp-main.552
Zhang, Y., Zhu, H., Wang, Y., Xu, N., Li, X., Zhao, B.: A contrastive framework for learning sentence representations from pairwise and triple-wise perspective in angular space. In: Proceedings of the 60th annual meeting of the association for computational linguistics: ACL, pp. 4892–4903 (2022). https://doi.org/10.18653/v1/2022.acl-long.336
Chen, Y., Liu, Z., Li, J., McAuley, J.J., Xiong, C.: Intent contrastive learning for sequential recommendation. In: WWW ’22: The ACM web conference 2022: WWW, pp. 2172–2182 (2022). https://doi.org/10.1145/3485447.3512090
Du, H., Shi, H., Zhao, P., Wang, D., Sheng, V.S., Liu, Y., Liu, G., Zhao, L.: Contrastive learning with bidirectional transformers for sequential recommendation. In: Proceedings of the 31st ACM international conference on information & knowledge management: CIKM, pp. 396–405 (2022). https://doi.org/10.1145/3511808.3557266
Yang, Y., Huang, C., Xia, L., Huang, C., Luo, D., Lin, K.: Debiased contrastive learning for sequential recommendation. In: Proceedings of the ACM Web Conference 2023: WWW, pp. 1063–1073 (2023). https://doi.org/10.1145/3543507.3583361
Yuan, X., Duan, D., Tong, L., Shi, L., Zhang, C.: ICAI-SR: item categorical attribute integrated sequential recommendation. In: SIGIR ’21: The 44th international ACM SIGIR conference on research and development in information retrieval: SIGIR, pp. 1687–1691 (2021). https://doi.org/10.1145/3404835.3463060
Xie, Y., Zhou, P., Kim, S.: Decoupled side information fusion for sequential recommendation. In: SIGIR ’22: The 45th international ACM SIGIR conference on research and development in information retrieval :SIGIR, pp. 1611–1621 (2022). https://doi.org/10.1145/3477495.3531963
Wang, G., Ying, R., Huang, J., Leskovec, J.: Multi-hop attention graph neural networks. In: Proceedings of the thirtieth international joint conference on artificial intelligence: IJCAI, pp. 3089–3096 (2021). https://doi.org/10.24963/ijcai.2021/425
Sabour, S., Frosst, N., Hinton, G.E.: Dynamic routing between capsules. In: Advances in neural information processing systems 30: annual conference on neural information processing systems 2017: NIPS, pp. 3856–3866 (2017). https://proceedings.neurips.cc/paper_files/paper/2017/file/2cad8fa47bbef282badbb8de5374b894-Paper.pdf
Deng, J., Guo, J., Yang, J., Xue, N., Kotsia, I., Zafeiriou, S.: Arcface: Additive angular margin loss for deep face recognition. IEEE Trans. Pattern Anal. Mach. Intell. 44(10), 5962–5979 (2022). https://doi.org/10.1109/TPAMI.2021.3087709
Jean, S., Cho, K., Memisevic, R., Bengio, Y.: On using very large target vocabulary for neural machine translation. In: Proceedings of the 53rd annual meeting of the association for computational linguistics and the 7th international joint conference on natural language processing of the asian federation of natural language processing: ACL, pp. 1–10 (2015). https://doi.org/10.3115/v1/P15-1001
Hidasi, B., Karatzoglou, A., Baltrunas, L., Tikk, D.: Session-based recommendations with recurrent neural networks. In: 4th International conference on learning representations: ICLR (2016). https://doi.org/10.48550/arXiv.1511.06939
Fan, S., Wang, Y., Pang, X., Chen, L., Han, P., Shang, S.: Uamc: user-augmented conversation recommendation via multi-modal graph learning and context mining. World Wide Web, pp. 1–21 (2023). https://doi.org/10.1007/s11280-023-01219-2
Xu, S., Hua, W., Zhang, Y.: Openp5: Benchmarking foundation models for recommendation. CoRR abs/2306.11134 (2023). https://doi.org/10.48550/ARXIV.2306.11134
Funding
National Natural Science Foundation of China (NSFC) (61972455)
Author information
Authors and Affiliations
Contributions
Zizhong.Zhu. and Shuang.Li. wrote the main manuscript text, Yaokun.Liu. revised the figure, Xiaowang.Zhang. served as the corresponding author to coordinate and promote the completion of the paper, and Zhiyong.Feng. and Yuexian.Hou. provided reasonable revision opinions. All authors reviewed the manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This article belongs to the Topical Collection: Special Issue on Advancing recommendation systems with foundation models Guest Editors: Kai Zheng, Renhe Jiang, and Ryosuke Shibasaki
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Zhu, Z., Li, S., Liu, Y. et al. High-level preferences as positive examples in contrastive learning for multi-interest sequential recommendation. World Wide Web 27, 21 (2024). https://doi.org/10.1007/s11280-024-01263-6
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s11280-024-01263-6