Mixed-Order Heterogeneous Graph Pre-training for Cold-Start Recommendation

Sui, Wenzheng; Jiang, Xiaoxia; Ge, Weiyi; Hu, Wei

doi:10.1007/978-3-031-25201-3_14

Wenzheng Sui^13,14,
Xiaoxia Jiang¹⁴,
Weiyi Ge¹⁴ &
…
Wei Hu¹³

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13423))

Included in the following conference series:

Asia-Pacific Web (APWeb) and Web-Age Information Management (WAIM) Joint International Conference on Web and Big Data

750 Accesses

Abstract

The cold-start problem is a fundamental challenge in recommendation. Heterogeneous information networks (HINs) provide rich side information in addition to sparse user-item interactions, which can be used to alleviate the cold-start problem. However, most existing models based on graph neural networks (GNNs) only consider the user-item interactions as supervision signals, making them unable to effectively exploit the side information. In this paper, we propose a novel pre-training model, named MHGP, for cold-start recommendation in a self-supervised manner. The key idea is to leverage the mixed-order information in a HIN. We first use GNNs with a hierarchical attention mechanism to encode the first-order and high-order structures of a user-item HIN. Then, we pre-train the embeddings of users and items by contrasting the two structure views and maximizing the agreement of positive samples in each view. Afterwards, the embeddings are fine-tuned together with the recommendation model. Experiments show that our model can consistently improve the performance of cold-start recommendation and outperform other state-of-the-art pre-training models.

This work is supported by Science and Technology on Information Systems Engineering Laboratory (No. 05202006).

Access provided by Autonomous University of Puebla. Download conference paper PDF

GIFT4Rec: An Effective Side Information Fusion Technique Apply to Graph Neural Network for Cold-Start Recommendation

Heterogeneous graph convolutional network pre-training as side information for improving recommendation

Article 02 May 2022

E-MIGAN: Tackling Cold-Start Challenges in Recommender Systems

Keywords

1 Introduction

Recommender systems, which are used to search in a large amount of information and provide personalized recommendation services, play an indispensable role in web services nowadays. Among the recommendation models, collaborative filtering (CF) is one of the most widely-used algorithms, which leverages users’ historical interactions to obtain their preferences. However, CF suffers from the cold-start problem as some users may have few interactions.

With the rapid development of web services, various kinds of side information have become available for recommender systems, which form the so-called heterogeneous information networks (HINs) [1] and can be used to alleviate the cold-start problem. Recently, some efforts try to use metapaths [8] or graph neural networks (GNNs) [1, 8] to learn the embeddings of users and items on HINs, since both metapaths and GNNs are capable of capturing high-order semantic relations. As the cold-start users and items may have much more high-order neighbors, aggregating these neighbors can help learn better embeddings of the cold-start users and items.

However, most existing models exploit the rich side information in a supervised manner [1, 8] where the supervision signals are still user-item interactions. As the cold-start users and items have few interactions, they are not fully trained during the training process. Thus, the side information is not fully exploited, especially for the cold-start users and items with rich side information. Besides, the user-item interactions merely describe the direct interaction relations between users and items, while various side information describes many other first-order and high-order relations, which reflect different aspects of users and items. Therefore, the interactions can help learn the direct interaction relation better, but introduce some noises when guiding the process of learning other relations.

To tackle the above problems, a feasible solution is to design a pre-training task which is specifically designed for assisting the aggregation of rich side information. However, most of the existing pre-training models are not for the HIN-based recommendation scenario [6, 8], where the first-order neighbors of a user can directly reflect one part of this user’s preference, and the high-order neighbors of a user implies another part of his/her preference. They describe this user’s preference from two perspectives and together form a more complete one. Therefore, a key challenge is how to jointly consider the two kinds of neighbors in the pre-training task.

In this paper, we propose a novel pre-training model named MHGP to exploit the rich side information in a HIN for cold-start recommendation. We first encode users and items in both first-order and high-order structure views with GNNs and three different attention mechanisms. Then, we collect users and items which are connected with each other by multiple metapaths as the positive samples and leverage contrastive learning to make the embeddings of the first-order structure view of positive samples similar; meanwhile align their embeddings in the high-order structure view. Once the pre-training process converges, the pre-trained embeddings will be fine-tuned with the recommendation model.

We conduct comparative experiments on three real-world datasets. The results demonstrate that our pre-trained model can improve the performance of recommendation models in the cold-start recommendation scenario and outperform several state-of-the-art pre-training GNNs models.

2 Related Work

2.1 Pre-training GNNs

Recently, pre-training GNNs has attracted plenty of attentions which aims to improve the performance of GNNs. The pre-training task can be performed with contrastive learning, such as DGI [7], DMGI [5] and GraphCL [10]. There are also some works performing the pre-training task in other ways, such as L2P-GNN [4] in a meta-learning way.

A few works aim to improve the recommendation task. The work in [2] simulates the cold-start scenario and takes the embedding reconstruction as the pre-training task. SGL [9] performs graph data augmentation for contrastive learning which can be implemented in a pre-training manner. In overall, they cannot fully exploit various types of nodes and relations for pre-training on a HIN to enhance the recommendation task.

2.2 Cold-Start Recommendation

In recent years, the studies on the cold-start problem mainly focus on two directions. One is how to leverage side information to learn better embeddings of users and items, such as DisenHAN [8] and HGRec [6]. The other direction is how to exploit the underlying patterns in the interactions. Most studies adopt GNNs to mine the high-order collaborative information behind the user-item bipartite graph, such as LightGCN [3]. However, these models exploit the high-order information in a supervised manner. For the cold-start users and items, their embeddings are rarely trained as they have very few interactions.

3 The Proposed MHGP Model

In this section, we introduce our mixed-order heterogeneous graph pre-training (MHGP) model to pre-train the embeddings of users and items. The overall architecture is illustrated in Fig. 1.

3.1 First-order Structure View Encoding

As our purpose is to pre-train the embeddings of users and items for recommendation, we do not consider the embeddings of other node types. Therefore, in the HIN-based recommendation scenario, a user’s first-order neighbors can be users or items, while an item’s first-order neighbors can simply be users.

Item’s First-Order Structure Encoding. The importance of different users who have interacted with the same item may be different. Therefore, we apply a node-level attention mechanism to encode the first-order structures of items:

$$\begin{aligned} \textbf{h}^F_i&=\sum _{u\in \mathcal {N}_i}\boldsymbol{\alpha }_{i,u}\textbf{h}_u,\end{aligned}$$

(1)

$$\begin{aligned} \boldsymbol{\alpha }_{i,u}&=\frac{\exp (LeakyReLU(\textbf{c}_n[\textbf{h}_i\,||\,\textbf{h}_u]))}{\sum _{u\in \mathcal {N}_i}\exp (LeakyReLU(\textbf{c}_n[\textbf{h}_i\,||\,\textbf{h}_u]))}, \end{aligned}$$

(2)

where $\textbf{h}_u\in \mathbb {R}^d$ is the embedding of user u. $\mathcal {N}_i$ is the first-order neighbor set of item i. $\textbf{c}_n\in \mathbb {R}^{2d \times 1}$ is the attention vector, and $\Vert $ denotes the concatenation operation.

User’s First-Order Structure Encoding. For a user, the two types of first-order neighbors contribute differently to a user’s preference. Therefore, we design a hierarchical attention mechanism consisting of node-level attention and type-level attention to fully capture the influence from users’ first-order neighbors:

$$\begin{aligned} \textbf{h}^F_u&=\boldsymbol{\beta }_1\sum _{i\in \mathcal {N}^I_u}\boldsymbol{\alpha }_{u,i}\textbf{h}_i+\boldsymbol{\beta }_2\sum _{v\in \mathcal {N}^U_u}\boldsymbol{\alpha }_{u,v}\textbf{h}_v, \end{aligned}$$

(3)

where $\textbf{h}_i,\textbf{h}_v\in \mathbb {R}^d$ are the embeddings of item i and user v, respectively. $\mathcal {N}^I_u$ and $\mathcal {N}^U_u$ denote user u’s first-order neighbor sets of items and users, respectively. $\boldsymbol{\alpha }_{u,i}$ and $\boldsymbol{\alpha }_{u,v}$ are the node-level attention values of which the calculations are similar to $\boldsymbol{\alpha }_{i,u}$. $\boldsymbol{\beta }_1$ and $\boldsymbol{\beta }_2$ are the type-level attention values:

$$\begin{aligned} \boldsymbol{\beta }_i&=\frac{\exp (\textbf{w}_i)}{\sum \limits _{j\in \left\{ 1,2\right\} } \exp (\textbf{w}_j)},\end{aligned}$$

(4)

$$\begin{aligned} \textbf{w}_i&=\frac{1}{|\mathcal {U}|}\sum \limits _{u\in \mathcal {U}}\textbf{c}_t \tanh (\textbf{W}^F \textbf{h}^i_u+\textbf{b}^F), \end{aligned}$$

(5)

where $\textbf{c}_t\in \mathbb {R}^d$ is the attention vector, $\textbf{W}^F\in \mathbb {R}^{d \times d}$ and $\textbf{b}^F\in \mathbb {R}^{d \times 1}$ are the learnable parameters.

3.2 High-order Structure View Encoding

In a HIN, we can obtain the high-order neighbors by exploiting the rich metapath-based neighbors [1]. As each metapath carries a specific semantic relation, different kinds of metapath-based neighbors imply different preference characteristics.

Metapath-Based Neighbor Generation. In a HIN, there are several adjacency matrices including user-item interaction matrix $\textbf{Y}$ to describe the whole graph, with each describing one kind of first-order relation. Thus, we can obtain the metapath-based neighbors by matrix multiplication of these adjacency matrices, such as $\textbf{Y}\textbf{Y}^T$ for metapath “user-item-user”. Afterwards, we set all nonzero values to 1 to form the final adjacency matrix.

Metapath-Based Neighbor Aggregation. Assuming that there are M metapaths $\left\{ \varPhi _1, \varPhi _1, ..., \varPhi _M\right\} $ and their corresponding matrices are obtained. For each metapath $\varPhi _m$, we use a GCN to aggregate the corresponding neighbors to obtain $\textbf{h}^{\varPhi _m}_u$. Then, we apply a semantic-level attention mechanism to fuse the embeddings of all kinds of metapaths starting from user u:

$$\begin{aligned} \textbf{h}^H_u&=\sum _{m=1}^M\boldsymbol{\beta }_{\varPhi _m}\textbf{h}^{\varPhi _m}_u, \end{aligned}$$

(6)

where $\boldsymbol{\beta }_{\varPhi _m}$ is the semantic-level attention values of which the calculation is similar to $\boldsymbol{\beta }_1$ and $\boldsymbol{\beta }_2$. The embeddings of all kinds of metapaths of items are calculated in the same way, denoted by $\textbf{h}^H_i$.

3.3 Pre-training with Contrastive Learning

There are always some users sharing similar preferences in the recommendation scenarios. Therefore, the embeddings of the first-order and high-order structure views of these users should be similar, respectively. This also applies to items. We treat these users and items as positive samples and leverage contrastive learning to force the two kinds of embeddings of positive nodes to be consistent.

We first count how many kinds of metapaths are connected between each pair of nodes i and j, and the result is denoted by connectivity(i, j). For each node i, we select all the nodes where $connectivity(i, j) > 0$ and sort them in descending order to form $\mathcal {S}_i$. As $\mathcal {S}_i$ can be very large and the nodes with lower connectivity(i, j) values may introduce some noises, we set a threshold $T_\mathcal {S}$. If $|\mathcal {S}_i|>T_\mathcal {S}$, we select the top-$T_\mathcal {S}$ nodes as the positive nodes of node i.

After obtaining the embeddings of the first-order and high-order structure views, we feed them into a feed-forward neural network to project them into the same semantic space. Then, the final loss is calculated as follow:

$$\begin{aligned} \mathcal {L}=\lambda \mathcal {L}_u+(1-\lambda )\mathcal {L}_i, \end{aligned}$$

(7)

where $\mathcal {L}_u$ and $\mathcal {L}_i$ denote the losses from user side and item side, respectively. $\lambda $ is a learnable parameter to adaptively balance the importance of the two sides. The calculation of $\mathcal {L}_u$ is given as follows:

$$\begin{aligned} \mathcal {L}_u&=\lambda _u\mathcal {L}^F_u+(1-\lambda _u)\mathcal {L}^H_u,\end{aligned}$$

(8)

$$\begin{aligned} \mathcal {L}^F_u&=\frac{1}{|\mathcal {U}|}\sum _{u\in \mathcal {U}}-\log \frac{\sum _{v\in \mathcal {S}_u} \exp (sim(\textbf{h}^F_u, \textbf{h}^F_v)/\tau )}{\sum _{w\in \mathcal {U}}\exp (sim(\textbf{h}^F_u, \textbf{h}^F_w)/\tau )},\end{aligned}$$

(9)

$$\begin{aligned} \mathcal {L}^H_u&=\frac{1}{|\mathcal {U}|}\sum _{u\in \mathcal {U}}-\log \frac{\sum _{v\in \mathcal {S}_u} \exp (sim(\textbf{h}^H_u, \textbf{h}^H_v)/\tau )}{\sum _{w\in \mathcal {U}}\exp (sim(\textbf{h}^H_u, \textbf{h}^H_w)/\tau )}, \end{aligned}$$

(10)

where $sim(\cdot )$ denotes the cosine similarity. $\tau $ denotes the temperature hyperparameter. $\lambda _u$ is a learnable parameter to adaptively balance the importance of the two kinds of embeddings of users. The calculation of $\mathcal {L}_i$ is similar to $\mathcal {L}_u$.

3.4 Fine-Tuning with Recommendation Models

Many existing GNN-based recommendation models initialize the embeddings of users and items randomly, which may lead to local optima during training and further affect the performance of recommendation. To alleviate this problem, we use the pre-trained embeddings to initialize the recommendation model. The embeddings are further fine-tuned with the recommendation model under the supervision of interactions.

Table 1. Statistics of the datasets.

Full size table

4 Experiments and Results

4.1 Experiment Settings

We conduct the experiments on three real-world datasets: Last.FM, Ciao and Douban Movie. All the datasets contain few interactions and rich side information. The statistics of the datasets are summarized in Table 1.

We choose LightGCN [3] as the base recommendation model and choose three pre-training models DGI [7], DMGI [5] and SGL [9] for comparison.

For each dataset, we randomly choose $x\%$ interactions as the training set, $\frac{1-x\%}{2}$ as the validation set, and $\frac{1-x\%}{2}$ as the testing set. To simulate various cold-start environments, we set x to 20 and 40, respectively.

All the pre-training models are trained from scratch. The early stopping patience is 20 epochs. We tune the learning rate in $\left\{ 0.01, 0.001, 0.0001\right\} $. For our MHGP, the GNN layer is set to 1 for both encoders. For LightGCN, the number of GNN layer is 2 and the embedding size is fixed to 64. We fine-tune other hyperparameters according to the original papers.

Table 2. Performance of top-20 recommendation with LightGCN as the base model.

Full size table

Table 3. Ablation study results with $20\%$ interactions as the training set. P for Precision@20, R for Recall@20 and N for NDCG@20.

Full size table

4.2 Overall Performance Comparison

The overall performance is shown in Table 2. We can see that our pre-training model MHGP can consistently improve the performance of LightGCN, which demonstrates the effectiveness of MHGP for cold-start recommendation. In addition, the relative improvement increases as the training data decreases. This indicates that when the user-item interactions are sparse, our model can learn better embeddings of users and items by reasonably exploiting the rich side information. Besides, in most cases, our proposed model outperforms other state-of-the-art pre-training models. This indicates that MHGP is more suitable for the HIN-based recommendation task. By contrasting the first-order and high-order structures of the positive samples, MHGP can effectively capture the inherent structure information in a HIN and further benefit the recommendation task.

4.3 Ablation Study

We design two variants MHGP$_f$ and MHGP$_h$ to perform ablation study. MHGP$_f$ only considers the first-order neighbors while MHGP$_h$ only considers the high-order neighbors. We compare them with MHGP and the results are given in Table 3. We can see that MHGP always achieves the best performance, indicating the necessity of jointly considering the two kinds of neighbors. Furthermore, all of them can improve the performance of LightGCN, which demonstrates the effectiveness of aggregating each kind of neighbors in pre-training. We also observe that MHGP$_h$ performs better than MHGP$_f$ on the Ciao and Douban Movie datasets. However, on the Last.FM datasets, the performance of MHGP$_f$ is better. This is reasonable since the interactions are sparser while the side information is richer on Ciao and Douban Movie than on Last.FM.

5 Conclusion and Future Work

In this paper, we introduce a novel pre-training model MHGP to exploit the rich information in a HIN for enhancing cold-start recommendation. MHGP uses contrastive learning to force the embeddings of first-order and high-order structures of positive nodes to be similar. Thus, it can learn the better embeddings of users and items. Experiments show that MHGP outperforms other state-of-the-art pre-training GNN models. In future work, we will explore whether MHGP can benefit other recommendation scenarios such as sequential recommendation.

References

Fu, X., Zhang, J., Meng, Z., King, I.: MAGNN: metapath aggregated graph neural network for heterogeneous graph embedding. In: WWW, pp. 2331–2341 (2020)
Google Scholar
Hao, B., Zhang, J., Yin, H., Li, C., Chen, H.: Pre-training graph neural networks for cold-start users and items representation. In: WSDM, pp. 265–273 (2021)
Google Scholar
He, X., Deng, K., Wang, X., Li, Y., Zhang, Y., Wang, M.: LightGCN: simplifying and powering graph convolution network for recommendation. In: SIGIR, pp. 639–648 (2020)
Google Scholar
Lu, Y., Jiang, X., Fang, Y., Shi, C.: Learning to pre-train graph neural networks. In: AAAI, pp. 4276–4284 (2021)
Google Scholar
Park, C., Kim, D., Han, J., Yu, H.: Unsupervised attributed multiplex network embedding. In: AAAI, pp. 5371–5378 (2020)
Google Scholar
Shi, J., Ji, H., Shi, C., Wang, X., Zhang, Z., Zhou, J.: Heterogeneous graph neural network for recommendation. CoRR arXiv:2009.00799 (2020)
Velickovic, P., Fedus, W., Hamilton, W.L., Liò, P., Bengio, Y., Hjelm, R.D.: Deep graph infomax. In: ICLR (2019)
Google Scholar
Wang, Y., Tang, S., Lei, Y., Song, W., Wang, S., Zhang, M.: DisenHAN: disentangled heterogeneous graph attention network for recommendation. In: CIKM, pp. 1605–1614 (2020)
Google Scholar
Wu, J., et al.: Self-supervised graph learning for recommendation. In: SIGIR, pp. 726–735 (2021)
Google Scholar
You, Y., Chen, T., Sui, Y., Chen, T., Wang, Z., Shen, Y.: Graph contrastive learning with augmentations. In: NeurIPS, pp. 5812–5823 (2020)
Google Scholar

Download references

Author information

Authors and Affiliations

State Key Laboratory for Novel Software Technology, Nanjing University, Nanjing, China
Wenzheng Sui & Wei Hu
Science and Technology on Information Systems Engineering Laboratory, Nanjing, China
Wenzheng Sui, Xiaoxia Jiang & Weiyi Ge

Authors

Wenzheng Sui
View author publications
You can also search for this author in PubMed Google Scholar
Xiaoxia Jiang
View author publications
You can also search for this author in PubMed Google Scholar
Weiyi Ge
View author publications
You can also search for this author in PubMed Google Scholar
Wei Hu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Wei Hu .

Editor information

Editors and Affiliations

Nanjing University of Aeronautics and Astronautics, Nanjing, China
Bohan Li
Newcastle University, Callaghan, NSW, Australia
Lin Yue
Nanjing University of Aeronautics and Astronautics, Nanjing, China
Chuanqi Tao
Jinan University, Guangzhou, China
Xuming Han
Free University of Bozen-Bolzano, Bolzano, Italy
Diego Calvanese
University of Tsukuba, Tsukuba, Japan
Toshiyuki Amagasa

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Sui, W., Jiang, X., Ge, W., Hu, W. (2023). Mixed-Order Heterogeneous Graph Pre-training for Cold-Start Recommendation. In: Li, B., Yue, L., Tao, C., Han, X., Calvanese, D., Amagasa, T. (eds) Web and Big Data. APWeb-WAIM 2022. Lecture Notes in Computer Science, vol 13423. Springer, Cham. https://doi.org/10.1007/978-3-031-25201-3_14

Download citation

DOI: https://doi.org/10.1007/978-3-031-25201-3_14
Published: 10 February 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-25200-6
Online ISBN: 978-3-031-25201-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Mixed-Order Heterogeneous Graph Pre-training for Cold-Start Recommendation

Abstract

Similar content being viewed by others

GIFT4Rec: An Effective Side Information Fusion Technique Apply to Graph Neural Network for Cold-Start Recommendation

Heterogeneous graph convolutional network pre-training as side information for improving recommendation

E-MIGAN: Tackling Cold-Start Challenges in Recommender Systems

Keywords

1 Introduction

2 Related Work

2.1 Pre-training GNNs

2.2 Cold-Start Recommendation

3 The Proposed MHGP Model

3.1 First-order Structure View Encoding

3.2 High-order Structure View Encoding

3.3 Pre-training with Contrastive Learning

3.4 Fine-Tuning with Recommendation Models

4 Experiments and Results

4.1 Experiment Settings

4.2 Overall Performance Comparison

4.3 Ablation Study

5 Conclusion and Future Work

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Mixed-Order Heterogeneous Graph Pre-training for Cold-Start Recommendation

Abstract

Similar content being viewed by others

GIFT4Rec: An Effective Side Information Fusion Technique Apply to Graph Neural Network for Cold-Start Recommendation

Heterogeneous graph convolutional network pre-training as side information for improving recommendation

E-MIGAN: Tackling Cold-Start Challenges in Recommender Systems

Keywords

1 Introduction

2 Related Work

2.1 Pre-training GNNs

2.2 Cold-Start Recommendation

3 The Proposed MHGP Model

3.1 First-order Structure View Encoding

3.2 High-order Structure View Encoding

3.3 Pre-training with Contrastive Learning

3.4 Fine-Tuning with Recommendation Models

4 Experiments and Results

4.1 Experiment Settings

4.2 Overall Performance Comparison

4.3 Ablation Study

5 Conclusion and Future Work

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation