Keywords

1 Introduction

In recent years, recommender systems have been offering personalized item recommendations on online services. With the increasing social activities of users, providing group recommendations on online services is poised to become a new and viable pathway to attract users and boost user engagement continuously.

The group recommendation is to recommend items of common interests (such as dining restaurants, travel destinations, and gathering venues) for a group of members, often taking group preferences as a guideline [9, 10, 13, 19]. Depending on the way a group is established, group recommendation can be divided into two categories: persistent group recommendation (PGR) and ephemeral group recommendation (EGR). The former is for a group with fixed members having long-term and extensive interactions between the group and items. The latter is for a temporarily formed group without fixed members, where the group has little or no historical interactions with items, making it impossible to directly learn from those interactions. We focus on EGR in this paper.

We note that the group recommendation, either PGR or EGR, should adhere to the norm of seeking common ground, which facilitates the smooth progress of collective activities. Specifically, the group recommendation models are supposed to treat the common preferences of all the members as the group preferences, and seek factors behind the consensus among group members. In particular, the model should treat the historical and current preferences of a group member equally, without weakening the role of the group member’s historical preferences over time. This stems from real-life experience: if a member’s historical preferences are the same as the current preferences of the other members in a group, even if the member’s current preferences have deviated from his/her historical preferences, the member may still reach a compromise with the other members, accepting the item that is consistent with his/her historical preferences. Moreover, the model are expected to treat the strong preferences of one member and the weak preferences of another member equally, and should not be misled by the strong preferences exhibited by one or a few members.

Fig. 1.
figure 1

Case study. The icon denotes the type of restaurant. The sequence of icons in a rectangle denotes the interaction history of the person on the left side.

Taking a real ephemeral group from the Yelp dataset as an example. As depicted in Fig. 1, the group consists of three persons. The person on the top has wide-ranging interests, where steak is his early preference. The person on the middle left has interest in steak and hamburger, and the person on the middle right has interest in steak only. Finally, this group actually visits a steakhouse, a restaurant that is acceptable to all three of them. Unfortunately, existing EGR models fail to obtain the correct result. For the case in the Fig. 1, GroupIM [14] and CubeRec [6] recommend a burger shop and a dessert shop, respectively. It seems that these two models are influenced by the strong personal preferences of two group members. \(\mathrm {S^2}\)-HHGR [20] and HyperGroup [8] recommend a noodle shop and a pizza shop, respectively. These two models seem to ignore historical preferences of members and be influenced by the user with wide-ranging preferences. In addition, two PGR models, i.e., AGREE [4] and ConsRec [16], also give incorrect results.

To realize “seeking common ground” in group recommendations and figure out a feasible solution to the inherent problem in EGR, that is, group-item interactions are extremely sparse or nonexistent, we propose a multi-hypergraph model named HL4EGR (Hypergraph Learning for Ephemeral Group Recommendation). The model employs hypergraphs to capture the relationship among users, items, and groups, and adopts a two-stage framework consisting of pre-training and fine-tuning. During the pre-training, we choose the hypergraph to model user-item interactions, where a hyperedge connects all the items that a user interacts with, thus equally treating historical interactions and current interactions. The item embeddings obtained by the pre-training are subsequently clustered to identify user preferences. At the stage of fine-tuning, we construct three hypergraphs to model user-group affiliations and two types of group-group similarities, respectively. Here, two types of similarities are given from two different perspectives, one explicit from the perspective of items interacted with by the members and another implicit from the perspective of common preferences of members. Both of them emphasize the commonality of member behavior or member preferences, and weaken the intensity of member behavior or preferences. Further, we maximize the agreement between contrastive views of groups by cross-hypergraph contrastive learning. Finally, we aggregate these group embeddings to generate group preferences and then perform the prediction for groups. For the case in Fig. 1, our model recommends a steakhouse that is in line with the ground truth. Our contributions are summarized as follows.

  • We construct four hypergraphs and learn the complex relationships among users, items, and groups through hypergraph convolutions. Particularly, by means of hypergraphs, we weaken the timeliness and intensity of user behavior and preferences and captures their common preferences effectively, thus satisfying the intrinsic requirement of group recommendation.

  • We highlight that identifying and leveraging similarities between groups provides a practicable way to cope with the absence of group-item interactions. Moreover, we take group self-discrimination as the self-supervised task, which offers auxiliary supervision signals via two views of a group w.r.t. explicit and implicit group-group similarities for reinforcing group representation learning.

  • We conduct extensive experiments on three public datasets. The experimental results show that HL4EGR consistently outperforms the state-of-the-art models, showing relative gains of 8.92%-15.93% on Recall@50 and 13.37%-18.88% on NDCG@50, respectively.

2 Related Work

Early group recommendation adopts collaborative filtering to obtain the member’s scores on items and then aggregates their scores to get group preferences by some hand-crafted heuristic rules. Customary aggregation methods include the least misery [1], the average [3], and the maximum satisfaction strategy [2]. However, these predefined aggregation strategies lack the flexibility to achieve optimal performance in group recommendation. Subsequent work on group recommendation shifts towards how to efficiently aggregate the preference representations of all group members to the group preference. For example, multiple group recommendation models such as AGREE [4], SoAGREE [5] and MoSAN [15] propose different attention-based aggregation methods.

With the development of graph neural networks, the tripartite graph [12] has been employed to model users, items, and groups relationships and then learn group representations. Furthermore, hypergraphs are found to be more suitable for modeling groups because hyperedges in the hypergraph can connect two or more nodes and represent a more general topological relationship. Some models [8, 10, 16, 20] apply the hypergraph to model groups and then employ Hypergraph Neural Networks (HNNs) [17] to generate group representations. For example, ConsRec [16] models users and items as nodes, groups as hyperedges, and learns group representations through HNNs. In addition, CubeRec [6] adaptively generates a hypercube representation for each group. However, these models do not discover the essence of user preferences playing a role in group recommendation scenarios. They do not treat strong and weak preferences equally, nor do they give equal weight to historical and current preferences.

Recently, the research on group recommendation [6, 14, 20] has attempted to incorporate self-supervised learning to alleviate the data sparsity problem. For example, for enhancing the user and group representations, GroupIM [14] proposes maximizing mutual information between members within a group and the group. \(\mathrm {S^2}\)-HHGR [20] designs a double-scale node dropout strategy and performs node self-discrimination on different user representations. However, existing methods mainly focus on finding self-supervision signals in user-group relationships without considering group-group relationships. Besides, some studies rely on introducing additional information to improve the performance of group recommendations. For example, KGAG [7] introduces knowledge graphs into group recommendation. SIGR [18] and HyperGroup [8] introduce social relationships among users to learn group preferences influenced by social relationships.

Compared to existing work, our model employs multiple hypergraphs to model different relationships among users, items, and groups from multiple perspectives, using the prior about the role of user preferences for group recommendations as an inductive bias of the model. Moreover, our model captures self-supervision signals from the similarities between groups, and then learn more comprehensive group representations.

Fig. 2.
figure 2

Architecture of our HL4EGR.

3 Methodology

3.1 Model Overview

Let \(\mathcal {U}\), \(\mathcal {V}\) and \(\mathcal {G}\) denote the user set, item set, and ephemeral group set, respectively. An ephemeral group \(g_k \in \mathcal {G}\) consists of \(|g_k|\) users, i.e., \(g_k={\{u^{g_k}_i\}}^{|g_k|}_{i=1}\), where \(u^{g_k}_i \in \mathcal {U}\). There are two types of observed interactions among users, items, and ephemeral groups, i.e., user-item interactions denoted as \(\textbf{X} \in \mathbb {R}^{|\mathcal {U}|\times |\mathcal {V}|}\), and group-item interactions represented as \(\textbf{Y} \in \mathbb {R}^{|\mathcal {G}|\times |\mathcal {V}|}\), where the element \(y_{kj}\) of the matrix \(\textbf{Y}\) is equal to 1 if the group \(g_k\) has historical interactions with the item \(v_j\), otherwise \(y_{kj}=0\).

Given an ephemeral group \(g_k\), our task is to predict the item that the group \(g_k\) is most likely to be satisfied with.

For this task, we propose a multi-hypergraph model HL4EGR, whose architecture is shown in Fig. 2. We build four hypergraphs to model the user-item interactions, the user-group affiliations, and the explicit and implicit group-group similarities.

As shown in Fig. 2, the training of HL4EGR is divided into two stages, i.e., pre-training and fine-tuning.

In the first stage, we construct the user-item hypergraph \(H^{UV}\) and perform the convolution operation on \(H^{UV}\), thus obtaining the user embeddings \(\textbf{U}\) and item embeddings \(\textbf{V}\). \(\textbf{U}\) is utilized for initializing the group embeddings used in the second stage and \(\textbf{V}\) is applied to characterize the user preferences by clustering.

In the second stage of training, i.e., fine-tuning, except for constructing the user-group hypergraph \(H^{UG}\), we also construct two group-group hypergraphs \(H^{V}\) and \(H^{P}\), portraying explicit and implicit similarities between groups, respectively. Then we perform the hypergraph convolution operations to obtain group embeddings. Furthermore, we adopt a cross-hypergraph contrastive learning strategy to align embeddings of the same group from both explicit and implicit perspectives, thus obtaining more comprehensive group preferences that are devoted to group recommendation.

3.2 Hypergraph Construction

User-Item Hypergraph. We define the user-item hypergraph as \(H^{UV} = (\mathcal {V}, \mathcal {E}^{UV})\), where a node of \(H^{UV}\) is an item in \(\mathcal {V}\), a hyperedge \(e_i^{UV} \in \mathcal {E}^{UV} \), \(i \in [1,|\mathcal {U}|]\) connects all the items that user \(u_i\) interacts with, and \(|\mathcal {E}^{UV}| = |\mathcal {U}|\). As shown in Fig. 2, user \(u_1\) has historical interactions with item \(v_1\) and item \(v_2\), thus we connect {\(v_1, v_2\)} with a hyperedge. Such hyperedges eliminate temporal differentiations of interactions, treating historical interactions and current interactions equally.

User-Group Hypergraph. We define the user-group hypergraph as \(H^{UG} = (\mathcal {U} , \mathcal {E}^{UG})\), where a node of \(H^{UG}\) is a user in \(\mathcal {U}\), a hyperedge \(e_k^{UG} \in \mathcal {E}^{UG}\), \(k \in [1, |\mathcal {G}|]\) connects all the users in group \(g_k\), and \(|\mathcal {E}^{UG}| = |\mathcal {G}|\). As shown in Fig. 2, we connect the group member \(\left\{ u_1, u_2\right\} \) of the group \(g_1\) with a hyperedge, which reflects the user-group affiliation.

Group-Group Hypergraphs. For alleviating the data sparsity issue, we construct two group-group hypergraphs, i.e., \(H^{V} = ( \mathcal {G}, \mathcal {E}^{V})\) and \(H^{P} = (\mathcal {G}, \mathcal {E}^{P})\).

In hypergraph \(H^{V}\), \(\mathcal {G}\) is taken as the node set, and a hyperedge \(e^V_k \in \mathcal {E}^{V}\), \(k \in [1, |\mathcal {G}|]\) connects all such groups, provided that a member of that group and a member of group \(g_k\) interact with the same item. In other words, hypergraph \(H^{V}\) contains the explicit similarities between groups.

Complementary to \(H^{V}\), hypergraph \(H^{P}\) implies the implicit similarities between groups, i.e., the preference similarities between groups. The group preference is essentially a collection of member preferences. Specifically, with the consideration of the interference of noisy behavior, we regard the items that all users have interacted with as the starting point to model the user’s preferences, instead of capturing the user’s preferences from a user’s behavior. We perform K-means clustering on the item embeddings \(\textbf{V}\) obtained by the pre-training on the hypergraph \(H^{UV}\) and generate c clustering centers. Next, for each user, given an item that this user has interacted with, if the distance between the item embedding and the center of the category the item belongs to is less than \(\mu \), this center is considered to be a preference of this user. Subsequently, the preferences of group members are merged to form a set of group preferences. Then, we build a hyperedge \(e^P_k \in \mathcal {E}^{P}\), \(k \in [1, |\mathcal {G}|]\) to connect the groups that has common preferences with group \(g_k\).

When building the group-group hypergraphs, we treat all the hyperedges equally (i.e., hyperedges with same weights), thus flattening the intensity of a user’s individual behavior and preferences. This enables HL4EGR to more fairly learn the common preferences of users within the group, reducing the impact of the intensity of a user’s personal behavior and preferences on the group-group similarity.

3.3 Hypergraph Convolution

In HL4EGR, we design a HyperGraph Convolutional Network (HGCN) to learn representations of nodes and hyperedges in a hypergraph. Without loss of generality, we formalize four hypergraphs uniformly as \(H=(\mathcal {N},\mathcal {E})\), where \(\mathcal {N}\) denotes the node set and \(\mathcal {E}\) denotes the hyperedge set. The learning process of the l-th layer of HGCN is as follows.

Firstly, we aggregate representations of all nodes connected by hyperedge \(e_k\) as follows.

$$\begin{aligned} \textbf{m}_k^{(l)} = \text {AGG}({\textbf{n}_i^{(l-1)} | n_i \in e_k}) \end{aligned}$$
(1)

where \(e_k \in \mathcal {E}\) denotes the k-th hyperedge, \(\textbf{n}_i^{(0)}\) is the initial embedding of node \(n_i \in \mathcal {N}\), \(\textbf{n}_i^{(l-1)}\) is the embedding of the node \(n_i\) in the \((l-1)\)-th layer, AGG(\(\cdot \)) denotes an aggregation function, realized as an average pooling function.

Then, we concatenate node aggregation representation \(\textbf{m}_k^{(l)}\) and hyperedge representation \(\textbf{e}_k^{(l-1)}\) to update the hyperedge representation as follows.

$$\begin{aligned} \textbf{e}_k^{(l)} = \text {CONCAT}(\textbf{m}_k^{(l)}, \textbf{e}_k^{(l-1)}) \textbf{W}^{H} \end{aligned}$$
(2)

where \(\textbf{e}_k^{(0)}\) is the initial embedding of hyperedge \(e_k\), \(\textbf{e}_k^{(l)}\) denotes the embedding of the hyperedge \(e_k\) in the l-th layer. \(\textbf{W}^{H} \in \mathbb {R}^{2d \times d}\) is a learnable matrix.

Moreover, node representations can be updated as follows.

$$\begin{aligned} \textbf{n}_i^{(l)} = \text {AGG}({\textbf{e}_k^{(l)} | e_k \in \mathcal {E}_i} ) \end{aligned}$$
(3)

where \(\mathcal {E}_i\) represents the set of hyperedges connected to the node \(n_i\).

Finally, we can obtain the embedding \(\textbf{n}_i\) of the node \(n_i\), and the embedding \(\textbf{e}_k\) of the hyperedge \(e_k\) as follows.

$$\begin{aligned} \textbf{n}_i = \sum _{l=1}^{L} \textbf{n}_i^{(l)},~~~~ \textbf{e}_k= \sum _{l=1}^{L} \textbf{e}_k^{(l)} \end{aligned}$$
(4)

where L is the number of convolutional layers.

During pre-training, we first randomly initialize the representations of nodes and hyperedges in \(H^{UV}\) and feed them into an HGCN. Then, we iterate and optimize the HGCN by the cross entropy loss (\(L_U\) in Fig. 2). After pre-training, we obtain user embeddings \(\textbf{U}\in \mathbb {R}^{|\mathcal {U}|\times d}\) and item embeddings \(\textbf{V}\in \mathbb {R}^{|\mathcal {V}|\times d}\).

During fine-tuning, we first aggregate user embeddings \(\textbf{U}\) to generate group embeddings \(\textbf{G}^U\in \mathbb {R}^{|\mathcal {G}|\times d}\). In detail, taking the group \(g_k\) as an example, we leverage an attention mechanism to aggregate embeddings of users in the group \(g_k\), thereby obtaining the initial group representation \(\textbf{g}_k^{U} = \textbf{G}^U(k,:)\). This process can be formalized as follows.

$$\begin{aligned} \textbf{g}_k^{U} = \sum _{u_i \in g_k} {\alpha }_{i} \textbf{u}_i \end{aligned}$$
(5)
$$\begin{aligned} {\alpha }_{i} = \frac{\exp (tanh(\textbf{u}_i \textbf{W}^{AGG} + b))}{\sum _{u_{i^{\prime }} \in {g_k}} \exp (tanh(\textbf{u}_{i^{\prime }} \textbf{W}^{AGG} + b))} \end{aligned}$$
(6)

where \( \textbf{u}_i = \textbf{U}(i,:)\) denotes the embedding of the user \(u_i\) obtained by pre-training, \({\alpha }_{i}\) is the attention weight w.r.t. the user \(u_i\). \(\textbf{W}^{AGG} \in \mathbb {R}^{d}\) is a learnable vector and b is a bias.

Next, we use \(\textbf{G}^U\) to initialize hyperedges of \(H^{UG}\), \(H^{V}\) and \(H^{P}\), and nodes of \(H^{V}\) and \(H^{P}\), and use user embeddings \(\textbf{U}\) obtained by the pre-training to initialize node representations on the hypergraph \(H^{UG}\), and then feed them into corresponding HGCNs.

Finally, by performing the calculations over these three HGCNs, we obtain representations of hyperedges in three hypergraphs, denoted as \(\textbf{G}^{UG}\), \(\textbf{G}^{V}\), and \(\textbf{G}^{P}\), respectively. Given group \(g_k\), its embeddings from three hypergraphs are \(\textbf{g}_k^{UG} = \textbf{G}^{UG}(k,:)\), \(\textbf{g}_k^{V} = \textbf{G}^V(k,:)\), and \(\textbf{g}_k^{P} = \textbf{G}^P(k,:)\), respectively.

3.4 Cross-Hypergraph Contrastive Learning

To learn more comprehensive group preferences, we design a contrastive learning strategy on two group-group hypergraphs, i.e., the hypergraph \(H^{V}\) reflecting explicit similarities and the hypergraph \(H^{P}\) implying implicit similarities, aligning two embeddings of the same group in \(H^{V}\) and \(H^{P}\). Concretely, we regard the representations w.r.t. the same group in two hypergraphs \(H^{V}\) and \(H^{P}\) as positive sample pairs. The representations w.r.t. different groups in the same batch in two hypergraphs \(H^{V}\) and \(H^{P}\) are considered as negative sample pairs. We take InfoNCE loss as the contrastive learning loss as follows.

$$\begin{aligned} L_{CL} =-\sum _{g_k \in \mathcal {G}} \log {\frac{\exp (sim(\textbf{g}_{k}^{V},\textbf{g}_{k}^{P})/{\tau })}{{\exp (sim(\textbf{g}_{k}^{V},\textbf{g}_{k}^{P})/{\tau })}+N_V+N_P}} \end{aligned}$$
(7)
$$\begin{aligned} N_V = \sum _{g_{k^{\prime }}\in \mathcal {G}_k^-}{\exp (sim(\textbf{g}_{k^{\prime }}^{V},\textbf{g}_{k}^{P})/{\tau })},~~~N_P = \sum _{g_{k^{\prime }}\in \mathcal {G}_k^-}{\exp (sim(\textbf{g}_{k}^{V},\textbf{g}_{k^{\prime }}^{P})/{\tau })} \end{aligned}$$
(8)

where \(\textbf{g}_{k}^{V}\) and \(\textbf{g}_{k}^{P}\) form a pair of positive samples, corresponding to the representations of the group \(g_k\) in the hypergraph \(H^V\) and \(H^P\), respectively. \(\mathcal {G}_k^-\) is the set of negative samples w.r.t. the group \(g_k\), which is composed of other groups (i.e., \(k^{\prime } \ne k\)) within the same batch. \(sim(\cdot )\) function is adopted for calculating the similarity of a pair of vectors, which refers to the cosine similarity in this paper. \(\tau \) is the temperature parameter.

3.5 Model Optimization

During pre-training, we predict the interaction probabilities \(\mathbf {\hat{x}_i} \in R^{|\mathcal {V}|}\) of user \(u_i\) on the item set \(\mathcal {V}\) as follows.

$$\begin{aligned} \mathbf {\hat{x}_i} = softmax(\textbf{u}_i \textbf{W}^{UV}) \end{aligned}$$
(9)

where \(\textbf{u}_i = \textbf{U}(i,:)\)obtained from hypergraph \(H^{UV}\), \(\textbf{W}^{UV} \in \mathbb {R}^{d \times |\mathcal {V}|}\) is a learnable matrix.

Then we calculated the cross entropy loss \(L_U\) as follows.

$$\begin{aligned} L_U = -\frac{1}{|\mathcal {U}|}\sum _{i=1}^{|\mathcal {U}|} \sum _{j=1}^{|\mathcal {V}|} x_{ij} \log \hat{x}_{ij} \end{aligned}$$
(10)

where \(\hat{x}_{ij}\) refers to the interaction probability of the user \(u_i\) w.r.t. the item \(v_j\). \(x_{ij}\) is the ground truth of user-item interaction.

During fine-tuning, given group representations from different hypergraphs, we adopt an adaptive aggregation strategy to fuse different group embeddings, i.e., \(\textbf{g}_k^{U}\) obtained from Eq. 5, \(\textbf{g}_k^{UG}\) in the hypergraph \(H^{UG}\), and \(\textbf{g}_k^{V}\) in the hypergraph \(H^{V}\), to generate the group preference \(\textbf{g}_k\) for the group \(g_k\) as follows.

$$\begin{aligned} \textbf{g}_k = \alpha \textbf{g}_k^{U} + \beta \textbf{g}_k^{UG} + \gamma \textbf{g}_k^{V} \end{aligned}$$
(11)

where \(\alpha = \sigma (\textbf{g}_k^{U}\textbf{W}^{U})\), \(\beta = \sigma (\textbf{g}_k^{UG}\textbf{W}^{UG})\), and \(\gamma = \sigma (\textbf{g}_k^{V}\textbf{W}^{V})\). \(\textbf{W}^{U}\), \(\textbf{W}^{UG}\), and \(\textbf{W}^{V} \in \mathbb {R}^d\) are learnable matrices. \(\sigma \) is the sigmoid activation function.

We predict the interaction probabilities \(\mathbf {\hat{y}}_k \in \mathbb {R}^{|\mathcal {V}|}\) of the group \(g_k\) on the item set \(\mathcal {V}\) as follows.

$$\begin{aligned} \mathbf {\hat{y}}_k = softmax(\textbf{g}_k \textbf{W}^{GV}) \end{aligned}$$
(12)

where \(\textbf{W}^{GV} \in \mathbb {R}^{d \times |\mathcal {V}|}\) is a learnable matrix.

Then, we adopt the cross entropy loss as the main loss, calculated as follows.

$$\begin{aligned} L_G = -\frac{1}{|\mathcal {G}|}\sum _{k=1}^{|\mathcal {G}|} \sum _{j=1}^{|\mathcal {V}|} y_{kj} \log \hat{y}_{kj} \end{aligned}$$
(13)

where \(\hat{y}_{kj}\) refers to the interaction probability of the group \(g_k\) w.r.t. the item \(v_j\). \(y_{kj}\) is the ground truth.

We adopt a multi-task strategy to jointly optimize the main group recommendation task and the auxiliary contrastive learning task as follows.

$$\begin{aligned} L = L_G + \lambda L_{CL} \end{aligned}$$
(14)

where \(\lambda \) is a hyperparameter.

3.6 Complexity Analysis

Space Complexity. In HL4EGR, the learnable parameters are mainly from embeddings of users, items, and groups. In addition, as for hypergraph convolutions, since we have four hypergraphs in two stages, each with L layers, the number of parameters is \(4L\times 2d^2\). The number of parameters for two prediction layers in two stages is \(2|\mathcal {V}|d\). Thus, the space complexity of HL4EGR is \(O(L{d}^2 + |\mathcal {U}|d + |\mathcal {V}|d + |\mathcal {G}|d)\).

Time Complexity. The computation amount of HL4EGR is mainly concentrated on the hypergraph convolutions. Let |H| be the number of nonzero elements in the adjacency matrix of hypergraph H. The time complexity of each hypergraph convolution computation is \(O(L\times (2|H|d + 2|\mathcal {E}|d^2))\), where \(|\mathcal {E}|\) is the number of hyperedges. For hypergraphs \(H^{UV}\), \(H^{UG}\), \(H^{V}\), and \(H^{P}\), the numbers of hyperedges are \(|\mathcal {U}|\), \(|\mathcal {G}|\), \(|\mathcal {G}|\), and \(|\mathcal {G}|\), respectively. The total time complexity of HL4EGR is \(O(Ld^2|\mathcal {G}| + Ld^2|\mathcal {U}| + Ld(|H^{UV}|+|H^{UG}|+|H^{V}|+|H^{P}|))\).

Table 1. Statistics of datasets.

4 Experiments

4.1 Experimental Settings

Datasets. We conduct experiments on three public datasets.

  • Weeplaces. It records users’ check-ins in location-based social networks. We extract check-ins from points of interest (POIs) in all major cities in the U.S. We follow the same operations as in GroupIM [14] for constructing user-POI interactions and group-POI interactions.

  • Yelp. It records users’ check-ins in local businesses (e.g., restaurants). We use the dataset published in [18], which includes users’ check-ins on businesses located in Los Angeles, as well as groups’ check-in information.

  • Douban. It is also published in [18], recording the information of users organizing and participating in social activities. We filter out users and items with fewer than 10 interactions.

Table 1 lists the statistics of the three datasets. As shown in Table 1, the average of group-item interactions is less than 3, which manifests that we conduct experiments on ephemeral groups. We randomly split all the groups of each dataset into training, validation, and test sets with a ratio of 7:1:2. We ensure that each group can only appear in one of the three sets.

Baselines. We compare HL4EGR to the following baselines:

  • Two PGR models: AGREEFootnote 1, which is a classical PGR model using an attention mechanism for member aggregation [4]. ConsRecFootnote 2, the state-of-the-art model for PGR, which proposes an HNN to learn member-level aggregation and captures the group consensus on three views [16].

  • Four EGR models: GroupIMFootnote 3, which maximizes user-group mutual information for group recommendation [14]. HyperGroupFootnote 4, which models groups as hyperedges to learn group representations [8]. \(\mathbf {S^2}\)-HHGRFootnote 5, which uses a hierarchical hypergraph and a node dropout strategy on the hypergraph to learn group preferences [20]. CubeRecFootnote 6, the state-of-the-art model for EGR, which utilizes the geometric expressiveness of hypercubes and hypercube intersection-based self-supervision to obtain the group representations [6].

Table 2. Overall performance. The values in bold and underlined are the best and second best results in each row.

Implementation Details. We implement our model in PyTorch. In our model, the number of hypergraph convolutional layers L is set to 2 and temperature \(\tau \) is set to 1. We tune the weight of contrastive learning loss \(\lambda \), the number of clustering centers c, the threshold of distance to any clustering center \(\mu \) for every dataset, finally setting \(\lambda \) to 0.3, \(\mu \) to 0.2 for all datasets, c to 64 for Weeplaces, 128 for Yelp and Douban. We optimize the model via the Adam optimizer with the learning rate 0.001. The implementation code has been releasedFootnote 7. For the sake of fairness, we set the size of all embeddings d to 64, the batch size to 256 in all the experiments. For all baselines, the hyperparameters are set to values corresponding to best performance reported in their respective papers. Experiments are conducted on NVIDIA RTX3090 GPU with 24G memory.

Metrics. To evaluate the performance of recommending items to groups, we adopt two metrics, i.e., Recall@K and NDCG@K (R@K and N@K for short), where Recall focuses on whether the group actually chooses the recommended item, NDCG focuses on the ranking of the recommended items and K is set to either 20 or 50.

Fig. 3.
figure 3

Group recommendation performance on groups of different sizes.

4.2 Performance Comparison

Overall Performance. Table 2 lists the experimental results of our proposed model and compared models on the three datasets. From Table 2, we have the following observations.

  • The PGR models are far inferior to the EGR models in all metrics. This is because PGR models depend on group-item interactions to learn group preferences; however, these interactions become extremely sparse or nonexistent in the context of ephemeral groups, ultimately leading to a decline in performance.

  • Hypergraph-based models, i.e., ConsRec, HyperGroup, \(\mathrm {S^2}\)-HHGR, and HL4EGR outperform the traditional attention-based model, i.e., AGREE, which demonstrates that the hypergraph structure excels in modeling user-group affiliations.

  • Three EGR models equipped with self-supervised learning, i.e., GroupIM, CubeRec, and HL4EGR, outperform other models. This might be attributed to the fact that these EGR models can discover and utilize additional supervision signals, thus improving the quality of group embeddings. This shows the advantages of self-supervised learning in EGR.

  • Our HL4EGR outperforms all baselines on three datasets. Taking Recall@20 as an example, compared to the best baseline on each of the three datasets, HL4EGR shows improvements of 14.56% - 23.48%, averaging at 17.87%.

Performance on Groups of Different Sizes. We split the test set into five subsets by the range of the number of group members, i.e., 2-3, 4-5, 6-7, 8-9, and >=10 members. We choose GroupIM and CubeRec for comparison because they are the top-2 best baselines, and we conduct experiments on Weeplaces and Yelp datasets.

As shown in Fig. 3, HL4EGR outperforms GroupIM and CubeRec in almost all cases, except on groups of 10 or more members in Weeplaces, where all three models have the same Recall values (reaching the maximum value of 1). In particular, HL4EGR outperforms the other two models for the case of groups of 2-3 members, indicating that HL4EGR is more suitable for real-life group recommendations, where the size of groups shows the long-tail distribution. Meanwhile, compared to other two models, HL4EGR has more significant performance gains for groups of 10 or more members in Yelp. The reason behind might be that HL4EGR can correct group representations by treating all behavior and preferences of all members indiscriminately in terms of timeliness and intensity, thus capturing common preferences of groups more accurately, while the number of group members increases.

Table 3. Ablation study.

4.3 Ablation Study

Effect of Multiple Hypergraphs. We design four variants to observe the effect of four hypergraphs in HL4EGR on the performance. Variant A deletes the HGCN on \(H^{UV}\) but also performs the pre-training, taking randomly initialized user and item embeddings as input and cross entropy loss, i.e., \(L_U\) as the optimization goal. Variant B removes the HGCN on \(H^P\), which triggers a cascading removal of the contrastive learning module, since \(H^P\) is treated as a source of self-supervision signals. Variant C removes the HGCN on \(H^V\), which leads to the removal of contrastive learning module as well as the reduction of one source of the group preference. Variant D eliminates the HGCN on \(H^{UG}\).

The experimental results on Weeplaces and Yelp are listed in Table 3(a). Compared to HL4EGR, all variants show different degrees of performance degradation, illustrating that each hypergraph is effective. Variant A shows the notable performance degradation, indicating that hypergraph \(H^{UV}\) is the underpinning of the whole model. The direct reason behind this is that the user and item embeddings derived from \(H^{UV}\) are subsequently used for the construction and learning of other hypergraphs, which imposes a great positive impact on performance. The performance decrease of variant B shows the effect of alleviating data sparsity and adjusting group embeddings via contrastive learning. Variant C has a significant decline in performance, while compared to variant D, which shows that group-group relationships play a more important role than inherent memberships of groups in group recommendation.

Effect of Pre-Training. To observe the impact of pre-training, we construct two extra variants of HL4EGR, namely variants E and F. Variant E removes the pre-training, thus collapsing into the scaled-down version of only containing \(H^{UG}\) and \(H^V\) and taking randomly initialized user embeddings as the input of fine-tuning. Variant F substitutes SASRec [11] for the HGCN on \(H^{UV}\).

The performance of variants E and F is shown in the top half of Table 3(b). Variant E without pre-training shows the worst performance, indicating that the pre-training is indispensable.

Fig. 4.
figure 4

Sensitivity analysis of hyperparameters \(\lambda \), c and \(\mu \) on Weeplaces dataset.

Variant F was originally anticipated to exhibit high performance because SASRec adopts a self-attentive mechanism that learns both long-term and short-term dependencies and produces high-quality user and item embeddings. However, experimental results show that variant F does not surpass the original HL4EGR. This observation reveals that the self-attention mechanism, which is good at capturing temporal dependencies embedded in sequences, does little to help eliciting the group preferences. Presumably the reason for this would be that group preferences are time-insensitive.

Effect of Hyperedge Weights. As mentioned in Sect. 3.2, when constructing group-group similarity hypergraphs in HL4EGR, the weights on the hyperedges are assigned to the same value, aiming to weaken the effect of the intensity of individual member behavior and preferences. To observe how weight values affect performance, we modify \(H^V\) and \(H^P\) by setting weights to the number of items interacted with by both members of two groups and the number of common preferences of two groups, respectively, and construct three variants of HL4EGR. Variant G introduces weights on \(H^V\) and \(H^P\). Variant H introduces weights only on \(H^V\) while variant I does so only on \(H^P\).

The experimental results are shown in the bottom half of Table 3(b). It can be seen that variants G, H and I have lower performance than HL4EGR. In particular, variant G has a significant performance degradation, which shows that simultaneously emphasizing the intensities of member behavior and preferences has a significant negative effect on group recommendation.

4.4 Hyperparameter Sensitivity Analysis

Impact of Contrastive Loss Weight \(\lambda \). Figure 4(a) shows the results on Weeplaces dataset. This shows that appropriate contrastive learning loss can effectively normalize the group representations.

Impact of Number of Clustering Centers c. Figure 4(b) shows the results. HL4EGR achieves best performance on Weeplaces when \(c=64\). From Fig. 4(b), we believe that when c is very small, the model is unable to distinguish a user’s different preferences, leading to false similarity when modeling group-group similarity. When c is too large, the model tends to identify the same preference of a user as multiple preferences, which fails to weaken the intensity of user’s individual preference.

Impact of Distance Threshold \(\mu \). Figure 4(c) shows the results. HL4EGR achieves best performance when \(\mu =0.2\). We think that when \(\mu \) is very small, items that indicate user preferences are filtered out; when \(\mu \) becomes large, more items including noisy items are retained. Both cause performance reduction.

5 Conclusion

Ephemeral group recommendation is a challenging recommendation task, not only because group-item interactions are not enough to learn group preferences directly, but also because there are essential differences between group recommendation and personalized recommendation. This paper proposes a model HL4EGR that models the user-item interactions, user-group affiliations, and group-group similarities into multiple hypergraphs, reflecting the essence of the group recommendation. Meanwhile, HL4EGR also designs a contrastive learning strategy on the hypergraphs, which enables HL4EGR to learn more comprehensive group preferences. The results of experiments on public datasets show that HL4EGR substantially improves the accuracy of ephemeral group recommendation results.